Measuring Ad Effectiveness: A/B Testing for Campaign Success and Attribution

Henry Kpano
7 min readJan 22, 2025

--

In this article or documentation, I will discuss and give details of practically implemented A/B testing in measuring the effectiveness of two separate ads rolled out. These ads were targeted at driving engagements on company products. As the basis, the population in this analysis was randomized and not evenly distributed.

Note: In A/B testing, several aspects of the test can be measured based on the Product/Project Manager. Ads are run to target different elements or users' behavioral responses. It entirely depends on the Manager responsible for the project. Not all A/B testing must be analyzed for conversion ( conversion, in this case, refers to purchasing a product or service).

Based on the above, the scenario in this documentation is below.

Marketing companies want to run successful campaigns, but the market is complex, and several options can work. So, they normally conduct A/B tests, a randomized experimentation process in which two or more versions of a variable (web page, page element, banner, etc.) are shown to different segments of people simultaneously to determine which version has the maximum impact and drives business metrics.

The companies are interested in answering two questions:

  • Would the campaign be successful?
  • If the campaign was successful, how much of that success could be attributed to the ads?

With the second question in mind, we normally do an A/B test. The majority of the people (the experimental group) will be exposed to the ad. A small portion of people (the control group) will instead see a Public Service Announcement (psa) (or nothing) in the exact size and place the ad would normally be.

The idea of the dataset is to analyze the groups, find if the ads were successful, how much the company can make from the ads, and if the difference between the groups is statistically significant.

Data dictionary:

  • Index: The row index
  • User ID: User ID (unique)
  • test group: If “ad” the person saw the advertisement, if “psa” they only saw the public service announcement
  • converted: If a person bought the product then True, else is False
  • total ads: Amount of ads seen by person
  • most ads day: Day that the person saw the biggest amount of ads
  • most ads hour: Hour of day that the person saw the biggest amount of ads

Data Quality Checks: This is where we ensure our data is checked for inconsistencies, discrepancies, and outliers. These are important aspects of the analysis as they can distort the actual findings from the analysis. From the data used in this analysis, I conducted some data quality checks such as

  1. Duplicate checks: To verify if there are repeated data in the data.
  2. Missing data checks: This is to check if there is some lost data in the data. From the checks, there were no missing data.
  3. Outlier checks: These values seem too high or low which could impact the analysis outcome.

In every A/B test, there are two hypotheses to be considered to be able to conclude the findings. Below are the hypotheses for this analysis.

Hypothesis 1(Null Hypothesis): There would be no changes in the number of customers that would purchase from the platform. Both control and experimental would be the same.

Hypothesis 2(Alternative Hypothesis): The new ads would lead to an increase in customer purchases and also increase website traffic.

Step 1: Analyse the control group, in this case, users in the “psa” in the data. We will analyze various aspects, such as the number converted, conversion rate, day-of-the-week conversion, hourly conversion, the percentage not converted, and the relationships between the number of ads seen a day, the control group, and conversion.

From my Tableau dashboard

From the dashboard above, the following could be drawn from the findings.

  1. The randomization pattern used to determine the population of the control group seems to be on the low side. It could be seen that the control makes up only 4% of the total population. From another perspective, this could also be deliberate from the Product Manager as well.
  2. It could be seen that Monday had the highest conversion in all the day’s users engaged with the ad. With a conversion rate of 2.26%. This might be the day users are most active and tend to engage more as such a good amount of invested in this day can lead to more conversion.
  3. In as much as Monday had the highest conversion, Thursday had the highest ads seen by customers, which is the highest engagement.
  4. Conversions are high in the hours of 16:00hrs. This might be hours when customers are best seen to be engaging with marketing ads.
  5. Based on the above findings, the Product Manager and Analyst can invest more in days with higher conversions. Out of 582,481 total engagements, Monday had a low engagement but the highest conversion. The total conversion rate for the control group is 1.79%.

Step 2: Analyse the experimental group, users in the “ad” in the data. We will analyze various aspects, such as the number converted, conversion rate, day-of-the-week conversion, hourly conversion, the percentage not converted, and the relationships between the number of ads seen a day, the control group, and conversion.

From my Tableau dashboard

From the dashboard above, the following could be drawn from the findings.

  1. It is realized that the experimental group had a high number of users, which made up 96% of the total population. The population imbalance might impact the findings.
  2. It is realized that just as the control group, the experimental group also had the highest conversion rate on Monday of 3.32%. This reveals a trend in the days having the highest conversion rate.
  3. Even though Monday had the highest conversion rate, Friday had the highest ad engagement. This might mean the ad is reaching and appealing to a specific group of individuals on Monday more than on Friday.
  4. The time which had the highest conversion was the same as in the control group. From the dashboard, it is realized that 16:00hrs had the highest conversion of 3.09%. It would be prudent for the Product manager to conduct a deeper dive into these findings with more data.
  5. From the dashboard, 2.55% of them converted after engaging the experimental ads. It would be prudent for the team to conduct a deep dive into how to improve these percentages by looking at factors and user behaviors that can help improve ads. This conversion rate is an improvement compared to the control group. A further statistical analysis would provide better statistical performance between both.

Step 3: Analyzing both experiments based on statistical principles.

From my Tableau dashboard

Below are some points that can help the analyst and Product Manager make decisions from the findings.

Point 1: The sample size in this analysis is not rationally distributed or proportionally randomized. It can be seen that the control group made up only 4.00% of the total population, while the experimental group made up 96.00%. It would be prudent to reconsider a reasonable sample size for both groups to help get a more proportional relationship to analyze.

Point 2: From the dashboard, it can be realized that, on average, a user saw the ad 24.82 times before a conversion can be made. This seems to be a lot of time for conversion to take place. Reducing this frequency can help facilitate growth and quicker iteration of new ideas to help improve future ads.

Point 3: The lift rate from the analysis indicates that the experimental ad performed better by 43.09%. This means the experimental group converted 43.09% more users than the control group.

Point 4: Standard Error: This is calculated to determine the level of error in the two data sets for this analysis. From the analysis, it can be seen that the error is almost insignificant to the result. A significant error in the data could influence the analysis, leading to a wrong conclusion or decision by management. The formula for calculating this is below

Test Statistics Formula and Standard Error Formula

From the dashboard, the standard error is almost insignificant, which means all outliers and erroneous data have been removed. Leading to a standard error of 0.001044.

Point 5: Test Statistics: From the analysis, it is realized that the test led to a 7.370. This means the treatment group has made more impact than the control group. From a statistical point of view,

If the Test Statistics(TS) > critical z-score or TS < negative critical z-score, reject the null hypothesis. The treatment group made more gains and would yield more impact than the control group.

Conclusion: Several suggestions or conclusions can be made from the analysis done.

Firstly, both ads make more impact on Mondays and at 16:hrs, which means that a deeper dive into this would be helpful. This can bring out more interesting user behavioral findings to help improve the experimental ads.

Secondly, the experimental ad has all the trends and impacts made by the control group while showing improvements in its performance. A further improvement can improve the conversion rate leading to a higher lift rate.

Thirdly, the experimental ads can be rolled out to all users as it impacts all aspects of the user and more. A deep dive would help provide further focus areas to improve.

For more consultation and understanding, you can reach me via email henrykpano@gmail.com.

--

--

Henry Kpano
Henry Kpano

Written by Henry Kpano

Data Engineer, Data Analyst, Product Analyst, Python, Machine Learning Enthusiast, Solutions Architect

No responses yet