A/B Testing
Last updated
Last updated
In digital realm, an A/B test is an experiment conducted to assess the performance of different versions of an online experience, based on metrics like signup rates. This is achieved by randomly presenting each version to users and subsequently analyzing the outcomes.
This robust statistical method should be employed when we seek to make data-driven decisions for
web designs (e.g. conversion rates optimization)
user interfaces (e.g detecting the impact of releasing a new feature)
marketing strategies (e.g.evaluate the value of advertising)
or other operational changes.
Specifically, it's most insightful when we can clearly define and measure the performance indicators like
conversion rates
click-through rates
signup rates
or engagement levels.
A/B testing is ideal when you have two distinct variants to compare and sufficiently large sample sizes, which we will discuss further, to reach statistically significant conclusions. However, even though, A/B tests provide us with invaluable insights there are occasions when we shouldn't conduct A/B tests if:
we don't have the infrastructure
we don't have enough traffic
we lack clear hypothesis to test
the test subject is not impactful
there are ethical consequences (e.g. testing inappropriate content)
we have high opportunity cost (e.g cost of blocking a certain feature from a large number of users is high)
In cases where A/B testing is not option, we can:
Perform user experience research using focus groups and surveys to gain insights into the preferred options.
Scrutinize user activity logs to enhance our understanding of which version is a more suitable fit.
Implement the product change and subsequently conduct a retrospective analysis by examining historical data to verify if the targeted metric responds as anticipated.
When A/B testing is available to us, here are the parameters/factors we need to set/consider prior to the test:
Decide on the metric to test (i.e. conversion rate)
Determine thresholds for alpha (significance level, generally 5%) and power (i.e. 1-beta, generally 80%) depending on the value of minimal detectable effect (MDE)
Define sample size based on MDE, power and metric variance
Set experiment length (at least 2 weeks to account for dow - day of week effect or novelty effect)
Randomly select sample of users (or the entire population if the available dataset is small)
Randomly assign those users to control and treatment groups (similar groups in terms of counts, demographics, income, geolocation, new vs old users, etc)
For more detailed information on
sample sizes and thresholds please check https://guessthetest.com/calculating-sample-size-in-a-b-testing-everything-you-need-to-know/
duration please check https://guessthetest.com/how-long-should-you-run-an-a-b-test/
After running the test and logging user actions for each group for the desired duration, we should then be
computing metrics of interest (i.e. conversion rate)
testing for statistically significant differences to make decision.
A/B testing specifically plays a crucial role when the cost of making uninformed changes could be high, or when fine-tuning an existing system where small improvements could lead to substantial gains over time. However, knowing when to apply or not to apply A/B testing is the first step!