At first glance, the question hardly makes sense. After all, the whole point of A/B testing is to land on the “right” approach. Businesses expose consumers to Option A or Option B—for example, a landing page or a digital ad—and then measure which performs better on a key variable such as engagement or purchase size. This helps in making informed decisions for the future.
But while this has become common practice across industries, one key question remains: Is A/B testing being done in the most efficient and cost-effective way?
This is a matter of interest Ahal BasabuKellogg professor of business, and Vikas Deep, a Kellogg PhD student advised by Bassamboo. “Companies like Google, Amazon, and many others run thousands of A/B experiments to optimize their product design,” says Deep. “And testing is very expensive.”
So Bassamboo, Deep, and Sandeep Juneja at Ashoka University developed a mathematical model to understand the most efficient way—that is, the way that uses the fewest consumer observations—to determine whether and by how much one option works better than another.
They found that by examining the variation in how consumers respond to both options, decision makers can significantly reduce the number of consumers needed, drastically reducing costs. “How beneficial could it be?” says Bassamboo. “It could reduce the number of observations required by 50 percent.”
“We’ve solved the problem of minimizing the cost or duration of the experiment while still having confidence in the results of A/B testing,” says Deep.
Improvement to 50–50
When performing A/B testing, the most common policy was to implement a randomized controlled trial (RCT), where consumers are randomly assigned to option A or B — such as different landing pages, as mentioned earlier — with an equal chance of being assigned to either. “The simplicity of this policy is that you don’t have to think about anything when you give the consumer one choice over another,” says Bassamboo. “It’s a static policy.”
But the researchers wanted to consider a different assignment rule, one that takes into account a key factor: variation in the measure of interest. In the example, it might be engagement, as measured by the time a consumer spends on the website after viewing one of the landing pages. So, is the variation in engagement different between the two pages—and can this difference in variation be leveraged to more effectively allocate users to each page?
To answer this question, the researchers constructed a mathematical model of A/B distribution that takes into account this key factor of variation. “It takes the arrivals that come in and tries to learn something about the variation in each of the strands to understand how to allocate consumers to options in the future,” says Bassamboo. “So it’s adaptive.”
According to their model, this “smart” adaptive approach can reduce the duration or size of the experiment by up to 50%, saving significant costs. “If you have the money, it’s good to go with an RCT,” says Bassamboo. “But this is a way to use your budget to experiment more carefully.”
More specifically, RCTs perform well when the standard deviation, or amount of variation surrounding the mean, in observations from the two selections is similar. But if Landing Page Option A, to continue our example, results in a wide spread of engagement time (perhaps resonating with some users while disappointing others) while Option B yields a much narrower spread, then the researchers’ adaptive policy is the best. This is because it will assign a higher percentage of website visitors to the option with the highest variability—option A, in this case—to better estimate which landing page is driving the most time on site and by how much.
The researchers emphasize that, for the proposed policy to work, the only change should be the proportion of incoming consumers corresponding to the two options.
Learn what you want to learn
The researchers also point out that optimal use of the adaptive model for A/B distribution requires knowing what you are trying to learn from the test. “It all depends on what question is at the heart of the puzzle you’re trying to solve,” says Bassamboo.
Specifically, are you just trying to figure out which option is better—Landing Page A or B? Or is your goal to get an accurate measurement of how much better a given option is?
“No one type of question is always more difficult to answer than another,” notes Bassamboo, and pursuing one question will eventually lead you to the other. But given your goal, the optimal approach “may vary quite a bit.”
In most settings, he explains, if the objective is an accurate measure of how much better Option A is than Option B, the number of observations allocated to Option A will be proportional to the standard deviation of Option A’s result. However, “This may not be optimal if the goal is simply to find the best option,” he says.
“You have to optimize the allocation policy for the goal you have in mind to make full use of it,” says Deep.