It is conventional wisdom that A/B testing (or in proper terms, randomized controlled experiments) is the gold standard for causal analysis, meaning if you run an A/B test, you know what caused an effect. In practice, this is not always true. Sometimes, the A/B test only provides a statistical understanding of causes but not an average Joe's understanding.
Let's start with a hypothetical example in which both definitions are aligned. The only difference between the test and control groups was (junk) mail sent to the test group that advertised Amazon's Prime Day. If a higher proportion of the customers who got the junk mail shopped at Amazon on Prime Day than the control group, then everyone agrees that the mailing caused an increase in shopping activities (assuming the difference was statistically significant). Of course, not everyone was influenced by the mail but on average, more people in the test group shopped.
This consensus falls apart in the canonical A/B testing example. Imagine that the only difference between the test and control groups is the background color of the home page. The test group saw a darker background than the control group. The test group had a lower bounce rate than the control group. Statistically, the background color is regarded as the cause of the drop in bounce rate.
But what does that last sentence mean to the average Joe? There seems to be a missing link. It really isn't the color that is influencing someone's browsing behavior. It may be that the darker color causes certain text or images on the home page to be more visible, which causes more visitors on average to browse more pages, sticking around for a longer period of time. It could be the opposite. The darker color may make certain text less visible, which causes visitors to stick around for a longer period of time because they were struggling to find what they were looking for!
In other words, the background color may only be an indicator of the cause, not the true cause itself.
Some people have argued that it doesn't matter. You don't need to know what the true cause is. All you care about is that this result can be replicated.
I disagree with that position because taking that stance frequently leads to misguided future actions. The belief that background color somehow caused lower bounce rates inevitably leads you to test a variety of other colors (red, green, yellow, orange, blue, purple, ...) Almost all of these tests would be a waste of time. If, on the other hand, you learn that making your text clearer to the visitors is the true cause, then you may instead test larger font, different font types, placement of the text, etc.
In short, you should have some mental model behind the cause and effect mechanism.
This is similar to a problem that arises in medicine. We might know that a drug increases the level of some chemical. Now if that drug works, then often the assumption is that the disease is a result of low levels of the chemical. This can be true, but it can also be false. It may well be that the increase is offsetting some other effect.
Posted by: Ken | 07/22/2015 at 02:20 AM