I'm in the midst of preparing my talk this Friday at a University of Denver conference (more info here). The talk will focus on the range of research methods that have been utilized in pandemic-related scientific research, why research methods matter, and how they are related to research quality.
I have some materials that probably will get edited out of the final presentation but I don't want to just throw them away. So I am posting some of them here. Today, I like to use a practical example to illustrate the miracle of randomization, which is an important feature of controlled scientific experiments.
It's hard to imagine running an A/B test or a clinical trial without randomization but this idea is not that old - the British statistician RA Fisher is often credited as having invented it in the 1920s. Randomization means the participants are randomly assigned (via a coin flip) to A or B groups (or test or control groups).
The consequence of randomizing treatment is profound. It leads to "covariate balance". If 50% of the research population is female, we expect 50% of the test group as well as 50% of the control group to be female. Depending on the total sample size, the margins of error around those percentages vary.
The one act of randomization confers balance between the test and control groups on all covariates of interest - not just gender but also race, income groups, comorbidities, employment status, anything that is relevant to your study.
What's more, you get balance on other covariates for free! Any covariate, including things you've never dreamed about.
It is this last point that has enormous practical implications.
Let me explain using a real-life A/B test. Let say Amazon wants to test sending customers an email on Christmas eve, and measure whether this email increases spending in the next week. The data scientist draws up the list of targeted customers, then randomly select half of this list to receive the email, and the other half to be excluded from the mailing.
If everything goes as planned, at the end of the week, the data scientist can pull the data, tabulate the spending on both groups of customers, and any difference in spending can be attributed to the email. Of course, the analyst knows that household income, gender, religion, ethnicity, education, prior spending, etc are all factors that affect holiday spending. Since treatment was randomized, all of those factors are well balanced between test and control groups, and therefore cannot explain any statistically significant change in spending.
So far so good.
In a meeting with the IT team, it was disclosed that a systems error caused all Yahoo.com emails to be dropped from mailing. Let's assume that people still using Yahoo.com emails are bigger spenders on Amazon (reasonable since these people are probably older). Has this error ruined our test?
The short answer is no. That's because randomization not only balances variables we know about, but also balances unknown variables. In this case, if the analyst pulls out the proportion of Yahoo.com email addresses from the test and control groups, it's highly likely that the difference is immaterial. Therefore, we can still interpret any difference in spending as caused by email.
It gets even better!
You might think the above scenario is concocted and unrealistic (though I can assure you it's not.) What makes it a tad unrealistic is the eventual discovery of the error. Most errors are committed and never revealed.
For example, people with Yahoo.com emails may not be dropped by our IT team; they may simply be victims of spam filters. Most email providers out there use one of several spam filters, which are black-box algorithms that change constantly. For any number of reasons, it's possible that Yahoo's spam filter algorithm dumped Amazon emails into the spam folder during the week of our email test, and so people with Yahoo.com emails mostly didn't see the emails.
This situation cannot be predicted by the data scientist, nor will it likely to discovered after the fact. However, our test analysis can proceed as usual because again, the test group will have about the same proportion of Yahoo.com emails as the control group. To the extent that a difference in spending is observed, it isn't due to the spam filter.
We can call this the unintended consequences of randomization. This is really really cool stuff.
***
Not long ago, I wrote about real-world studies of vaccine effectiveness that mimic randomized controlled trials by synthesizing test and control groups. The matching process involves explicitly balancing the two groups on a list of matching variables (age, gender, etc.).
This process does not have the magic of a randomized experiment. In practice, balance is enforced on those matching variables but there is no reason to believe the two groups are also balanced on other known covariates not used for matching. Further, it is even less likely that balance is achieved on unknown covariates.
This is one reason why experiments are considered the gold standard for causal studies.
Recent Comments