If you're a regular reader, you should be shocked by this headline - you might think I have gone postal.
I believe all models of data incorporate unverified (even unverifiable) assumptions - whether they are implicit or explicit in the model, or whether the modeler are aware of them or not. In fact, when the modeler posits a model to be free of assumptions is when you know they know not what they speak.
A typical assertion of this type is "we did not adjust the data because doing so involves making subjective assumptions". The belief is that by "not making an assumption", the analysis is "objective".
How does this conversation even come about? Usually it's because a source of bias has been identified. (Otherwise, there would have been no reason to adjust the data.) When analysts say we've left the data intact so as not to inject subjectivity, they have just made the subjective decision that the bias is ignorable.
***
So why am I now saying don't make assumptions?
I'm now speaking from the perspective of consuming other people's analytical findings, like reading press releases or tweets about scientific research.
A primary goal of science communications is to inform readers of the researcher's conclusions, and to summarize scientific evidence enough to convince the readers of its veracity.
When we read the precis of scientific papers, we frequently make assumptions about what the researchers have done.
***
I invite you to think about your own process of comprehension as you read the following excerpt from the BBC News coverage of why the U.K. government approved Pfizer's Covid-19 vaccine for teenagers.
The third paragraph addresses the benefits of the vaccine.
After reading the above paragraph, you may have picked up the following information:
a) a clinical trial was conducted for teenagers in the U.S.
b) the trial's result is based on "more than 2,000" participants
c) the key result was 16 cases in the placebo group versus 0 in the vaccine group
d) a secondary analysis shows the vaccine works in these teenagers "as well as young adults aged 16-25".
If, after reading those sentences, you think you learned the clinical trial for teenagers replicated the prior adult trials, producing equally amazing results, then you have made assumptions and inferences about what the press release said.
If you dig into the scientific report in NEJM about the Pfizer teenager trial, you will learn the following:
The findings came from an interim analysis, i.e. an early peek of results. The primary endpoint of the trial was not 16 vs 0 but the comparison between teenagers and young adults. This comparison between two age groups was not based on adjudicated symptomatic PCR-positive cases detected from 14 days after the 2nd shot in placebo vs vaccinated (the criterion used in adult trials) but based on antibody levels in blood samples a month after the second shots. Antibodies were tested on 200 teenagers, just 20 percent of the vaccinated teenagers. The placebo group in the trial was not utilized for efficacy. Having antibodies does not automatically mean these individuals are protected from contracting or transmitting Covid-19. In fact, no one knows the minimum level of antibodies required to stop Covid-19, and that's why Phase 3 trials typically measure clinically relevant outcomes such as cases, instead of laboratory outcomes such as antibody levels.
These facts mean
a) the clinical trial setup was not utilized for evaulating benefits because the primary analysis compared vaccinated individuals in two age groups
b) the sample size of 2,000 was irrelevant to the primary analysis
c) the 16 vs 0 metric is a post-hoc exploratory result (see this previous post for why it was not pre-specified as the primary endpoint, and should not suddenly become the primary endpoint because the trial outcome pleases the investigators.)
d) When they said "vaccine works as well," they did not mean on the basis of symptomatic, PCR-positive Covid-19 cases - as one might infer from the previous sentence - but in terms of the level of antibodies induced by the vaccination.
***
Before I read the NEJM report, I also thought the clinical trial followed the same design as the adult trial. What's interesting to me is we inferred all the wrong information when we consumed the news article.
One possibility is a deference to authority: the headline says the U.K. government endorsed those findings based on the science. But notice that the report merely said benefits outweigh the risks, provided some numbers, and then we the readers inferred that the results were strong. The scientific statements did not contain any value judgement.
BBC News did not
a) say the design of the trial was the same as for adults
b) claim 2,000 is a sufficiently large sample size to draw sound conclusions
c) claim that 16 vs 0 constitutes good/strong/amazing evidence
d) define what "to work as well" between two different age groups mean.
If you have some statistical training, you may have made assumptions about what the researchers did. Specifically, you may have assumed that:
a) the trial is completed and the analysis is final
b) The sample size of 2,000 is designed to deliver valid conclusions about the vaccine's efficacy
c) The 16 vs 0 comparison meets the minimum criterion for the primary efficacy endpoint as outlined in a pre-specified protocol
d) The comparison between teenagers and young adults also concerns efficacy as measured in the adult trials.
I have pointedly called these assumptions because the paragraph did not provide any of those details. We used our professional judgment to infer what methods produced the results as given. We also extend our professional courtesy by assuming that the investigators did their jobs, e.g. in deciding that 2,000 participants would be enough to estimate vaccine efficacy to an acceptable level of precision (say within plus/minus 10% as opposed to 30%).
***
The preceding discussion is not met to disparage the Pfizer study. A less rigorous study is not worthless. Analyses that use other types of controls are acceptable.
I'm using this example to illustrate the process of comprehension when we read communications of science. We expect that if the scientists did something out of the ordinary, they will dislose it in the summary. When the details are absent, we may fill in the blanks by assuming the conventional.
What I have learned in many years of practice is to never make assumptions when reading other people's analyses. Read their detailed methodologies, get a hand on anything you can about how they measured any data, defined any metrics, what they included and excluded from the analysis, etc. etc.
Don't make assumptions. Find out what the analysts actually did.
P.S. I previously pointed out that the NEJM paper also has one key omission (link). If you really want to understand the science, you sometimes have to dig many layers deep.
Comments