One of the keys of vetting any Big Data/OCCAM study is taking note of the decisions made by the researchers in conducting the analysis. Most of these decisions involve subjective adjustments or unverifiable assumptions. Not that either of those things are inherently bad - indeed, any analysis one comes across is likely to utilize one or more likely both. As consumers of such analyses, we must be aware what the decisions are.
- The authors selected a period of time to study. For the research paper, this was January 2010 to September 2013. The database has existed since 1998, and it wasn't explained why the other years are irrelevant. Besides, in the poster presentation, the analysis was based on March 2010 to November 2012, a different but overlapping period. In any case, it is assumed that what happened during those months is representative.
- Heart attack admissions are assumed to be a reliable indicator of heart attacks. (Now, it is true that in the publications, the researchers explain that they use admissions requiring PCI as a proxy for heart attacks but as per usual, the reporters drop the modifiers, thus becoming complicit in "story time": selling us one bill of goods (admissions) and then delivering another (heart attacks).)
- What happened at "non-federal" hospitals is assumed to be the same as what happened at other hospitals.
- What happened in Michigan is assumed to be representative of what happened in 47 other states. Also assumed is the lack of similar effect in the two states that do not change their clocks.
- Cases of heart attack admission that did not result in PCI are not tracked by the data collector, and are assumed to be unimportant.
- The data is assumed to be correct. Procedures to collect data and to define cases are assumed to be consistent across all participating hospitals.
- The sample size is really small. There were a total of four Spring Forwards and three Fall Backs in the data.
- What annoyed Andrew: no adjustments are made for multiple comparisons, which means they are assuming that the observed effect is not a random event. This is a strong assumption.
- The effect of DST (if it exists) is assumed to be linear over the number of days from the DST time shift. In other words, the patients admitted on the Tuesday after Spring Break is assumed to be twice as exposed to DST as the patients admitted on the Monday. It's hard for me to get my head around this assumption.
If enough of these assumptions or modeling decisions bother you, you should ignore the study and move on.
This study, like many others, is a perfect illustration of story time. In such studies, the researchers present some data analyses that tie factor A with outcome B; frequently, neither factor A nor outcome B is directly measured so the researchers start to speculate about a web of causation. Sleepy readers may not realize that much of the discussion is pure speculation while the result from the data analysis is extremely limited.
Story time occurs right here:
Our study corroborates prior work showing that Monday carries the highest risk of AMI. This may be attributed to an abrupt change in the sleep–wake cycle and increased stress in relation to the start of a new work week.
The first sentence is based on their data but the second statement is pure speculation. There is absolutely nothing in this study to confirm or invalidate the claim that the sleep-wake cycle of the average Michigan resident who presents himself or herself to the hospital ward was disrupted, or that the stress experienced by said resident has increased.
The fallacy of causation creep shows up as well. The authors said, "Our data argue that DST could potentially accelerate events that were likely to occur in particularly vulnerable patients and does not impact overall incidence."
If the DST effect is merely a correlation, and not a cause, it would not follow that by changing DST, one can affect the outcome. The only way the above statement holds is if one interprets the correlation as causation. Their "data" have done no arguing; it is the humans who are making this claim.
For those mathematically inclined, here is the description of the statistical model used in estimating the "trend" of heart attacks (recall that the gap between the actual counts and this trend is claimed as the DST effect):
This model allowed for a cubic trend in numeric date as well as seasonal factors reflecting weekday (Monday–Friday), monthly (January–December) and yearly (2010–2013) effects. The model also adjusted for the additional hour on the day of each fall time change, as well as the loss of an hour on the day of spring time changes through the inclusion of an offset term.
... The impact of the spring and fall time changes on AMI incidence adjusting for seasonality and trend was assessed through the addition of indicator variables reflecting the days following spring and fall time changes as predictors to the initial trend/seasonality model.
I'm a bit confused by this description, which implies the weeks of the DST time shifts are included in this model used to predict the trend and seasonality. When subsequently, this model prediction is compared to the actual admissions count in the week after the DST time shift to compute relative risk, aren't they just looking at the residuals of the model fit?
Also conspicuously absent is any mention of a hospital effect or a geographical effect or a patient demography effect, all of which I'd think are possible predictors of heart-attack admissions.