The second part (pp.66-87) of Chapter 2 is concerned with how to use hospital data to compare the "skill" of doctors. A doctor who has "skill" creates greater than average improvement in the outcomes of his or her patients, after controlling for other factors, such as the type of patients. The technical hurdle comes from the non-random assignment of patients to doctors (by triage nurses) so that, for example, the best doctors may get the patients with lower-than-average chances of survival. If we compare the average survival of patients by doctor, the difference could reflect the "skill" of the doctors, or the survivability of the assigned patients, or some combination of both. This, as L&D points out, is a form of selection bias. We would like to control for patient assignment, and isolate the "skill" factor.
The structure of this section closely resembles a statistical analysis process, is intelligently laid out, and so makes for enjoyable reading. The steps in this process include:
- understanding the origin of the data to be analyzed (the description of Craig Feied at Washington Hospital Center is not merely space filler!)
- exploring the data, starting with listing the variables, and examining basic summary statistics, a step that should never be skipped even if the work is mundane
- describing the problem, and noting statistical challenges (e.g. selection bias, deaths not tracked in the original data)
- collecting additional data as needed (such as appending demographic information, and appending dates of death of those patients who died outside the hospital)
- conducting simple analyses of individual variables on the outcome (e.g. what is the relationship between the type of complaint upon arrival and subsequent survival)
- constructing a statistical model (often a regression model) to handle multiple variables, and other complexities such as interaction effects, covariates, adjustments
- interpreting the results
- analyzing the results (e.g., they created profiles of the "good" and "bad" doctors, which is only possible after addressing which doctors have skill).
I have trouble understanding exactly what their methodology is (This shows why equations are much better than words when it comes to describing methodology but if we want to learn methods, we wouldn't be reading Freakonomics.) The Notes references a working paper by Mark Duggan and Levitt, and I could not locate it on-line, nor at either of their home pages. So I will summarize what they indicated in the book, and then add some comments on what I think other key issues are for this problem.
The Methodology (primarily pp.78-79)
- Two or three ER doctors work per shift at this hospital. When a patient shows up, a triage nurse will assign him or her to one of the doctors. The exact method for assignment is not known, but the nurse likely uses her knowledge of which doctor is particularly good with which type of patients, for example, patients complaining of short of breath, or older patients. The nurses' assignment is therefore non-random; in other words, the mix of patients seen by each doctor cannot be assumed to be identical.
- They say that patients can be modeled as random arrivals at the shift level ("de facto, accidental randomization"). More precisely, "the patients who show up between 2 and 3 pm on one Thursday in October are, on average, likely to be similar to the patients who show up the following Thursday, or the Thursday after that." This sounds reasonable (ignoring seasonality), and it should be easy enough to check the assumption because they have data on the patient-doctor assignments, and various characteristics of the patients.
- Now, they say "while we exploit the information about which doctors are working on a shift, we don't factor in which doctor actually treats a particular patient". They stress that this is a paradox -- in order to do this analysis properly, they need to throw away the data concerning doctor-patient matches.
- So I think this is a two-level model, or a two-step analysis: first at the shift level, and then at the doctor level. At the first level, they evaluate entire shifts: "if the patients who came on the first Thursday have worse outcomes than the patients who came on the second or third Thursday, one likely explanation is that the doctors on that shift weren't as good."
- Then they get to the doctor level: "if you look at a particular doctor's record across hundreds of shifts and see that the patients on those shifts have worse outcomes than is typical, you have a pretty strong indication that the doctor is at the root of the problem." This again sounds reasonable. But I am a bit unclear how they avoided the nonrandom assignment problem once they drilled down to the doctor level -- if the triage nurses noticed that Dr. X is particularly good with older patients, then even across all shifts, Dr. X would likely have tended to an excess of older patients, no?
They have clearly thought a lot about this problem, and I am not disagreeing with the method. I just don't comprehend it completely. It would be interesting to see some exploratory data on the doctor-patient matching, which could give color to the nonrandom assignment issue.
I now list some other issues that L&D don't address directly but are worthy of attention:
- In biostatistics, this is a survival analysis problem. One key characteristic of this data is so-called right censoring: that a number of ER patients would still be living at the end of the study, and thus their time till death is not known. Thus, instead of measuring the average time till death, we look at one-year death rate (or survival rate), 5-year death rate, etc.
- The data set contains eight years worth of patients. Patients who entered the ER eight years ago have been "observed" for eight years while those who entered one month ago have been "observed" for only one month. If the type of patients cannot assumed to be constant over time, there is a built-in bias.
- Another cohort effect is that over the eight years, there may have been medical advances, either across many ailments, or across specific ailments. A key pitfall in analyzing observational studies such as this is the failure to account for every knowable effect; what we don't know could really be fatal. An important tip is to think outside the database: start with what could affect the outcome we're measuring; don't start with what data we have collected.
- As an industry statistician, I always think about whether the analysis brings about an "actionable" result. (This is admittedly not a major issue for academia.) With this in mind, the question I'd ask is: for a given ailment, which treatment method is likely to bring better patient outcomes, after controlling for doctors, and other factors? I'm not sure what a hospital might do with a ranking of doctor "skill" (averaged across types of patients, ailments, etc.).
Other thoughts on the rest of Part 2:
p.62 -- What Alan Krueger conducted sounds like a matched "case-control" study. Cases were the martyrs, and controls were men matched by age. This type of study is usually analyzed using odds ratios, e.g. for the poor family factor, this would be ((0.28)/(0.72))/((0.33)/(0.67)) = 0.79 (reciprocal is 1.27). The odds of a martyr coming from a poor family is about 80% the odds of a non-martyr coming from a poor family. Based on this calculation, I suspect the factor is not statistically significant. Good study but not the results we would hope for, and it confirms that there is no easy way to predict martyrdom.
p.63 -- L&D expresses surprise at finding that terrorists have above average education and social status. Note that people who perpetrate credit card scams are often PhDs.
pp.65-6 -- Useful walkthrough of how to compute the total cost of terrorist attacks, beyond just counting dead bodies.
p.68 -- They argue that 9/11 exposed the lack of "surge capacity" in our emergency rooms in hospitals. "If there had been a thousand victims, would they even have gotten inside?" I'm not sure about the wisdom of designing emergency rooms to accommodate extremely rare events like 9/11. Would like to see an economic analysis of cost and benefit.
They follow with some eye-opening information on the design of emergency rooms, narrated by Craig Feied. The bit on air recirculation inside a hospital is startling. And patients do die from ailments they pick up after entering the hospital.
pp.70-72 -- Hospitals have poor data back in those days, and Feied, the ER modernizer, had to get his hands dirty collecting the data, which is so very true of most data projects. The following sentence begs to be translated back to technical jargon: "Their system would deconstruct each piece of data from every department and store it in a way that allowed it to interact with any other piece of data, or any other 1 billion pieces."
p.73 -- I am not sure I want to meet Mr. Feied. They say "when challenged, he wouldn't rest until he found a way to charm, or, if need be, threaten his way to victory."