One of the secrets of great data analysis is thoughtful data collection. Great data collection is necessary but not sufficient for great data analysis.
I recently had the unfortunate need to select a new doctor. Every time I had to do this, it has been an exercise in frustration and desperation. And after wasting hours and hours perusing the "data" on doctors, inevitably I give up and just throw a dart at the wall.
Every medical insurer points you to their extensive online resource called the doctors' directory. Apparently, we are supposed to pick a doctor from this directory. There is a lot of data in this directory. A casual search results in hundreds of matches. What are the data available for me to narrow down my selection?
- Which school the doctor graduated from?
- When did the doctor graduate?
- What was the name of the degree?
- How many languages the doctor speak?
- What hospitals the doctor is affiliated with?
- Which medical group the doctor operates within (if any)?
- What are the fields of specialization?
- The address of the office
Conspicuously absent are any data that measure the quality or outcomes of the doctor. There is neither quantitative nor qualitative measure of quality or patient satisfaction. We don't know anything about wait times. It is very challenging to know how big the doctor's practice is.
The data that are provided are essentially just that--data that convey almost no information. I don't think which school the doctor went to matters, nor the name of the degree. Age might be somewhat useful as it indicates amount of experience but the year of graduation is often suppressed. Ethnicity is perhaps useful but it is not present; in some cases, the name reveals this information but not usually.
Hospital affiliations could have been useful if many doctors are not affiliated with many hospitals. I asked a friend of mine who is a doctor whether there are more "selective" hospitals like there are more "selective" universities but he tells me hospital affiliation conveys no information.
Fields of specialization is also useless as I am not looking for a specialist.
Languages spoken is an oddity. If I interpret the data literally, it seems that American doctors have an obsession with learning foreign languages. It is incredible how many of them speak three, four or more languages, including relatively exotic ones. Chances are these doctors have people in the office who speak those foreign languages. In any case, since my primary language is English, I have no inclination to select doctors based on what other languages they (or their staff) speak.
So the only piece of data I can use is the address. Is the doctor close to my home or work?
And that seems to be a poor way of selecting doctors.
PS. While writing this, I am reminded of a continuous stream of useless real-time data: those signal bars on our cellphones. The number of bars and the speed at which a webpage loads are much less correlated than expected.
It's okay if we treat the data as a joke. But somewhere in the world, some data scientists are using the data to do serious work.