Yesterday, Larry Cahoon, a 29-year veteran at the Census Bureau, answered some questions. The rest of the interview is printed below.
KF: How can a data analyst improve his or her skills?
I have to say my best training has been many, many hours of just playing with statistics, playing with graphics, and reading the analysis others have done. The more data I see, the more analysis I do, the more graphics I look at and produce, the more I learn about how to look at data and how to see the pitfalls in the data analysis. This is getting down in the mud and dealing with real data with all of its warts. My wife tells people I’m a statistician through and through as every time she looks at my computer, there is a graphic of some type on the screen.
Finally, I have always been an avid reader of Science Fiction. No one can read Science Fiction without being forced to consider any problem from different perspectives and take into considerations differing assumptions. This in turn has helped me develop the ability to question the assumptions being made in any data analysis.
KF: What are your pet peeves with published data interpretations?
I seem to return again and again to the same issues with the analyses I see in the newspapers, online, and just about anywhere. The most basic problem is one of incomplete analysis.
We see so many papers and news reports where a data difference is observed and then based on no data whatsoever, the author goes off with an entire line of speculation without any data to justify that speculation. This line of thinking then frequently ends up with the claim that these two things are correlated, and therefore we have cause and effect.
The media then fans these reports by writing a story without asking basic questions, such as is the data itself any good or have they any evidence for the claims that are being made. The media acts as if the claims have been proven – especially in how they headline the story.
My second pet peeve is what I call an emphasis on a one dimensional world. This is usually reflected in simple statements like: A causes B. The world is much more complex than that. Those who investigate airline accidents have been telling us for some time that there is seldom just one cause for each accident. Rather there are a number of causes. We need to carry that knowledge over to our statistical analysis and reporting.
KF: Which source(s) do you turn to for reliable data analysis?
I can’t say that I have any favorite source for data analysis. If forced to name one, I would say that I tend to like the work of the Pew Research Center (link) Their surveys seem to be well designed, the questions they ask well thought out, and the analysis something I can trust.
I like the data that is available from the Federal Government. But the government agencies rightly avoid most detailed data analysis in an effort to remain nonpartisan.
KF: Thank you so much for your time. We're lucky that you continue to blog in your retirement.
You can view the earlier installments of the interview series here.