A short while ago, I introduced Larry Cahoon's blog, GoodStatsBadStats. He started the blog almost two years ago after retiring from the US Census Bureau (site offline due to government shutdown), where he spent 29 years working on the statistical design of most of the household surveys conducted by the Bureau. Cahoon received his PhD from Carnegie Mellon University.
Cahoon spent the final seven years of his career working on the 2010 decennial census. He was involved in almost all areas of census design and operations. His work included focusing on the many issues of both over coverage and under coverage in the census, the effectiveness of the publicity campaign, and the use of the American Community Survey as a replacement for the long form of the census.
Larry was very gracious and offered great insights and long answers. I have divided the interview into two parts. Part 2 will appear tomorrow.
KF: What are the key skills of statistical reasoning?
There are three things on the top of my list when I think about statistical reasoning skills. A solid knowledge of statistical methods and principles is a necessary starting point. I extend this to include the ability to think in terms of probability. This is a way of thinking or a way of viewing the world at the most basic level.
Equally important is a solid foundation in logical reasoning. The third piece is recognizing and dealing with the assumptions that must be made in any statistical analysis. One needs to be able to look at the problem from a multitude of perspectives and ask if the assumptions being made are good ones. Can I make a logical argument why they are true and why a contrary set of assumptions is false? The ability to consider alternate assumptions and confounding factors that may not be obvious is very important to any statistical analysis.
KF: Was graduate school useful training for your career?
In my own background, my first year of graduate school just about drove me crazy as I did nothing but statistics. Most of it was extremely theoretical work. I have always said that the masters’ degree they gave me after that year was in some sense worthless as I got very little exposure to the practical side of the statistical profession.
But what I did learn from the full immersion was to think in terms of probability. I came to see statistics not just as theory but as a way of thinking. Thinking in probability terms became second nature. It helped that I was trained in a Bayesian environment long before it reached the levels of use that we now see.
I saw statistical testing as a sometimes useful but limited tool. It became clear to me that much more important questions are what is the best estimate and what is the decision rule that would come out of the analysis. It was just as important to know how that data is going to be used as it is to know how the analysis is to be done.
KF: How did your work at the Census Bureau influence you?
To do good statistics, knowledge of the subject matter it is being applied to is critical. I also learned early on that issues of variance and bias in any estimate are actually more important than the estimate itself. If I don’t know things like the variability inherent in an estimate and the bias issues in that estimate, then I really don’t know very much.
A favorite saying among the statisticians at the Census Bureau where I worked is that the biases are almost always greater than the sampling error. So my first goal is always to understand the data source, the data quality and what it actually measures.
But, I also still have to make decisions based on the data I have. The real question then becomes given the estimate on hand, what I know about the variance of that estimate, and the biases in that estimate, what decision am I going to make.
KF: I want to reiterate this point to my readers who are not statisticians. In data analysis, we are using available data (the sample) to make a general statement, say use the response of subjects enrolled in a clinical trial to describe the effect of a new drug on all potential patients. Imagine you are trying to hit the bull's eye. Shotgun #1 produces a wide scatter around the target while Shotgun #2 produces a narrow scatter but the average shot lands wide of the target. We say that #1 has high variance and low bias while #2 has high bias but low variance. Both types of errors contribute to the shot being off the bull's eye.
Because the Census Bureau typically uses large samples, the sampling error (variance) is very manageable. What is hard to control is bias, meaning the entire sample is not representative of the population under study. This is Shotgun #2.
As I matured in my career, I learned that diversity matters. The greater the exposure and the more diverse the exposure to real world statistics, the better the practitioner would become. So while I worked for many years in the area of survey design, when I went to the annual statistical meetings, I always made the effort to maximize my exposure to other areas of statistics. I would always return home with a few textbooks on areas of statistics outside of what I was actually working on at the job.
A curious nature with a desire to continue learning is essential. Today a good training route is to read as many statistical blogs as you can find the time for. Especially important is to seek out and read the work of those who disagree with me. This forces me to think much more critically of the work I am doing.
Part 2 will follow tomorrow.
Read all the previous interviews here.