A short while ago, I introduced Larry Cahoon's blog, GoodStatsBadStats. He started the blog almost two years ago after retiring from the US Census Bureau (site offline due to government shutdown), where he spent 29 years working on the statistical design of most of the household surveys conducted by the Bureau. Cahoon received his PhD from Carnegie Mellon University.

Cahoon spent the final seven years of his career working on the 2010 decennial census. He was involved in
almost all areas of census design and operations. His work included
focusing on the many issues of both over coverage and under coverage in
the census, the effectiveness of the publicity campaign, and the use of
the American Community Survey as a replacement for the long form of the
census.

Larry was very gracious and offered great insights and long answers. I have divided the interview into two parts. Part 2 will appear tomorrow.

***

**KF: What are the key skills of statistical reasoning?**

**LC:**

There are three things on the top of my list when I think
about statistical reasoning skills. A solid knowledge of statistical methods
and principles is a necessary starting point. I extend this to include the
ability to think in terms of probability. This is a way of thinking or a way of viewing the world at the
most basic level.

Equally
important is a solid foundation in logical reasoning. The third piece is recognizing and
dealing with the assumptions that must be made in any statistical analysis. One
needs to be able to look at the problem from a multitude of perspectives and
ask if the assumptions being made are good ones. Can I make a logical argument why
they are true and why a contrary set of assumptions is false? The ability to consider alternate
assumptions and confounding factors that may not be obvious is very important
to any statistical analysis.

**KF: Was graduate school useful training for your career?**

**LC:**

In my own background, my first year of graduate school just
about drove me crazy as I did nothing but statistics. Most of it was extremely
theoretical work. I have always said that the masters’ degree they gave me
after that year was in some sense worthless as I got very little exposure to
the practical side of the statistical profession.

But what I did learn from the full immersion was to think in
terms of probability. I came to see statistics not just as theory but
as a way of thinking. Thinking in
probability terms became second nature. It helped that I was trained in a Bayesian environment long before it
reached the levels of use that we now see.

I saw statistical testing as a sometimes useful but limited
tool. It became clear to me that much more important questions are what is
the best estimate and what is the decision rule that would come out of the
analysis. It was just as important to know how that data is going to be used
as it is to know how the analysis is to be done.

**KF: How did your work at the Census Bureau influence you?**

**LC:**

To do good statistics, knowledge of the subject matter it is
being applied to is critical. I also learned early on that issues of variance and
bias in any estimate are actually more important than the estimate itself. If I don’t know things like the
variability inherent in an estimate and the bias issues in that estimate, then I
really don’t know very much.

A
favorite saying among the statisticians at the Census Bureau where I worked is
that the biases are almost always greater than the sampling error. So my first
goal is always to understand the data source, the data quality and what it
actually measures.

But, I also still have to make decisions based on the data I
have. The real question then becomes given the estimate on hand, what I know about
the variance of that estimate, and the biases in that estimate, what decision am
I going to make.

**KF: I want to reiterate this point to my readers who are not statisticians. In data analysis, we are using available data (the sample) to make a general statement, say use the response of subjects enrolled in a clinical trial to describe the effect of a new drug on all potential patients. Imagine you are trying to hit the bull's eye. Shotgun #1 produces a wide scatter around the target while Shotgun #2 produces a narrow scatter but the average shot lands wide of the target. We say that #1 has high variance and low bias while #2 has high bias but low variance. Both types of errors contribute to the shot being off the bull's eye.**

** Because the Census Bureau typically uses large samples, the sampling error (variance) is very manageable. What is hard to control is bias, meaning the entire sample is not representative of the population under study. This is Shotgun #2.**

**LC continues:**

As I matured in my career, I learned that diversity matters. The greater the exposure and the more
diverse the exposure to real world statistics, the better the practitioner would
become. So while I worked for many years in the area of survey design, when I
went to the annual statistical meetings, I always made the effort to maximize my
exposure to other areas of statistics. I would always return home with a few
textbooks on areas of statistics outside of what I was actually working on at
the job.

A curious nature with a desire to continue learning is essential.
Today a good training route is to read as many statistical blogs as you can
find the time for. Especially important is to seek out and read the work of
those who disagree with me. This forces me to think much more critically of the
work I am doing.

***

Part 2 will follow tomorrow.

Read all the previous interviews here.

## Recent Comments