« Two new reviews of Numbersense | Main | An offer for New York Times Business reporters »


Feed You can follow this conversation by subscribing to the comment feed for this post.

Michael L


I'm a social scientist not a statistician. My quantitative training has been mainly in econometrics not statistics, although, of course, there is a lot of overlap between the two. Econometricians, at least the ones I've read, seem to be familiar with the distinction you're making. They teach things such as instrumental variables, differences-in differences, regression discontinuity designs, etc. in an effort to make the "all else equal" phrase less "pretend" and more "design" (approximately so, that is). So I think the issue in econometrics/economics might not be so much about mis-use of the phrase. It might be more about thinking that the combination of economic theory and tools that have been developed to address omitted variable bias as well as other things that threaten causal interpretations of findings on observational data discussed in econometrics approximate random experiments more than they actually do. Engineer/Computer scientist Judea Pearl has written a lot about these issues. I wonder if you're familiar with his work and, if so, what your take on it is.


I thought that with regression, the more appropriate phrase is "allowing the other input variables to change" rather than "all else equal." There is commonly some covariance among input variables in observational studies, so when one variable changes, you'd naturally expect to observe changes in others.

Michael L


As I see it, and again this is probably due to my training in econometrics, the place to start is with what Pearl, Goldberger, and others call a structural equation. Many people call what I'm referring to as a structural equation a regression model but I think the distinction in terminology may matter. The structural equation is the algebraic version of what an analyst thinks the relationship is between an outcome variable and a set of causal variables. I say algebraic version because such relationships can also be depicted using directed graphs. The way we were taught is that structural equations encode assumptions about causal relationships so by definition their parameters have "all else equal" interpretations. Least squares regression is one way to estimate the parameters of structural equations, given certain assumptions and some data. And, given other assumptions and some data, hypothesis tests and confidence intervals can be conducted to see if the causal assumptions encoded in structural equations are supported by the data. I think what Kaiser is saying is that a randomized experiment is a way to "test" for this support by design. Econometricians have developed or refined methods they think can get at causal effects when such experiments aren't feasible even to address the issue of covariance among input variables you raised. I think the question raised by Kaiser's post is whether, in the absence of randomized experiments, the "all else equal" phrase be anything other than "pretend mode." Many Econometricians and others think the answer, at least in principle, is "yes." I believe some statisticians may think so too.


Michael/Dave, thanks for the discussion. You pretty much captured my point which is that there are many people who use or interpret these regression/structural equation models and think all is well since they "controlled" for everything else, which is often described as all else equal but in reality, I think statisticans, econometricians, others like Pearl all agree that the causality is assumed. Of the approaches to modeling causality, I like the propensity scoring framework of Rubin. I have to spend more time with Pearl's model to form an opinion - in general, though, I am a bit uncomfortable with those models that work under a lot of conditional independence assumptions; I just don't understand why we should believe them.

Michael L


Propensity scoring is based on assumptions that can be questioned too. Propensity scoring uses observed variables to model selection to treatment so it seems based on the assumption that only observed variables affect such selection. But factors like motivation, so called "grit," and others we often don't have data on can affect treatment selection as well. So whether we're talking about conditional independence assumptions or "only observed variables affect treatment selection assumptions" we're still talking about assumptions which may or may nor be plausible in any particular case.


Michael: I don't mean to say the other methods are wrong - I'm just more comfortable with one set of assumptions rather than the other set, at least until I change my mind. The post is primarily aimed at those who think that because they ran a regression, they have "controlled" for other factors, forgetting about the assumptions made in the process.

Verify your Comment

Previewing your Comment

This is only a preview. Your comment has not yet been posted.

Your comment could not be posted. Error type:
Your comment has been posted. Post another comment

The letters and numbers you entered did not match the image. Please try again.

As a final step before posting your comment, enter the letters and numbers you see in the image below. This prevents automated programs from posting comments.

Having trouble reading this image? View an alternate.


Post a comment

Your Information

(Name is required. Email address will not be displayed with the comment.)

Business analytics and data visualization expert. Author and Speaker. Currently at Columbia. See my full bio.

Next Events

Jan: 27 Judge for Business Analysis Presentations, Pace University

Feb: 7 Finding and Telling Stories Using Data Visualization, Atlanta, GA

Feb: 15 Make Your Data Speak, Austin, TX

Feb: 28 Discover What is Hiding in Your Data, Copenhagen, Denmark

Apr: 1 New York Public Library Career Talk, New York, NY

Past Events

See here

Future Courses (New York)

Spring: Statistical Reasoning & Numbersense, rSQUAREedge (4 weeks)

Summer: Applied Analytics Frameworks & Methods, Columbia (6 weeks)

Junk Charts Blog

Link to junkcharts

Graphics design by Amanda Lee


  • only in Big Data