« Two new reviews of Numbersense | Main | An offer for New York Times Business reporters »

Comments

Feed You can follow this conversation by subscribing to the comment feed for this post.

Michael L

Kaiser,

I'm a social scientist not a statistician. My quantitative training has been mainly in econometrics not statistics, although, of course, there is a lot of overlap between the two. Econometricians, at least the ones I've read, seem to be familiar with the distinction you're making. They teach things such as instrumental variables, differences-in differences, regression discontinuity designs, etc. in an effort to make the "all else equal" phrase less "pretend" and more "design" (approximately so, that is). So I think the issue in econometrics/economics might not be so much about mis-use of the phrase. It might be more about thinking that the combination of economic theory and tools that have been developed to address omitted variable bias as well as other things that threaten causal interpretations of findings on observational data discussed in econometrics approximate random experiments more than they actually do. Engineer/Computer scientist Judea Pearl has written a lot about these issues. I wonder if you're familiar with his work and, if so, what your take on it is.

Dave

I thought that with regression, the more appropriate phrase is "allowing the other input variables to change" rather than "all else equal." There is commonly some covariance among input variables in observational studies, so when one variable changes, you'd naturally expect to observe changes in others.

Michael L

Dave,

As I see it, and again this is probably due to my training in econometrics, the place to start is with what Pearl, Goldberger, and others call a structural equation. Many people call what I'm referring to as a structural equation a regression model but I think the distinction in terminology may matter. The structural equation is the algebraic version of what an analyst thinks the relationship is between an outcome variable and a set of causal variables. I say algebraic version because such relationships can also be depicted using directed graphs. The way we were taught is that structural equations encode assumptions about causal relationships so by definition their parameters have "all else equal" interpretations. Least squares regression is one way to estimate the parameters of structural equations, given certain assumptions and some data. And, given other assumptions and some data, hypothesis tests and confidence intervals can be conducted to see if the causal assumptions encoded in structural equations are supported by the data. I think what Kaiser is saying is that a randomized experiment is a way to "test" for this support by design. Econometricians have developed or refined methods they think can get at causal effects when such experiments aren't feasible even to address the issue of covariance among input variables you raised. I think the question raised by Kaiser's post is whether, in the absence of randomized experiments, the "all else equal" phrase be anything other than "pretend mode." Many Econometricians and others think the answer, at least in principle, is "yes." I believe some statisticians may think so too.

Kaiser

Michael/Dave, thanks for the discussion. You pretty much captured my point which is that there are many people who use or interpret these regression/structural equation models and think all is well since they "controlled" for everything else, which is often described as all else equal but in reality, I think statisticans, econometricians, others like Pearl all agree that the causality is assumed. Of the approaches to modeling causality, I like the propensity scoring framework of Rubin. I have to spend more time with Pearl's model to form an opinion - in general, though, I am a bit uncomfortable with those models that work under a lot of conditional independence assumptions; I just don't understand why we should believe them.

Michael L

Kaiser,

Propensity scoring is based on assumptions that can be questioned too. Propensity scoring uses observed variables to model selection to treatment so it seems based on the assumption that only observed variables affect such selection. But factors like motivation, so called "grit," and others we often don't have data on can affect treatment selection as well. So whether we're talking about conditional independence assumptions or "only observed variables affect treatment selection assumptions" we're still talking about assumptions which may or may nor be plausible in any particular case.

Kaiser

Michael: I don't mean to say the other methods are wrong - I'm just more comfortable with one set of assumptions rather than the other set, at least until I change my mind. The post is primarily aimed at those who think that because they ran a regression, they have "controlled" for other factors, forgetting about the assumptions made in the process.

The comments to this entry are closed.

Get new posts by email:
Kaiser Fung. Business analytics and data visualization expert. Author and Speaker.
Visit my website. Follow my Twitter. See my articles at Daily Beast, 538, HBR, Wired.

See my Youtube and Flickr.

Search3

  • only in Big Data
Numbers Rule Your World:
Amazon - Barnes&Noble

Numbersense:
Amazon - Barnes&Noble

Junk Charts Blog



Link to junkcharts

Graphics design by Amanda Lee

Next Events

Jan: 10 NYPL Data Science Careers Talk, New York, NY

Past Events

Aug: 15 NYPL Analytics Resume Review Workshop, New York, NY

Apr: 2 Data Visualization Seminar, Pasadena, CA

Mar: 30 ASA DataFest, New York, NY

See more here

Principal Analytics Prep



Link to Principal Analytics Prep

Community