The start-at-zero rule

Co-variates and scatter plots

Mahalanobis had a nice post on what he calls the "omitted variable bias".  His main point is to use regressions with care.  Although he didn't say it outright, the post also suggests using scatter plots with care!

Scatter plots let us visualize the relationship between two selected variables.  But other (omitted) variables may be equally or more important.  For my baseball team efficiency plots, I overlaid league, location and division; for his wage-quit rate plots, he overlaid age.  He then fitted a linear regression model, leading to one line for each age group.  He used blue and red to contrast the two models.  (It would have been clearer if he had also used blue and red dots to contrast the data used to fit each line.)

This topic is sometimes called "co-variates".  These are variables that are not under study but are correlated with the outcome variable and thus need to be accounted for in the analysis.  Scatter plots help determine which co-variates affect the outcome.

To further complicate matters, co-variates may also be correlated with explanatory variables.  Such "interaction" between variables is evident when the two regression lines are not parallel.


Feed You can follow this conversation by subscribing to the comment feed for this post.

The comments to this entry are closed.