## Horrid stuff 2

##### Feb 13, 2007

Jon P took my comment on negative correlation and explored it furtherGiven the large ranges of values cited in the original Economist chart, Jon concluded that there wasn't enough evidence to make a judgement.

I agree to a large extent.  Apart from the high variability of individual measurements, we also face the tiny sample of 5 cities.
In his chart, he made an implicit assumption that the correlation of two factors is related to the product of the ranges (variability) of each factor by plotting the rectangles.

A different way of looking at it is to plot only the mid-range values (i.e. ignoring the within-city variability).  The graph on the left hand side shows very little pattern.

Resorting to the formula, I found that the correlation = -0.03.  So barely detectable negative correlation.  Lets visualize this.

On the right graph, I added the mean lines for both variables.  This divides the graph into four quadrants; dots that fall into the lower right and upper left quadrants make the correlation value negative.  There were three of those versus two in the positive quadrants; hence, the tiny negative correlation.

You can follow this conversation by subscribing to the comment feed for this post.

-0.03? So R-squared is much less than 1%. Pretty tenuous. Also, you have three points in favor of a downward slope and two in favor of upwards, but one of those downward points would be touching the quadrant divider if the marker were a little larger. You could almost draw a circle connecting all the points.

We really need more cities, and more paired measurements, to say anything meaningful about the relationship.

I took the analysis sideways, and replotted my chart based on the assumption that there was some correlation (essentially plotting the diagonals of my boxes, rather than the centroids Kaiser plotted):

http://peltiertech.com/Excel/Commentary/HorridStuff.html

Why not? It's snowing and sleeting here (central Massachusetts), school is canceled and nobody's going anywhere, and it takes only two seconds to copy a sheet and delete a few rows.

Food for thought.

My monkey brain insists on seeing a pattern to the rise and fall, either two straight lines or a parabola. But with only five points of dubious accuracy, there's probably nothing really there.

I think we can all agree on the conclusion that five samples are not enough to make reliable inference.

My graph was intended to illustrate a neat way to visualize the concept of correlation. Nothing more than that!

I'm not having a go at anybody. I'm sorry if you feel like you're being piled on.

Derek, don't worry. I wrote that comment just to make sure that the point of the posting was not lost.

The comments to this entry are closed.