## Horrid stuff 2

##### Feb 13, 2007

Jon P took my comment on negative correlation and explored it further. Given the large ranges of values cited in
the original Economist chart, Jon concluded that there wasn't enough
evidence to make a judgement.

I agree to a large extent. Apart from the high variability of individual measurements, we also face the tiny sample of 5 cities. In his chart, he made an implicit assumption that the correlation of two factors is related to the product of
the ranges (variability) of each factor by plotting the rectangles.

A different way of looking at it is to plot only the mid-range values (i.e. ignoring the within-city variability). The graph on the left hand side shows very little pattern.

Resorting to the formula, I found that the correlation = -0.03. So barely detectable negative correlation. Lets visualize this.

On the right graph, I added the mean lines for both variables. This divides the graph into four quadrants; dots that fall into the lower right and upper left quadrants make the correlation value negative. There were three of those versus two in the positive quadrants; hence, the tiny negative correlation.

-0.03? So R-squared is much less than 1%. Pretty tenuous. Also, you have three points in favor of a downward slope and two in favor of upwards, but one of those downward points would be touching the quadrant divider if the marker were a little larger. You could almost draw a circle connecting all the points.

We really need more cities, and more paired measurements, to say anything meaningful about the relationship.

Posted by: Jon Peltier | Feb 14, 2007 at 07:37 AM

I took the analysis sideways, and replotted my chart based on the assumption that there was some correlation (essentially plotting the diagonals of my boxes, rather than the centroids Kaiser plotted):

http://peltiertech.com/Excel/Commentary/HorridStuff.html

Why not? It's snowing and sleeting here (central Massachusetts), school is canceled and nobody's going anywhere, and it takes only two seconds to copy a sheet and delete a few rows.

Food for thought.

Posted by: Jon Peltier | Feb 14, 2007 at 08:08 AM

My monkey brain insists on seeing a pattern to the rise and fall, either two straight lines or a parabola. But with only five points of dubious accuracy, there's probably nothing really there.

Posted by: derek | Feb 14, 2007 at 09:19 AM

I think we can all agree on the conclusion that five samples are not enough to make reliable inference.

My graph was intended to illustrate a neat way to visualize the concept of correlation. Nothing more than that!

Posted by: Kaiser | Feb 15, 2007 at 12:33 AM

I'm not having a go at anybody. I'm sorry if you feel like you're being piled on.

Posted by: derek | Feb 15, 2007 at 03:12 AM

Derek, don't worry. I wrote that comment just to make sure that the point of the posting was not lost.

Posted by: Kaiser | Feb 15, 2007 at 10:02 PM