Feb 13, 2007

Horrid stuff 2

Jp_horridstuff Jon P took my comment on negative correlation and explored it furtherGiven the large ranges of values cited in the original Economist chart, Jon concluded that there wasn't enough evidence to make a judgement.

I agree to a large extent.  Apart from the high variability of individual measurements, we also face the tiny sample of 5 cities. 
In his chart, he made an implicit assumption that the correlation of two factors is related to the product of the ranges (variability) of each factor by plotting the rectangles.

A different way of looking at it is to plot only the mid-range values (i.e. ignoring the within-city variability).  The graph on the left hand side shows very little pattern.

Resorting to the formula, I found that the correlation = -0.03.  So barely detectable negative correlation.  Lets visualize this. 

Redo_pollutant2 On the right graph, I added the mean lines for both variables.  This divides the graph into four quadrants; dots that fall into the lower right and upper left quadrants make the correlation value negative.  There were three of those versus two in the positive quadrants; hence, the tiny negative correlation. 



Jan 24, 2007

Convenience charting

Statisticians have long riled against "convenience sampling", that is, the practice of selecting samples based on what's easily available, not at random.  Say picking your friends.

Wpost_childmortality Dustin J sent in this example of what can only be called "convenience charting".  Dustin said he had no clue what this chart is saying, and I am not surprised. 

The chart plots a statistical object known as the "survival function".  It is likely that "survival analysis" was done, after which the chart creator  picked up the resulting statistical object and dumped it onto this "convenience chart".

If we take the top line on the "child survival" graph, it shows the probability of one child surviving up to a certain age, if the child belonged to a family with 1-3 kids.  The chance is about 92.5% that the child will survive through age 2, and 88% that the child will survive through age 18.  The difference between those percentages is due to the chance that the child may die between ages 2 and 18.

A slight transformation of the data will make this point much clearer.  What is the probability of a child dying by a certain age?  Using the example, a child has 12% chance to die by age 18, and 7.5% chance of dying between ages 0-2.

Redochildmortality The junkart chart depicts this probability.  (I reverse-engineered the data which explains why the distances between the line segments look strange.)

What this chart doesn't address is how we are to interpret the probability of "a child dying" in a family with more than one child.  Is it a random child dying?  At least one child dying?  Exactly one child dying (the other X-1 surviving)? 

The original chart also committed a number of standard errors.  The child survival function represent probabilities, not percentages.  The third category should be 8-11 kids, not 7-11.  If we are picky, then we would also like to see "confidence intervals" because there must have been many fewer families in the 12+ sample than the 1-3 sample.  In the second chart (which I don't have space to discuss), some data labels are missing, which indicates a presumption that all readers have seen the first chart.

Reference:  "Child, Parents Drive Each Other to Early Graves", Washington Post, Jan 14, 2007. 

Nov 15, 2006

Poll numbers

The Political Arithmetik blog has great graphics pertaining to, surprise surprise, political matters.  I really like the ones portraying Presidential approval ratings. 

Bushfullterm20061022This chart plots all the different polls (grey dots) at once; the blue line is the estimated approval rate over time while the scatter of grey dots provides an estimate of the reliability of the blue line. 

Different polls are different random samples of the population.  Random sampling is not fool-proof; any one sample has a chance, albeit small, to poorly represent the population.  That's why the dots add greatly to the chart.

ApprovalatmidtermDerek pointed me to a different chart, a simple dot plot that shows Bush's 2006 mid-term approval rate was the 2nd worst since 1946.  To paraphrase him, this is a scenario in which the chart does not add much because the underlying data is a simple ranked list.

He also suggested differentiating the 2nd term presidents from the one-termers. 

Shown below is another view of the data,  emphasizing the time dimension.  The linked dots represent two-term presidents.  The gridlines delineate the minimum, average and maximum approval ratings over time.  Another line shows Bush's 2006 approval rating, which is the 2nd worst since 1946.   Redo_approvalrate

Lots of other great charts at this blog.  Check them out.

 

Oct 23, 2006

Tracking tigers

Nyt_tigers_1


This chart is fantastic work from Amanda Cox and Joe Ward at NYT.  It tracked the baseball Tigers' season, showing how they peaked in early August (with a 10 games lead) and limped into the playoffs, five days after losing the division title.  That slide, beginning in mid-September, set them back 4.5 months.  (It would help to label the 5 games behind the leader line.)

The shading to show which team(s) were chasing them is a stroke of genius.

Further, the dot plots on the right very cleanly brings out their advantage in pitching.  The hitting numbers are mixed.

The following chart is for the Cardinals:

Nyt_cardinals

 

Reference: "World Series Preview", New York Times, Oct 21 2006.

Oct 11, 2006

Arming the competition

At the TCS blog, Tim Worstall attacked a chart comparing global levels of income inequity, originally published by the Economic Policy Institute.  His post is here.  Tim claimed that this chart proved precisely the opposite of what the EPI intended it to show, that is, that the chart showed that "the poor in America have exactly the same standard of living as the poor in Finland (and Sweden)", two countries which he derided as "redistributionist paradises".  From this, Tim concluded that the U.S. is doing enough for the poor.

Tcs_incomeStephen C., who sent in this chart, was very confused by the length of the bars: left of the divider, the larger the income index, the shorter the bar; right of the divider, the larger the income index, the longer the bar.

For the EPI, this is a case of arming the competition.  Echoing Robert's comment from yesterday, this is one chart that opines but should have murmurred. 

The chart is a very convoluted way to study the idea of income inequality.  The first bar states that the 90th percentile income in Finland is 1.11 times the median U.S. income, after adjusting for PPP.  Notice the simultaneous change in percentile and country, which complicates our understanding of the difference.

The median income is perhaps the simplest (not most informative) measure of income equality.  In the EPI chart, the edges of each bar describe the 10th and 90th percentile income in a country.  We only know 80% of the population lie within each bar but nothing about how they are distributed.

Redo_income_1In the revised chart, I plotted another popular measure of income equality, the ratio of 90th percentile to 10th percentile (since the data is readily available from the EPI chart).  It's clear that inequality is highest in the English-speaking Western world where the top earners get 4-6 times more than the bottom earners.

This income ratio is computed for each country, and can be used to compare across countries without resorting to another index. 

Reference: "America: More Like Sweden Than You Think", TCS Daily, Aug 26 2006.

Oct 06, 2006

For love of Color

Derek C. pointed us to this piece of chartjunk on Wikipedia.  This chart compares the mass of solar system objects, relative to the Earth's mass.Wiki_solar

Derek's comment:

The bars are inappropriate, as their length is proportional to the
logarithm of the ratio of the masses of the object and the Earth. Also
the multiple colours are distracting.

I'm also mystified by the first bar called "Solar System".  It seems to convey the idea that the Solar System is much larger than the Earth;  combined with the second bar ("Sun"), it tells us that every object but the Sun pales into insignificance.  If this is true, then the Solar System needs to be labelled differently as it is not a "solar system object".

Derek sent in a much improved chart:

Derekc_solar

His version is much cleaner.  The axis labels, properly oriented, are much easier to read.  The use of color is admirably restrained: I suspect that he is as baffled as I about the asterisks (now blue dots) in the original chart. I'd retain the vertical line through the Earth (relative mass = 1) to help anchor the chart.

But a job well done!  He should send it in to the powers to be at Wikipedia.


Mentions


  • My Amazon.com Wish List

  • Yahoo! Picks

Search Junk Charts


  • Custom Search

Residues

July 2008

Sun Mon Tue Wed Thu Fri Sat
    1 2 3 4 5
6 7 8 9 10 11 12
13 14 15 16 17 18 19
20 21 22 23 24 25 26
27 28 29 30 31