« June 2016 | Main | August 2016 »

What if the RNC assigned seating randomly

The punditry has spoken: the most important data question at the Republican Convention is where different states are located. Here is the FiveThirtyEight take on the matter:


They crunched some numbers and argue that Trump's margin of victory in the state primaries is the best indicator of how close to the front that state's delegation is situated.

Others have put this type of information on a map:


The scatter plot with the added "trendline" is often misleading. Your eyes are drawn to the line, and distracted from the points that are far away from the line. In fact, the R-squared of the regression line is only about 20%. This is quite obvious from the distribution of green shades in the map below.


So, I wanted to investigate the question of how robust this regression line is. The way statisticians address this question is as follows: imagine that the seating has been assigned completely at random - how likely would the actual seating plan have arisen from random assignment?

Take the seating assignments from the scatter plot. Then randomly shuffle the assignment to create simulated random seating plans. We keep the same slots, for example, four states were given #1 positions in the actual arrangement. In every simulation, four states got #1 positions - it's just that which four states were decided by flipping coins.

I did one hundred simulated seating plans at a time. For each plan, I created the scatter plot of seating position versus Trump margin (mirror image of  the FiveThirtyEight chart), and fitted a regression line. The following shows the slopes of the first 200 simulations:


The more negative the slope, the more power Trump margin has in explaining the seating arrangement.

Notice that even though all these plans are created at random, the magnitude of the slopes range widely. In fact, there is one randomly created plan that sits right below the actual RNC plan shown in red. So, it is possible--but very unlikely--that the RNC plan is randomly drawn up.

Another view of this phenomenon is the histogram of the slopes:


This again shows that the actual seating plan is very unlikely to be produced by a random number generator. (I plotted 500 simulations here.)

In statistics, we measure rarity by "standard errors". The actual plan is almost but not quite three standard errors away from the average random plan. A rule of thumb is that 3 standard errors or more is rare. (This corresponds to over 99% confidence.)


PS. Does anyone have the data corresponding to the original scatter plot? There are other things I want to do with the data but I'd need to find (a) the seating position by state and (b) the primary results nicely set in a spreadsheet.

Confusion is not limited to complex dataviz

This chart looks simple and harmless but I find it disarming.


I usually love the cheeky titles in the Economist but this title is very destructive to the data visualization. The chart has nothing to do with credit scores. In fact, credit scoring is associated with consumers while countries have credit ratings.

Also, I am not a fan of the Economist way of labeling negative axes. The negative sign situated between 0 and 1 looks like a stray hyphen that the editor missed.

A line chart would have brought out the pattern more sharply:


The pairing of columns in the original chart signals that readers should compare GDP growth to population growth. A good point, since GDP scales with population.

Controlling for population size can be accomplished by the per-capita GDP growth rate.


The last three years are clearly different. By this metric, different in a good way.

This chart creates a problem for the journalist. The article is about the deal to "save" Puerto Rico which some has criticized as colonial. Presumably, the territory has been in dire straits. There are plenty of metrics to illustrate this point but GDP growth is not one of them.