« September 2005 | Main | November 2005 »

The Richest 0.1% II

I return to the NYT article on income disparity, with a note and a discussion.  First, note that "the IRS data tend to understate incomes for those at the very top because of different rules for reporting wages and capital gains".

Second, they cited Bruce Bartlett, a "fiscally conservative Republican tax expert", who found it "remarkable" that

"just 129,000 tax filers pay more than 15 percent of all federal income taxes".

Mr. Bartlett may have been musing at this chart:Redotaxes3

The level of inequality appears even greater than the income disparity.  Nevertheless, his observation commits the same fallacy as when left-leaning tax experts lament that the top 20% earned nearly 60% of all income (as we saw previously from the Lorenz curve).

It is the fallacy of the hidden strawman: both statements invite us to make comparisons to the diagonal line of the Lorenz curve, which is a strawman because that line of so-called "equality" is an irrelevance.  I have debunked the argument that a society in which everyone earns equal pay (e.g. the top 20% get 20% of all income) can function.  The idea that everyone pays equal taxes (e.g. the top 0.1% pay 0.1% of all taxes) is similarly absurd.

The key mischief of both these arguments is using a misleading scale.  A statement about the top 1% of taxpayer implies a ranking from highest to lowest of each taxpayer, followed by the removal of information about their differential incomes (or taxes paid), retaining only the ranks for analysis.  In effect, this change of scale (from numeric to ordinal) causes every consecutive pair of taxpayers to have the same difference in income (or taxes paid), which is obviously false.

Redotaxes4In this next chart, I added the blue curve, which plots the cumulative % of taxes paid against the cumulative % of income earned.  The purported tax disparity is now revealed to be much smaller than asserted: for example, the people who earned the top 20% of income paid a bit more than 30% of taxes. 

Observe also that the blue curve is much less steep at the origin than the orange curve (which uses the cumulative % of population on the x-axis), revealing that Mr. Bartlett's statement is grossly exaggerating.

The Richest 0.1% I

Income distribution is often presented in "Lorenz curves".  The following is an attempt to defy conventional wisdom:


This chart fails my self-sufficiency test...


A few elements of this graph are confusing.  The same set of shading is used to classify two different variables.  The annotation of "Top" and "Bottom" appears arbitrary.  The rightmost column, representing top 0.1% of taxpayers, has the same width as the leftmost column, representing the bottom 20%.

Redotaxdist1Why not stick with a Lorenz curve?  This presentation is versatile.  The diagonal is a line of "equality": for example, the (20%,20%) point on this line indicates that the top 20% of the population (ranked by decreasing income) took exactly 20% of the income growth in 2003.

A lot of information can be read off this chart: the top 0.1% earners took about 25% of all income increase; the top 1% took 40%; the top 20% took almost 80%.  In general, a curve that bends away from the diagonal (like this one) depicts severe inequality.

However, in many practical situations as is here, comparison with the diagonal is meaningless.  The "ideal" society would probably not be one in which annual income growth is equally distributed among all taxpayers.  (I'll leave the rationalization to economists and sociologists.)
More helpful is a graph that shows relative changes in inequality.  If I add a second curve (orange) showing the distribution of income (as opposed to income growth), then we see a trend of increasing inequality!  The large bend from the diagonal indicates that income distribution is far from "equal"; what's more, the distribution of incremental income is even more skewed.

For more analysis: The Richest 0.1% II

Reference: "At the Very Top, a Surge in Income in '03", New York Times, Oct 5 2005.

The college premium: trends and changes

Those of us who "invested" a small fortune to get a college education would be comforted to see this pair of charts: 02uchigraphic

These charts do a good job in delivering the key messages.  Some of you will approve of the "Percentage employed" chart which abides by the start-at-zero rule; in fact, the graph looks exactly the same even if we extended the time-line back to 1995.  For the "Earnings gap" chart, I prefer to explain the data as an index with the high-school grad earnings = 100.

WagepremThe above line charts are great for revealing the trends in time series data.  They, however, hide the period-to-period changesThe graphic on the right indicates that there were only 2 down years for college grads between 1980 and 2000.  It also shows that while 2001 was the peak year in terms of wage differential, college grads made the most gains during the early 1980s (after some big losses in the late 1970s).

EnrollpremBecause of the start-at-zero rule, it is impossible to see period-to-period changes in the employment rate chart.  In creating the second chart to our right, I first created an employment rate index similar to the wage premium index above, using the high-school grad employment rate = 100; then, I plotted how the index changed over time.  Here, we are surprised to see that there have been the same number of up and down quarters in the past 10 years.

Reference: "College Still Counts, Though Not as Much", New York Times, Oct 2 2005.

The self-sufficiency test

One test I use to judge the worthiness of data graphics is "self-sufficiency": can the graphical elements stand on their own feet?  If one removes the numbers from the graphic, can one still understand the key messages?

Graphical elements such as bars, lines, dots and pie slices encode data into their lengths, widths, heights, sizes, angles and so on.  Oftentimes, the actual numbers are printed beside these elements.  The numbers may  serve one of two purposes: sometimes, to satisfy those readers who would want to know the precise numbers, not just rough visual estimates; sometimes, to cover up flaws in the design because the graphical elements cannot be interpreted without printing the numbers.  The "self-sufficiency" test detects this last instance when the graphic designer has failed.  In these situations, the data charts are superfluous; the graphical elements did nothing more than regurgitate the underlying tables of numbers.

Below are examples of charts that fail the self-sufficiency test:

(1) The ARV therapy chart

Without the numbers, the Sub-sahara bar becomes rather meaningless: we can't tell how far "off the chart" it is.  The message concerning percent of needs met disappears when the right column of numbers is removed.

(2) The Costco chart

When the numbers are removed, there is no scale and thus no ability to gauge the sizes of the dots.  The problem of visually estimating dot sizes was addressed here.


The following charts pass the self-sufficiency test:

(1) The poll result bar chart
Here, with numbers removed, the reader can still read off the result with no problems.  Of course, it won't satisfy the readers who demand to know precise percentages but then I'd refer such people to an appendix of data tables.

(2) The poll result percentile matrix

The other format used by Clarin in Argentina also preserves its value when the numbers are removed.  In this case, patient readers can even read off the precise percentages.


(3) The Bundeslag Bumps chart


All the key messages, concerning changes in party leadership and relative sizes of the parties, are intact.  The missing scale is easily remedied by putting one on the side but note that there is no need to print all the data.