In the sister blog, I featured a data graphic showing the difference in pay levels between white male workers in the U.S. and women workers of various races. That post discusses purely visual matters, in effect accepting the analysis as given. In this post, I take a deeper look at the underlying data analysis.

Let us review the analysis strategy behind the chart, and then discuss why this simple strategy is not particularly insightful.

The starting point of the analyst is income data collected by the Census Bureau. Even here, income is measured in several ways. It appears that using personal income (as opposed to household or family income) is the most appropriate here. There is an additional complication of how to handle "missing values", which may arise in this context because someone is not employed and thus earn zero income. When one says "median income", does one include or exclude the zero earners? What about those who only worked part of the year?

After those questions are addressed, one can work with median incomes, computed at the right level of aggregation. The report linked to by *Business Insider* only contains aggregation by race, and aggregation by gender, each calculated separately. What is needed is a cross-tabulation. It is not possible to obtain the median income of Asian women from the median income of Asians and the median income of women - unless the analyst makes unwarranted assumptions.

So the analyst learns that white men made $46,000 in 2017 while Asian women made $27,000 and Hispanic women made $21,000. This data can be plotted directly, or after computing the race-gender discount (off the white male wage level).

***

The analyst wants to **make this data come alive** by using a different unit, the number of days worked.

One way to achieve this is by converting the annual salary to daily salary, then computing how many days the median Asian woman must work in order to earn $46,000, the median for the white man. This is roughly 611 days, which suggests that the Asian woman must work 246 extra days.

In the above description, **I have seamlessly papered over several annoying details**. I have assumed - without checking with reality - that everyone works 365 days a year. In fact, no one worked 365 days a year. And even if I obtained the correct value of the average number of days worked, call it X, it will also be the case that X will vary between race and gender. Thus, I made a further implicit assumption that such variance is not large enough to bother about.

I justify this lack of care because I rounded the median salaries to the nearest thousand. The questions I raised above ought to concern those analysts who insist on printing estimates in decimals. Is it possible to attain that level of precision while making simplifying assumptions?

***

Also notice that the analysis strategy is **counterfactual** in nature - it requires conjuring up a hypothetical scenario. It's a comparison of a white man who works exactly one year, and a woman (of any race) who works till she earns $46,000 or whatever is the current median wage for white men.

**The notion of "extra days" is an invention** since there are only 365 days in a year, and the women can never catch up unless the white guys stop working.

***

When comparing white men and Asian women, both gender and race are shifted. The *Business Insider* story leads readers to attribute the wage gap to primarily gender but unless we see the median salaries for Asian men, Hispanic men and black men, one can't be sure.

Most likely, there is a gender "effect" as well as a race "effect". The gender effect may even be differently sized for each race. This is known as an "interaction" effect.

***

Finally, there are even more factors to be considered. It is well known that at least some of the wage gap is explained by the difference in the **mix of jobs and industries** that men and women tend to be employed in. So one can't conclude discrimination without further investigation.

Unequal pay for equal work is discrimination but unequal pay for unequal work need not be.

***

Now, check out my comments about the calendar visualization of the wage gaps by *Business Insider*.

I wonder if another way to make this data come alive would be to base it on an 8-hour workday and show that black women would have to work a 13.1 hour day to earn the same pay, etc. That way we get around the "extra days" problem.

Posted by: Howie | 08/28/2019 at 11:56 AM

Howie: Nice comment. It works so long as the additional hours do not push the total beyond 24 hours. This nicely points out why selecting the right scale is important! (Some people dismiss re-scaling as pointless.)

Posted by: Kaiser | 08/28/2019 at 12:18 PM