The management at New York Times was smart enough to create a data science team but apparently the newsroom has not yet been exposed to STAT 101.
Today, NYT printed this headline: "Top C.E.O. Pay Fell -- Yes, Fell -- in 2015" (link to article). I immediately thought, not so fast, are they talking about average pay or median pay.
Anyone who has taken STAT 101 knows that when analyzing data with extreme values (CEO pay would clearly fall in this bucket), one should use median values, not averages. Since this data came from a survey of 200 top-paid CEOs, the median would be the CEO whose pay is ranked right in the middle of the pack.
A few paragraphs in, my suspicion was confirmed:
After years of steady increases, the average compensation among the top executives in 2015 was down 15 percent from the 2014 figure of $22.6 million, according to the Equilar 200 Highest-Paid C.E.O. Rankings, conducted for The New York Times.
Are they using average values because the median values were unavailable? This seems strange since the Equilar study should have all 200 values.
Clicking on the link and looking at the very first bullet point under "Key Trends and Takeaways" yields the following:
Median pay for Equilar 200 CEOs was $16.6 million in fiscal year 2015. The CEOs on this list saw a 5% pay increase at the median.
Instead of a 15% drop in pay, the Equilar analyst concluded that there was a 5% increase in pay. We also learned that the year before, the median pay jumped by 21%.
Not only did NYT got the statistics wrong but it loudly screamed the wrong conclusion in the headline.
We demand better.
PS. The median of $16.6 million means that the 100 CEOs at the bottom half of this list earned between $12.2 million (the minimum to get on the list) and $16.6 million, a range of $4.4 million. If one takes the $4.4 million the other direction and look at the range $16.6 to $21.0 million, that includes 54 CEOs. Take another $4.4 million up to $25.4 million, and there are only 25 CEOs above this number. This is how you can see the pay data is heavily skewed.