A reader didn't like this graphic in the Wall Street Journal:
One could turn every panel into a bar chart but unfortunately, the situation does not improve much. Some charts just can't be fixed by altering the visual design.
The chart is frustrating to read: typically, colors are used to signify objects that should be compared. Focus on the brown wedges for a moment: Basic EDA 46%, Data cleaning 31%, Machine learning 27%, etc. Those are proportions of respondents who said they spent 1 to 3 hours a day on the respective tasks. That is one weird way of describing time use. The people who spent 1 to 3 hours a day on EDA do not necessarily overlap with those who spent 1 to 3 hours a day on data cleaning. In addition, there is no summation formula that lets us know how any individual, or the average data scientist, spends his or her time during a typical day.
But none of this is the graphics designer's fault.
The trouble with the chart is in the D corner of the Trifecta checkup. The survey question was poorly posed. The data came from a study by O'Reilly Media. They asked questions of this form:
How much time did you spend on basic exploratory data analysis on average?
A. Less than 1 hour a week
B. 1 to 4 hours a week
C. 1 to 3 hours a day
D. 4 or more hours a day
It is not obvious that those four levels are mutually exhaustive. In fact, they aren't. One hour a day for five working days is a total of 5 hours a week. Those who spent between 4 and 5 hours a week have nowhere to go.
Further, if one had access to individual responses, it's likely that many respondents either worked too many hours or too few hours.
The panels are separate questions which bear no relationship to each other, even though the tasks are clearly related by the fact that there are only so many working hours in a day.
To fix this chart, one must first fix the data. To fix the data, one must ask the right questions.