Fantastic visual, but the Google data need some pre-processing
Light entertainment: I am a pen and I object

Common charting issues related to connecting lines, labels, sequencing

The following chart about "ranges and trends for digital marketing salaries" has some problems that appear in a great number of charts.

Marketingsherpa-chartofweek-062915-salaries

The head tilt required to read the job titles.

The order of the job titles is baffling. It's neither alphabetical nor by salary.

The visual form suggests that we could see trends in salaries reading left-right, but the only information about trends is the year on year salary change, printed on top of the chart.

Some readers will violently object to the connecting lines between job titles, which are discrete categories. In this case, I also agree. I am a fan of so-called profile charts in which we do connect discrete categories with connecting lines - but those charts work because we are comparing the "profiles" of one group versus another group. Here, there is only one group.

The N=3,567 is weird. It doesn't say anything about the reliability of the estimate for say Chief Marketing Officer.

***

A dot plot can be used for this dataset. Like this:

Redo_jc_digitalsalaries

The range of salaries is not a great metric as the endpoints could be outliers.

Also, the variability of salaries is affected by two factors: the variability between companies, and sampling variability (which depends on the sample size for each job title). A wide range here could mean that different companies pay different salaries for the same job title, or that very few survey responders held that job title.

 

 

Comments

Feed You can follow this conversation by subscribing to the comment feed for this post.

conchis

"A wide range here could mean that ... very few survey responders held that job title."

I don't think this makes sense: more responses will (weakly) increase the range; they can't possibly reduce it.

jlbriggs

@conchis - I don't think that's the point. The point that I take from it is that a wide range could be wide because there are a large number of diverse data points, or it could be wide simply because there are two very different data points - without sample size on the category level, we don't know.

We want to know.

mbab

@conchis @jlbriggs
I believe it's possible that there have been confusion between the range (the obvious minimum and maximum of each category) and a measurement related to standard deviation.
If few people answered, the range can be really wide, and so would the standard deviation be. But with more answers the standard deviation (or any measurement linked to it) can decrease, while as you well said, it cannot reduce the range... (It should really be obvious that adding values between the minimum and the maximum doesn't change said minimum and maximum...)

The comments to this entry are closed.