Long-time reader Georgette A. found this chart from a Linkedin post by David Curran:
She found it hard to understand. Me too.
It's one of those charts that require some time to digest. And when you figured it out, you don't get the satisfaction of time well spent.
***
If I have to write a reading guide for this chart, I'd start from the right edge. The dataset consists of the top 2000 English words, ranked by popularity. The right edge of the chart says that roughly two-thirds of these 2000 words are of Germanic origin, followed by 20% French origin, 10% Latin origin, and 3% "others".
Now, look at the middle of the chart, where the 1000 gridline lies. The analyst did the same analysis but using just the top 1000 words, instead of the top 2000 words. Not surprisingly, Germanic words predominate. In fact, Germanic words account for an even higher percentage of the total, roughly three-quarters. French words are at 16% (relative to 20%), and Latin at 7% (compared to 10%).
The trend is this: as we restrict the word list to fewer and more popular words, the more Germanic words dominate. Of the top 50 words, all but 1 is of Germanic origin. (You can't tell that directly from the chart but you can figure it out if you measure it and do some calculations.)
Said differently, there are some non-Germanic words in the English language but they tend not to be used particularly often.
As we move our eyes from left to right on this chart, we are analyzing more words but the newly added words are less popular than those included prior. The distribution of words by origin is cumulative.
The problem with this data visualization is that it doesn't "locate" where these non-Germanic words exist. It's focused on a cumulative metric so the reader has to figure out where the area has increased and where it has flat-lined. This task is quite challenging in an area chart.
***
The following chart showing the same information is more canonical in the scientific literature.
This chart also requires a reading guide for those uninitiated. (Therefore, I'm not saying it's better than the original.)
The chart shows how words of a specific origin accumulates over the top X most popular English words. Each line starts at 0% on the left and ends at 100% on the right.
Note that the "other" line hugs to the zero level until X = 400, which means that there are no words of "other" origin in the top 400 list. We can see that words of "other" origin are mostly found between top 700-1000 and top 1700-2000, where the line is steepest. We can be even more precise: about 25% of these words are found in the top 700-1000 while 45% are found in the top 1700-2000.
In such a chart, the 45 degree line acts as a reference line. Any line that follows the 45 degree line indicates an even distribution: X% of the words of origin A are found in the top X% of the distribution. Origin A's words are not more or less popular than average anywhere in the distribution.
In this chart, nothing is on top of the 45 degree line. The Germanic line is everywhere above the 45 degree line. This means that on the left side, the line is steeper than 45 degrees while on the right side, its slope is less than 45 degrees. In other words, Germanic words are biased towards the left side, i.e. they are more likely to be popular words.
For example, amongst the top 400 (20%) of the word list, Germanic words accounted for 27%.
I can't imagine this chart is easy for anyone who hasn't seen it before; but if you are a scientist or economist, you might find this one easier to digest than the original.