Neither the forest nor the trees

On the NYT's twitter feed, they featured an article titled "These Seven Tech Stocks are Driving the Market". The first sentence of the article reads: "The S&P 500 is at an all-time high, and investors have just a handful of stocks to thank for it."

Without having seen any data, I'd surmise from that line that (a) the S&P 500 index has gone up recently, and (b) most if not all of the gain in the index can be attributed to gains in the tech stocks mentioned in the headline. (For purists, a handful is five, not seven.)

The chart accompanying the tweet is a treemap:

Nyt_magnificentseven

The treemap is possibly the most overhyped chart type of the modern era. Its use here is tangential to the story of surging market value. That's because the treemap presents a snapshot of the composition of the index, but contains nothing about the trend (change over time) of the average index value or of its components.

***

Even in representing composition, the treemap is inferior to, gasp, a pie chart. Of course, we can only use a pie chart for small numbers of components. The following illustration takes the data from the NYT chart on the Magnificent Seven tech stocks, and compares a treemap versus a pie chart side by side:

Junkcharts_redo_nyt_magnificent7

The reason why the treemap is worse is that both the width and the height of the boxes are changing while only the radius (or angle) of the pie slices is varying. (Not saying use a pie chart, just saying the treemap is worse.)

There is a reason why the designer appended data labels to each of the seven boxes. The effect of not having those labels is readily felt when our eyes reach the next set of stocks – which carry company names but not their market values. What is the market value of Berkshire Hathaway?

Even more so, what proportion of the total is the market value of Berkshire Hathaway? Indeed, if the designer did not write down 29%, it would take a bit of work to figure out the aggregate value of yellow boxes relative to the entire box!

This design sucessfully draws our attention to the structural importance of various components of the whole. There are three layers - the yellow boxes (Magnificent Seven), the gray boxes with company names, and the other gray boxes. I also like how they positioned the text on the right column.

***

Going inside the NYT article itself, we find two line charts that convey the story as told.

Here's the first one:

Nyt_magnificent7_linechart1

They are comparing the most recent stock prices with those from October 12 2022, which is identified as the previous "low". (I'm actually confused by how the most recent "low" is defined, but that's a different subject.)

This chart carries a lot of good information, even though it does not plot "all the data", as in each of the 500 S&P components individually. Over the period under analysis, the average index value has gone up about 35% while the Magnificent Seven's value have skyrocketed by 65% in aggregate. The latter accounted for 30% of the total value at the most recent time point.

If we set the S&P 500 index value in 2024 as 100, then the M7 value in 2024 is 30. After unwinding the 65% growth, the M7 value in October 2022 was 18; the S&P 500 in October 2022 was 74. Thus, the weight of M7 was 24% (18/74) in October 2022, compared to 30% now. Consequently, the weight of the other 473 stocks declined from 76% to 70%.

This isn't even the full story because most of the action within the M7 is in Nvidia, the stock most tightly associated with the current AI hype, as shown in the other line chart.

Nyt_magnificent7_linechart2

Nvidia's value jumped by 430% in that time window. From the treemap, the total current value of M7 is $12.3 b while Nvidia's value is $1.4 b, thus Nvidia is 11.4% of M7 currently. Since M7 is 29% of the total S&P 500, Nvidia is 11.4%*29% = 3% of the S&P. Thus, in 2024, against 100 for the S&P, Nvidia's share is 3. After unwinding the 430% growth, Nvidia's share in October 2022 was 0.6, about 0.8% of 74. Its weight tripled during this period of time.


A nice plot of densities, but what's behind the colors?

I came across this chart by Planet Anomaly that compares air quality across the world's cities (link). The chart is in long form. The top part looks like this:

Visualcapitalist_airqualityinches_top

The bottom part looks like this:

Visualcapitalist_airqualityinches_bottom

You can go to the Visual Capitalist website to see the entire chart.

***

Plots of densities are relatively rare. The metric for air quality is micrograms of fine particulate matter (PM) per cubic meter, so showing densities is natural.

It's pretty clear the cities with the worst air quality at the bottom has a lot more PM in the air than the cleanest cities shown at the top.

This density chart plays looser with the data than our canonical chart types. The perceived densities of dots inside the squares do not represent the actual concentrations of PM. It's certainly not true that in New Delhi, the air is packed tightly with PM.

Further, a random number generator is required to scatter the red dots inside the circle. Thus, different software or designers will make the same chart look a bit different - the densities will be the same but the locations of the dots will not be.

I don't have a problem with this. Do you?

***

Another notable feature of this chart is the double encoding. The same metric is not just presented as densities; it is also encoded in a color scale.

Visualcapitalist_airqualityinches_color_scale

I don't think this adds much.

Both color and density are hard for humans to perceive precisely so adding color does not convey  precision to readers.

The color scale is gradated, so it effectively divided the cities into seven groups. But I don't attach particular significance to the classification. If that is important, it would be clearer to put boxes around the groups of plots. So I don't think the color scale convey clustering to readers effectively.

There is one important grouping which is defined by WHO's safe limit of 5 pg/cubic meter. A few cities pass this test while almost every other place fails. But the design pays no attention to this test, as it uses the same hue on both sides, and even the same tint changes on either side of the limit.

***

Another notable project that shows densities as red dots is this emotional chart by Mona Chalabi about measles, which I wrote about in 2019.

Monachalabi_measles

 


Messing with expectations

A co-worker sent me to the following map, found in Forbes:

Forbes_gastaxmap

It shows the amount of state tax surcharge per gallon of gas in the U.S. And it's got one of the most common issues found in choropleth maps - the color scheme runs opposite to reader expectations.

Typically, if we see a red-green color scale, we would expect red to represent large numbers and green, small numbers. This map reverses the typical setup: California, the state with the heftiest gas tax, is shown green.

I know, I know - if we apply the typical color scheme, California would bleed red, and it's a blue state, damn it.

The solution is to avoid the red color. Just don't use red or blue.

Junkcharts_redo_forbes_gastaxmap_green

There is no need to use two colors either.

***

A few minor fixes. Given that all dollar amounts on the map are shown to two decimal places, the legend labels should also be shown to 2 decimal places, and with dollar signs.

Forbes_gastaxmap_legend

The subtitle should read "Dollars per gallon" instead of "Cents per gallon". Alternatively, keep "Cents per gallon" but convert all data labels into cents.

Some of the states are missing data labels.

***

I recast this as a small-multiples by categorizing states into four subgroups.

Junkcharts_redo_forbes_gastaxmap_split

With this change, one can almost justify using maps because there is sort of a spatial pattern.

 

 


To a new year of pleasant surprises

Happy new year!

This year promises to be the year of AI. Already last year, we pretty much couldn't lift an eyebrow without someone making an AI claim. This year will be even noisier. Visual Capitalist acknowledged this by making the noisiest map of 2023:

Visualcapitalist_01_Generative_AI_World_map sm

I kept thinking they have a geography teacher on the team, who really, really wants to give us a lesson of where each country is on the world map.

All our attention is drawn to the guiding lines and the random scatter of numbers. We have to squint to find the country names. All this noise drowns out the attempt to make sense of the data, namely, the inset of the top 10 countries in the lower left corner, and the classification of countries into five colored groups.

A small dose of editing helps. Remove most data labels except for the countries for which they have a story. Provide a data table below for those who want details.

***

In the Methodology section, the data analysts (possibly from a third party called ElectronicsHub) indicated that they used Google search volume of "over 90 of the most popular generative AI tools", calculating the "overall volume across all tools per 100k population". Then came a baffling line: "all search volumes were scaled up according to the search engine market share in each country, using figures from statscounter.com." (Note: in the following, I'm calling the data "AI-related search" for simplicity even though their measurement is restricted to the terms described above.)

It took me a while to comprehend what they could have meant by that line. I believe this is what that sentence means: Google is not the only search engine out there so by only researching Google search volume, they undercount the true search volume. How did they deal with the missing data problem? They "scaled up" so if Google is 80% of the search volume in a country, then they divide the Google volume by 80% to "scale up" to 100%.

Whenever we use heuristics like this, we should investigate its foundations. What is the implicit assumption behind this scaling-up procedure? It is that all search engines are effectively the same. The users of non-Google search engines behave exactly as the Google search engine users. If the analysts somehow could get their hands on the data of other search engines, they would discover that the proportion of search volume that is AI-related is effectively the same as seen on Google.

This is one of those convenient, and obviously wrong assumptions – if true, the market would have no need for more than one search engine. Each search engine's audience is just a random sample from the population of all users.

Let's make up some numbers. Let's say Google has 80% share of search volume in Country A, and AI-related search 10% of the overall Google search volume. The remaining search engines have 20% share. Scaling up here means taking the 8% of Google AI-related search volume, divide by 80%, which yields 10%. Since Google owns 8% of the 10%, the other search engines see 2% of overall search volume attributed to AI searches in Country A. Thus, the proportion of AI-related searches on those other search engines is 2%/20% = 10%.

Now, in certain countries, Google is not quite as dominant. Let's say Google only has 20% share of Country B's search volume. AI-related search on Google is 2%, which is 10% of its total. Using the same scaling-up procedure, the analysts have effectively assumed that the proportion of AI-related search volume in the dominant search engines in Country B to be also 10%.

I'm using the above calculations to illustrate a shortcoming of this heuristic. Using this procedure inflates the search volume in countries in which Google is less dominant because the inflation factor is the reciprocal of Google's market share. The less dominant Google is, the larger the inflation factor.

What's also true? The less dominant Google is, the smaller proportion of the total data the analysts are able to see, the lower the quality of the available information. So the heuristic is the most influential where it has the greatest uncertainty.

***

Hope your new year is full of uncertainty, and your heuristics shall lead you to pleasant surprises.

If you like the blog's content, please spread the word. I'm looking forward to sharing more content as the world of data continues to evolve at an amazing pace.

Disclosure: This blog post is not written by AI.


Several tips for visualizing matrices

Continuing my review of charts that were spammed to my inbox, today I look at the following visualization of a matrix of numbers:

Masterworks_chart9

The matrix shows pairwise correlations between the returns of 16 investment asset classes. Correlation is a number between -1 and 1. It is a symmetric scale around 0. It embeds two dimensions: the magnitude of the correlation, and its direction (positive or negative).

The correlation matrix is a special type of matrix: a bit easier to deal with as the data already come “standardized”. As with the other charts in this series, there is a good number of errors in the chart's execution.

I’ll leave the details maybe for a future post. Just check two key properties of a correlation matrix: the diagonal consisting of self-correlations should contain all 1s; and the matrix should be symmetric across that diagonal.

***

For this post, I want to cover nuances of visualizing matrices. The chart designer knows exactly what the message of the chart is - that the asset class called "art" is attractive because it has little correlation with other popular asset classes. Regardless of the chart's errors, it’s hard for the reader to find the message in the matrix shown above.

That's because the specific data carrying the message sit in the bottom row (and the rightmost column). The cells in this row (and column) has a light purple color, which has been co-opted by the even lighter gray color used for the diagonal cells. These diagonal cells pop out of the chart despite being the least informative (they have the same values for all correlation matrices!)

***

Several tactics can be deployed to push the message to the fore.

First, let's bring the key data to the prime location on the chart - this is the top row and left column (for cultures which read top to bottom, left to right).

Redo_masterwork9_matrix_arttop

For all the drafts in this post, I have dropped the text descriptions of the asset classes, and replaced them with numbers so that it's easier to follow the changes. (For those who're paying attention, I also edited the data to make the matrix symmetric.)

Second, let's look at the color choice. Here, the designer made a wise choice of restricting the number of color levels to three (dark, medium and light). I retained that decision in the above revision - actually, I used four colors but there are no values in one of the four sections, therefore, effectively, only three colors appear. But let's look at what happens when the number of color levels is increased.

Redo_masterwork9_matrix_colors

The more levels of color, the more strain it puts on our processing... with little reward.

Third, and most importantly, the order of the categories affects perception majorly. I have no idea what the designer used as the sorting criterion. In step one of the fix, I moved the art category to the front but left all the other categories in the original order.

The next chart has the asset classes organized from lowest to highest average correlation. Conveniently, using this sorting metric leaves the art category in its prime spot.

Redo_masterwork9_matrix_orderbyavg

Notice that the appearance has completely changed. The new version brings out clusters in the data much more effectively. Most of the assets in the bottom of the chart have high correlation with each other.

Finally, because the correlation matrix is symmetric across the diagonal of self-correlations, the two halves are mirror images and thus redundant. The following removes one of the mirrored halves, and also removes the diagonal, leading to a much cleaner look.

Redo_masterwork9_matrix_orderbyavg_tri

Next time you visualize a matrix, think about how you sort the rows/columns, how you choose the color scale, and whether to plot the mirrored image and the diagonal.

 

 

 


Parsons Student Projects

I had the pleasure of attending the final presentations of this year's graduates from Parsons's MS in Data Visualization program. You can see the projects here.

***

A few of the projects caught my eye.

A project called "Authentic Food in NYC" explores where to find "authentic" cuisine in New York restaurants. The project is notable for plowing through millions of Yelp reviews, and organizing the information within. Reviews mentioning "authentic" or "original" were extracted.

During the live presentation, the student clicked on Authentic Chinese, and the name that popped up was Nom Wah Tea Parlor, which serves dim sum in Chinatown that often has lines out the door.

Shuyaoxiao_authenticfood_parsons

Curiously, the ranking is created from raw counts of authentic reviews, which favors restaurants with more reviews, such as restaurants that have been operating for a longer time. It's unclear what rule is used to transfer authenticity from reviews to restaurants: does a single review mentioning "authentic" qualify a restaurant as "authentic", or some proportion of reviews?

Later, we see a visualization of the key words found inside "authentic" reviews for each cuisine. Below are words for Chinese and Italian cuisines:

Shuyaoxiao_authenticcuisines_parsons_words

These are word clouds with a twist. Instead of encoding the word counts in the font sizes, she places each word inside a bubble, and uses bubble sizes to indicate relative frequency.

Curiously, almost all the words displayed come from menu items. There isn't any subjective words to be found. Algorithms that extract keywords frequently fail in the sense that they surface the most obvious, uninteresting facts. Take the word cloud for Taiwanese restaurants as an example:

Shuyaoxiao_authenticcuisines_parsons_taiwan

The overwhelming keyword found among reviews of Taiwanese restaurants is... "taiwanese". The next most important word is "taiwan". Among the remaining words, "886" is the name of a specific restaurant, "bento" is usually associated with Japanese cuisine, and everything else is a menu item.

Getting this right is time-consuming, and understandably not a requirement for a typical data visualization course.

The most interesting insight is found in this data table.

Shuyaoxiao_authenticcuisines_ratios

It appears that few reviewers care about authenticity when they go to French, Italian, and Japanese restaurants but the people who dine at various Asian restaurants, German restaurants, and Eastern European restaurants want "authentic" food. The student concludes: "since most Yelp reviewers are Americans, their pursuit of authenticity creates its own trap: Food authenticity becomes an americanized view of what non-American food is."

This hits home hard because I know what authentic dim sum is, and Nom Wah Tea Parlor it ain't. Let me check out what Yelpers are saying about Nom Wah:

  1. Everything was so authentic and delicious - and cheap!!!
  2. Your best bet is to go around the corner and find something more authentic.
  3. Their dumplings are amazing everything is very authentic and tasty!
  4. The food was delicious and so authentic, and the staff were helpful and efficient.
  5. Overall, this place has good authentic dim sum but it could be better.
  6. Not an authentic experience at all.
  7. this dim sum establishment is totally authentic
  8. The onions, bean sprouts and scallion did taste very authentic and appreciated that.
  9. I would skip this and try another spot less hyped and more authentic.
  10. I would have to take my parents here the next time I visit NYC because this is authentic dim sum.

These are the most recent ten reviews containing the word "authentic". Seven out of ten really do mean authentic, the other three are false friends. Text mining is tough business! The student removed "not authentic" which helps. As seen from above, "more authentic" may be negative, and there may be words between "not" and "authentic". Also, think "not inauthentic", "people say it's authentic, and it's not", etc.

One thing I learned from this project is that "authentic" may be a synonym for "I like it" when these diners enjoy the food at an ethnic restaurant. I'm most curious about what inauthentic onions, bean sprouts and scallion taste like.

I love the concept and execution of this project. Nice job!

***

Another project I like is about tourism in Venezuela. The back story is significant. Since a dictatorship took over the country, the government stopped reporting tourism statistics. It's known that tourism collapsed, and that it may be gradually coming back in recent years.

This student does not have access to ready-made datasets. But she imaginatively found data to pursue this story. Specifically, she mentioned grabbing flight schedules into the country from the outside.

The flow chart is a great way to explore this data:

Ibonnet_parsons_dataviz_flightcities

A map gives a different perspective:

Ibonnet_parsons_dataviz_flightmap

I'm glad to hear the student recite some of the limitations of the data. It's easy to look at these visuals and assume that the data are entirely reliable. They aren't. We don't know that what proportion of the people traveling on those flights are tourists, how full those planes are, or the nationalities of those on board. The fact that a flight originated from Panama does not mean that everyone on board is Panamanian.

***

The third project is interesting in its uniqueness. This student wants to highlight the effect of lead in paint on children's health. She used the weight of lead marbles to symbolize the impact of lead paint. She made a dress with two big pockets to hold these marbles.

Scherer_parsons_dataviz_leaddress sm

It's not your standard visualization. One can quibble that dividing the marbles into two pockets doesn't serve a visualziation purpose, and so on. But at the end, it's a memorable performance.


Finding the story in complex datasets

In CT Mirror's feature about Connecticut, which I wrote about in the previous post, there is one graphic that did not rise to the same level as the others.

Ctmirror_highschools

This section deals with graduation rates of the state's high school districts. The above chart focuses on exactly five districts. The line charts are organized in a stack. No year labels are provided. The time window is 11 years from 2010 to 2021. The column of numbers show the difference in graduation rates over the entire time window.

The five lines look basically the same, if we ignore what looks to be noisy year-to-year fluctuations. This is due to the weird aspect ratio imposed by stacking.

Why are those five districts chosen? Upon investigation, we learn that these are the five districts with the biggest improvement in graduation rates during the 11-year time window.

The same five schools also had some of the lowest graduation rates at the start of the analysis window (2010). This must be so because if a school graduated 90% of its class in 2010, it would be mathematically impossible for it to attain a 35% percent point improvement! This is a dissatisfactory feature of the dataviz.

***

In preparing an alternative version, I start by imagining how readers might want to utilize a visualization of this dataset. I assume that the readers may have certain school(s) they are particularly invested in, and want to see its/their graduation performance over these 11 years.

How does having the entire dataset help? For one thing, it provides context. What kind of context is relevant? As discussed above, it's futile to compare a school at the top of the ranking to one that is near the bottom. So I created groups of schools. Each school is compared to other schools that had comparable graduation rates at the start of the analysis period.

Amistad School District, which takes pole position in the original dataviz, graduated only 58% of its pupils in 2010 but vastly improved its graduation rate by 35% over the decade. In the chart below (left panel), I plotted all of the schools that had graduation rates between 50 and 74% in 2010. The chart shows that while Amistad is a standout, almost all schools in this group experienced steady improvements. (Whether this phenomenon represents true improvement, or just grade inflation, we can't tell from this dataset alone.)

Redo_junkcharts_ctmirrorhighschoolsgraduation_1

The right panel shows the group of schools with the next higher level of graduation rates in 2010. This group of schools too increased their graduation rates almost always. The rate of improvement in this group is lower than in the previous group of schools.

The next set of charts show school districts that already achieved excellent graduation rates (over 85%) by 2010. The most interesting group of schools consists of those with 85-89% rates in 2010. Their performance in 2021 is the most unpredictable of all the school groups. The majority of districts did even better while others regressed.

Redo_junkcharts_ctmirrorhighschoolsgraduation_2

Overall, there is less variability than I'd expect in the top two school groups. They generally appeared to have been able to raise or maintain their already-high graduation rates. (Note that the scale of each chart is different, and many of the lines in the second set of charts are moving within a few percentages.)

One more note about the charts: The trend lines are "smoothed" to focus on the trends rather than the year to year variability. Because of smoothing, there is some awkward-looking imprecision e.g. the end-to-end differences read from the curves versus the observed differences in the data. These discrepancies can easily be fixed if these charts were to be published.


Accounting app advertises that it doesn't understand fractions

I captured the following image of an ad at the airport at the wrong moment, so you can only see the dataviz but not the text that came with it. The dataviz is animated with blue section circling around and then coming to a halt.

Tripactions_partial sm

The text read something like "75% of the people who saw this ad subsequently purchased something". I think the advertiser was TripActions. It is an app for accounting. Too bad their numbers people don't know 75% is three-quarters and their donut chart showed a number larger than 75%.

***

Browsing around the TripActions website, I also found this pie chart.

Tripactions_Most_Popular_Recurring_Pandemic_Era_Monthly_Expenses_-_TripActions_jmogxx

The radius of successive sectors is decreasing as the size of the proportions shrinks. As a result, the same two sectors labeled 12% at the bottom have differently-sized areas. The only way this dataviz can work is if the reader decodes the angles sustained at the center, and ignores the areas of the sectors. However, the visual cues all point readers to the areas rather than the angles.

In this sense, the weakness of this pie chart is the same as that of the racetrack chart, discussed recently here.

In addition, the color dimension is not used wisely. Color can be used to group the expenses into categories, or it can be used to group them by proportion of total (20%+, 10-19%, 5-9%, 1-4%, <1%).

 

 


Two uses of bumps charts

Long-time reader Antonio R. submitted the following chart, which illustrates analysis from a preprint on the effect of Covid-19 on life expectancy in the U.S. (link)

Aburto_covid_lifeexpectancy

Aburto_lifeexpectancyFor this post, I want to discuss the bumps chart on the lower right corner. Bumps charts are great at showing change over time. In this case, the authors are comparing two periods "2010-2019" and "2019-2020". By glancing at the chart, one quickly divides the causes of death into three groups: (a) COVID-19 and CVD, which experienced a big decline (b) respiratory, accidents, others ("rest"), and despair, which experienced increases, and (c) cancer and infectious, which remained the same.

And yet, something doesn't seem right.

What isn't clear is the measured quantity. The chart title says "months gained or lost" but it takes a moment to realize the plotted data are not number of months but ranks of the effects of the causes of deaths on life expectancy.

Observe that the distance between each cause of death is the same. Look at the first rising line (respiratory): the actual values went from 0.8 months down to 0.2.

***

While the canonical bumps chart plots ranks, the same chart form can be used to show numeric data. I prefer to use the same term for both charts. In recent years, the bumps chart showing numeric data has been called "slopegraph".

Here is a side-by-side comparison of the two charts:

Redo_aburto_covidlifeexpectancy

The one on the left is the same as the original. The one on the right plots the number of months increased or decreased.

The choice of chart form paints very different pictures. There are four blue lines on the left, indicating a relative increase in life expectancy - these causes of death contributed more to life expectancy between the two periods. Three of the four are red lines on the right chart. Cancer was shown as a flat line on the left - because it was the highest ranked item in both periods. The right chart shows that the numeric value for cancer suffered one of the largest drops.

The left chart exaggerates small numeric changes while it condenses large numeric changes.

 

 


Superb tile map offering multiple avenues for exploration

Here's a beauty by WSJ Graphics:

Wsj_powerproduction

The article is here.

This data graphic illustrates the power of the visual medium. The underlying dataset is complex: power production by type of source by state by month by year. That's more than 90,000 numbers. They all reside on this graphic.

Readers amazingly make sense of all these numbers without much effort.

It starts with the summary chart on top.

Wsj_powerproduction_us_summary

The designer made decisions. The data are presented in relative terms, as proportion of total power production. Only the first and last years are labeled, thus drawing our attention to the long-term trend. The order of the color blocks is carefully selected so that the cleaner sources are listed at the top and the dirtier sources at the bottom. The order of the legend labels mirrors the color blocks in the area chart.

It takes only a few seconds to learn that U.S. power production has largely shifted away from coal with most of it substituted by natural gas. Other than wind, the green sources of power have not gained much ground during these years - in a relative sense.

This summary chart serves as a reading guide for the rest of the chart, which is a tile map of all fifty states. Embedded in the tile map is a small-multiples arrangement.

***

The map offers multiple avenues for exploration.

Some readers may look at specific states. For example, California.

Wsj_powerproduction_california

Currently, about half of the power production in California come from natural gas. Notably, there is no coal at all in any of these years. In addition to wind, solar energy has also gained. All of these insights come without the need for any labels or gridlines!

Wsj_powerproduction_westernstatesBrowsing around California, readers find different patterns in other Western states like Oregon and Washington.

Hydroelectric energy is the dominant source in those two states, with wind gradually taking share.

At this point, readers realize that the summary chart up top hides remarkable state-level variations.

***

There are other paths through the map.

Some readers may scan the whole map, seeking patterns that pop out.

One such pattern is the cluster of states that use coal. In most of these states, the proportion of coal has declined.

Yet another path exists for those interested in specific sources of power.

For example, the trend in nuclear power usage is easily followed by tracking the purple. South Carolina, Illinois and New Hampshire are three states that rely on nuclear for more than half of its power.

Wsj_powerproduction_vermontI wonder what happened in Vermont about 8 years ago.

The chart says they renounced nuclear energy. Here is some history. This one-time event caused a disruption in the time series, unique on the entire map.

***

This work is wonderful. Enjoy it!