Journalism suffers from an archiving challenge in the digital age, which I wrote about here.
Even worse is the fate of data graphics. This has always been an issue, as digital archives of newspapers do not save any of the graphics. (Try going to the New York Times archive to see for yourself).
The new wave of graphing technology is making this problem worse!
The new technology embeds charting instructions within the HTML code itself, which means that the chart is assembled "on the fly". Think of each chart existing as a collection of pieces: legend title, legend text, axes, etc. This presents many challenges:
- All pieces in one integral image (jpg, png, etc.) is relatively easy to save. How does one save a chart that exists in a dozen pieces?
- One can never be sure what the reader is seeing. Did all the pieces render properly? Does the chart look differently depending on which browser, which OS, which device is rendering it?
- In some applications, the browser might make a call to a remote database to fetch the data for rendering the chart. This means it's possible for someone looking at a chart this morning to see something different from someone else looking at the "same" chart in the afternoon. Since those are two different readers, no one will even notice the difference. Does saving the graphic now involve saving snapshots of the database too?
***
Let me illustrate the above with a recent example from the New York Times--the graphic about jealousy in dogs I discussed last week on Junk Charts (link).
Here is a screenshot of the chart as it appeared to me at the time I saved it:
The reason I did a screenshot is that when I right clicked to save the chart, it gave me the following:
The saved image is missing the text labels, legend titles, etc., all these being rendered as separate instructions on the HTML code itself.
Here is the output of the code inspector. The saved image corresponds to one line of the code:
The legend text of the red box is itself a separate line of code:
If you keep going, you will learn that the second legend text is a separate line of code, so is the third legend text. The axis labels on the right are rendered in four separate pieces.
***
With so much work going into these data graphics, I really hope our industry will rally and figure out a way to archive the work.
PS. Would have loved to have been a fly on the wall at this meeting: http://t.co/FGUNYmLx07
Scott Klein and others attempted an ambitious definition of what needs to be saved. Their work is mostly to do with complex apps, which of course are even hairier than saving the simple static chart I discussed above.
There is at least one good reason to have the text be text, rather than part of the image: Google can find it. Also, it renders better on high-res displays.
I don't disagree with the issue, though this particular example is fairly simple. In many cases, when the data is rendered in the browser, there isn't even an image to save. The solution is to take screenshots (when you're doing it manually) or use a browser testing tool that can interpret the JavaScript (e.g., Selenium). Yet another option when something is rendered as SVG (i.e., using D3) is to use a tool like the clever SVGCrowBar to save the resulting SVG.
Posted by: Robert Kosara | 07/28/2014 at 11:27 AM
Since the NYT’s charts are made with with D3, which was created by Mike Bostock (@mbostock), who himself works at the paper and oversees the interactive work, maybe you could try voicing your concerns to him directly?
Although he’s a busy guy, he’s taken kindly to the feedback I’ve given him, the times I have talked with him on Twitter.
Posted by: Pessimism | 07/28/2014 at 03:52 PM
I agree with Robert---there's a good reason to have text separate from images. Accessibility for screen readers is another point to consider (though I don't know how accessible the charts NYT uses are). How many people are going through and saving the images anyway?
Posted by: Akiva | 07/28/2014 at 11:34 PM
Akiva/Robert: I think you are missing my point. Newspapers, printed matter, and now blogs form a record of human history. Yes, few people need to save these images but few people need to save the articles either. My point is that we save all the text but we take a nonchalant attitude towards saving the images and the data graphics. Screenshots are sufficient for my blogging use but unless there is a non-manual, systematic way of saving them and linking them to the articles, they are not really an archiving tool.
Pessimism: I don't think it's Mike's problem to solve although if he can develop tools to help the archivists. This is more the archivist's or historian's job.
Posted by: junkcharts | 07/29/2014 at 12:10 AM