While making slides for a presentation, I wanted to scan a chart from the paper copy of the Wall Street Journal. This was a chart from several years ago, and I can't find it online. I went to the NYU Library, expecting this to be an easy task. Find the bound copies of the paper, or the microfiche, right?
Nope. Apparently, we are in the digital age now. This means that the Library only subscribes to the digital edition of Wall Street Journal. Paper copies are not kept around.
This practice leads to a number of problems - that should trouble the N=All crowd.
It is often the case that the print edition and the online edition use different charts for the same article. The most recent example is the chart on American's use of time in WSJ (link). Future historians may find the digital version of the chart in their libraries, and assume that that version was seen by most readers. In reality, only a proportion of readers saw that version and it is quite hard to know what proportion did.
But the problem is worse!
Because of personalization technology (and also because of A/B testing), there are many digital editions. It is not easy to understand what most people are reading because they are probably reading different headlines or front-page articles.
I keep wondering how we keep track of what version "most" people are reading. I know for a fact that A/B testing data do not get saved for archiving purpose.
Another related practice for news websites is to hook us with some shocking headline and then when you click through, you find possibly a different headline and possibly an article that has little to nothing to do with the shocker.
This means the same person can see multiple headlines for the same online article. Unless the database also records the structure of the web, this nuance will be missing from the weblogs. Most databases do not treat online articles differently from print articles, meaning that every article has one title, one byline, etc. If multiple headlines are being used, which one would be recorded?
It goes back to: you may think you have all of the data. You don't.