The real climategate
Dec 02, 2009
Climategate is all the rage at the moment. What interests me about this episode is not the integrity of certain scientists, or science in general, nor the culture of academia, and certainly not the evidence of climate change. For me, the real climategate is the woeful state of statistical education. Let me explain.
Here is the infamous email: (via Nathan Silver, with my highlights)
From: Phil Jones
To: ray bradley ,mann@[snipped], mhughes@
[snipped]
Subject: Diagram for WMO Statement
Date: Tue, 16 Nov 1999 13:31:15 +0000
Cc: k.briffa@[snipped],t.osborn@[snipped]
Dear Ray, Mike and Malcolm,
Once Tim’s got a diagram here we’ll send that either later
today or first thing tomorrow. I’ve just completed Mike’s Nature
trick of adding in the real temps to each series for the last 20
years (ie from 1981 onwards) amd [sic] from1961 for Keith’s to
hide the decline. Mike’s series got the annual land and marine
values while the other two got April-Sept for NH land N of 20N.
The latter two are real for 1999, while the estimate for 1999
for NH combined is +0.44C wrt 61-90. The Global estimate for
1999 with data through Oct is +0.35C cf. 0.57 for 1998.
Thanks for the comments, Ray.
Cheers, Phil
What concerns me is Phil Jones' describing what he did as a "trick" to "hide the decline". He apparently thought that he was doing something shameful. But when is it shameful to extend the plot of a time series so as to display the long-term trend, and not be misled for short-term fluctuations? This is providing statistical context to the data being examined. Lots of people are condemning this as a willful act to mislead the public but if they have some statistical literacy, they will understand that finding the appropriate time scale to look at the data is one of the most important tasks of analyzing time series data. It's a problem when even prominent scientists do not comprehend why they should be doing this.
I have always wondered why in climatology as well as in economics, we rarely see decomposed time-series plots (at least not in the public's eye).
On the right, I found on-line a plot of a decomposition of beer sales that separates out seasonality, trend and other parts of a time series. The original data is shown up top. In practice, newspapers and blogs give us such plots all the time when they should show us the third plot down (the trend with the seasonal factor removed), unless the story is about seasonality.
Note to self: should include basic time-series decomposition in the intro stats syllabus; much too important a topic to leave to a second course.
What is convenient is you actually had the "beer sales".
Wouldn't you be slightly frustrated if budweiser refused a FOIA to get at those "beer sales?"
Just sayin'
Posted by: chad | Dec 02, 2009 at 09:04 AM
Tufte discusses briefly time-series decomposition in The Visual Display...
While most people can easily understand the need to separate trends from seasonal fluctuations they don't have the necessary statistical skills to use advanced time-series decomposition in their analysis. In this case the ratio-to-moving-average approach may be a good starting point.
Posted by: Jorge Camoes | Dec 02, 2009 at 10:08 AM
Time series decomposition is vital in all official statistics - the X11 procedure and all its descendants were actually developed at the US census bureau.
Bill Cleveland et al's STL procedure is available in R and everybody can use it quite easily to do such a decomposition as shown in the beer sales example.
Nonetheless, decompositional methods seem to be treated as an orphan by most primarily mathematically trained statisticians.
It is definitely true that teaching students about all the Box & Jenkins ARMA stuff should come AFTER an introduction to decomposition methods like STL.
Posted by: Martin | Dec 02, 2009 at 01:10 PM
"But when is it shameful to extend the plot of a time series so as to display the long-term trend"
Honestly, have you actually seen the chart Phil Jones is referring to in that email?
The "hide the decline" trick had nothing to do with extending a time series to show a long-term trend.
The trick involved cutting one time series short by a couple decades (tree ring data) and grafting on a second time series (recent instrument data).
The problem was not extending the chart, or even using two different data sets. The problem was switching data sets mid-chart, without noting the switch.
Other versions of the chart show the distinction clearly by drawing the two data sets with two different lines.
Posted by: Woeful | Dec 02, 2009 at 11:54 PM
Woeful: if it's not clear already, I am not taking sides in the actual debate concerning the particular chart as I have not reviewed the issue, and I also believe that one could not take sides based on a casual look at one chart given the complexity of this issue. In this post, I am reacting to the way they snickered as they discussed chart design. There are legitimate statistical reasons for determining scale, data fusion ("grafting"), etc. I am not saying their particular actions were legitimate; I am just saying the email shows no recognition of these statistical issues.
Posted by: Kaiser | Dec 03, 2009 at 12:33 AM
Martin: indeed, the beer sales example I found used the STL function in R.
Posted by: Kaiser | Dec 03, 2009 at 12:36 AM
Dumb question - is there a primer you might suggest for someone wishing to learn how to perform a decomposition? I have spent some time trying to learn how to do this on my own, but if there was a cookbook out there I could review, I'd love to hear your suggestions. My work environment really wants us to stick to Excel, so R and other tools are not things I have ready access to, alas.
Posted by: Ran Barton | Dec 06, 2009 at 12:22 AM
I find it odd that beer sales increase with the same rate as global temperature increase. Perhaps beer consumption is causing global temperature increase! *fart*
Posted by: Frank Chillamos | Dec 07, 2009 at 07:56 PM
Ran: maybe try this reference for the STL procedure in R. Remember R is open source freeware, and hopefully your work place will let you install that.
R.B. Cleveland, W.S. Cleveland, J.E. McRae, and I. Terpenning (1990) STL: A Seasonal-Trend Decomposition Procedure Based on Loess. Journal of Official Statistics, 6, 3–73.
Posted by: Kaiser | Dec 08, 2009 at 12:47 PM
Thank you for the reference; I will track that down.
Posted by: www.facebook.com/profile.php?id=696381372 | Jan 11, 2010 at 12:45 PM
Here it is: link.
Thank you again.
Posted by: www.facebook.com/profile.php?id=696381372 | Jan 11, 2010 at 12:53 PM
I was not familiar with STL; it's fantastic. Thanks for highlighting it!
I could have used this several years ago, but I wonder: will STL work on time series where the periodic (seasonal) component has an irregular and essentially random length? e.g. the dimension of a part checked 100% where the number of parts is highly variable over time.
Posted by: Tom Hopper | Dec 10, 2010 at 04:11 AM