An interesting episode is developing in econometrics over the very high profile Reinhart-Rogoff paper that was heavily cited as a source to "prove" that high levels of national debt impede growth. It appears that that result was based on a combination of spreadsheet errors, and bad assumptions.
1. Andrew Gelman has a great discussion here. His main concern is ethics of data analysts. This is a very important point - any experienced data analyst knows that it is extremely easy to make mistakes; in fact, if there is a data analyst who has not made errors that subsequently have to be corrected, or disseminated results that subsequently have to be revised or even retracted, one can be sure that the data analyst has not done a lot of analyses. (I'm excluding those who only work with simulated data.)
Errors come from many sources, and it appears that Reinhart-Rogoff managed to touch several important ones.
a. Processing errors - in the course of an analysis, data is often transported from system to system. Even on a spreadsheet, the data may be copied and pasted from one page to another, or formulas are set up to link from one area to the next. For a team project, the spreadsheet or dataset may pass from one person to another. A tiny moment of inattention could easily ruin the data. It appears that Reinhart-Rogoff dropped some data on the floor. I should note that "Big Data" makes this problem a hundred times harder to solve. If you only have one spreadsheet, you can in theory verify each number at different stages of the analysis. Not if you have millions of rows and thousands of columns in a dataset.
b. Bad assumptions - any analysis will contain assumptions, mostly because the data has problems or holes. Making assumptions is not itself bad. But some assumptions will turn out to be bad. In this case, the assumptions were badly explained. Critics noted that some data points were deliberately dropped from the analysis, and the conclusion was sensitive to those dropped data. Based on these reports, Reinhard-Rogoff have not convinced these critics why those data points should be dropped.
c. Bad data -the biggest reason for bad analysis is usually that the analyst does not know how the data came into being. Somebody else collected the data (or a machine did). The data was purchased from a third-party. All kinds of hidden assumptions will now affect the analysis. For example, you analyze credit-card transaction data to try to understand consumer spending patterns. You see that the billing decreased. If you're not careful, you'd conclude that consumers are spending less. But there are many other possibilities. For example, maybe credit-card companies increase their fees and then merchants set higher minimums, forcing consumers to pay cash. Frequently, you learn about this type of extraneous factors after you already disseminated the analysis.
Finally, while they acknowledge the issue of reverse causation, they seem very much to be trying to have it both ways — saying yes, we know about the issue, but then immediately reverting to talking as if debt was necessarily causing slow growth rather than the other way around.
In their response to the critics, Reinhart-Rogoff's key point was "We were only arguing association, not causality." (see here and here.) I'm sure we'll find specific statements to that effect in the paper but all too often, the discussion of the implications of such research includes statements that assume the cause-effect relationship, exactly what Krugman was complaining about.
This problem is not just found in economics, although I see it a lot with the Freakonomics style of thinking. It is also very common in the medical literature, and other fields in which observational data is used. The usual ploy is first acknowledge that the data could not prove causality ("we found an association between sleeping less and snoring; our data does not allow us to prove causation."), then quietly assume that the causal link is there, and wax on the implications ("if you want to snore less, sleep less.")