Over at Andrew's blog, he's checking some numbers from Albert-Laszio Barabasi's latest pop-sci book, and having a load of fun with it. It's something I used to do when I had more time reading pop-sci books.

One sloppy passage from the Barabasi book is this:

It’s possible to put actual monetary value on each citation a paper receives. We can, in other words calculate exactly how much a single citation is worth. Any guesses? Shockingly, in the United States each citation is worth a whopping $100,000. We know this by looking at the amount of money the nation spends on research . . . If we then divide that figure by the total number of citations collectively generated by the papers these funds paid for, we can estimate the cost of a single citation.

Barabasi claims that each citation in each research paper published in the U.S. is worth $100,000. As Andrew and his readers pointed out, the paragraph is filled with loose language and fishy analysis.

The "exact" monetary value is certainly not exactly $100,000. An "estimate" usually isn't "exact". "How much a single citation is worth" is different from "the cost of a single citation". The only thing we know for sure is that Barabasi divided the money spent on research by the total number of citations. When Andrew traced the origin of this formula, he found that Barabasi cited an unpublished report by someone else, and we don't have any details on how it is derived.

Spending is not the same as value. The U.S. spends the most money by far on healthcare amongst peer countries, and as we have learned, our health outcomes are far from world best (link). If an advertiser spends $5 million on a SuperBowl ad, the brand has not created $5 million in value. In fact, the business starts in a $5 milllion hole, and must generate incremental sales above and beyond $5 million to achieve a return in investment! Citations, as far as I know, do not recuperate any investment dollars for the nation.

Another problem is to pretend there is no time lag. This bad assumption is also used by epidemiologists throughout the pandemic when they take last week's new deaths divided by last week's new cases, and call that a case fatality rate. Deaths, in reality, lag infections by weeks so last week's new deaths derive from new cases from many weeks ago. This time-alignment error is most acutely felt in data that exhibit large temporal shifts.

If cases are growing rapidly, this method of computing fatality rate under-estimates it. The analyst may get away with this for a while but when the growth slows, the reverse effect kicks in -- and that's when the analyst will suddenly change the formula. Look out for that!

Barabasi appeared to have assumed that R&D spending in a given year leads to citations in the same year. It's unclear how he dealt with citations this year of a paper published several years ago.

***

The next issue is **survivorship bias**. A paper with zero citations costs nothing, according to his formula. So if I were the research budget administrator, what should I do?

How about funding lots of researchers who publish useless papers that are neither read nor cited? I can achieve a monetary value of ... infinity! (Divide a big number by zero).

Ideally, I will score exactly one citation so the average value per citation is my entire research budget. Now, I apply for new funding based on value per citation.

The error is to compute the average based on the subset of survivors (those papers with at least one citations) while ignoring the non-survivors (papers without citations).

It's like computing the average rate of return of the hedge fund industry, ignoring any hedge funds that imploded during the year.

## Recent Comments