Note to readers: There will be few updates to the blog in the next few days until southern Manhattan gets its power back.
Note to AT&T: If you listen to your customers, you won't need to "investigate". Cell phone service has been unavailable since Monday.
Lots of numbers are thrown around in the media all the time. Do you know how they are derived? This is a really important question to ask every time you see a number.
Sometimes, the source is pretty obvious. Polls and surveys provide a lot of data. Websites report a lot of numbers because every click is registered in a database. Some data are painstakingly collected by human beings, like the situational statistics collected in a baseball or football match.
What you may not realize is how unreliable some of the data sources can be. The recent New York Times article discusses a change to how much sugar Americans are consuming (link). It discloses how inexact this science is.
Some of the steps involved in deriving that number:
- An average American diet is estimated
- For each food item, the sugar content is estimated
- For each food item, the "food loss" is estimated. The idea is that if food is bought but not consumed, the sugar is not consumed.
- A Nielsen survey of food purchases is used
- Interviews conducted by CDC are used
- Businesses and lobbyists for the sugar industry are consulted to provide "expertise"
- Academics are consulted to provide "expertise"; however, none of these professors are willing to talk about what "advise" they provided. They claimed that they could not recall. Apparently, no notes were taken by anyone at these meetings, and also they could not remember something that happened in 2008.
This enterprise sounds impressive because it is so complex. But complexity is only a good thing if we are able to measure the finer components more precisely; in this case, even at the finer level, each step involves guesswork. The error of each step then compounds. When you add in lobbyists who have an agenda, it's a grand mess.
The news here is that the Agricultural Department decided that the "food loss" estimates in the past are flawed, and by using new "methods", their estimate of sugar consumption fell by a whooping 25 percent. How can we believe this new research? To me, it is impossible to create "food loss" estimates for hundreds of food items. If you look at the report, they have food loss estimates for each of spinach, squash, sweet potatoes, snap beans, tomatoes, broccoli, etc. and the tables go on for pages.
The researchers are not conservative either. They make huge revisions to the prior estimates. The article mentioned pumpkins, which went from 20 percent lost to 69 percent lost. Yes, they claim that 70 percent of all pumpkins sold are never eaten. I mean, it's Halloween but are we really growing pumpkins just for the fun of one night?
Now these researchers told the reporter that comparing the old estimates to the new estimates is "improper because of changes in methodology". This is absurd; new methods must be compared to existing methods.
We sorely need an estimate of the margin of error of this sugar consumption number. Given that it has so many components, each of which has a wide margin of error, my suspicion is that the entire enterprise collapses by its own weight.
What are some ways to create a much more credible estimate? Look at the diet, and focus on the big items. Recruit a panel of people (like Nielsen does) and ask them to record their own behavior for a period of time... how much they purchased? how much they trashed? At least, we have a set of real data to work with. The sample size may be smaller but the quality of the data is much higher.
I'm sure there are many other superior methods. It's an interesting problem for statisticians to think about. For others, be careful when you're fed numbers.