More than a decade ago, some researchers at the U.S. CDC created an incredibly complicated model that was highly dependent on what were basically wild guesses, and came out with the estimate that there were 76 million cases of foodborne disease in the U.S. annually. Several researchers, including one of my students and me, pointed out that this estimate was ridiculously over-precise, given the complete guesswork that went into it. Also, it was almost certainly biased downward (i.e., a lot too low). Our article was a general analysis about accounting for errors in estimates and just used that estimate as a particularly glaring example of researchers who apparently do not understand the concept of error bars.
The reason I bring it up is that every time there is an outbreak of foodborne disease that makes the U.S. newspapers (i.e., it had a lot of victims in the U.S.), this number gets toted out again, like in this New York Times article, written by Walecia Konrad yesterday. Granted, I have never heard of Konrad and she(?) is not one of NYT’s expert health writers, but I have seen the big names make this mistake also, as well as numerous others. And sure enough, the CDC website has that figure (and some others from the same original paper, which are also repeated by Konrad) in multiple places on its website. Several of these have date stamps as recent as 2009 and as far as a few minutes of searching revealed, there is no acknowledgment that this is a very rough estimate and the only way a reader would even learn the age of the statistic would be to find one of the rare invocations that actually cited the original paper.
Why is this such an embarrassment for both health reporters and epidemiologists?
Even if we assume that most of the inputs into the model were unbiased (i.e., really were the best possible guesses – which is an incredibly charitable assumption), given the uncertainty of the guesses, it would be bold to assume that the estimate was correct within a factor of two. (If you want details, they are in our paper.) But think about the implications of reporting that “76 million” figure that has been repeated a zillion times since then (plus or minus half a zillion). If they had claimed the round figure “about 75 million”, it could be read as something as imprecise as “more than 50 million but less than 100 million”. That would probably have still been too precise a claim, based on the quality of the model used to derive the estimate, but it would at least have been in the ballpark. If they had rounded their estimate to 80 million it would have implied something like “between 75 and 85 million” (because 74 million would round to 70, while 86 would round to 90). That would be way too precise a claim, but at least representing some vague awareness that their estimate is not perfect. By claiming 76 million and offering no indication of uncertainty bounds around it, however, they are effectively saying “we are quite sure the answer is between 75.5 million and 76.5 million”.
So the reported figure was ridiculous the day it was originally written. But it is even more absurd to continue to make the same quantitative claim. This figure, even if it had been correctly estimated in the late 1990s (using data that often was years older), is obviously not a scientific constant like the speed of light. This should be fairly obvious to reporters who use this number, since the reason they are writing the story is because there is an unusual event happening, as in the present case, that did not happen the year before. More important, food handling practices change over time, some for the better and some for the worse. At the very least, I would like to think that newspaper reporters and the CDC are aware of the little matter of the U.S. population increasing by more than 10% since the original analysis was done, so unless improvements in food safety perfectly balanced out the expanding population with just the right decrease in per capita disease rates, year after year, the number could not possibly remain constant.
In other words, pretending that this number is accurate is like estimating the number of Americans that voted Republican or were Hispanic with one study in the 1990s, and then continuing to claim the number is exactly that value forevermore. I would assume that political scientists would not make those mistakes. It is a real shame that CDC’s epidemiologists lack equivalent understanding of the way the world works.
The source of the error hardly lets Konrad and other reporters off the hook however. The NYT article, like many others, presents the figures as if they were God’s Own Truth, not even reporting the source let alone including caveats like “CDC claims…”. You would think that following the huge embarrassment of playing stenographer for the U.S. government’s lies that started the Iraq war, the NYT’s instructions to its reporters would include “never repeat something the U.S. government claims as if it were an indisputable fact.”
Why does this matter, and why did I decide to write about it today since reporters have been repeating this error for years? The answer to the latter was that I was tipped over the edge by seeing on the front page of the newspaper an article about back-to-school shopping that declared that “on average, fathers are expected to spend 23 percent more on their … children than mothers”. It boggles the mind to think that anyone would believe that this prediction could be made with that kind of accuracy (not “about one quarter more than”, but down to the last percentage point). Do people not realize that even if we had a complete cash register data from every purchase made during the season, we would not be able to estimate that figure so accurately retrospectively? We could not figure out how many were made by dads rather than moms that precisely (e.g., if they are both at the store, who gets credit?) or what constitutes a back-to-school purchase. Do people not realize that we probably cannot even estimate how many schoolkids have dads who buy them anything with that degree of precision (due to uncertainty about who constitutes a dad, whether they are actually in the kids’ lives, etc.)?
Actually, I guess the answer is that most people do not realize. Therein lies why this matters. As soon as something has a number attached, reporters and others seem to turn off even their most basic critical abilities and common sense. It never occurs to them to ask even the simple “how could anyone possibly know this (e.g., exactly what portion of back-to-school purchases will be made by fathers)?” People seem to think that those who cook up these numbers have some magical sciences available to them. Modern science lets us measure the concentration of cadmium or BPA in biological samples down to the parts per billion level, so it stands to reason that they must be right when they claim that many people are being sickened by it, right? If they tell us that 47,318 people die each year in the U.S. from environmental tobacco smoke then it must be true, right? Surely a science so precise could not coexist with genuine controversy about whether there is any major mortality risk from ETS exposure.
And so, since we know that the U.S. government can count foodborne disease cases (which are almost never definitively diagnosed, by the way) within 1% accuracy, surely when they flatly declare “salmon genetically engineered to grow quickly is safe to eat and poses little risk to the environment”, they must be right. Right? The first bit seems likely, but the declaration by the Food and Drug Administration that there is basically no chance of ecological pollution from the new genes rings a little hollow. Maybe the country’s regulators of pain relievers and pacemakers somehow know more than the rest of us about the risk of introducing novel agents into the ecology, but I do not share the New York Times’s faith in the government (at least they attributed the claim to the FDA rather than just declaring it true). Perhaps if New York Times reporters read the New York Times, they might have noticed their several articles this year about the spread of the “Roundup ready” engineered gene from crops to weeds, making the latter resistant to what had been the best available herbicide. This is not to take a position on genetic engineering, mind you, just to speak up in favor of not being so gullible about any declaration made using the language of science.