This seems like a fitting first post for this blog, since the paper in question played a substantial role in creating the path that led to much my current work in both tobacco harm reduction (THR) and recognizing the importance of publication bias in situ.
A few weeks ago, Brad Rodu published an analysis (“Winn’s Legacy: The Fifty Fabrication“) that pointed out how a misleading number from a thirty-year-old paper continues to appear in the anti-THR literature, in statements by the U.S. government’s executive branch to the U.S. legislature (i.e., the bureaucrats are lying to the peoples’ representatives), and in other forums. The figure is the erroneous claim – based on the the 1981 paper by Winn and her then-new colleagues at the U.S. National Cancer Institute, which was based on her dissertation research from the 1970s – that smokeless tobacco (ST) increases the risk of oral cancer (OC) by a factor of fifty. This number misled many well-meaning anti-tobacco activists to discourage THR back in the 1980s, and now provides ammunition for the dominant anti-tobacco extremist faction to discourage THR even though they surely know they are lying when they present the number. As a result, it has contributed to the deaths of countless smokers who would have switched to the low-risk alternative had they known the truth. While it is impossible to quantify the counterfactual, the intensive use of this number in anti-THR propaganda, particularly 5 to 15 years ago, means that it probably had substantial independent effects, and it seems reasonable to guess that it killed thousands of smokers who would have otherwise been saved.
It is worth noting that this is not a case of innocent scientists publishing a scientifically valid result and then activists taking it and misusing it: The authors put the 50-fold figure in their abstract, even though they undoubtedly knew how utterly misleading it was. The NCI is among the worst historical offenders in publishing disinformation to pretend that ST causes risks similar to those from smoking, including the fifty claim, and is still committed to misleading smokers about their options for lowering their risks. Winn herself has perpetuated the fifty estimate, though she obviously knows what it really means and that it is almost always misinterpreted, as well as knowing what manipulations of the data it took to cook up the number (the main point of the present analysis). I am aware of no case where she or NCI made an effort to correct the misperceptions. Winn now seems intent on ending her career by redoubling the needless mortality she caused by discouraging harm reduction, as noted by Rodu here and here. Her initial contribution to this at the start of her career – when there was less definitive knowledge that the risks from ST are very small and THR was not so clearly the best public health intervention regarding smoking – might be seen as an accidental sin, but it is difficult to see how the current actions are forgivable.
The purpose if the present analysis is to report on some biases in that estimate that Rodu could not have included because he, like most people in the field, have never had a chance to look at the original data. In contrast with more honest sciences, it is common for epidemiologic data that is used to produce published articles, even those that have huge policy implications, to be kept secret in perpetuity. This seriously torturing the definition of “publication”, renders peer review almost meaningless, and means that a lot of policy is based on junk science. Fortunately, I am one of the few researchers who has a copy of the Winn data.
[Aside: To clarify, I believe that the causal pathway is not that fundamental problems with the science of epidemiology caused the acceptance of hiding data. Rather, an unfortunate accidental historical path made it acceptable to keep data secret resulted in epidemiology, attracting people whose work could not stand up to scrutiny because of its low quality or political bias. But because such people then came to dominate the field – particularly in areas that are highly political – and thus its publications, rules, gatekeeping, budgets, etc., what might have been a temporary problem of a young science became institutionalized by those in power who do not want their junk science exposed, or do not even realize they are doing junk science but simply want to preserve their empires.]
I have written about what the Winn data shows before, but clearly did not do so loudly enough, since Rodu was surprised when I submitted some of what appears below as a comment on his blog. Instead of posting that comment, he suggested that I write something in greater depth, and so here it is (and thanks to him for the suggestion and comments on the draft). For those who have stumbled across this – perhaps because it is such a good example of how unethical biases are introduced into epidemiology research reports – but do not know the basic facts about smokeless tobacco, you might want to consult the FAQ at TobaccoHarmReduction.org or the background chapters in the Tobacco Harm Reduction 2010 Yearbook. For those who are aware of the basic facts but not the Winn legacy, see the Rodu posts that are linked from here.)
Before addressing what can only be learned by looking at the data, I will start with a few epistemological observations about that 50-fold increase statistic that are sufficient to show that approximately every use of that number represents either an intentional lie or fundamental ignorance of basic scientific research methods and/or the content of the original article.
1. Epidemiologic estimates are not physical constants. As anyone who paid attention to even one decent class in epidemiology knows, the results depend on the specifics of the exposure (which likely changes over time), the population and their other exposures (which inevitably changes over time), and the exact outcome being measured (which might also change based on contemporary assessments of the right measure of a phenomenon). Also, methods for analyzing the exposure-disease-population combination in question often improve, rendering old analyses obsolete (see the point below about “ever users”). Thus, anyone who cites an effect estimate from more than three decades ago as if it were a constant clearly has no business reporting on health science – they obviously do not understand it.
I would hope that it is obvious to my readers that other social science measurements (e.g., what is the benefit of a college degree? what portion of babies are raised by unmarried couples?) are not constants over time and across populations, and would not quote an estimate based on a study of mostly rural elderly women in North Carolina the 1970s as if it applied to everyone and were still true. Though even something that obvious is not obvious to everyone: For example, you still occasionally see the claim that there are exactly 76 million cases of foodborne disease annually in the U.S., which is based on an extremely rough modeling exercise from 1999, which in turn is based on studies from earlier than that (and which, incidentally, I and others showed to be a bad estimate even at the time, but that is a different point). Apparently it does not even occur to people who repeat that number that the population size, the quality of the food supply, and many other factors have changed dramatically in more than a decade, and thus even if the number had been exactly right at the time it would not longer be.
In fairness, a relative risk estimate will often be more stable over time than some of these other social science measures, but it is still not stable. The exposure (product type, etc.) changes, populations change (which mainly means that causal co-factors and competing cause of censoring change), and even disease ascertainment changes (diagnosis, definitions). So, in short, even if the Winn estimate had been unbiased and meaningful at the time it was published, it would be of only historical value now.
2. To repeat the point that Rodu emphasizes in his post, even if the estimate were unbiased and relevant to today, it was not an estimate of the risk of OC. First, it was limited to a particular rare form of OC. Indeed, the specific analysis in which the number was presented actually emphasizes that other, much more common, forms of OC did not show measurable increase once the rarer cancers were separated out. So claiming that the statistic represents the risk of OC as a whole is like claiming that slicing bagels is the leading cause of traumatic injury because it is the leading cause of one particular traumatic injury (laceration injuries to the hand that require urgent care) that represents a tiny fraction of all injuries. This is particularly important because the particular specific OCs that generate the statistic are so incredibly rare (and even far more so in the absence of smoking and heavy drinking) that even if there were a 50-fold increase in risk, this would not be very significant in terms of lifetime disease risk. But since anyone trafficking in the large number undoubtedly realizes that most readers/listeners will think otherwise when they hear the big number, even if the number were a valid measure for the rare disease risk, presenting it without the caveat that the risk is trivial would still constitute scare tactic propaganda rather than the honest communication we should demand of the government and others.
Second, the result applied only to the group who had used ST for more than 50 years. This point is sometimes alluded to when the number is cited, with a phrase like “those using the product the longest”, though that phrase is not something readers will pay much attention to (as the authors of such statements no doubt realize) and fails to really capture the point that these subjects had been using ST constantly since the 1920s or earlier (and the data analysis, below, shows that their usage was even more extreme than that implies). Moreover, most of the asides about the exposure group seem to use a phrase like “the heaviest users”, which implies to the readers that a 25-year-old who uses a lot of ST is at this level of risk. Rodu, I, and others have written about these points extensively, so I will not belabor them.
3. The exposure studied by Winn was mostly the local variety of powdered dry snuff preferred by traditional Appalachian women in mid-20th century and, more so, earlier than that. This means that not only do exposures change over time, as noted above, but that even at the time of the study this exposure was different from the common exposure (chewing tobacco and moist snuff). Moreover, the population itself is rather unusual, which further erodes the generalizability of the result. The one or two other studies that were able to separate out a tiny bit of data for this particular exposure and population also reported a measurable risk for oral cancer. For those who do not know, these represent outliers from the numerous studies of all American and Swedish ST products that have shown that there is no measurable risk of OC.
For more information, Rodu has written extensively about this point, as have I and others. Some of those authors have concluded that the Winn study’s main result (not the cooked up 50 statistic – see below) and the other smaller studies mean that those archaic products caused a substantial risk for oral cancer. Others suggest that we cannot be sure this is why the Winn study is such an outlier, though it is a plausible hypothesis, and we will never know for sure. But either way, this means that even if the 50 statistic were an accurate estimate of something, it would have basically no relevance to the products that people use today. Clearly it would have absolutely no relevance to the modern products that are promoted for THR.
4. The result is extremely statistically unstable (i.e., was very dependent on the luck of the draw in terms of who ended up in the sample). This makes the result very easy to manipulate in the ways described below as PBIS. Even apart from the points below about how the data was distilled to get an impressive result, the mere fact that it is so sensitive to the particular sample Winn chose (which people who know some statistics might know by the phrase “very wide confidence interval”) means that the result should never be reported as if it were a precise estimate of the exact risk. If this were the only concern, one could still say “a very large multiple” or something like that, but it is misleading to imply that we actually have such good information that we can quantify the estimate.
5. A final reason why this number should not be cited as a reason to avoid THR or ST more generally is a bit more subtle, and requires knowledge of the world and not just statistics, but should be instantly recognizable as valid: Even if it were true that someone initiating ST right now would have a 50-fold increase in OC risk fifty years from now, who cares? Seriously. Anyone with enough literacy, wealth, and motivation to have access to this propaganda is of a social class high enough to, almost certainly, have access to high-probability cure for OC fifty years from now. Just think about the progress in medical technology in the last fifty years, and then about the rate of acceleration of technology. This is nothing like the heart attack that might kill a fifty-year-old smoker tomorrow or the emphysema that will likely be irreversible in his lifetime. A dramatic multiplication of someone’s (very very low baseline) risk for oral cancer fifty (or even forty and probably even twenty) years from now simply does not matter very much. This is obviously not to say that we should not endeavor to prevent cancer or that current OCs are not terrible diseases, of course, but discouraging a behavior that could save many lives (via THR) or that people simply really like based on claims of a few cancers that will not occur until they are extremely likely to be curable is obviously indefensible.
With all that as background, I will no proceed in Part 2 to discuss what is not widely knowable about the Winn paper, because it requires analyzing the data. One might argue that since the result is so clearly irrelevant for the above reasons, there is little point in this. But since those incredibly obvious reasons do not seem to have stopped the disinformation, it cannot hurt to pile on a few others. In addition, this serves and important illustration of the methodologic and ethical problems that are rife in epidemiologic publishing, particularly in areas where those reporting the results are more activist than scientist. For example, someone who realizes how this number was cooked will be more likely to see that the anti-THR propaganda that has been published by the Karolinska Institute over the last few years is pretty clearly cooked. Oh, and the full reference for the Will study is N Engl J Med. 1981 Mar 26;304(13):745-9, Snuff dipping and oral cancer among women in the southern United States, Winn DM, Blot WJ, Shy CM, Pickle LW, Toledo A, Fraumeni JF Jr. – I mention this as a hint to teachers: If you are looking for teaching articles to demonstrate dramatic over-conclusion, naive epidemiology methods, and unsubstantiated policy prescriptions, you really cannot beat the New England Journal of Medicine.