The anti-snus activist shop at Karolinska Institute just published (press release; article – open access) a report that claims that using snus causes weight gain. Since KI snus researchers are notorious for changing their statistical analysis to get the anti-snus results they want (my colleagues and I, and Brad Rodu have written a lot about this), the smart money says that their data do not really support their conclusion. I believe these Karolinska researchers account for the majority of studies from the last half decade that claimed to find an association between snus (the Swedish word for moist snuff) and disease. They got these results by changing their statistical methods in order to create such an association – or so we suspect, since they continue to defy a court order to release their data, which would provide proof of their unethical behavior. So, it is always an interesting little exercise to figure out how they cooked up their latest “result”.
In this case, the most obvious candidate is the effect of age. The authors claim that snus use increased the chance a man gained 5% of his body weight between 2002 and 2007 by about 30 or 40% compared to a nonuser of tobacco. Presumably the authors are motivated by the fact that few people will worry about a trivial risk of cancer as much as they will a 30% increase in the chance of gaining weight (which someone will probably misconstrue as a 30% chance when they put it in the propaganda). But what they do not tell us in the abstract or the press release is that the nonusers had an average age of 46 at baseline while the snus users had an average age that was ten years younger. Unless Swedish men are radically different from those in the Western countries I am more familiar with, an unfortunate fact of life is that many of us gain well over 5% during some period between late-20s and 40, when family and too much work replace sports and such. Fortunately most of us stabilize again after that.
So we would expect a group of men with an average age in the mid-30s to include many in this prime fattening range, while a group with average age in the mid-40s would have fewer. Oh, but wait, they said they “controlled for” age. Doesn’t this solve the problem?
Therein lies a great example of one of the biggest lies of epidemiology. “Controlling for” is never perfect, but in many cases it is obviously so very far from perfect that it is dishonest to claim to be controlling for the confounder. There are many different ways this can hapen. The reason in this case is that the effect of age is not linear, but they assumed it was. At least I assume they assumed that, since they do not actually report what they did (failure to report one’s methods is typical for bad epidemiology). They wrote “Age and baseline weight were included in the analyses as continuous variables”. Apparently they are unaware that continuous variables can take all sorts of different shapes, and it did not occur to them to specify that they were assuming a simply linear trend. I am quite confident they would have said so if using an appropriately complicated function.
(Aside: They published this in an online-only journal that does not have page limits. Thus, their failure to publish their methods, or alternative analyses, or sensitivity analyses, or the effect estimates for their covariates, etc., was entirely their choice. It is bad enough that health science “publishing” in dead tree journals forbids actually including enough information. It is worse, in cases like this, when the authors choose to not to report useful information.)
From the few words they wrote, I can only conclude that they controlled only for the linear trend across the entire age range, from 18 to 84 (i.e., basically the entire adult male population). That is, they assumed that the effect of being 19 rather than 18 is the same as the effect of being 25 rather than 24, and 41 rather than 40, and so on. There is presumably some minor linear trend across all ages, but this trend does not capture the fact that (I am just roughing this out without looking it up, but you can see the idea) a five year period starting at age 18 is likely to see major weight gain, starting at 23 less so, with the risk at 30 being higher again but 40 being lower.
If that is not clear, consider another example. Imagine the claim that being unmarried increases your risk for auto accident. It is probably true to some extent, but it is even more true that young drivers (mostly unmarried) and old drivers (more likely than average to be widowed) are much riskier than those in between. Now imagine that we “control for” age, but use statistics that assume that the effect of age is a linear trend. That line is going to be fit to a curve that looks like a bowl (high on each extreme, low in the middle), and so it going to be fairly flat and not look at all like the real age effect. Thus, having “controlled for” age with that variable, we still have almost all of the actual confounding effect of age, and so the statistics blame lack of marriage for the bad driving of inexperienced and rather less responsible youth and dottering and even less responsible old people.
There are useful ways to control for the effect of age on a variable. Assuming a continuous linear trend is usually not one of them, and clearly not in the case of weight gain. It seems quite plausible that the snus users included more very young adults who were not up to their full adult weight yet and others at greatest risk of late-youth weight gain, and this explains the entire “effect”, and moreover the effect of age is only slightly affected by controlling for the lifetime linear trend. One reason I suspect this explanation is that they report three different statistical models, each controlling for more variables than the last, but the first still controls for age. I would really like to see how much difference “controlling for age” makes compared to not doing so. Since age is hugely predictive of someone’s likelihood of weight gain in the next few years, it should matter a lot if correctly controlled for, but I am betting that putting in the covariate they used had very little effect. It is very suspicious that they did not report the unadjusted association; it might have clearly demonstrated that they were not really controlling for the effect of age.
There are other possible explanations for the results. This is particularly true for a second outcome they looked at, people becoming obese. That outcome is likely affected by numerous confounding variables that the authors did not include at all. My hypothesis about age being the explanation is also not so well supported for the result for former snus users (already quit at baseline and remaining abstinent at followup), who are only slightly younger than the never-users but were also more likely to gain weight. This suggests there is something different about people who are inclined to use snus, other than age, that the study did not control for at all, since there is no plausible reason why having used snus in the past would cause weight gain. There is definitely more to critique about this study. Still, I am still pretty confident about this age effect mattering, and makes for a nice Unhealthful News lesson about potential confounding and how vacuous the statement “we controlled for…” can be.
No doubt that if confronted with this observation the first thing the authors would do would be to retreat into the usual weasel words and say they were just looking to see if there was an association, not making a causal claim. This is clearly disingenuous because, though they always use the word “association” when reporting their results, the prose surrounding the results makes quite clear that they are pursuing a causal hypothesis. They certainly never say “but there is no reason to assume it is causal”. More important, if I am right about the effect of age, or about them manipulating their statistical methods in general, that weaseling is not even accurate (though I suspect that few epidemiologists would even understand why). If they analyzed the statistics in a misleading way, then there is not really even an association in the data, when considered properly. If they were just looking at the unadjusted comparison of the two variables then they could claim this, but they claim to be controlling for age. If proper controlling for age would make much of the result go away, as I suspect, then not only is there no evidence of causation, but there is not even an association after adjusting for age.
Of course, it is possible that when confronted with this challenge they will release their data, or even just enough additional analyses and information beyond the almost-useless reporting they provided, to show that my hypothesis is wrong. They might even report “sorry we were so cryptic about the functional form; it was continuous, but we realized that linear was the wrong functional form so we used a fourth degree polynomial [or three splines] to capture the nonlinearity that you describe; thank you for giving us a chance to clarify that and prove you wrong; nyah nyah!”. Most of you reading this will have no idea what the middle bit of that last sentence means, but there is no reason to worry about that, because I can assure you it is not true. Such an analysis is way over their heads. As for them having something they could release that would allow that “nyah nyah” at me, I am really not to worried.