A new analysis, published as two journal articles (the first is the one relevant to this post) and widely over-hyped in the news (example), that looked at alcohol consumption and cardiovascular disease is a perfect example. First, it should be noted that the authors, a group at the University of Calgary, came to the conclusion that we already knew beyond any serious doubt: Moderate alcohol consumption is protective against heart disease to an impressive degree, and possible protective against stroke to a minor degree. As I noted in UN29 this has been well known for two decades. So averaging together all of the many studies that tend to support the conclusion (and the very few that do not) tells us little we did not know already.
In theory it might help us quantify the effect. That is, we might have studies that showed a variety of results – reducing risk by 10%, 20%, or 50% – so we were already sure there was a reduction, but the synthesis might let us consolidate them into a single estimate. But this brings up one of the fatal flaws in the synthesis, the “separated at birth” assumption. Basically an analysis of this type effectively pretends that the studies being summarized were a single large study and the resulting dataset was chopped up into pieces and analyzed by different research groups, and so the synthetic analysis is putting it all back together. Obviously this is fiction, since each study was done on a different population, there were a variety of different definitions of exposure, outcome, follow-up time, and the analyses were done differently. There are a few statistical tricks to deal with some of the differences but these mostly replace one of the many clearly false assumptions with another likely false assumption.
Thus, a study like the new one effectively says “let’s average together this result of a five year study of American men who consumed an average of 1.9 drinks per day with this other twenty year study of French men and women who consumed an average of 1.2 drinks per day with….” I suspect if they explained it in those terms rather than burying the essence of the methodology in statistical mumbo jumbo most readers would laugh at the concept. A useful type of meta-analysis, sometimes called a “comparative meta-analysis” or the less descriptive “analytic meta-analysis”, is one that attempts to learn from the difference between studies, rather than pretending there are none. E.g., there is a great deal of uncertainty about exactly how much drink produces the optimal benefit, so a useful analysis would be to see if the existing studies, taken as a whole, could better inform that. Another useful comparison of results across studies that has been done is the one that put to rest the myth that this is about red wine rather than alcohol itself; in that case the comparison was to see if there was any substantial difference between the protective effects of different sources of alcohol.
A synthetic analysis inevitably hides all manner of problems, both with the individual studies and with the synthesis itself. In this case, in order to make the combined studies somewhat more uniform (that is, to make the “separated at birth” assumption a little bit closer to true) the authors restricted the analysis to study results where the comparison group was lifelong non-drinkers. But one of the persistent criticisms of the claim that alcohol is protective is that lifetime teetotalers in a free-drinking society are an unusual group that might just have extra disease risk due to being anti-social or following minority religious practices, or might refrain from drinking due to some known health problem. The hypothesis, then, is that it is not so much that moderate drinking helps, but that being a non-drinker means something is wrong with you. Most of the time this claim is put forth by nanny-state type activists who just do not want to admit the truth about alcohol (usually they omit the phrase “being a non-drinker means something is wrong with you”), but the hypothesis still cannot be dismissed out of hand. This is why many researchers pursuing this topic have compared the moderate regular drinkers to those more-likely-normal folk who drink occasionally but not never.
Thus, the entire new study was done in a way that told us what we already knew and avoided even looking at the most important reason why this might be wrong. And it avoided trying to answer any of the unanswered comparative questions. It did provide an update to previous systematic reviews of the topic, a minor contribution, but did nothing that created any fundamentally new knowledge. You would never know any of that from reading the news, of course. In this particular case there is nothing wrong with the effect of the one day of hype that resulted from pretending we had learned something new, which was to get out a public health message that has long been known to the experts but is not widely understood in the population. However, praising bad practice because it had a good outcome in a particular case is a very dangerous game.
What in my mind is the biggest problem with the new research reports, however, is the bit that starts:
we can now examine the argument for causation based on Hill’s criteria
Those familiar with what I (and others expert on the subject) have written about the useful but universally mis-interpreted contributions of Austin Bradford Hill will immediately know why I respond to that with: Really?
Such a statement indicates a fundamental failure to understand how to think scientifically, a problem rather common in health science. It is not an error that is likely to be quoted in a newspaper, since it is too arcane for that medium, but some pundits did fall for it. I will write more about this tomorrow, and later in the series, but to start the point:
Over 45 years ago, Hill gave one of the greatest talks in the history of epidemiology with many brilliant lessons. It also included a list of points (he called them “considerations”; many authors – like the recent ones – incorrectly refer to them as “criteria”) that are worth considering when trying to decide if an observed association between an exposure and disease is causal rather than confounding. These are also often referred to as “the Bradford Hill criteria”, which represents confusion about his surname. Such a common sense list of considerations was a valuable contribution to anyone who had no clue about how to do a scientific analysis to assess whether an association was causal. It is also equally worth considering a separate point which he did not address in any depth, whether the association in the data might not represent a real association in the world that is either causal or confounding. I.e., it might have been caused by biases in the data gathering or analysis process. (If this seems a bit too technical to make any sense to you, do not worry because (a) I will address it again as the series continues (and had an example of some of it here) and (b) it presumably also makes no sense to anyone who would talk about applying “Hill’s criteria”.)
Hill’s list was one of many that have been proposed. It included some obvious valid points (make sure the ostensible cause occurred before the ostensible effect) and some slightly less obvious ones (consider whether there is a plausible pathway from the cause to effect; observe whether the greater the dose of exposure the more likely the outcome; observe similar results in different populations). It also contains some points that are as likely wrong as right (make sure the effect is specific, which does not work for the many exposures that cause a constellation of different diseases). It only obliquely touches on what is probably the best test for causation when it is in question, think like a scientist: Figure out what the proposed confounding would be; figure out what observational or experimental data would be different if the relationship were causal rather than the confounding; gather and check the necessary data.
But most important, Hill’s or others’ lists of considerations are not a checklist and cannot be used that way. The considerations are not definitive: it is easy to list examples where the “criteria are met” but we know the association is confounding and not causal. Moreover, there are no rules for determining whether one of the considerations “has been met”. Indeed, in almost every case it is possible to make an observation that could be interpreted as being in line with the consideration (e.g., the Calgary authors argued “the protective association of alcohol has been consistently observed in diverse patient populations and in both women and men”) and also an observation that could be interpreted toward the opposite (e.g., a few paragraphs after the above declaration, the authors wrote, “we observed significant heterogeneity across studies”).
Another example from the Calgary paper is the declaration that the consideration of specificity was met because there was no protective effect against cancer, only against cardiovascular disease; they could have just as easily have said it was not met because the results were observed for both coronary artery disease and stroke, or because moderate alcohol consumption also protects against gallstones. Obviously the authors just chose a way to look at specificity that supported the conclusions they wanted to draw. Finally, there is no method for converting one’s list of observation (“the cause definitely preceded the effect; there is good biological plausibility; there is moderate agreement across studies; etc.) into the conclusion “and that is enough to conclude causation”. In other words, scientific conclusions require thinking, not recipes.
It has been shown that in general when someone tries to “apply” causal “criteria” they choose from the list of the considerations, pick the ones that fit their goal (either claiming or denying that an association is causal) and apply them in idiosyncratic ways. This is certainly what was done in the present case. Indeed, it is the only thing that can be done, since there is neither a definitive list nor a systematic way to apply the entries on it. In other words, “we applied Hill’s criteria to confirm this was causal” is, at best, similar to saying “we looked at this inkblot and confirmed it is a picture of a butterfly”. At worst, it is simply blatant rhetoric disguised as science. Indeed, the last few times I had seen someone actually make an assertion about what “Hill’s criteria” show, it was in consulting reports which were written as the most overt kind of advocacy for a particular conclusion about causation.
Of course, the Calgary authors seem to have come to the right conclusion. There is little doubt that the relationship is causal based on all we know, though there remains the one source of doubt that teetotaling is caused by or has common cause with poor health. Funny how they did not mention that possibility.
News reporters, not surprisingly, naively reported the quantitative conclusions from the meta-analysis as if the “separated at birth” assumption were correct and the comparison to teetotalers were reasonable. More realistically, they had absolutely no idea that either of those was an issue they needed to worry about (and the authors apparently either shared this ignorance or were hiding their knowledge) and did not bother to ask anyone who might know better. Since most readers will only remember the qualitative conclusion, though, the resulting message was basically correct. Fortunately this was a case where the right answer was hard to miss.