Monthly Archives: February 2011

Unhealthful News 58 – Hierarchy of study types, "Hill criteria", and other anti-shibboleths

Over the last few months I have realized that a strategy I employ for identifying good science is to look for common oversimplifications that no real expert would present (at least without noting that they are simplifications).  These represent bits of advice that are better than believing the opposite and better than complete ignorance about what to believe, but I just realized what they are akin to.  They are analogous to the advice we give children who are learning to cross the street or to drive – something that is a good start on what to think about in those situations, but that would make us rather uncomfortable if repeated by an experienced adult as if it were still how she thinks about the challenge.

I am thinking about such things as the advice to look left, then right, then left again (or is it the other way around?) before crossing the street.  That is really good advice for a child who is just learning to cross the street and might only look in one direction if not advised otherwise.  But it is not necessary or useful to think in such mechanistic terms once you are skilled at recognizing traffic patterns as you approach the street, taking in what you need to know subconsciously.  But adults also know that it is not sufficient in some cases, like in London or someplace like Bangalore, where I use the strategy of furiously looking at every bit of pavement that could fit a car, just to make sure one is not attacking from that direction.  A similar bit of advice is “slow down when it is snowing”, good advice to a new driver, and it remains true for an experienced driver.  But it would be a mistake for someone to interpret that as “all you have to know about driving in the snow is to slow down”.

Encountering a writer or researcher who believes that a randomized trial is always more informative than other sources of information (I have written a lot about this error in the UN series, which I will not repeat now) is like walking down the street with a forty-year-old who stops you at the corner and talks himself through “look left, then right….”  Yes, it is better than him walking straight into traffic, just as fixating on trial results is better than physicians saying “based on my professional experience, the answer is….”  The latter is the common non-scientific nonsense that forced medical educators to hammer the value of randomized trials into the heads of physicians, so they did not get hit by cars.  Or something like that – I am getting a bit lost in my metaphors.  Anyway, the point is that you would conclude that perhaps your forty-year-old companion was perhaps not up to the forty-year-old level of sophistication in his dealings with the world.

Other such errors peg the author at a different point in the spectrum of understanding.  Yesterday I pointed out that anyone who writes “we applied the Bradford Hill criteria”, or some equivalent, is sending the message that they do not really understand how to think scientifically and assess whether an observed association is causal.  They seem to recognize that it is necessary to think about how to interpret their results, but they just do not know how to do it.  They certainly should think about much of what was on Hill’s list, but if they think it can be used as a checklist, their understanding seems to be at the level of “all you need to know is to drive slower”.  That puts them a bit ahead of residents of the American sunbelt who do not seem to understand the “slower” bit, and have thousands of crashes when they get two centimeters of snow.  You have to start somewhere. 

Perhaps if it were forty five years ago, when Hill wrote his list, their approach would be a bit more defensible.  As I wrote about Hill’s list of considerations about causation in one of the papers I wrote about his ideas,

Hill’s list seems to have been a useful contribution to a young science that surely needed systematic thinking, but it long since should have been relegated to part of the historical foundation, as an early rough cut.

I would like to be able to say that those who make this mistake are solidly a step above those who think that there is some rigid hierarchy of study types, with experiments at the top.  However, the authors who wrote the paper that appealed to Hill’s “criteria” that I discussed yesterday also wrote, “Clearly, observational studies cannot establish causation.”  As I have previously explained, no study can prove causation, but any useful study contributes to establishing (or denying) it to some degree.  The glaringly obvious response is that observational studies of smoking and disease – those that were on everyone’s mind when Hill and some of his contemporaries wrote lists of considerations – clearly established causation.  (I love that “Clearly” they started the sentence with because I know I am clearly guilty of overusing words like that.  But I certainly would like to think, of course, that I obviously only use them when making an indubitably true statement.)

More generally, it is always an error to claim that there is some rigid hierarchy of information, like the claims that a meta-analysis is more informative than its component parts.  As I wrote yesterday, not only are synthetic meta-analyses rather sketchy at best, but this particular one included a rather dubious narrowing of which results were considered.  The best study type to carry out to answer a question depends on what you want to know.  And assessing which already-existing source of information is most informative is more complicated still, since optimality of the study design has to be balanced against how close it comes to the question of interest and the quality of the study apart from its design.

When authors make an oversimplification that is akin to advice we give children, is a good clue that they do not know they are in over their head.  That is, I suspect that most people who repeat one of these errors not only do not know it is an error (obviously), but were not even of the mindset, as we all are at some point, of saying “uh oh, I have to say something about this, but it is really beyond my expertise, so I had better look up the right equation/background/whatever and try to be careful not to claim more than I can really learn by looking it up.”  Rather, I suspect they thought they really understood how to engage in scientific inference at a deep level, but they are actually so far from understanding that they do not even know they do not understand.  It is kind of like, “what do you mean complicated? everyone knows how a car works; you just turn this key and it goes.”

These errors are a good clue that the authors thought they understood the rest of their analysis, but might have been just as over their heads there too.  I may not be able to recognize where else they were wrong or naive, either because I am not an expert on the subject matter or simply because they did not explain how they did the analysis, as is usually the case.  But the generic sign that they know only enough to be dangerous is there.  This is why I am engaging in anger management self-therapy about these errors, telling myself “when I read things like that, I should not feel like my head is exploding with frustration yet again; rather, I should thank the authors for generously letting me know that I should not take anything they say too seriously.”

If someone writes about a hierarchy of study designs or Bradford Hill criteria, it probably means they are following a recipe from a bad introductory epidemiology textbook or teacher, perhaps the only one they ever had.  This probably also means that the rest of their methods are following a simplistic recipe also.  That certainly does not mean that they did a bad study; the recipes exist because they are passable ways to do simple studies of simple topics, after all.  But if they are trying to do something more complicated than crank out a field study, like do an analytic literature review or sort through a scientific controversy, the recipe-followers are definitely in over their heads.

These errors serve as a shibboleth, or more precisely a shibboleth failure.  Anyone who makes one of those statements is volunteering that he cannot pronounce the tricky test word correctly (i.e., is not really expert in the language of scientific analysis and is just trying to fake it).  We cannot count on everyone to volunteer this signal, of course, and we cannot stop them at the river and quiz them.  This approach is not useful for typical health news reporting, when a reporter basically just transcribes a press release about a research study, because they do not even attempt to make such analyses and so cannot make the error.  But researchers and news-analysis authors (and people giving “expert witness” testimony in legal matters) volunteer information about their limited understanding often enough that we can make use of it.  What is more, though a shibboleth is normally thought of as a way to recognize whether someone is “one of us“, it can be used just as effectively to recognize when someone is pretending to have expertise even if you yourself do not have that expertise.  You can train your ear to recognize a few correct pronunciations even if you cannot lose your own accent.

Unhealthful News 57 – Alcohol consumption is good for heart attack but meta-analyses and "causal criteria" are bad for the health news

Most synthetic meta-analyses are parlor tricks, set-pieces that produce a flashy result but signify nothing.  For those not familiar, a synthetic meta-analysis (which is almost always just called “a meta-analysis” though this is misleading because there are more useful types of meta-analysis) combines the results of studies on a topic based on the fiction that they were results from a single big study and reports the results of this fiction.  Occasionally this is useful and appropriate, but usually it is a misleading exercise.  But like any good parlor trick (or grade school level science museum demonstration show), synthetic meta-analyses tend to impress people who do not understand them and, much worse, give the illusion of education when actually they may do more harm than good to real understanding.

A new analysis, published as two journal articles (the first is the one relevant to this post) and widely over-hyped in the news (example), that looked at alcohol consumption and cardiovascular disease is a perfect example.  First, it should be noted that the authors, a group at the University of Calgary, came to the conclusion that we already knew beyond any serious doubt:  Moderate alcohol consumption is protective against heart disease to an impressive degree, and possible protective against stroke to a minor degree.  As I noted in UN29 this has been well known for two decades.  So averaging together all of the many studies that tend to support the conclusion (and the very few that do not) tells us little we did not know already.

In theory it might help us quantify the effect.  That is, we might have studies that showed a variety of results – reducing risk by 10%, 20%, or 50% – so we were already sure there was a reduction, but the synthesis might let us consolidate them into a single estimate.  But this brings up one of the fatal flaws in the synthesis, the “separated at birth” assumption.  Basically an analysis of this type effectively pretends that the studies being summarized were a single large study and the resulting dataset was chopped up into pieces and analyzed by different research groups, and so the synthetic analysis is putting it all back together.  Obviously this is fiction, since each study was done on a different population, there were a variety of different definitions of exposure, outcome, follow-up time, and the analyses were done differently.  There are a few statistical tricks to deal with some of the differences but these mostly replace one of the many clearly false assumptions with another likely false assumption.

Thus, a study like the new one effectively says “let’s average together this result of a five year study of American men who consumed an average of 1.9 drinks per day with this other twenty year study of French men and women who consumed an average of 1.2 drinks per day with….”  I suspect if they explained it in those terms rather than burying the essence of the methodology in statistical mumbo jumbo most readers would laugh at the concept.  A useful type of meta-analysis, sometimes called a “comparative meta-analysis” or the less descriptive “analytic meta-analysis”, is one that attempts to learn from the difference between studies, rather than pretending there are none.  E.g., there is a great deal of uncertainty about exactly how much drink produces the optimal benefit, so a useful analysis would be to see if the existing studies, taken as a whole, could better inform that.  Another useful comparison of results across studies that has been done is the one that put to rest the myth that this is about red wine rather than alcohol itself; in that case the comparison was to see if there was any substantial difference between the protective effects of different sources of alcohol.

A synthetic analysis inevitably hides all manner of problems, both with the individual studies and with the synthesis itself.  In this case, in order to make the combined studies somewhat more uniform (that is, to make the “separated at birth” assumption a little bit closer to true) the authors restricted the analysis to study results where the comparison group was lifelong non-drinkers.  But one of the persistent criticisms of the claim that alcohol is protective is that lifetime teetotalers in a free-drinking society are an unusual group that might just have extra disease risk due to being anti-social or following minority religious practices, or might refrain from drinking due to some known health problem.  The hypothesis, then, is that it is not so much that moderate drinking helps, but that being a non-drinker means something is wrong with you.  Most of the time this claim is put forth by nanny-state type activists who just do not want to admit the truth about alcohol (usually they omit the phrase “being a non-drinker means something is wrong with you”), but the hypothesis still cannot be dismissed out of hand.  This is why many researchers pursuing this topic have compared the moderate regular drinkers to those more-likely-normal folk who drink occasionally but not never.

Thus, the entire new study was done in a way that told us what we already knew and avoided even looking at the most important reason why this might be wrong.  And it avoided trying to answer any of the unanswered comparative questions.  It did provide an update to previous systematic reviews of the topic, a minor contribution, but did nothing that created any fundamentally new knowledge.  You would never know any of that from reading the news, of course.  In this particular case there is nothing wrong with the effect of the one day of hype that resulted from pretending we had learned something new, which was to get out a public health message that has long been known to the experts but is not widely understood in the population.  However, praising bad practice because it had a good outcome in a particular case is a very dangerous game.

What in my mind is the biggest problem with the new research reports, however, is the bit that starts:

    we can now examine the argument for causation based on Hill’s criteria

Those familiar with what I (and others expert on the subject) have written about the useful but universally mis-interpreted contributions of Austin Bradford Hill will immediately know why I respond to that with:  Really? 

Such a statement indicates a fundamental failure to understand how to think scientifically, a problem rather common in health science.  It is not an error that is likely to be quoted in a newspaper, since it is too arcane for that medium, but some pundits did fall for it.  I will write more about this tomorrow, and later in the series, but to start the point:

Over 45 years ago, Hill gave one of the greatest talks in the history of epidemiology with many brilliant lessons.  It also included a list of points (he called them “considerations”; many authors – like the recent ones – incorrectly refer to them as “criteria”) that are worth considering when trying to decide if an observed association between an exposure and disease is causal rather than confounding.  These are also often referred to as “the Bradford Hill criteria”, which represents confusion about his surname.  Such a common sense list of considerations was a valuable contribution to anyone who had no clue about how to do a scientific analysis to assess whether an association was causal.  It is also equally worth considering a separate point which he did not address in any depth, whether the association in the data might not represent a real association in the world that is either causal or confounding.  I.e., it might have been caused by biases in the data gathering or analysis process.  (If this seems a bit too technical to make any sense to you, do not worry because (a) I will address it again as the series continues (and had an example of some of it here) and (b) it presumably also makes no sense to anyone who would talk about applying “Hill’s criteria”.)

Hill’s list was one of many that have been proposed.  It included some obvious valid points (make sure the ostensible cause occurred before the ostensible effect) and some slightly less obvious ones (consider whether there is a plausible pathway from the cause to effect; observe whether the greater the dose of exposure the more likely the outcome; observe similar results in different populations).  It also contains some points that are as likely wrong as right (make sure the effect is specific, which does not work for the many exposures that cause a constellation of different diseases).  It only obliquely touches on what is probably the best test for causation when it is in question, think like a scientist:  Figure out what the proposed confounding would be; figure out what observational or experimental data would be different if the relationship were causal rather than the confounding; gather and check the necessary data. 

But most important, Hill’s or others’ lists of considerations are not a checklist and cannot be used that way.  The considerations are not definitive: it is easy to list examples where the “criteria are met” but we know the association is confounding and not causal.  Moreover, there are no rules for determining whether one of the considerations “has been met”.  Indeed, in almost every case it is possible to make an observation that could be interpreted as being in line with the consideration (e.g., the Calgary authors argued “the protective association of alcohol has been consistently observed in diverse patient populations and in both women and men”) and also an observation that could be interpreted toward the opposite (e.g., a few paragraphs after the above declaration, the authors wrote, “we observed significant heterogeneity across studies”). 

Another example from the Calgary paper is the declaration that the consideration of specificity was met because there was no protective effect against cancer, only against cardiovascular disease; they could have just as easily have said it was not met because the results were observed for both coronary artery disease and stroke, or because moderate alcohol consumption also protects against gallstones.  Obviously the authors just chose a way to look at specificity that supported the conclusions they wanted to draw.  Finally, there is no method for converting one’s list of observation (“the cause definitely preceded the effect; there is good biological plausibility; there is moderate agreement across studies; etc.) into the conclusion “and that is enough to conclude causation”.  In other words, scientific conclusions require thinking, not recipes.

It has been shown that in general when someone tries to “apply” causal “criteria” they choose from the list of the considerations, pick the ones that fit their goal (either claiming or denying that an association is causal) and apply them in idiosyncratic ways.  This is certainly what was done in the present case.  Indeed, it is the only thing that can be done, since there is neither a definitive list nor a systematic way to apply the entries on it.  In other words, “we applied Hill’s criteria to confirm this was causal” is, at best, similar to saying “we looked at this inkblot and confirmed it is a picture of a butterfly”.  At worst, it is simply blatant rhetoric disguised as science.  Indeed, the last few times I had seen someone actually make an assertion about what “Hill’s criteria” show, it was in consulting reports which were written as the most overt kind of advocacy for a particular conclusion about causation.

Of course, the Calgary authors seem to have come to the right conclusion.  There is little doubt that the relationship is causal based on all we know, though there remains the one source of doubt that teetotaling is caused by or has common cause with poor health.  Funny how they did not mention that possibility.

News reporters, not surprisingly, naively reported the quantitative conclusions from the meta-analysis as if the “separated at birth” assumption were correct and the comparison to teetotalers were reasonable.  More realistically, they had absolutely no idea that either of those was an issue they needed to worry about (and the authors apparently either shared this ignorance or were hiding their knowledge) and did not bother to ask anyone who might know better.  Since most readers will only remember the qualitative conclusion, though, the resulting message was basically correct.  Fortunately this was a case where the right answer was hard to miss.

Unhealthful News 56 – Slumping toward feudalism and other economic observations

Today I noticed in the news that here in Pennsylvania, a state above the average in providing social services, a program to provide health care for the poorest adults in the state has run out of money.  Meanwhile, half of Americans do not realize that the new Congress has not actually repealed the “Obamacare” law, a lame excuse for a health financing system but currently the only hope of a first step in the right direction (i.e., the direction of healthcare not being a luxury for the rich and/or bankrupting the country).  And, “As Mental Health Cuts Mount, Psychiatric Cases Fill Jails”.

Yes, I know that at the outset of Unhealthful News I said I would not report much on financing, and I do have three good epidemiology posts in mind but unfinished.  Actually, I suppose I am not really seriously posting about financing, since all I am doing is listing depressing news about it with no further analysis. 

It has proven to be a very difficult day to focus on the type of health news I analyze when looking at the news today.  I am not just talking about Libya, which is horrifying but by way of being the usual story of people trying to extract themselves from feudalism.  There is also the American people letting themselves be drawn back toward feudalism.  It is a feudalism that has a much higher level of health and material well-being than the visions of castles and peasantry that the word usually evokes, obviously, but feudalism just the same.  For those who do not follow U.S. politics, there is currently a showdown about whether we continue to have functional government employee unions, one of the few remaining bulwarks against growing oligarchy (since the U.S., unlike most Western countries, lacks a strong welfare state, labor unions play a particularly critical role in offering a backstop against wage serfdom).

The tragedy is not even so much that the oligarchs are threatening what is left of the middle class, but that what is left of the middle class is helping them do it.  In a story yesterday about my hometown, a professor from my old haunts put it quite depressingly:

Richard Freeman, an economist at Harvard, said he saw the hostility toward unions as a sign of decay in society. Some working-class people see so few possibilities for their lives that it is eroding the aspirational nature that has long been typical of Americans. 

“It shows a hopelessness,” he said. “It used to be, ‘You have something I don’t have; I’ll go to my employer to get it, too. Now I don’t see any chance of getting it. I don’t want to be the lowest one on the totem pole, so I don’t want you to have it either.’ ”

Of course, I know that no one comes to this blog to read yet another bit of random punditry about the economic situation.  So, I will move on to a brief observation about the economics of health-affecting behaviors:

Yesterday, Chris Snowdon, with the contributions of some of his readers in the comments, offered the insight that price hikes, in the form of taxes on cigarettes, are not entirely unlike prohibition in their effects, especially for the poor.  In particular, they increase the demand for smuggling lower-priced (tax evading) alternatives.  To take that one step further, economists generally think of prohibition as simply a large price hike, and you will be able to much better understand prohibitions if you think of them that way rather that the way they are typically reported, as if they are some qualitative change that suspends the laws of supply and demand.  Imposing a serious risk of legal penalties for owning or selling a good makes it very expensive, but this is no different from something just being so rare or hard to make that it is naturally expensive.  If demand is great enough then the price will be paid.  Of course, fewer people will buy it at the increased price, but quite often some still will do so. 

Indeed, the equivalence of prohibition and price takes other forms too.  The effectiveness of a prohibition regime is typically measured by the street price of the good.  I.e., effectively enforced prohibition quite directly and literally means a higher purchase price, nothing more or less, and ineffective prohibition enforcement is evident in dropping prices, as has been the case with most street drugs over the last decade.  The fact that those who smuggle black market or divert gray market drugs are in jeopardy of arrest or violence means they charge more for their efforts than they would if the market were legal.  If their risks are low, the premium is less unless they can get monopoly pricing by restricting supply, creating a cartel through violence of their own.  If this sounds remarkably like the economics of any other good, that should not be surprising.

So why are we not always at the mercy of monopolies?  The simple economics tells us that in most cases if someone is making monopoly rents, then competitors will be attracted to the market.  One way to keep this from happening is the threat of violence from the government or organized crime, which incidentally are more similar than most people realize.  No, that is not a joke or some kind of ultra-libertarian slogan.  Governments effectively evolved from organized crime, which is little different from feudalism; it offers huge advantages for those in power, who take most of society’s surplus wealth, but offers just enough advantage for the rest of the population (compared to being at the mercy of invaders, non-organized crime, or even greater exploitation) that they do not rise up against it.

Anyway, the funny thing about cigarettes is while large companies can make them very efficiently, if the price is raised high enough, then small operations become competitive and face little risk of legal or illegal violence.  This is evidenced by the story of do-it-yourself growing of tobacco and cigarette making right in New York City.  It is a remarkably inefficient use of labor (as is smuggling).  But if changes to the remarkably efficient American economy that created the middle class in the 20th century force people into such inefficiency, growing crops in one’s back yard becomes the best option for many people.

Feudalism indeed.