Since the health news remains dominated by radiation risks from Japanese reactors, which I will probably write about again but am not yet ready to, today’s post is methodologic background. This is long and somewhat technical, but my goal is to explain it at a level that my regular readers can make sense of it. On Sunday I alluded to occasions when controlling for confounding is not expected to make a study estimate better, and may even clearly make it worse. I have written about confounding here a few times, but thought I could summarize so I can refer to it (including in my post planned for tomorrow). Yes it is long, but it takes good epidemiology students weeks to learn this material, so I just could not condense it any further.
In newspaper stories about most epidemiologic studies, there are words like “the study controlled for the effects of age, race, smoking, and other variables”. I believe that most readers have a bit too much faith in what this means. I know that most people doing the studies have a bit too much faith. Basically this means that when analyzing the association between the exposure and disease of interest in the data, the analysts included some measure that they had of a few other exposure variables in an attempt to separate the effects of each exposure and thereby isolate the effect of the one of interest. (Note that “exposure” can include personal characteristics like age and sex, which is not the way the word is used in common language.)
Why do we do this? It is possible – indeed, almost inevitable – in an observational study that there will be characteristics that are different between the exposed (they have the exposure of primary interest for the particular analysis) and unexposed populations, apart from having the exposure itself. (To keep that sentence simpler, because it is quite important to understand, I left out a more complicated ending: “…apart from having the exposure itself or something caused by the exposure, or caused by the disease outcome that is affected by the exposure, or a few other more complicated relationships.” I hope that at least the first of these complications is clarified in what follows.) If such differences between the two groups – the differences that are not the exposure of interest itself (or something it causes etc.) – affect the disease outcome, then we have a problem because this will alter the estimated effect of the main exposure. If there is another characteristic that happens to be more common among exposed people, and it causes an increase in disease, then if we are not careful we might blame that increase on the exposure.
An important point to understand, which if you understand it will put you ahead of most people writing epidemiology, is that confounding is different from the confounders you hear about. Confounding exists, basically, if there is a difference in the probability of the outcome between the exposed and unexposed groups that is not caused by the exposure (notice the lack of mention of any other variables or pieces of data – you do not have to know what is different between the groups, let alone have a measure of it). Put another way, if everyone in the exposed group had been forced to be unexposed but otherwise left unchanged, they still would have had different outcomes from the group that actually was unexposed. For example, smokers have poorer health outcomes than nonsmokers, even apart from the effects of smoking, which means that most studies will blame smoking for not just its own effects, but some of the other differences between the groups.
A confounder is a variable that you think should reduce the confounding if you can “control for” its association with the outcome; this may mean that that characteristic is more common in one group (exposed or unexposed) and you think it caused the confounding (that is the easiest way to think about it), but it might just be a associated with whatever actually caused the confounding. Thus, it has been pointed out that a better label for these variables is deconfounders or unconfounders, since we are hoping they will reduce confounding, regardless of whether they caused it. A good example is the ubiquitous “race/ethnicity” variable; race causes very few outcomes but it is sometimes a somewhat useful proxy for social class, living conditions, or other variables that might actually be causing confounding in a study.
Unfortunately, because of measurement error, random error, and some variables being poor proxies for what we would really like to measure (e.g., race is correlated with social class or wealth, but obviously far from perfectly), controlling for a confounder will almost never completely eliminate the confounding. This is why I said that studies of smoking will generally blame smoking for some of the confounding rather than just its own effects, without regard to whether we are actually “controlling for” other variables. A study should attempt to control for predicted confounders, but it will not be able to do it perfectly, and often will do so quite badly.
Example: Smokers tend to have other unhealthy behaviors, be less wealthy, and have have psychological difficulties (which often are why they to smoke). A study that includes variables for race, whether someone is employed, and their history of disease diagnoses will control for some of that. But it is obviously not nearly a perfect measure of the factors we are really worried about, and thus cannot control for all of the effects of those factors. Thus there will be what is known as “residual confounding”.
Residual confounding is often mentioned as if it were some odd problem that sometimes happens, but it basically always exists if there was confounding. For example, attempts to control for someone’s smoking when studying something else are generally imperfect. Imagine you are studying an exposure that you know is more likely the more someone smokes, so you try to control for the effects of smoking in a typical way, by categorizing people into: nonsmoker, smokes 1-10 per day, smokes 11-20 per day, smokes >20 per day. That helps, but there is still an association within each category, that someone who is in the 11-20 category is more likely closer to the 20 than the 11 if they have the exposure, so there is still some confounding even within the category.
The simple bottom line is that if a study controls for confounding and finds that it matters, it is probably the case that the actual confounding is a bit worse than the estimated (and thus controlled for) confounding, and so some of it is still in the main effect estimate. This means that if, say, a study found an estimated relative risk of 2.5 without controlling for confounders, but adjusted to an estimated 1.7 when controlling for confounders, then (if we know nothing else other than what appears in this sentence) we should believe that the true value is something less than 1.7 because the controlling for the confounding was likely an under-adjustment. Remarkably few people realize this.
The other side of this are cases where a variable should not be included in the calculation as an un-confounder because they do not un-confound. Not every variable that happens to exist in a dataset will tend to reduce confounding if it is included. To over-simplify a bit, but to cover most cases in an understandable fashion, you basically want to control for variables — and only those variables — that are associated with the exposure of interest and that have an effect on the disease (or are associated with some unmeasured variable that has such an effect), but not if they are caused by the exposure.
To take an obvious example, the last digit of someone’s phone number might be in the dataset, but obviously has no effect on anything. Yet if you include it in the statistical calculation it will have some effect on the result. Usually that result will be small, and in a perfect world (in particular, if we have a very large number of observations) it will almost certainly be negligible. But in most studies the number of observations is small enough that some variables like phone number digit – basically series of random numbers – will affect the effect estimate.
If researchers are honest, the inclusion of a pointless variable which happens to alter the effect estimate would just be a matter of bad luck and the probability of such bad luck is captured in the “we might have had bad luck” measure (i.e. the confidence interval). But sometimes dishonest researchers, or the not-much-different researchers who simply do not understand what they are doing, try all their variables to see which ones produce a “better” answer (typically researchers are hoping to find a higher risk estimate, though not always). They then choosing to include any variables that move the answer in the “better” direction, creating bias that is not captured in the error statistics. That is, the answer will probably be in the “better” direction from the true value, but the confidence interval will not “know” this has been done, so it will be calculated based on the assumption that the researcher did not play these games, so it will be wrong too. This is an example of what I have labeled publication bias in situ. To return to the artificial example, if you were one of these dishonest researchers, you might notice that including the variable “is the last digit of the subject’s phone number a 1, yes or no?” has no effect, but you can keep trying and perhaps discover that adjusting for having last digit 7 or 9 has an effect in the “right” direction. So you put in the variable “is last digit 7 or greater” and get a result you like better.
You would probably not get away with that one, but if it was something only slightly less silly you probably would get away with it. You would not tell readers that you tried a lot of different models and picked the one you liked, and the reviewers would never know it either (they only see the same paper that readers eventually see). Readers and reviewers will be under the impression that you had a theory that the model you used best represented the real world, and that is why you chose it. This is a huge problem in epidemiologic publishing, and one of the many ways in which peer review fails. When I review a paper where this fishing around for a model seems likely, I always insist that the authors report a list of the different models they tried before settling on the one they reported; it is a very rare editor that bothers to pass on this request since it does not fit their goal of quickly deciding to reject or accept the paper (and as far as I can recall, the authors have never complied with the request, being embarrassed about the answer or simply knowing that if they are facing a demanding reviewer at one journal it is easier to just move onto another journal).
It should be evident that including variables that have no plausible effect on the outcome (or that are not plausibly related to the exposure status in certain ways – I left out that bit to keep it simpler) in your model: (a) is not going to correct for confounding; indeed, if it changes the estimate, the change is just as likely to be further from the truth rather than closer to it; and (b) can be used to intentionally bias a result. Thus it should be clear that such variables should never be included in the statistical analysis.
It is even worse to include certain other variables. The most obvious is something that is intermediate in the causal pathway between exposure and disease. Using the example of alcohol protecting against heart attack, which I have addressed in some previous posts, it appears that a benefit of alcohol is improving blood lipids (increasing good cholesterol and such), which reduces heart attack risk. But some studies – notably the ones that estimated there was little benefit – “controlled for” the effect of blood lipids. I remember realizing this was a problem when this literature was first coming out. I doubt that the authors were intentionally biasing their results, though it is possible. (I was in graduate school when I figured that out, without ever having been taught the theory. That moment of revelation may be responsible for my ongoing interest in the theory of controlling for confounding and the fact that it is usually done incorrectly and even dishonestly.) In the alcohol case, I think rather than intentionally trying to make the good effect of alcohol go away, it was just a case of the bad education in research methods that medical researchers usually have; they though: “blood lipids affects heart attack risk and so we are supposed to control for it”. But as you now know, that is not the rule. (If someone did the same thing today, however, I would suspect that they were intentionally trying to bias their result to support a temperance agenda. Times have changed for the worse.)
A similar problem comes from “controlling for” something that is caused by the exposure. So if you are studying smoking and lung cancer, but control for lung function (which is affected by smoking), it will diminish the measured effect. In a case like that, you have controlled-away some of the actual effect by, basically, including two measures of the exposure (smoking, and smoking-caused lung problems) that then split the estimated effect between them. Unfortunately things start to get a lot more complicated at this point That is, unless that other factor causes also causes the disease and is affected by the exposure of interest, but also is affected by some other characteristic that is different between the exposed and unexposed groups. Then it really gets annoying and simple rules for what to include start yielding to more complicated rules and methods.
But if most people writing in the field would merely get the over-simplified rules right it would be a great improvement. The currently popular method for figuring out what to control for involves drawing a little box-and-arrow diagram, mapping the various factors that cause other factors, and following some rules about which ones are confounders and should be controlled for (it conceptually simple, that is – filling in the scientific reasoning can be quite a challenge – if it were not, everyone would do it correctly, after all). Students with good professors are being taught that. Unfortunately, what most people writing in the field think, and apparently what students in bad epidemiology programs (which is to say, most epidemiology programs) are still being taught today, is that something should be included as a “confounder” if including it substantially changes the effect estimate. (Again, that is the honest-but-clueless version; dishonest researchers can do worse by choosing to include those variables that change the result for the “better”.) Actually, that is not quite fair: I infer from the textbooks and typical practice that they are taught the simplified version of the right rules (the one that I presented here) to recite on a test, but then are taught that when they are actually working with the numbers they should just including anything that has a big effect, anything up to, and possibly including, having a 7 in one’s phone number.
The font seems to get smaller on this blog every day!
Woops, sorry. I actually am not entirely sure how to control that (I am open to a lesson in that) and was posting from a different place. I will try to fix any recent ones that violate the format standard.
I give up. Can you, Chris, or someone who posts with Blogger give me a hint. I changed the font to Default, which cleared up the problem of the font style being different. But changing the size to Normal did nothing, and when I changed it to Large (which worked) and back to Normal, it just stayed small.
Excellent presentation Carl! We should round up all the antismoking “researchers” out there and send them to a summer camp where you could beat some statistical sense into their brains!