This is a correction of a pet peeve of mine (and, more importantly, a major fallacy about science), so I can refer back to when I need to. I am thinking of it as akin to programming a keyboard macro to type your address with a single keystroke.
There are a lot of advantages and disadvantages of different epidemiologic study designs and similar and other studies in other fields. Moreover, specific implementations can also be worse or better than average. I am setting all that aside here to make a single point:
Many people fall into the trap of parroting the claim that observational studies (i.e., studies where “exposures” – possible causes of the outcome of interest – are allowed to happen naturally, without intervention) can only show a correlation between the proposed cause and effect, without directly showing causation. This is quite right, but in a very misleading way, because the subtext is that experiments (aka intervention studies; i.e., where the ostensible causes are assigned by the researcher) do not also fit this description, when in fact they do. That is, no study design allows us to observe causation, so all quantitative studies are based on “mere” correlation. It is true that some studies get us a lot more comfortable about being able to infer about the causal relationship of interest, but there is no simplistic recipe for identifying which ones those are.
The key point is that we cannot observe causation. It is actually a bit difficult to even define it, and there are some competing definitions. But the most intuitive and most widely cited is the counterfactual definition, in which exposure E causes an outcome D if and only if D occurs when E occurs but not otherwise (all else remaining unchanged except for other factors that are caused by E and thus might be intermediate steps on the way from E to D). This is phrased in terms of a specific individual E and D; if it is ever true for an individual you can then say for the entire class of E and D (such as for each person in a population), “E sometimes causes D”, which is usually shortened to “E causes D”.
This is called counterfactual because for one precisely defined E (e.g., a condition of a particular person at a particular time) it is only possible to observe a world where E occurs or one where it does not, not both. So we cannot know that D occurred with E but not otherwise, only one or the other. The other is the counterfactual (counter to the actual facts of the world) part. (Note that I am using the variable letters from Exposure and Disease, following standard epidemiology teaching but the exact same principles apply to any science and should be obvious for any other social science.)
To try to deal with the counterfactual conundrum, we try to observe something as similar as possible to the counterfactual states of E that did not occur. Typically this takes the form of observing similar people who have a different value for E, but can also be the same person at different times. The more similar the comparison observations, the more confident we can be that we are observing the manifestation of causation, all else equal, though this is never the same as observing causation itself. Some substitute observations are quite compelling: Someone suffering a head injury at about the same time he is in a car crash makes a pretty compelling case for causation; he can be compared to the countless minutes when someone does not suffer a head injury and is not in a car crash. It is still possible that it was mere coincidence or that the injury caused the crash or something else, but those seem incredibly unlikely. Some aspects of that “incredibly unlikely” can be quantified, but not all; some remain the art of scientific inference and defy recipes.
When we study something where our “subjects” are effectively exactly identical, we can do even better than the car crash example. If we impose one exposure on a million molecules, watching to see if an outcome occurs, and compare them to another million of the same molecules that are not exposed, and it is almost as good as actually seeing the counterfactual version of the population. That is, we think of the second million of the molecules a day later as being an effectively perfect substitute for the first million in the counterfactual state of not having the exposure. But this is still a highly-confident educated conclusion about sameness and not the same as observing the mysterious construct that is “causation”. It is a case of observing correlation between the exposure and outcome and being quite confident of the explanation.
Trying to mimic this situation in epidemiology, we often turn to the randomized clinical trial (aka randomized controlled trial, RCT), a term adopted because some people want to obscure the proper descriptor: medical experimentation on people. In such studies, exposures are assigned to the experimental subjects, as they were with the molecules. The simplistic and incorrect interpretation of why we do such experiments is because this allows us to observe causation while letting exposures occur naturally (an observational study) merely shows correlation. But as noted above, both show correlation and neither allows the direct observation of causation.
So what is the real advantage of RCT experiments?
In a purely observational study we have the problem that is called confounding (better labeled *systematic* confounding) which means that there are factors that are different (on average) between the people who have exposure E and those who do not, *other than the exposure itself or anything it causes*, and these factors that cause different outcomes with respect to D (e.g. they make D more likely). These differences might be mistakenly attributed to E. For example, smokers have worse health outcomes than nonsmokers for many diseases for reasons other than smoking; that is, health outcomes would differ between smokers and nonsmokers even if none of the “smokers” actually smoked. Since this difference is the case for the entire population the problem is not just a matter of sampling variation in the study (the type of error that is quantified by confidence intervals); it would still exist if you were able to include everyone in your study. Similarly, it is not caused by measurement error or any other goof. It is a real difference between the people who happen to have E and those who do not.
[Note that confounding is often confused with the badly named “confounders”, which are probably best called “unconfounders”. These are characteristics that are measured (or could be) and thus are variables that can be “controlled for” to reduce the confounding. Confounding exists if the exposed and unexposed populations are different, in terms of propensity for the outcome apart from what the exposure causes, regardless of the identification of confounders. Confounding does not require or promise that we can identify a confounder (i.e., unconfounder) variable.]
So, in an experimental study we break the connection between E and any other factor by assigning E randomly. On average, then, confounding is eliminated (if everything is done according to the theoretical perfect rules, which is seldom quite the case). There actually still is confounding – by bad luck we might assign more people who just have a propensity toward D to one exposure group – but this confounding is no longer systematic. It is random and can therefore be captured using random error statistics (confidence intervals, p-values, etc.). Eliminating systematic confounding as an explanation for a correlation in your data is a great advantage, but it is not the same as going from not seeing causation to seeing it. You have merely ruled out one of the many possible explanations that compete with causation to explain the correlation in the data. Of course, this might be enough to make you extremely confident that you are seeing the manifestation of real causation, but you still never see the causation. But still, you should pay attention to the phrasing there: “make you”. The conclusion about causation is still in the mind of the observer, not in the data.
With this in mind, it becomes possible to realistically discuss how some studies might have advantages over others in terms of causal inference. An experiment on people can eliminate systematic confounding as an explanation for observed correlations. On the other hand, it might not offer a very good measure of the phenomenon of interest, so there are tradeoffs, art rather than blind, naïve following of recipes, and no bright lines. In future analyses, I will present such points in context, referring back to this note as the “you never observe causation, only correlations with different degrees of support for causal inference” macro.