Monthly Archives: October 2010

Another foray into transport safety (car seats) and misleading statistics

This is apropos to nothing that I really work on, but that’s just me.  If I were not inclined to wander off and spend a few hours on curiosity projects like this I am sure I would not do nearly as good a job on my real work.

Numerous times over the last week, I have run into the claim that three out of four automobile child restraint systems (car safety seats) are installed incorrectly.  If you run a web search on some combination of those words you can see how many times it appears.  It seems to be so accepted as a fact that it shows up in news feature headlines, sometimes even when the statistic does not appear in the story itself.  Sometimes the statistic is just asserted as fact or attributed to “surveys” (which is pretty funny; how exactly do you design a survey question that effectively measures whether someone is accidentally doing something incorrectly and does not realize it).  I suspect this is enough for most readers to not hesitate to accept the assertion.  Slightly more skeptical readers presumably are convinced with the statistic is attributed to the U.S. National Highway Traffic Safety Administration (NHTSA)

However, anyone who stops to think about it, however, has to ask the same question they should ask about any number of other claims by authority figures: “How can they possibly know that?”

My guess was that it was, at best, based on a single study which probably did not really show what is being claimed, and decided to check that hypothesis.  It turned out to be rather more difficult to trace than I expected.  NHTSA does indeed make the statement quite prominently on their web page, but without linking the claim directly to any scientific evidence.  I suspect this will not come as a shock to most readers of this blog who are familiar with the U.S. government agencies like the CDC and NIH putting out empirical claims — often misleading and sometimes out-and-out false — in the form of catchy illustrated propaganda text that would be the envy of any snake oil salesman.

After looking through several NHTSA pages and reports, it became fairly clear that it must trace back to one particular report: DOT HS 809 671, January 2004, “Misuse of Child Restraints”.  I was forced to conclude “fairly clear” because there was never once a clear reference from the 3/4 statistic to that report (it just seemed to show up as a reference when the statistic was claimed) and all the links to the report were 404 — the report had mysteriously disappeared from the NHTSA website.  NHTSA seems to base their knowledge about the topic on reports by one of their researchers (Lawrence E. Decina) over the last 15 years or so, including that missing report.  Fortunately I found a third-party archived copy from when the URL was live (many years ago), and confirmed that that study seemed to be the source of the statistic.

So what did the study actually show?  I have to admit I was a bit surprised that it seemed to be based on a fairly unbiased sample; the study population was probably a bit lower SES than the American average, but was not — as I expected — based on people who were self selected as doubting their installation (i.e., it was not based on people who showed up to have their installation inspected by experts).  Instead, drivers were recruited as they entered various parking lots in 2002.  This contrasts with the bias from a strangely similar finding reported recently here in Pennsylvania, which was based on modern data people who actively sought help from the police to check their installations (a bizarre coincidence that they also found “three out of four” given that they were measuring something quite different).

However, there was still a major problem in terms of how the results were reported.  The NHTSA study’s definition of a faulty installation was rather liberal.  The vast majority (almost all) of the recorded mis-installations consisted of not getting the straps (those holding the kid in the seat or those anchoring the seat to the car) tight enough.  Presumably some of these problems were so bad that the system would not have held in the event of crash.  But almost certainly most of them were still well within the range of functionality, just not up to to recommended standard (e.g., you can fit no more than one finger between the baby and the strap) which presumably includes a rather substantial margin for error (I doubt even an infant would slip out of the system through two fingers’ worth of slack).

In spite of this, NHTSA and others are clearly trying to portray the statistic as if it represented widespread consequential failure that needs aggressive attention.  They do this even as they also report on the wonderful improvements in instructions and ease of use for this equipment, and their educational successes, and thus the huge reduction in risk achieved.  Not surprisingly, this resembles the tactics of the anti-tobacco activists in government and elsewhere who want to claim both (a) things are still terrible, so you need to give us more money and (b) we have done a lot to improve things, so we are worthy of more money.  Their method for resolving the contradictions in these claims is to shamelessly make both of them and hope no one will notice.  (The contrast with the anti-tobacco activists, however, seems to be that NHTSA is right when they claim to have contributed to substantial progress over the last decade.)

NHTSA, the press, and others quoting the statistic also fall into the bizarre pattern, which I have written about before, of treating a quantitative social phenomenon as if it is somehow a physical science constant.  Even if the endlessly repeated figure were an accurate portrayal of the situation in 2002, things have inevitably changed (pretty clearly for the better) in the ensuing decade.  It would not be much different if they reported alarmist statistics about how few people wear seatbelts based on data from 1970.

However, I do want to give them props for not falling into the absurdity of reporting the 72.6% from the original study, implying that they have that level of precision.  NTSHA may be a bit alarmist, but at least they understand the concept of rounding and precision.

Note that in fairness to the actual NTSHA scientists, Decina et al., there is nothing in their line of research that suggests the authors are trying to create propaganda.  Perhaps it would have been useful if the 2002 research protocol had called for separating minor problems from those that made catastrophic failure likely, but that was a limitation of the study.  Readers more familiar with the anti-tobacco (anti-soda, etc. etc.) “research” in which the authors are clearly aggressively trying to write propaganda will notice the contrast.

An additional point on the topic:  If the statistic were really an accurate picture of frequent current important failure, it would represent a remarkable process failure or equipment design failure.  That is, if 3/4 of parents really used this equipment in a way that was destined to fail if needed then the equipment design was terrible and/or some kind of professional intervention was needed (e.g., since it is accepted that the equipment be required by law then formal instruction or sign-off by safety personnel or licensed installers should be required too, since the requirement would be 3/4 moot otherwise).  And yet it gets blamed on operator error and it is considered acceptable to just lecture the operators (parents) for installing the seats incorrectly.

But in this case or any other, if 3/4 of operators are doing the wrong thing, then it is not really their fault.  It is a design flaw.  Do you ever notice those situations where in some place where the public/customers interact with an installed system, and there there is a sign scrawled emphatically telling people what to do, such as “insert card HERE!!!!!” or “DO’NT Use THIS DooR”, directing them away from the obvious choice that everyone seems to try because it looks right.  This is usually accompanied by some nearby clerk who expresses exasperation about how *everybody* is so clueless that they cannot figure it out.  Somehow it does not seem to cross anyone’s mind that the hardware/layout or process is what needs fixing, not the skills and intuitions of the majority of the population.

[Note:  operator error is often referred to, rather strangely, as “human error”, as in “the crash was caused by human error”.  This seems to imply that the hardware and non-proximate decisions (systems) were designed by chimpanzees or maybe cows, which I will grant sometimes does seem to be the case.  Also, credit to this taxonomy of causes of accidents goes to sociologist Charles Perrow, who also noted that there are very few accidents that do not have multiple component causes (epidemiology talk, not his language) such that hardware/design, systems, and operator decisions all contributed.]

Vargas Llosa on smoking, quitting, and living

In honor of Mario Vargas Llosa’s Nobel Prize, I thought I would recall what I (and perhaps only I) think is the best thing he ever wrote.  It was an op-ed from the New York Times from 1 Sept 2000, entitled “A Languid Sort of Suicide”.

He begins by saying “since I stopped smoking 30 years ago, I have detested cigarettes and their manufacturers” and later describes smoking as an “unmitigated cataclysm”.  But having demonstrated a self-awareness about his emotional attitude toward smoking, cigarettes, and manufacturers, he — unlike the obsessed anti-tobacco extremists who have cultivated such hatreds until nothing else seems to matter to them — he goes on to make clear that this clearly does not justify twisting all of society around his pique.  In particular, he objects to liability awards against cigarette companies. 

He tells the story of starting to smoke, having it a major part of his life, and then quitting and persuading others to quit, becoming a best-case scenario for ceasing and being delighted by the choice.  Yet he waxes quite eloquently about the appeal of smoking as a lifestyle, a reminder to those who pursue anti-tobacco extremism (who, of course, will pay no attention) and those of us who advocate THR (who try to listen).

The last three paragraphs make the column a timeless classic.  I reproduce them below, pushing the barriers of fair use, perhaps, but the column is difficult to find now:

The obligation of the state, in a democratic society, is to make citizens aware that tobacco is harmful, so that they can decide with adequate knowledge whether to smoke.  This, indeed, is what is happening in most Western countries.  If a person in the United States, France, Spain or Italy smokes, it is not out of ignorance of what this means for health, but because he does not wish to know, or does not care.

To commit suicide by degrees is a choice that ought to figure on the list of basic human rights.  This is the only possible approach if we wish to preserve the freedom of the individual, which must include the freedom to opt not only for what is beneficial to him, but also for what harms or injures.

And so, though at first sight, the decision of juries to impose astronomical penalties on the tobacco companies may seem a progressive measure, it is not so.  What sort of freedom would it be that allowed us only to choose what is good for us?”

“…only shows correlation, not causation…”

This is a correction of a pet peeve of mine (and, more importantly, a major fallacy about science), so I can refer back to when I need to. I am thinking of it as akin to programming a keyboard macro to type your address with a single keystroke.

There are a lot of advantages and disadvantages of different epidemiologic study designs and similar and other studies in other fields. Moreover, specific implementations can also be worse or better than average. I am setting all that aside here to make a single point:

Many people fall into the trap of parroting the claim that observational studies (i.e., studies where “exposures” – possible causes of the outcome of interest – are allowed to happen naturally, without intervention) can only show a correlation between the proposed cause and effect, without directly showing causation. This is quite right, but in a very misleading way, because the subtext is that experiments (aka intervention studies; i.e., where the ostensible causes are assigned by the researcher) do not also fit this description, when in fact they do. That is, no study design allows us to observe causation, so all quantitative studies are based on “mere” correlation. It is true that some studies get us a lot more comfortable about being able to infer about the causal relationship of interest, but there is no simplistic recipe for identifying which ones those are.

The key point is that we cannot observe causation. It is actually a bit difficult to even define it, and there are some competing definitions. But the most intuitive and most widely cited is the counterfactual definition, in which exposure E causes an outcome D if and only if D occurs when E occurs but not otherwise (all else remaining unchanged except for other factors that are caused by E and thus might be intermediate steps on the way from E to D). This is phrased in terms of a specific individual E and D; if it is ever true for an individual you can then say for the entire class of E and D (such as for each person in a population), “E sometimes causes D”, which is usually shortened to “E causes D”.

This is called counterfactual because for one precisely defined E (e.g., a condition of a particular person at a particular time) it is only possible to observe a world where E occurs or one where it does not, not both. So we cannot know that D occurred with E but not otherwise, only one or the other. The other is the counterfactual (counter to the actual facts of the world) part. (Note that I am using the variable letters from Exposure and Disease, following standard epidemiology teaching but the exact same principles apply to any science and should be obvious for any other social science.)

To try to deal with the counterfactual conundrum, we try to observe something as similar as possible to the counterfactual states of E that did not occur. Typically this takes the form of observing similar people who have a different value for E, but can also be the same person at different times. The more similar the comparison observations, the more confident we can be that we are observing the manifestation of causation, all else equal, though this is never the same as observing causation itself. Some substitute observations are quite compelling: Someone suffering a head injury at about the same time he is in a car crash makes a pretty compelling case for causation; he can be compared to the countless minutes when someone does not suffer a head injury and is not in a car crash. It is still possible that it was mere coincidence or that the injury caused the crash or something else, but those seem incredibly unlikely. Some aspects of that “incredibly unlikely” can be quantified, but not all; some remain the art of scientific inference and defy recipes.

When we study something where our “subjects” are effectively exactly identical, we can do even better than the car crash example. If we impose one exposure on a million molecules, watching to see if an outcome occurs, and compare them to another million of the same molecules that are not exposed, and it is almost as good as actually seeing the counterfactual version of the population. That is, we think of the second million of the molecules a day later as being an effectively perfect substitute for the first million in the counterfactual state of not having the exposure. But this is still a highly-confident educated conclusion about sameness and not the same as observing the mysterious construct that is “causation”. It is a case of observing correlation between the exposure and outcome and being quite confident of the explanation.

Trying to mimic this situation in epidemiology, we often turn to the randomized clinical trial (aka randomized controlled trial, RCT), a term adopted because some people want to obscure the proper descriptor: medical experimentation on people. In such studies, exposures are assigned to the experimental subjects, as they were with the molecules. The simplistic and incorrect interpretation of why we do such experiments is because this allows us to observe causation while letting exposures occur naturally (an observational study) merely shows correlation. But as noted above, both show correlation and neither allows the direct observation of causation.

So what is the real advantage of RCT experiments?

In a purely observational study we have the problem that is called confounding (better labeled *systematic* confounding) which means that there are factors that are different (on average) between the people who have exposure E and those who do not, *other than the exposure itself or anything it causes*, and these factors that cause different outcomes with respect to D (e.g. they make D more likely). These differences might be mistakenly attributed to E. For example, smokers have worse health outcomes than nonsmokers for many diseases for reasons other than smoking; that is, health outcomes would differ between smokers and nonsmokers even if none of the “smokers” actually smoked. Since this difference is the case for the entire population the problem is not just a matter of sampling variation in the study (the type of error that is quantified by confidence intervals); it would still exist if you were able to include everyone in your study. Similarly, it is not caused by measurement error or any other goof. It is a real difference between the people who happen to have E and those who do not.

[Note that confounding is often confused with the badly named “confounders”, which are probably best called “unconfounders”. These are characteristics that are measured (or could be) and thus are variables that can be “controlled for” to reduce the confounding. Confounding exists if the exposed and unexposed populations are different, in terms of propensity for the outcome apart from what the exposure causes, regardless of the identification of confounders. Confounding does not require or promise that we can identify a confounder (i.e., unconfounder) variable.]

So, in an experimental study we break the connection between E and any other factor by assigning E randomly. On average, then, confounding is eliminated (if everything is done according to the theoretical perfect rules, which is seldom quite the case). There actually still is confounding – by bad luck we might assign more people who just have a propensity toward D to one exposure group – but this confounding is no longer systematic. It is random and can therefore be captured using random error statistics (confidence intervals, p-values, etc.). Eliminating systematic confounding as an explanation for a correlation in your data is a great advantage, but it is not the same as going from not seeing causation to seeing it. You have merely ruled out one of the many possible explanations that compete with causation to explain the correlation in the data. Of course, this might be enough to make you extremely confident that you are seeing the manifestation of real causation, but you still never see the causation. But still, you should pay attention to the phrasing there: “make you”. The conclusion about causation is still in the mind of the observer, not in the data.

With this in mind, it becomes possible to realistically discuss how some studies might have advantages over others in terms of causal inference. An experiment on people can eliminate systematic confounding as an explanation for observed correlations. On the other hand, it might not offer a very good measure of the phenomenon of interest, so there are tradeoffs, art rather than blind, naïve following of recipes, and no bright lines. In future analyses, I will present such points in context, referring back to this note as the “you never observe causation, only correlations with different degrees of support for causal inference” macro.