Some clarification and followup. Yesterday I cited a blogger who took down a claim that driving is causing obesity, which was based entirely on the “evidence” that both were increasing almost linearly with time. His points were entirely right and quite clever, but I felt that he had understated a key point that could aid understanding of statistical analysis more generally. The point is that it is meaningless to describe two series as correlated if it is impossible for them to not be highly correlated, like if they are both constant or changing almost exactly linearly. It makes little more sense to say they are correlated than to say 1 and 2 are correlated.
But I think it is worth clarifying that it is possible that driving rates really do cause some of the obesity, but that just looking at the correlation is not the way to figure it out. The general point is this: There are right ways and wrong ways to seek evidence of a particular relationship.
In particular for the driving-obesity case, we would like to be able to control for all of the other variables that are causing the time trend in obesity, and then see if there is anything left that is explained by driving rates. This is not possible, however, so the next best thing is to just remove the time trend for obesity by just looking at deviations from the trend (blips off of the trend line). Following this standard approach, we would then look to see if the ostensible cause, driving, explains the blips. However, in this case it clearly does not, since it is linearly increasing and, since it obviously does not explain all of the upward trend in obesity, this evidence suggests it does not explain any of it.
As an aside, there are different ways to model a trend and thus control for it. The simplest is to assume the trend is a straight line on the graph, extended through time, such that the variable “remembers” where it is supposed to be and if it moves off the trend it tends to compensate and go back to it. Also reasonably simple is the “random walk” around the trend, in which any deviation from the trend creates a new point from which the trend resumes and there is no tendency to get back to the old trend line. The latter probably describes more variables accurately, and economics often models it; the former is what is almost always used in epidemiology, probably because they do not teach about the other in most epidemiology courses.
Before I wander too far, there is a concrete news target for this. It was reported this week that:
Washington, a state that has long boasted one of the lowest smoking rates in the nation, has taken a sizable drop from its third-place ranking, tying with Maryland this year for 11th place.
Sounds huge. Except,
Currently, 15.2 percent of adults in the state smoke, up from 14.9 percent last year, according to numbers from the U.S. Centers for Disease Control and Prevention (CDC).
The numbers from the survey that produces those results wander quite a lot, due to random sampling and other errors. (They also are substantially lower than other estimates for US smoking rates, but that is another story.) So the “change” reported there is better described as “no measurable change”.
But in that article and an editorial, the Seattle Times joined the bureaucrats whose budget is threatened in seeking to blame this outcome, at least partially, on cuts to the anti-tobacco budget:
…may be attributed to funding cuts to the state Tobacco Prevention and Control Program, which is aimed at reducing tobacco-related disease and death, state officials say. In the past two years, the prevention program has seen major cuts — almost 60 percent of its funding — with even deeper cuts looming.
The particulars of this are pretty funny:
And in 2009, the Legislature ended the state’s anti-smoking advertising campaign. Ads make a big difference, especially for teens and young adults who are influenced by plenty of pro-smoking ads paid for by tobacco companies peddling flavored cigarettes.
(For those who do not know, there are almost no pro-smoking ads in any influential media, and flavored cigarettes were never a major part of the market and have not been produced by any company with an advertising budget for over five years.)
Consider just one facet, the Tobacco Quit Line, a state-funded prevention program launched a decade ago. The phone service has provided expert advice and useful tools to some 150,000 people trying to kick the habit. But starting July 1, callers will no longer be able to get quit kits, over-the-phone help or nicotine replacement unless they are on Medicaid or have insurance.
(So in anticipation of losing the quit line, lots of people without Medicaid or insurance have started smoking??? Also, look at those numbers: “launched a decade ago” and “150,000 people” – that is about 1/10 of 1% of the smokers in the state getting advice every year. It seems unlikely that this made much of a difference, especially since I believe there are a few other ways to get information these days.)
Anyway, getting back to the point of the day, we could assess whether it really appeared that the budget cuts were affecting smoking rates. I realize that the anti-smoking people have neither the skills for nor any interest in doing good science. But it is possible. Probably the most useful thing to do would be to wait a year and see if the measured rate tics down again. In the context of the above brief bit about trend types, the real trend in smoking probably is closer to the random walk, but the measured rate has a tendency to bounce back from deviations from the trend because many deviations are study error.
For those who want to make an estimate now, they should look at whether budget changes coincided with smoking rate changes at other times, not just the latest politically convenient result. This is not perfect because there will be confounders, but it could be informative. I would bet that: most of the decrease in smoking was before 1998; the budget leapt up in 1998 and stayed high following that; there was some decrease in smoking around then, though much less than in, say, the early 1980s when the budget was very small; the smoking rate was flat for a decade despite the budget continuing to be high. If this is the case, then there is even less of a case to be made for their claim than there is for driving causing obesity. At least for the latter they really did track each other.
Finally, as another revisit to a previous post, recall how in UN163 I discussed the controversy over an apparently bad health economics study and wistfully imagined what it would be like if health science was held to such standards. Today, Krugman, who had commented at the outset, added:
So when the McKinsey alleged study made headlines, the firm was pressed to explain how the study was conducted. And it has refused to answer.
It’s hard to escape the conclusion that the study was embarrassingly bad — maybe it was a skewed sample, maybe the questions were leading, maybe there was no real data at all. Whatever.
The important thing is that this must not stand. You can’t enter the political debate with strong claims about what the evidence says, then refuse to produce that evidence.
Sigh. If only that were the standard. We know about as much about much of CDC’s key data about smoking as we know about what McKinsey did in that study, and no one even complains.
Though I suppose maybe the grass is not entirely greener:
And it’s especially bad when the media give your claims lots of attention, while barely covering the furor over the refusal to explain where those claims come from.