Last week, Stephen Colbert had Bem on his show, thanks to the publicity surrounding the study, and I felt like Bem also did not understand important things about his research.
Colbert naturally played for the laughs (it was quite funny) by focusing on Bem’s use of pornography in his experiments. Bem could have explained this simply by saying, “we needed a way to create an intense experience for someone in an experimental setting, and for obvious reasons we could not introduce them to their future spouse or smack them in the head, so this was the easiest choice.” Instead, the conversation probably gave the viewer the impression that the extent of the scientific inquiry was whether ESP works for porn.
Rather more important, though, was Bem’s bungling in explaining the statistics. Colbert challenged him with the observation that the experiments showed only 53% of the subjects’ making the correct choice, the “yes there is ESP” result, compared to the 50% that would have occurred totally at random. This is a sensible question from an intelligent layperson, as we would expect from Colbert. It is one of those points where intelligent intuition kind of lets us down and sorting out the actual math is important. Unfortunately, Bem went off on this bizarre discourse about how 53% was the same as the popular vote for Obama in the presidential election and a few other examples with which he tried to claim it was really not a small number.
This was a totally incorrect explanation.
The right explanation is that when you are demonstrating that a phenomenon exists (versus does not exist at all) then the slightest occurrence is interesting. So as long as you have such a large sample size that 53% is very unlikely to occur by chance (for those who do not fully understand that point, I will likely cover it later in the series), then you have shown that something seems to be happening. (Note: I am not sure how large a sample of observations Bem collected in his various experiments, but I am going to assume it was enough to mean that 53% was unlikely to occur by chance.) If you have been very careful about designing your experiment or observation (which I obviously cannot attest to in this case, but let’s assume it) so that the only explanations seem to be “the phenomenon of interest has been observed” or “this result was caused by unlucky random sampling”, then if the latter is statistically unlikely you have supported the former.
If “remembering the future” occurs only 6% of the time (i.e., of the 50% of the people who would have guessed wrong if they were being totally random, 6% of them actually get it right due to something that happens in the future, for a total of 53% right rather than the 50% from luck alone), that is a lot more than we would have guessed. (As I noted before, that does not mean his research alone demonstrates this extraordinary claim, but assuming there are not obvious flaws, it supports it.) Indeed, if he had a huge amount of data, even if his results suggested ESP works only 1% of the time, even that small number could be very unlikely to be explained by chance. There are probably plenty of places where his methods and results can be challenged, but Bem should have been able to explain that this percentage was not one of them, or acknowledge that his study was, in fact, too small and that the 3% could have been chance; either way, the popular vote comparison is very misleading.
Moreover, for something where quantity, not just existence of the phenomenon, matters, 3% is actually quite small. Obama winning with 53% of the vote means that almost as many people preferred someone else as preferred him. Similarly – and this is important for understanding the health news – if someone claims to have discovered a 6% increase in the risk for a disease due to some exposure it really does not matter much. For one thing, there are many more things that can go wrong with health studies, so even if the sample size is quite large (and the result is “statistically significant”) this just means that random sampling is unlikely to explain the result, but any number of other problems might. The nice thing about a simple research goal like Bem’s is that most of those complications can be eliminated. But even if a health study were perfect, a result of 6% would likely not have any practical implications. The existence of ESP is interesting, no matter how small the effect; a small change in the frequency of occurrence of a particular health outcome often does not matter much.
Yet health studies are typically reported in the news as if the mere existence of a risk, rather than its magnitude, is what matters. If only Bem would have explained that magnitude matters for most questions, like Colbert implied, but in a few rare areas like his experiments the mere existence of something is what is interesting. If only Bem could have anticipated the question and prepared a better answer. (sorry – couldn’t resist)