I am writing this mostly as a placeholder for some thoughts emerging from modeling work I am doing right now. I thought that some of my more technical readers would find it interesting and maybe some of you (or at least one — talking to you, Prof. I.B.) could help me think this through or maybe even identify where others have made the same observations.

The work that really put me on the map (presented to much acclaim in 1999 though I could not get it published in final form until 2003) was presenting a way to properly report the uncertainty around epidemiologic estimates. To explain, the only uncertainty around point-estimates results in epidemiology that was (and still almost always is) the confidence intervals. These provide a heuristic measure of about how much random sampling error there is in a result. But the reporting of CIs tends to obscure all the other non-random errors in the result for most readers (including most people who claim to be experts in the field). People see this error bar around the estimate and assume that it really represents how uncertain the estimate is, which it most certainly does not do. Thus, in some sense, the dutiful reporting of one measure of uncertainty serves as much to hide uncertainty as it does to report it.

What I did was propose a way to report an estimate of the impact of other types of errors (measurement error, sampling bias, etc.) in addition to the random sampling error. The method that I used to do the calculation for this model was Monte Carlo simulation. This was purely a calculation technique — MC is the easiest way to do complex calculations when you are working with uncertain inputs.

(For those who do not know, the method consists of taking a random draw from each uncertain input and calculating one model result, and then repeating that many thousands of times with different random draws to show the distribution of possible results based on the distribution of inputs. It is theoretically possible to calculate the same result directly using equations, but that is mind bogglingly difficult, whereas MC is easy. It is basically equivalent to doing a calculation using a computer, or digging a hole with a backhoe, rather than doing it by hand — the MC simulation, computer, or digger is just a tool to make the job easier, not the essence of what is being done.)

Much to my annoyance, almost everyone (I can think of only one exception) who took these ideas and ran with them did two things that were utterly contrary to the spirit and goals of what I was presenting: 1. They treated the MC tool as if it were the important essence in itself, rather than properly treating it as just the method to get to a goal. 2. They started using the approach to replace one misleadingly precise claim (the epidemiologic point estimate that ignores the errors) with a more complicated misleadingly precise claim (that the rough distribution that can be calculated is a precise estimate of the results of uncertainty).

Fast forward to today, when computers are quick and cheap (it took my best computer 2.5 weeks to run the simulation that was the core of what I produced in 1999), and we see MC error calculations of various sorts in many calculations. But these seem all to serve mainly to impress naive readers with the fancy tools, but also to pretend to account for the uncertainty and thereby hide the real uncertainty.

I have started thinking of it as “Monte Carlo porn”.

So, for example, a model might ask what will happen to smoking rates over time when a predicted 6.3 percent reduction in smoking initiation caused by some anti-smoking policy filters through the population over time. The modelers then report “the uncertainty” by allowing the reduction to differ by +/-10% of the predicted value, run a MC simulation using random draws from that range, and report a simple summary of the distribution of results. This adds nothing of genuine scientific value. Anyone who is capable of understanding the modeling in the first place can figure out that if the predicted reduction is high by 10% then the difference in the medium-run impact between the reduction scenario and the baseline scenario is going to be about 10%. Maybe it will be a bit more and maybe a bit less, but that really does not matter.

But an unsophisticated reader (i.e., most everyone to whom the results are touted) is going to interpret that reported uncertainty as being a genuine measure of total uncertainty (just as the same people misinterpret the bounds of CIs as representing the range of possible values that could result from random error). Never mind that a perfectly plausible estimate of the effect of the policy is a 1% or even 0% reduction in smoking initiation. When the typical reader sees the reported overly-narrow range of uncertainty, they are tricked into believing that it is the real uncertainty (just as they are usually tricked, by the reporting of CIs, into believing that the only possible source of error is random sampling).

So, basically, the current practice — some unknown portion of which actually traces back to my work that was about trying to fix the problem of failing to quantify uncertainty — serves to hide genuine uncertainty by making a mock presentation of uncertainty. So much for progress.

Several possibly helpful observations: 1. MC methods do not yield results that have a clear interpretation in applications that I am most familiar with (measurement error), which makes it problematic, as you said, to make a fetish of them. 2. MC methods do not usually have the capacity to distinguish between poor and good “guesses” in relation to data. Consequently, some of the results may be rather misleading. Again, this is the reason to use them cautiously and interpret with appropriate humility. (My answer to this problem is to add Bayesian element to MC, so that only guess that are compatible with data and priors are accepted.) 3. MC methods seem to be very useful when exact or even approximate calculation by analytically means is not possible. I have seen the method applied to essentially verify simple well-known algebra, which is indeed a way to bamboozle a reader (MC porn indeed). 4. One way forward may be to help illustrate appropriate used of results of MC calculations by providing more than qualitative appraisal of results, i.e. to give additional guidance and examples of just how to use the range of values/distribution that MC simulation/sensitivity analysis yields. Thanks for your observations and reflections…