Unhealthful News 149 – Understanding (some of) the ethics of trials and stopping rules, part 2

Yesterday I explained why clinical trials (aka randomized clinical trials, RCTs, medical experiments on people) almost always inflict harm on some of their subjects, as assessed based on current knowledge (which is, of course, the only way we can measure anything).  To clarify, this means that one group or another in the trial experiences harm in expected value terms.  “Expected value” means it is true for the average person, though some individuals might benefit while others suffer loss, and averaging across hypothetical repetitions of the world, because sometimes the luck of the draw causes an overall result that is very different from what would occur on average.

The critical ethical observation about this is that causing this harm is ok.  Some people have to suffer some loss – in this case by volunteering to be a study subject and getting assigned to the believed-inferior regimen – for the greater good.  In this case, the greater good is the knowledge that lets us choose/recommend a regimen for everyone in the future based on the additional knowledge we gained from the study.  There is nothing inherently unethical with causing some people harm for a greater good.  Sometimes that is unethical, certainly, but not always.  If we tried to impose an ethical rule that said we could never make some people worse off for the greater good (or even the much narrower variation, “…some identified people…”, a large fraction of human activity would grind to a halt.  Now it does turn out that it is always possible, when an action has a net gain for society, to compensate those who are being hurt so that everyone comes out ahead in expected value terms (for those with some economics, I am referring to potential Pareto improvement).  But it turns out that for most clinical trials, no such compensation is offered and, bizarrely, it is often considered “unethical” to provide it (another pseudo-ethical rule that some “health ethicists” subscribe to, and another story for another day:  they claim that it would be too coercive to offer someone decent compensation to be in a trial, which …um… explains why it is considered unethical coercion to pay people to work their jobs?)

However, though it is not necessarily unethical to take actions that hurt people, there is a good case to be made that it is per se unethical to hurt people but claim to not be doing so.  Thus, an argument could be made that invading Iraq was ethical even though it was devastating for the Iraqi people (I am not saying I believe that, I am just saying there is room to argue).  But when US government apologists claim that the invasion was ethical because it made the average Iraqi better off, they are conceding the situation is unethical:  Not only are they lying, but they are implicitly admitting that the invasion was unethical because their defense of it requires making a false claim.  Similarly, banning smoking in bars/pubs is the subject of legitimate ethical debate even though it clearly hurts smokers and the pub business.  But when supporters of the ban pretend that pubs have not suffered they are being unethical and are implying that they think the truth (“the bans are costly for the pubs in most places, but we feel the benefits are worth the cost”) would not be considered ethical or convincing.

So, it seems that those doing and justifying clinical trials are on rather shaky ethical ground based on their rhetoric alone, because they pretend that no one is being hurt.  This is simply false.  Their claim is that if we are doing the trial then we must not know which of the regimens being compared is better, so no one is being assigned to an inferior choice.  But as I explained yesterday, this is simply false in almost all cases – they are misrepresenting the inevitable uncertainty as being complete ignorance.  But it gets worse, because as is usually the case that once you take one nonsensical step, others follow from it (which you can interpret as either “one false assumption leads to bad conclusions via logical reasoning” or “trying to defend the indefensible usually requires more indefensible steps to patch over the mess you have made”).  The stopping rules, as they now exist, are one of those bad steps that follow.

But it occurs to me that I need to explain one more epistemic principle before making the final point, so I will do that today and add a “part 3” to the plan here (you need to read part 1 to know what I am talking about here, btw).  I hope that anyone who likes reading what I write will find this worthwhile.

Clinical trials are an example of the tradeoff between gathering more information about which choice is better and exploiting the information you have to make the apparent best choice.  Yesterday I pointed out that if an expert is making a decision about a health regimen (e.g., a treatment option) for himself or a close relative right now, he almost certainly would have a first choice.  This is a case of just exploiting current knowledge because there is no time to learn more, so the choice is whichever seems to be better right now, even if it only seems a little better and is quite uncertain.  But if we are worried not just about the next member of the target population, but the next thousand or million who could benefit from a treatment or health-improving action, it would be worth resolving the uncertainty some.  The best way to do that is to mix up what we are doing a bit.  That is, instead of just going with the apparently better regimen (which would provide some information – it would help narrow down exactly what the expected outcomes are for that regimen) we seek the additional information of clarifying the effects of the other regimen.

Aside – yes, sorry; it is hard for me to present complicated topics that have subtle subpoints without getting all David Foster Wallace-esque – I already use his sentence structure, after all.  For a lot of trials, one of the regimens represents the current common practice, being used for comparison to the new drug/intervention/whatever of interest.  This is a regimen that we actually already have a lot of data about, and for which more usually continues to accumulated.  Thus, you might say, we can just assign everyone to the other regimen, if it is believed to be better, and use the data about the old standard from other sources.  This is true, and it is yet another epistemic disgrace that we do not make better use of that information in evaluating the new regimen.  But there are big advantages to having the data come from the same study that examined the new regimen.  This is often attributed to the value of randomization and blinding, but the main benefits have to do with people in studies being enough different from average that it is tricky to compare them to the population average.  People in studies experience placebo effects and Hawthorne effects (effects of merely being studied, apart from receiving any intervention, which are often confused with placebo effects – ironically including in the study that generated the name “Hawthorne effect”), and are just plain different.  Thus, though we should make better use of data from outside the study, there is still great value in assigning some people to each of the regimens that is being studied.

The tradeoff between exploiting best-available information and paying the price to improve our information is called a “two-armed bandit problem” (or more generally, just a “bandit problem”), a metaphor based on the slot machine, which used to be a mechanical device with an arm that you pulled to spin real mechanical dials, thus earning the epithet, “one-armed bandit” (this was back before it became all digital and able to take your money as fast as you could push a button).  Imagine a slot machine with a choice of two arms you can pull, which almost certainly have different expected payoffs.  If you are only going to play once, you should obviously act on whatever information you have.  If you are going to play a handful of times, and you have good information about which pays off better you should probably just stick with that one.  If you have no good information you could try something like alternating until one of them paid off, and then sticking with that one for the rest of your plays.  This strategy might well have you playing the poorer choice – winning is random, so the first win can easily come from the one that wins less – but you do not have much chance to learn any better. 

But imagine you planned to play a thousand times.  In that case, you would want to plan to play each of them some number of times to get a comparison.  If there is an apparent clear advantage for one of the choices, play it for the remainder of your plays (actually, if it starts to look like the test phase was a fluke because you are not winning as much in the later plays, you might reopen your inquiry – think of this as post-marketing surveillance).  On the other hand, if it still seems close, keep playing both of them some to improve your information.  The value of potential future information is that it might change your mind about which of the options is better (further information that confirms what you already believe has less practical value because it does not change your choice, though it does create a warm fuzzy feeling).  Now imagine an even more extreme case, where you can keep betting pennies for as long as you want, but eventually you have to bet the rest of your life’s savings on one spin.  In that case you would want to play many times – we are talking perhaps tens of thousands of times (let’s assume that the effort of playing does not matter) – to be extremely sure about which offers the better payoff.

There actually is an exact mathematics to this, with a large literature and some well-worked problems.  It is the type of problem that a particular kind of math geek really likes to work out (guess who?).  The calculations hinge on your prior beliefs about probability distributions and Bayesian updating, two things that are well understood by many people, but not by those who design the rules for most (not all) clinical trials.

Clinical trials are a bandit problem.  Each person in the study is a pull of the arm, just like everyone that comes after during the “exploit the information from the study to always play the best choice from now on” phase.  Many types of research are not like this because the study does not involve taking exactly the action that you want to eventually optimize, but clinical trials have this characteristic.

You may have seen emerging hints of the stopping rule.  The period of gathering more information in the bandit problem is, of course, the clinical trial period, while the exploitation of that knowledge is everyone else who is or will be part of the target population, now and into the future until some new development renders the regimen obsolete or reopens the question.  The stopping rule, then, is the point when we calculate that going further with the research phase has more costs (assigning some people to the inferior treatment) than benefits (the possibility of updating our understanding in a way that changes our mind about what is the better regimen).  It should also already be clear that the stopping rule should vary based on several different piece of information.  Therein lies part (not all) of the ethical problem with existing stopping rules

I hope to pull these threads together in part 3 (either tomorrow, or later in the week if a news story occurs that I do not want to pass up).

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s