A Torpedo Aimed Straight at H.M.S. Randomista

It can’t be that often that the American Journal of Agricultural Economics publishes a genuinely spectacular result, but their most recent issue is the exception that proves the rule.

Under the unassuming title of “Behavioural Responses and the Impact of New Agricultural Technologies: Evidence from a Double-Blind Field Experiment in Tanzania”, Erwin Bulte, Gonne BeekmanSalvatore Di FalcoJoseph Hellaand Pan Lei have delivered probably the most serious challenge yet on the theology of the Church of the Randomized Controlled Trial.

For years, people have known that RCTs in Development Economics have a serious external validity problem: just because poor farmers in Kenya react in a certain way to, say, a given incentive to use fertilizer, that’s no reason to suppose that farmers in Guatemala, or Nepal or anywhere else will.  As a result rigorous evidence isn’t, to quote Pritchett’s enviable knack for a snappy headline.

So long as that was the standard rap against RCTs, the randomistas just about got away with it via their claim to superior internal validity: they can say things about cause and effect that are true in a way other economists could only dream of.

And yet, there was always a discordant note in the RCT literature. The term itself, as well as its epistemic claims, are straightforwardly borrowed from medical research. But in medical research it’s not enough for a trial to be randomized and controlled. It generally has to be double-blind, as well, to aspire to “gold standard” status.

Of course, in most development settings, there’s no viable way to make trials double-blind: presumably, if you only pretend to give that Kenyan farmer a way to save for fertilizer, but you don’t really, the guy’s going to notice.

So the “double-blind” bit of the medical analogy is quietly dropped and swept under the epistemic rug, in hopes nobody will notice…or at least it was, until Bulte and his collaborators realized seeds have some interesting things in common with pills, and so if you can double-blind a medicine trial, you can probably double-blind a seed trial, too.

So they deviously ran an open RCT comparing traditional and improved cowpea seeds alongside a double-blind RCT testing the same thing. Their results are deeply troubling for Banerjee-and-Dufloites.

In the open RCT, Tanzanian cowpea farmers who knew they getting improved seed easily outperformed farmers who knew they were getting traditional seed. But in the double-blind study, farmers who weren’t told whether the seed they got was improved or not performed just as well whether that seed they got was improved or traditional.

In fact, farmers who used traditional seed without knowing it did just as well as farmers who used improved seed, whether they knew it or not. Only farmers who knew the seed they were given wasn’t improved lagged behind in productivity.

This gap between the results of the open and the double-blind RCTs raises deeply troubling questions for the whole field. If, as Bulte et al. surmise, virtually the entire performance boost arises from knowing you’re participating in a trial, believing you may be using a better input, and working harder as a result, then all kinds of RCT results we’ve taken as valid come to look very shaky indeed.

Of course, this is just one study, and so until it’s replicated we’d probably do best not to get too too excited. The pre-publication version of the paper came under some heavy fire, which was perhaps to be expected.

It does strike me as problematic that, while the study was done on cowpea seeds, cowpeas were a secondary crop for all farmers involved, and the productivity gap between improved and traditional cowpea seeds isn’t comparable to the gap in maize or wheat. Replication is badly needed here, and preferably in a study dealing with a crop more central to farmers’ livelihood strategies.

Still, the study is an instant landmark: a gauntlet thrown down in front of the large and growing RCT-Industrial Complex. At the very least, it casts serious doubt on the automatic presumption of internal validity that has long attached to open RCTs. And without that presumption, what’s left, really?

12 thoughts on “A Torpedo Aimed Straight at H.M.S. Randomista”

  1. This proves the huff factor and research money on improved seed has been a waste of time. Millions spent in cimmyt just down the drain. The full title of the paper and the key to it is the last part ‘and the impact of new agricultural technologies’. Is the agenda about new technologies.

    Were they trying to prove that the improved seed is not improved at all rather than anything else. It would seem it proves that all that money on research is a waste of money. I remember they did a trial on CF – or someone from Waginen -using cotton not maize which seemed a bit odd. Here they are using cowpeas which seems very odd as well. Maybe they need a double blind experiment on corn..

    Or were they trying to prove something about people. It seems the behavioural side is clear, as it seems truly random, but what good is that going to achieve in production. It just says if people know they have bad seed which is actually not bad seed at all but as good as improved seed then they won’t work but if they knew it was just as good as improved then they would work. So, some went off in a huff because they thought they were getting a raw end of the trial.

    Anyway, it is called anthropology: some people go off in the huff. People do not all act the same in the same circumstances. Development has no anthropological basis. It tries to think everybody is the same.

  2. Wow, good for them for them for doing the double-blind experiment. It always did seem strange that Randomistas would claim that RCTs were just like clinical medical trials when they left out one of the most important aspects of the medical trial.

    I guess Randomistas could argue that in a sense when you do an intervention on one group of people and not another, it is at least partially blind (you don’t tell the control group that that other village is getting free vaccines or whatever it is).

    Anyway, I agree with you. Need to see a staple crop replicated like this. Initial hypothesis might be that people wouldn’t work less hard if it were their main source of food.

    1. Yeah, Twitter is being absurd about this. The blog is hosted by WordPress.com, so I’m 100.00% sure there’s no malware on it. I’ve filled a bunch of tickets, but what can I say? Twitter is no democracy!

  3. Wait, you wrote, “just because poor farmers in Kenya react in a certain way to, say, a given incentive to use fertilizer, that’s no reason to suppose that farmers in Guatemala, or Nepal or anywhere else will.” and then went on to say that this one study should influence how we think about other studies in other places on completely different kinds of interventions? In the same post?

  4. Okay, so, let me get this right… The internal validity of this study is not strong enough to draw the conclusion that the internal validity of non-double blind RCT’s is bankrupt.
    And in order to say that this is something that has relevance for other RCTs you must accept that this RCT in particular (and RCTs as a class) can have external validity.
    I love this paper and this article if only for the interesting tautologies it raises.

  5. Wow, what a vituperative post based on a poorly written paper. See Berk’s excellent post on details, such as, uhm, not counting farmers who didn’t harvest any cowpea, who, when counted (as they should, because they were treated!!), uhm, like completely nullifies their “result”. This feels like one of those memes: you CANNOT claim to prove randomistas wrong if you use bad econometrics. =)

    The main point seem to be that seeds and effort/land quality can be complements. Indeed it makes sense that knowing for sure that the seed is high quality would make someone put in more effort. As Berk says, this is relevant in the real world (we want to know if something is effective). Double-blind experiments in medicine study if a drug is efficacious (holding everything else, including beliefs, fixed), and this is important but not the focus of most RCTs. Indeed, in the context of many studies efficacy is either already proven (deworming) or impossible to define (information campaign).

    Come to think about it, I’d love to see a double blind RCT on information campaigns… =)

    1. Not so fast. We need to think hard about why/when/whether double-blind (DB) is useful. Otherwise it’s just a buzzword.

      In medicine, we first need to know if a medical treatment works, *holding everything else fixed*. DB helps with that. Yay. It is also very important to know if the treatment works in the real world, when people may not take pills, or exercise less, etc., based on what they know about their medical treatment. DB does NOT help with that.

      In social sciences we are primarily (not exclusively!) interested in the total, real world, effect.
      For example: seat belts (or ABS breaks). We know from controlled studies that these technologies “work” and may save lives. In the real world, however, people may drive more dangerously when they know they have these safety measures, potentially eliminating the safety gain.
      So what’s the appropriate experiment? DB, or an open RCT, where you randomly assign seat belts to people’s cars (and they know it)? The latter design is essential to measure what would happen is seat belts became mandatory.

  6. Great to have found this site.

    I find the idea of “internal validity” laughable. It only exists in the social sciences, and only in those that have taken the positivistic road in order to appear more like the natural sciences. In so doing, they fail. “Internal validity” means “we have followed all our protocols, so the result should be reliable”, whereas in the natural sciences “validity” means just one thing: survives being smashed repeatedly against the hard rock of (external) reality.

    No doubt the cold fusionistas could have claimed “internal validity” for their results, if they’d known such a scam existed.

  7. I know I’m late to this party, but this study gets to the heart of current debates among at least some development scholars about RCTs.

    It’s fairly known that there is a ‘project effect’ with development interventions. The simple fact that people know they are part of a project (or even better, a ‘pilot’ or ‘beacon’ project, like a Millennium Village) can affect behaviour. So I’m not that surprised by Bulte et al’s finding: knowing you’re part of a trial affects behaviour. This doesn’t obviate the potential value of new cowpea varieties, but it certainly makes it harder to draw simplistic conclusions. Decent impact evaluation needs to take that into account, and move beyond RCTs as the sole measure of impact. For instance, a current issue of ‘European J of Development Research’ (vol. 26, iss 1) has a whole set of papers questioning the rise of RCTs as some sort of ‘gold standard’.

    I have a slight interest here, through a study looking at the impacts of ‘conservation-linked payments’ to communities that bordered a National Park in Rwanda (Nyungwe). We monitored human impacts inside the Park beside Communities receiving payments, as well as adjacent ‘Control’ communities that did not get payments. Human impacts declined in BOTH treatment and control communities – with the simple fact that they were being observed in a trial a major reason for this. Obviously, this was not a ‘pure’ study, in that it was not blinded – but echoes Bulte et al’s wider point that studies of people in the real world cannot simply isolate out ‘factors’: context affects behaviour in lots of interesting ways…

Leave a reply to Timothy Ogden (@timothyogden) Cancel reply