Saturday 31 May 2014

Replication and methods sections


Wow, things got a bit shouty there didn’t they. If you’re not up to speed, some people like replication, others don’t, and they don’t seem to get along very well. In an effort to remove all the interest out of this topic I thought it best to write a boring post about methods sections.

First things first, I like replication (who doesn’t!). I haven’t been involved in attempting to perform direct replications of other peoples work (at least not until very recently), but I replicate my own findings as much as possible to persuade myself of the validity of my results. I’m broadly supportive of the recent ‘replication movement’ although I’m not sure how novel an idea it really is given plenty of areas in psychology have been replicating for many a year.

Here is the small point I want to make, and it relates to the issue of being able to replicate purely on the basis of the methods section from a paper from another lab. The pro-replicators often state that if methods sections are written appropriately, anyone should be able to perform a replication of the study. However, this strong statement seems to me to be a bit naïve. I have two reasons for thinking this:

1.  A good methods section should include ‘all the necessary details in order to perform the experiment again’ however it should also exclude ‘any extraneous detail’. Without the latter then methods section could read something along the lines of:

“Procedure – each participant was welcomed in the lobby of the ground floor of 17 Queen Square, London, UK. The experimenter shook their right hand and encouraged them to enter the lift. The experimenter pressed the button for the 2nd floor. Doors closed within 10s of this button press…….The participant was asked to take a seat in front of the computer screen with both feet firmly on the floor such that their upper and lower legs formed a right angle with each other.

You’re probably reading this thinking this is absurd, and I agree. My (small) point is that it can sometimes be difficult to decide what is ‘necessary detail’ and what is ‘extraneous’. We make a judgement called based on our experience and knowledge of the literature. This will always be the case.  The issue being, one person’s idea of ‘extraneous’ will sometimes be different from another person’s – I wouldn’t dream of stating that the participant held a hot beverage before starting the experiment, but some think this is obviously relevant in certain situations.

2.  A well written methods section doesn’t allow ‘anyone’ to replicate the experiment. If I gave a well written methods section from a psychology journal to a historian of art I wouldn’t expect them to be able to replicate the experiment (or, if they managed, they wouldn’t do it very well). This argument applies to a lesser extent within sub-divisions of psychology as well – I would expect a cognitive psychologist who studies memory to be able to replicate my experiments more thoroughly than a social psychologist (and vice versa). If this wasn’t the case, why do we think it is useful giving students experience in running experiments if not to ‘learn the technique’?

This isn’t to say that we can’t and shouldn’t be able to replicate based purely on the methods section of someone’s paper. It’s simply to say, writing a methods section is hard and requires some amount of subjectivity with regards to what to include. Also, replication is hard – it’s impossible to replicate exactly – judgement calls have to be made about whether subtle differences between the original and the replication attempt matter.

Given all this, I heartily recommend talking to each other more (preferably in a civil manner). If I wanted to replicate someone’s experiment I would email them and ask them as many questions as possible. They don’t have a right to be involved, or contribute, or have a say in the experiment I am running, but it would be foolish for me to not want to communicate with them. Equally, I would be honoured if someone thought my experiment was worthy of replication. I would of course be nervous – what if it doesn’t replicate!? – and worried – did I make a mistake previously!? – but all these things are natural consequences of being a human being with a vested interest in my own research. I hope I’d be grown up enough to deal with those anxieties. Time will tell…


6 comments:

  1. Perfectly true.

    The idea that methods section "must" be complete. Especially for those outside the subspecialty, is foolish.

    Lofts of purism. In fact sometimes being"critical" gets as irrational and as ideological as other tribalisms

    ReplyDelete
  2. Aha! Finally, the debate becomes vaguely relevant to my day job. Thanks for raising the methods issues, Aidan. For me, this is the crux. Let's assume that we can, indeed, get accurate methods from the original paper or from questioning the authors. This puts us in a position to attempt a replication.

    Now, to me, the point isn't replication per se but replication of a finding, or "an effect" as you psych types refer to it, I believe. The point - and do please correct me if I am waaaay off piste - is to see if this effect is genuine, robust, applicable across a population, etc. In other words, this effect should obey certain rules and permit certain predictions to be tested, otherwise it's rather like observing one specific pattern in the clouds at one specific moment in time.

    So now let's look at the methods of the initial paper reporting the effect. Let's move the example to an fMRI experiment because that's my area. It is trivially easy for an fMRI experiment to make a false finding. The number of reasons for this are numerous, but one doozie is to use a sub-optimal acquisition. Let's say that in the original experiment the authors used a TE that is expected to yield poor signal in their primary area of interest, rendering that particular area of the brain susceptible (sic) to serious Type I *and* Type II errors unless very careful stats are used in the analysis.

    So now we have a putative effect reported for this brain region; call it BA 69 (in honor of watching too much porn). I read the original paper and I immediately have a methodological concern with the TE. I set about trying to determine whether BA 69 really exists, based on watching too much porn. I have a choice here, clearly. I either choose to replicate (via facsimile) the methods and attempt to show through statistics that the original finding was due to chance, arising out of the crappy signal in that brain region. Or - and this is the critical point - I elect to reduce the TE, perhaps also making other small changes to the acquisition in order to get the best signal from putative BA 69 that I can. Then I run my "replication" in an attempt to reproduce the putative effect, not to reproduce the original experiment. I fail miserably, although many of the subjects seem to enjoy the experiment and volunteer eagerly for the next study.

    I submit my failed replication attempt to a journal and it is duly published. What, dear psychologists, would be the response to this "failure to replicate?" Is this a different study entirely because I failed to copy the flawed original methods? Surely we're not suggesting that methods should be repeated verbatim if better approaches become apparent.

    ReplyDelete
    Replies
    1. I think this relates to why you want to replicate a specific experiment.

      If the experiment in question is interesting theoretically (say it suggests a function for BA69 not previously suggested) then I don't see a problem with using different a TE - on the proviso that you have independent objective data that your parameters increase signal to noise in this region. I think this would be similar to a RT experiment, where you replicate but use different software that you know measures RTs more accurately than those used by the original researchers. You are just measuring your dependent variable more accurately than before.

      Another possibility though is that your interest is clinical. Say BOLD response correlations in BA69 with a particular task might be useful as a marker for disease onset. Here, if the parameters used by the original researchers were 'off the shelf' and not bespoke, you might want to know if the effect replicates with this vanilla setup. For instance, if you knew the 'off the shelf' set up was widely available in hospitals but your different setup wasn't, you obviously want to ensure the effect replicates despite the methodological limitations inherent in the first experiment.

      Perhaps purists might disagree with my first contention, and that you should try and replicate as closely as humanly possible, but as I said in the original post - I don't think it is every possible to perform an exact replication. Why would you not use a setup that you had independent evidence reduced your signal to noise when measuring your dependent variable?

      Delete
    2. Like Aidan says, i'd have thought it depends: Suppose that what you're trying to (strongly) demonstrate that the original finding was due to the TE. You'd probably first have to show that the effect replicates with the original TE, and then that it disappears with the new, better TE (assuming there's a strong case that it is, in fact, better). On the other hand, if you're not suggesting that the TE caused the effect, merely that it made it more likely that any effect was spurious, resolving the dependent measure with higher accuracy shouldn't be an issue. In any case, there are always potential differences across studies that people are quite happy to assume don't matter: nobody complains when you run an experiment on a different computer to the original. And it's clearly not a different study entirely, since all you've changed is the precision with which you measured the dependent variable.

      I'm wondering, though, if this is somewhat different from a case where, for example, a rating scale is changed, because the responses of the participants may be somewhat dependent on the rating scale itself. I think it's entirely reasonable to assume that measuring RT or BOLD signal more accurately, for example, won't change the behaviour of the participants. But altering the items on a questionnaire *may* do: the values that participants report will be somewhat dependent on the values that are available on the scale, and the underlying constructs which are measured will be somewhat dependent on the wording of the items.

      Delete
  3. Practical fMri,

    Replication is not about doing a better study. It's about checking whether the original result was valid.

    Many "improvements" are changes which one cannot guarantee are 100% better. Even with "better" scanning some signals might suffer (depending on the method and stats etc.)


    Some changes are out of doubt. Like having more subjects....


    The thing that is lost on many replication enthusiastics is that once you hypothesize that you "improve" the replication validity *as a replication* has polemic value only, not scientific.

    What is the point of making a replication if results are not clear? It gets us into the hand weaving bisiness

    ReplyDelete
    Replies
    1. "Replication is not about doing a better study. It's about checking whether the original result was valid."

      But these can be complimentary. Let's suppose that the original study used a flawed control. Are you suggesting that the same flawed control be used in any replication attempts? And further, if the results are the same then the conclusions drawn should be the same?

      "What is the point of making a replication if results are not clear?"

      To which I would respond, what is the point of doing a replication if the the original experiment was flawed?

      Delete