Monday, 10 February 2014

In defence of ‘trends’



I don’t have a problem with people reporting ‘trends’. There, I’ve said it now so I can’t take it back. I see a lot of tweets highlighting research papers reporting non-significant ‘trends’. For example, someone might write in their results section “our comparison of interest was marginally significant (p=.08)”. So how bad is it to say something like this instead of “our comparison of interest wasn’t significant (p=.08)”? My argument is it depends on the circumstances and the exact wording. This isn’t to defend dodgy statistical practices, but just to add a bit more nuance rather than vilifying anyone who reports ‘trends’.

So, when is it definitely bad? When you want an effect to be there, you bend over backward and report one-tailed p-values, then report a ‘marginal’ effect, and finally make conclusions based on that ‘effect’. This situation undoubtedly applies to a lot of cases and is clearly the wrong way to go about doing things. Perhaps more contentious though: when is it OK to report a ‘trend’? Well, I would argue that as long as you are upfront about whether an effect is significant or not, and you are consistent in the manner in which you report it, it is OK to bring attention to the fact that a certain contrast revealed a ‘trend’.

For example, say I have run 3 experiments. Each experiment has four conditions in a 2x2 factorial design [A1B1 A2B1 A1B2 A2B2]. In all three experiments I see a significant difference between A1 and A2 in condition B1, but not in B2. Great, all three experiments agree with each other! In Experiments 1-2, I also see a significant interaction between these two factors. In other words, the difference between A1 and A2 in condition B1 is greater than the difference between A1 and A2 in condition B2. Great, everyone likes an interaction. However, in Experiment 3, the interaction doesn’t quite reach significance (p=.06). In such a situation I don’t see an issue with saying “the interaction term didn’t reach significance (though we note a trend in line with Experiments 1-2)”*. Some might disagree, but in my view as long as you are upfront about what you are reporting then the reader is at liberty to decide for themselves whether to ‘believe’ your results or not.
 
If you do disagree, the fact remains that the use of the word ‘trend’ is probably here to stay. So with that in mind, I’ve tried to come up with some suggestions that should hopefully bring clarity if people decide they do want to use the dirty word:

  1. State first and foremost that the effect is not significant – it isn’t, get over it.
  2.  State clearly what you think a ‘trend’ is, ahead of time. For example, any two-tailed p-value between .05 and .08.
  3. Apply this criterion to all contrasts, whether it is a contrast you predicted would be significant or not. If you think ‘trends’ are worth bringing attention to, this applies equally to effects you might not be as interested in or you didn’t predict would be significant.
  4. Don’t draw conclusions from any ‘trends’ unless they are supported by further evidence – as in the example I outlined above.

In this way all you are doing is bringing attention to particular non-significant p-values in a consistent non-biased manner. I’m guessing there will be some who disagree with this type of approach. I’d be interested to hear people’s views regardless.

* Actually, in this situation you could run a mixed-ANOVA to compare the interaction term across experiments. If you find a significant interaction between A and B, that didn’t further interact with the between-subject factor ‘Experiment’, then everyone is happy.

3 comments:

  1. I agree Aidan! Just because we once chose .05 (or .01 or .001) to be "significant", doesn't mean .051 is truly much better than .049 right? It's just a boundary that you're setting to make a binary choice (significant, insignificant) and I do agree that things that are getting close to this boundary are worth mentioning, with the correct caution of course (same would hold for .049 findings actually). If we're more lenient towards these type of findings, we might also get rid of the data massaging to bring .051 findings to a .049 just because that way you can call it significant! (we know everyone is doing that)

    ReplyDelete
    Replies
    1. Thanks Marlieke. I agree that if we're willing to bring attention to a p-value of .51 we should be equally willing to bring attention to .49 as an effect that 'just reached significance'. In general, as long as people are clear and don't obfuscate then I'm happy.

      Delete
    2. Obviously I meant .051 and .049!

      Delete