Are Small Samples in Neuro Reliable? Some Thoughts about Power

One of the first issues that market researchers face when they start to think about neuro-physiological measurement is how to think about the small sample sizes. Survey research is usually based on samples in the hundreds or even in the thousands, but it is not uncommon for User Experience testing to be based on ten (10) people. In fact, some fMRI studies have been published in peer-reviewed journals based on five (5) respondents.

This raises a very important question: Are such small samples reliable?

Small sample sizes are appropriate if the true effects being estimated are genuinely large enough to be reliably observed in such samples. However, as the Milgram examination below shows, small studies are susceptible to inflated effect size estimates making it difficult to be confident in the evidence for a large effect if small studies are the sole source of that evidence.

This has become more problematic in neuro studies as evidence has mounted that the human brain is more different across individuals than originally thought and that low sample neuro studies don’t have the statistical power to produce reliable results. After analyzing existing neuro studies some researchers have concluded that they likely right only 21% of the time, at best.

Fortunately, there are new tools for dealing with this. In the past, Power Analyses were not that common for neuro studies because large amount of data that needs to be considered. fMRI data, for example, requires massive multiple comparisons among tens of thousands correlated voxels. But recent advances in power calculation techniques and software are making power analyses much more accessible to even non-statisticians.

If you are interested in such things and haven’t watched the movie, Experimenter, it might be worth your time. Experimenter is about the Stanley Milgram obedience experiments of the 1960s, one of the most famous studies in psychology. Those are the experiments that reported that 65% of people were willing to administer “painful” shocks to people when instructed to do so by an administrator, even when the person at the controls received feedback that the person being shocked was in great pain. It is an interesting movie for the story it tells about the Milgram and one of the most famous experiments in psychology, but it is relevant here because what it says about small samples and statistical power.

Australian writer and psychologist Gina Perry has gone through Milgram’s original materials at Yale in detail. She reported that rather than being one big experiment (of about a thousand people) it was a number (28) of little experiments. She found the different conditions produced “obedience” rates that ranged from 0 to 100%. The commonly reported 65% obedience rate was based on just 40 subjects and was much higher than the mean for all the studies (40%), hence the power issue. She also says that within each condition there seemed to have been a lot of design variation because the administrator often went off script and the subjects seemed to have sometimes surmised that the shocks were not real.

Power analysis can help your team feel more comfortable about the reliability of small sample sizes in neuro work. This samples will be smaller than in attitudinal survey work, but they can’t be too small. In our fEMG work, we generally find that samples of between 30 and 60 people have sufficient power to produce reliable results more than 80% of the time.


If you would like to read more about Power Analysis and why small samples undermine reliability in neuroscience, visit:

If you are interested in learning more about Gina Perry’s observations, here is one of many articles that recap aspects of her book:

The Experimenter, available through Netflix:

, , , , ,

Comments are closed.