An evaluation of reproducibility and errors in published sample size calculations performed using G*Power

Robert Thibault*, Emmanuel Zavalis, Mario Malicki, Hugo Pedder

*Corresponding author for this work

Research output: Contribution to journalArticle (Academic Journal)

Abstract

Background. Published studies in the life and health sciences often employ sample sizes that are too small to detect realistic effect sizes. This shortcoming increases the rate of false positives and false negatives, giving rise to a potentially misleading scientific record. To address this shortcoming, many researchers now use point-and-click software to run sample size calculations. Objective. We aimed to (1) estimate how many published articles report using the G*Power sample size calculation software; (2) assess whether these calculations are reproducible and (3) error-free; and (4) assess how often these calculations use G*Power's default option for mixed-design ANOVAs; which can be misleading and output sample sizes that are too small for a researcher's intended purpose. Method. We randomly sampled open access articles from PubMed Central published between 2017 and 2022 and used a coding form to manually assess 95 sample size calculations for reproducibility and errors. Results. We estimate that more than 48,000 articles published between 2017 and 2022 and indexed in PubMed Central or PubMed report using G*Power (i.e., 0.65% [95% CI: 0.62% - 0.67%] of articles). We could reproduce 2% (2/95) of the sample size calculations without making any assumptions, and likely reproduce another 28% (27/95) after making assumptions. Many calculations were not reported transparently enough to assess whether an error was present (75%; 71/95) or whether the sample size calculation was for a statistical test that appeared in the results section of the publication (48%; 46/95). Few articles that performed a calculation for a mixed-design ANOVA unambiguously selected the non-default option (8%; 3/36). Conclusion. Published sample size calculations that use G*Power are not transparently reported and may not be well-informed. Given the popularity of software packages like G*Power, they present an intervention point to increase the prevalence of informative sample size calculations.
Original languageEnglish
JournalmedRxiv
DOIs
Publication statusPublished - Jul 2024

Fingerprint

Dive into the research topics of 'An evaluation of reproducibility and errors in published sample size calculations performed using G*Power'. Together they form a unique fingerprint.

Cite this