On the Importance of Modeling the Invisible World of Underlying Effect Sizes
Authors
Abstract
The headline findings from the Open Science Collaboration (2015)―namely, that 36% of original experiments replicated at p < .05, with the overall replication effect sizes being half as large as the original effects―cannot be meaningfully interpreted without a formal model. A simple model-based approach might ask: what would the state of original science be and what would replication results show if original experiments tested true effects half the time (prior odds = 1), true effects had a medium effect size (Cohen’s δ = 0.50), and power to detect true effects was 50%? Assuming no questionable research practices, 91% of p < .05 findings in the original literature would be true positives. However, only 58% of original p < .05 findings would be expected to replicate using the Open Science Collaboration approach, and the replication effects overall would be only ~60% as large as the original effects. A minor variant of this model yields an expected replication rate of only 45%, with overall replication effect sizes dropping by half. If the state of original science is as grim as a non-model-based (i.e., intuitive) interpretation of the Open Science Collaboration data suggests, should it be this easy to largely account for those findings using a model in which 91% of statistically significant findings in the original science literature are true positives? Claims that the findings reported by the Open Science Collaboration indicate a replication crisis should not be based solely on intuition but should instead be accompanied by a specific model that supports that interpretation.