Friday, July 16, 2021

Wellcome skepticism... Neither prediction markets (nor surveys) performed well in predicting outcomes for DARPA's Next Generation Social Science programme

Using prediction markets to predict the outcomes in the Defense Advanced Research Projects Agency's next-generation social science programme. Domenico Viganola, Grant Buckles, Yiling Chen, Pablo Diego-Rosell, Magnus Johannesson, Brian A. Nosek, Thomas Pfeiffer, Adam Siegel and Anna Dreber. Royal Society Open Science, July 14 2021. https://doi.org/10.1098/rsos.181308

Abstract: There is evidence that prediction markets are useful tools to aggregate information on researchers' beliefs about scientific results including the outcome of replications. In this study, we use prediction markets to forecast the results of novel experimental designs that test established theories. We set up prediction markets for hypotheses tested in the Defense Advanced Research Projects Agency's (DARPA) Next Generation Social Science (NGS2) programme. Researchers were invited to bet on whether 22 hypotheses would be supported or not. We define support as a test result in the same direction as hypothesized, with a Bayes factor of at least 10 (i.e. a likelihood of the observed data being consistent with the tested hypothesis that is at least 10 times greater compared with the null hypothesis). In addition to betting on this binary outcome, we asked participants to bet on the expected effect size (in Cohen's d) for each hypothesis. Our goal was to recruit at least 50 participants that signed up to participate in these markets. While this was the case, only 39 participants ended up actually trading. Participants also completed a survey on both the binary result and the effect size. We find that neither prediction markets nor surveys performed well in predicting outcomes for NGS2.

4. Discussion

In this project, we find little evidence that researchers can predict outcomes of the hypotheses tested in NGS2. Whether this is due to the relatively small sample of hypotheses (N = 22), participants (N = 39) or the type of hypotheses tested is unclear. Here, unlike in most previous work, participants predicted tests from Bayesian analyses—whether this contributes to the poor performance of the markets and surveys is also unclear. An important difference compared with the previous prediction markets studies on direct replications is also that the original study p-value is an important predictor of replication outcomes, but such information is by definition not available for predicting the NGS2 outcomes, making it a more challenging prediction task for forecasters. Given the previously observed success in experts predicting novel outcomes with forecasting surveys (e.g. [20]), it may be the case that prediction markets function better for replication outcomes relative to forecasting surveys—more work on this topic would be needed for more definitive conclusions.

No comments:

Post a Comment