Conference Agenda

Session
Open science
Time:
Wednesday, 01/Oct/2025:
1:30pm - 3:00pm

Session Chair: Aaron Peikert
Location: Raum L 115

60

Presentations

Prediction is All You Need to Demand Open and Transparent Science

Aaron Peikert

Max Planck Institute for Human Development, Germany

Prediction is the foundation of the empirical sciences. This perspective offers radical simplicity and insight into the replication and theory crises—as well as a path forward. By tasking open science with refining and assessing prediction, we can draw on information theory as a rigorous mathematical framework to recast its terminology. Such rigorous, information-theoretic analysis cuts through vague, inconsistent, and at times confused interpretations of the mechanics of open science. For example, the exploration/confirmation dichotomy—where exploration is often seen as out of scope for open science—can be reinterpreted as a continuum that aligns naturally with open science practices. From this perspective, computational reproducibility and preregistration emerge not merely as patches to the replication crisis, but as indispensable tools for psychological research. Equipped with a clearer conceptual foundation, researchers are better positioned to apply open science practices effectively across a wider range of research contexts.



Must have my walking stick 'cause it may rain: Exploring and exploiting rating scale restrictions via Bernoulli-formalization

Jens Hendrik Fünderich

LMU München, Germany

Rating scales come with quite peculiar restrictions: They have a lower limit, an upper limit, and only consist of a few integers. For variables collected on a rating scale, these characteristics produce a dependency between means and standard deviations. A mean score equal to the rating scale's maximum or minimum can only occur if there is exactly zero variability in responses. Similarly, zero variability is only found at integer means and a non-integer mean inevitably comes with non-zero variability. The largest amount of variability can only be found at a mean in middle of the scale. Forensic meta-science utilizes these restrictions, sometimes represented as umbrella plots, to detect reporting errors. But the restrictions and their implications have not been discussed outside of that field, despite the ubiquitous use of rating scale throughout psychological research. By rescaling moments of the Bernoulli-distribution, we provide a formalization of the rating scale restrictions with which we illustrate the implications for power analyses and meta-analytical heterogeneity. Using openly available data from the Many Labs projects, I will demonstrate how the Bernoulli-formalization may be used, for example for a priori power analyses and error detection. Further, I present re-analyses of multi-lab data that explore these implications in a multi-item context.



Simulation studies for methodological research: Status quo, problems, and potential solutions

Björn S. Siepe1, František Bartoš2, Samuel Pawel3

1University of Marburg, Germany; 2Department of Psychological Methods, University of Amsterdam; 3Epidemiology, Biostatistics and Prevention Institute, Center for Reproducible Science, University of Zurich

Simulation studies are an essential tool for methodological research in psychology. However, their quality can vary widely in terms of design, execution, and reporting. In this talk, we will present several meta-scientific projects on simulation studies. First, we will present the results of two literature reviews on current practices in simulation studies in psychology and other methodological fields. These indicate clear room for improvement, as the reporting of simulation studies often lacks crucial information and nuance. Based on these findings, we will suggest possible ways forward. First, these include the idea of standardized protocols for simulation studies that also allow for their pre-registration. As simulation research differs from other empirical research, we will discuss the potential and pitfalls of preregistration for simulation studies. As a second suggestion, we will outline the creation of harmonized synthetic benchmarking suites for simulation research. Inspired by benchmarking in machine learning, this idea involves the creation of a standardized and impartial set of simulated data across a range of conditions that can be used to compare statistical methods. In this way, knowledge creation in methodological research can become more cumulative and the strengths and weaknesses of different methods can be understood in more detail and nuance.



Sequential Designs for Efficient Replications: A Systematic Evaluation

Dennis Kondzic1, Robert Miller2, Steffi Pohl1

1Freie Universität Berlin, Germany; 2Psychologische Hochschule Berlin, Germany

Recent work has emphasized the importance of replication studies. Measures for assessing replication success, such as the Equivalence Test for effect homogeneity between studies, usually require large sample sizes to ensure adequate power and error control. This limits their practical feasibility, particularly in areas where data collection is costly or resource intensive.
We investigate how sequential testing procedures can be integrated into the design of replication studies in order to improve efficiency and reduce sample size.
We consider a number of frequentist sequential design methods, such as the Pocock and O'Brien-Fleming procedures, α-Spending Function approaches and the Sequential Probability Ratio Test (SPRT), as well as Bayesian approaches such as sequential Bayesian analysis and sequential meta-analysis approaches, and combine these with different measures of replication success.
In simulation studies, we evaluate the statistical properties of these procedures under different conditions, with a specific focus on error control and sample size efficiency, and compare the performance of sequential design methods to fixed-sample approaches.
We provide practical recommendations for implementing sequential designs in replication studies and highlight their potential to improve the efficiency of replication research.