17th Conference of the Methods & Evaluation Section of the German Psychological Society (DGPs)

Versatility in the Application of Bayesian Statistics

Chair(s): Tina Braun (Charlotte Fresenius Hochschule, Germany)

Bayesian statistics have an advantage over frequentist approaches in many use cases. Given the progress in modern technology, Bayesian statistics also continuously become easier to apply. Combined these two aspects lead to an increase in the usage of Bayesian statistics. Nevertheless, Bayesian statistics is still a long way from being the preferred method of analysis. The current symposium aims to show how versatile Bayesian statistics can be applied at different steps when addressing a research question and how this can benefit the researcher. First, Robert Miller will present how Bayesian statistics can be leveraged when planning the sample size for a study. In his project, Bayesian statistics are used to incorporate prior knowledge when predicting the study success. In the second talk Tina Braun shows a simple way for conducting Bayesian multilevel models. She demonstrates how a flat prior can be used as an entry point into Bayesian statistics. The third talk, given by Hannes Diemerling, focusses on the more advanced uses of Bayesian statistics, employing them to optimize hyperparameters and to independently validate model accuracies. His aim in this project is to eventually be able to read emotions in everyday life through large language models. Lastly, Timo von Oertzen will focus on Bayesian Null Hypothesis testing. To provide some balance within the symposium, he will highlight existing issues in this area. Combined, we aim to show that while Bayesian statistic comes with its challenges and problems, it can help researchers to address problems in a wide area of fields.

Presentations of the Symposium

Designing for Power: Managing Sample Size Amid Effect Size Uncertainty

Robert Miller¹, Timo von Oertzen²
¹Psychologische Hochschule Berlin, Germany, ²Thomas Bayes Institute, Germany

Determining the sample size that optimizes the trade-off between study costs (including participant burden) and informativeness is a key challenge in replication studies. A survey among members of the DGPs interest group "Open and Reproducible Research" (IGOR) explored the benefits and challenges of power analyses in this context. Preliminary results suggest that uncertainty arises primarily from the handling of imprecise effect size estimates. A contributing factor is the focus of statistical training on conditional power analysis, which insufficiently addresses this uncertainty.

This presentation introduces Assurance as an alternative to traditional power analysis, providing a more robust estimate of study success despite varying effect sizes. Assurance quantifies the probability of detecting a true effect, integrating uncertainty about the effect size for more reliable results. Additionally, the Bayesian Predictive Probability of Success (PPS) is discussed, offering a probabilistic assessment of replication success by incorporating prior knowledge and updating this probability with new data. PPS allows for flexible, dynamic predictions of study success.

Together, these approaches offer new avenues for study planning, enabling better management of uncertainties and more informed decision-making in replication research.

Using Flat Priors as a Simple Mean to Conduct Bayesian Multilevel Analysis

Tina Braun¹, Timo von Oertzen²
¹Charlotte Fresenius Hochschule, Germany, ²Thomas Bayes Institute, Germany

Multilevel modeling is becoming increasingly popular in the social sciences. The present study suggests to compute Bayesian multilevel models using a flat prior instead of using the still common frequentist approach. The Bayesian approach allows for a straightforward interpretation of the results, while using flat priors means that applied researchers can keep using the same tools they likely already use to compute multilevel models. Additionally, Bayesian multilevel modeling using a flat prior still yields reliable results while standard errors are already underestimated using the frequentist approach due to small sample sizes. The present article also introduces an accuracy score, estimating whether and, if so, how strongly, the estimated a-posteriori probability is biased in a given study. For very small sample sizes, biases can occur in the suggested approach. The accuracy score can then be used to correct the a-posterior probability.

Reading Emotions from the Real World: Embeddings over Categories

Hannes Diemerling¹, Patricia Kulla², Joachim Kruse²
¹MPI for Human Development, Germany, ²Bundeswehr University, Germany

Traditional approaches to emotion recognition often rely on basic categories such as “joy” or “anger.” While these labels offer simplicity, they are conceptually limited and fail to capture the complexity of genuine emotional experience. In this presentation, we introduce an embedding-based alternative that reframes emotion prediction as a high-dimensional regression problem.

To overcome the lack of spontaneous emotion containing datasets, we created a new corpus of 0.5-second video clips from real (non-acted) therapy sessions. Master Psychology students provided open-text descriptions of perceived emotions in these clips, phrasing their insights as if explaining them to a third person. These descriptions were embedded using a fine-tuned sentence transformer (Gbert), yielding dense vector representations of emotional perception. We then trained various neural network designs to predict these embeddings directly from video input, using cosine similarity as the loss function.

Our approach bypasses the need for predefined labels, mapping visual information onto a nuanced, continuous emotion space. A key component of our methodology is the use of Bayesian optimization for hyperparameter tuning. Instead of grid search—which scales poorly with increasing dimensionality—we employ Gaussian process-based surrogate models that efficiently balance exploration and exploitation. We also applied Bayesian updating in the context of independent validation of model accuracies.

We will conclude this talk with an outlook on harnessing generative large language models to derive full-text emotional interpretations directly from raw video, paving the way toward more human-centered, descriptive emotion modeling.

Why We Should Abandon Tests with Reduced Dimensionality, Even if They are Bayesian

Timo von Oertzen
Thomas Bayes Institute, Germany

The field of psychology leaves Null Hypotheses Significance Testing (NHST) behind, and that is a good development. However, Bayesian methods can be used to test Null Hypotheses as well. Some modern statistic courses describe Bayesian methods that way. Null Hypothesis Bayesian Testing (NHBT) is in fact very tempting, since it seems to keep the apparent advantage of not having to specify what a 'small' effect is, while it still comes with the 'Bayesian' label. In this talk, we will define some criteria that we want inference methods to satisfy, and we will see on some examples that NHBT is not better, but even worse than NHST. We will see that there are better ways to conduct tests, and that in the end of the day, they may even be easier than NHST.

Conference Agenda