Conference Agenda
| Session | ||
| Poster 
 | ||
| Presentations | ||
| 01: A comparative analysis of modeling curvilinear covariate effects on structural IRTree parameters with discrete mixtures versus score-based partitioning University of Mannheim, Germany IRTree models improve the validity of trait estimates by accounting for response style (RS) effects. However, traditional IRTrees assume parameter homogeneity across the population, which can bias estimates when respondents weigh traits and RS differently. Two approaches address this issue: mixture modeling and score-based partitioning (SBP). Both capture heterogeneity in response strategies and allow researchers to relate covariates to subgroups to better understand variations in RS and trait use. SBP sorts individual score contributions along a covariate and recursively splits the sample based on statistically significant fluctuations, fitting different measurement models to resulting subgroups. Mixture models, in contrast, identify latent classes with distinct measurement models independently of covariates, although covariates can be linked to classes via logistic regression. The present study compares these two methods specifically for a curvilinear relationship between a covariate and subpopulations. Such effects can, for instance, be expected for response speededness or age. Mixture models risk biased covariate effect estimates due to the inherent assumptions of the logistic function of covariate effects, even though class enumeration remains unaffected, as covariates are not used for class identification. In contrast, SBP can flexibly approximate any form of covariate relationship through multiple splits but may overestimate the number of subgroups. Furthermore, its performance may depend heavily on the strength, type, and measurement quality of the covariates. Using a comprehensive simulation study, we evaluate both methods regarding their ability to identify true subpopulations and model curvilinear covariate effects, and we discuss the methodological and practical implications of the results. 02: AIC versus BIC für den Vergleich polynomialer Regressionsmodelle Universität Münster, Germany In der Psychologie werden häufig konkurrierende Konstellations-Hypothesen formuliert – also unterschiedliche Erwartungen dazu, wie die Konstellation zweier Prädiktoren mit einer Kriteriumsvariable zusammenhängt. Beispielsweise wurde häufig die Hypothese aufgestellt, dass Personen, deren selbsteingeschätzte Intelligenz ähnlich zu ihrer tatsächlichen Intelligenz ist (die also eine akkurate Selbsteinschätzung haben), ein besonders hohes Wohlbefinden haben sollten (Kongruenzhypothese). Eine alternative Hypothese besagt dagegen, dass eine leichte Überschätzung der eigenen Intelligenz zu maximalem Wohlbefinden führen sollte (Optimale Überschätzung Hypothese). Um solche konkurrierenden Konstellations-Hypothesen empirisch zu vergleichen, können sie in polynomiale Regressionsmodelle übersetzt werden, die dann mit Daten geschätzt und mit informationstheoretischen Kriterien verglichen werden. Im Kontext von Konstellationshypothesen wurde dazu bislang meist der AIC als Kriterium für den Modellvergleich verwendet. Es gäbe aber durchaus andere Kriterien, die den Vergleich von (teilweise nicht genesteten) polynomialen Regressionsmodellen erlauben. Das Ziel meiner Forschung ist daher, den häufig verwendeten AIC gegen den BIC zu vergleichen, der ein naheliegendes alternatives Kriterium für den Modellvergleich ist. AIC und BIC unterscheiden sich in ihrer theoretisch-mathematischen Motivation, der daraus abgeleiteten Berechnungsformel, und dementsprechend möglicherweise in ihrer Performanz beim Vergleich von Konstellationshypothesen. Um diese Performanzunterschiede aufzuzeigen, werde ich die Ergebnisse einer Simulationsstudie präsentieren, in der ich AIC und BIC vergleiche. Dabei betrachte ich mehrere datengenerierende Modelle von unterschiedlicher Komplexität, sowie variierende Stichprobengrößen. Ich werde die Ergebnisse insbesondere dahingehend interpretieren, ob eine erste Empfehlung für die Verwendung von AIC oder BIC für den Vergleich von konkurrierenden Konstellationshypothesen abgeleitet werden kann, um Anwender:innen bei der Auswahl eines Kriteriums zu unterstützen. 03: A unified parallel constraint satisfaction model for intertemporal and risky choice 1University of Mannheim, Germany; 2Technische Universität Dresden, Germany Decisions involving delayed and uncertain outcomes are a fundamental part of everyday life. Psychological research has identified systematic anomalies in both intertemporal and risky choice that deviate from traditional economic rationales. However, no existing model comprehensively accounts for these anomalies across both domains. This study presents a unified parallel constraint satisfaction (PCS) model to bridge this gap. The PCS model simulates decision-making as an interactive process where competing options and their attributes are represented as interconnected nodes in a neural network. The model dynamically maximizes coherence by iteratively adjusting activation levels until a stable choice emerges. The results demonstrate that the model qualitatively reproduces key decision anomalies, including the magnitude effect, sign effect, common difference effect, common ratio effect, path dependence, and sequential presentation biases. Furthermore, the model generates novel predictions for combined intertemporal and risky choices. This work demonstrates that a single, dynamic framework can account for a wide range of empirically observed anomalies across both domains and offers theoretical insights into the relationship between risk and delay in decision making. While further empirical validation is needed to assess the model’s quantitative accuracy, its ability to qualitatively reproduce key effects and generate testable predictions makes it a strong candidate for future research. Overall, the model provides a valuable step toward a unified, process-based account of decision-making under uncertainty and delay. 04: Benchmarking Julia against R for efficiency Universität Kassel, Germany In psychological methods research, R has been established as a programming language for performing Monte Carlo simulation studies. Parallel to R, the Julia programming language has increasingly emerged in recent years. Julia can be used to carry out simulation studies in a way similar to R, promising more efficiency and faster computation. In the context of increasingly complex statistical models, the question emerges whether Julia has a significant efficiency advantage over R, both when used for simulation studies and for data analysis in an experimental context. This study examines this by benchmarking Monte Carlo simulations in R and Julia. The goal is to find out whether and under which conditions one of the two programming languages promises efficiency advantages. 05: Comparing the performance of tree-based algorithms in meta-analysis Universität Münster, Germany When conducting meta-analysis, researchers frequently aim to explore which variables systematically influence study-outcomes. Usually, meta-regression is used for this purpose. While examining nonlinear patterns such as interactions is often of interest in meta-analysis, this is difficult to accomplish using traditional meta-regression as one would need to specify each possible interaction as a predictor. This comes with a great risk of overfitting. A promising alternative is to use meta-trees or meta-forests, which are adaptations of regression trees based on the CART algorithm. While meta-trees are advantageous due to their interpretability, they are less stable and more prone to overfitting compared to meta-forests. A new look ahead procedure was suggested for improving the stability of meta-trees, but has not been evaluated yet. Furthermore, it is unclear how to adequately account for residual heterogeneity when using meta-trees or -forests. We address these research gaps by conducting a simulation study in which we compare the performance of different types of meta-trees and -forests with respect to model and variable selection, predictive performance and the estimation of the between-study variance. We vary the number of studies, the sample size within the studies, the number of irrelevant predictors, the complexity of the population model and the distribution and intercorrelations of the predictors. The results we present should inform researchers about what model is best to use and what risks to be aware of when doing so. 06: Confirmatory psychological network analysis: Evaluating fit indices, similarity measures, and hypothesis tests Goethe University Frankfurt, Germany Network analysis is ever-growing in popularity among psychologists. Up until now, its applications have been predominantly exploratory and descriptive in nature. However, recognizing that psychological research must be based on substantive theory has raised the need to formulate and evaluate specific hypotheses about network structure and properties. While initial research has demonstrated the potential utility of goodness-of-fit indices used in structural equation modeling (SEM) for this purpose (Du et al., 2024), hypothesis-testing techniques available to applied researchers within the frequentist framework remain limited. The present work aims to extend this emerging area of research by evaluating existing methods and measures for confirmatory testing of networks spanning from similarity indices, SEM fit indices, to classical sphericity tests and permutation-based methods. Using a simulation study, we evaluate how different network estimation methods, varying degrees of (dis)similarity between theoretical and empirical networks, number of variables (p), sample size (N), and multivariate non-normality affect these measures in terms of alpha- and beta error rates, sensitivity, specificity, and statistical power, thus providing insights into their applicability and robustness across various data scenarios. 07: Deep Learning -Based Approaches for Continuous-Time Dynamical Systems University of Zurich, Switzerland This project presents a neural network architecture that combines state-space modeling with recurrent neural networks for prediction of irregular time series data. The proposed model addresses key challenges in time series analysis like variable measurement intervals, missing observations, and consistent predictions across different time scales. The model fuses state-space modeling using recurrent neural networks and numerical integration of ordinary differential equations and implements a measurement correction mechanism that ensures trajectory consistency when incorporating new observations into the latent state representation. The models use neural networks to learn temporal dependencies in time-series data, providing a flexible alternative to parametric state-space models (e.g., those in ctsem R package), adaptively inferring system dynamics without strong parametric assumptions while accommodating complex, nonlinear dependencies. They still enable meaningful interpretation through impulse response functions and conditional predictions. Interpolation techniques enhance prediction consistency and enable analysis of missing data points, broadening applicability to real-world scenarios with irregular measurements. Evaluation confirms the model's reliability across varying sampling frequencies, demonstrating robust performance in both dense and sparse data contexts across multiple scientific and industrial domains. Beyond prediction, these models contribute to interpretability by providing uncertainty estimates, confidence intervals, and dynamic response characteristics. Their ability to approximate probability distributions over trajectories improves risk assessment and decision-making in forecasting applications. This work bridges deep learning with continuous-time modeling, highlighting potential hybrid approaches that integrate strengths from both paradigms while facilitating comparison with structured approaches in psychological and biological research. 08: Detecting Atypical Response Patterns with the Latent Space Item Response Model University of Mannheim, Germany In the Rasch model, the total score of a test is a sufficient statistic for estimating the person parameter. This means that an individual’s person parameter is solely determined by their total number of correct responses, but not by which specific items were solved correctly. As a result, individuals can show response patterns that do not match the predictions of the Rasch model on the item level, e.g. solving the difficult items but failing to solve the easier items. Identifying such atypical response patterns of individuals by checking the goodness-of-fit of persons, is critical for ensuring validity in psychological and educational assessment. Typically, infit and outfit statistics based on item residuals in the Rasch model, or standardized person-fit measures that compare observed response patterns to those expected under the Rasch model, are used to assess the goodness-of-fit of persons. The recently introduced Latent Space Item Response Model (LSIRM, Jeon et al., 2021) could offer a promising alternative approach for the detection of person misfit. The LSIRM extends the Rasch model by placing items and respondents in a latent metric space to capture unexplained interactions between them. Deviations in response patterns from those expected under the Rasch model may indicate the presence of such unexplained interactions. Consequently, the LSIRM should be capable of detecting person misfit. The aim of the current research is to evaluate, through simulation studies, whether the LSIRM can effectively identify atypical response patterns that deviate from the expectations under the Rasch model. 09: Divided we stand: A tutorial on using variability in theory and data analysis 1FernUniversität in Hagen; 2Universität Hamburg; 3Durham University This poster presents some tools for utilizing variability in research. In both society and academic research, there is an increasing emphasis on acknowledging psychological divergences from averages. Yet, techniques required to analyze variability as a subject of interest rather than a nuisance are seldom taught in study programs. Consequently, interested researchers may benefit from support when asking themselves one or more of these questions: (1) How can I enrich my theories with variability on the conceptual level? (2) How can I use variability to test theories that I otherwise could not distinguish from each other (as with some competing theories of dual-task performance)? (3) How do I test variability hypotheses, especially in the face of measurement error? To help with this, we created a tutorial and are developing online resources/references. Drawing from computational cognition research, we propose a theory-oriented workflow that enriches research in experimental/cognitive psychology research with a variability perspective. We take you through five steps from developing a formal theoretical model, to deriving predictions and testing those predictions. Each step’s implementation is illustrated with a fictional researcher’s journey. Alongside exemplary theoretical mechanisms, we explain how variance function and quantile regression can be used to test hypotheses about variability. The workflow is extensible and easily adapted to specific research contexts. This poster provides an overview over this workflow and its components, providing interested researchers with several potential entry points to this wide class of approaches and perspectives. 10: Handling Missing Data in Longitudinal Designs: An Evaluation of Single-Level and Multilevel Approaches to Multiple Imputation Universität Hamburg, Germany Longitudinal studies are often affected by missing data, which can arise, for instance, from participant non-response or dropout. Multiple imputation (MI) is commonly recommended for addressing missing data, with two prevalent strategies in longitudinal analyses: multilevel MI, which treats repeated measures as nested within participants and employs multilevel models to model variable relationships over time, and single-level MI, which treats repeated measures as separate variables and uses single-level models. Previous studies have shown that both single-level and multilevel MI can provide accurate results in balanced longitudinal designs, but they have mainly focused on relatively simple applications of latent curve modeling (LCM) in which the assumptions underlying multilevel MI were precisely met. In the present study, we aimed to evaluate single-level and multilevel MI in more general contexts by conducting two simulation studies. Study 1 focused on simple applications of single-indicator LCMs with varying degrees of residual autocorrelation. Study 2 focused on applications of multiple-indicator LCMs, where we also considered variants of single-level MI using composite scores or dimension reduction techniques to simplify imputation models with many variables. The results indicated that multilevel MI can produce biased parameter estimates and decrease statistical power in model comparisons when residual autocorrelation is present. In contrast, single-level MI provided accurate results across the simulated conditions, making it a potentially more flexible approach for addressing missing data in longitudinal designs. 11: Modelling vigilance behaviour with stochastic processes 1TU Braunschweig, Germany; 2Friedrich-Schiller-Universität Jena We illustrate the use of stochastic processes for modelling behavioural dynamics. More concretely, we apply a Poisson Processes model to human gaze („scanning“) behaviour, based on an earlier model developed by Pulliam (1973) in the context of vigilance. The model is applied to empirical data collected in three different contexts with different degrees of social stimulation in an urban university setting. We specify hypotheses based on the model, predicting differences in scanning rates and inter-scan intervals. We evaluate the fit of the model-implied distributions graphically using QQ-plots and then use generalised linear models to investigate differences in scan frequencies and interscan-intervals between the three contexts. Results support the pattern predicted from theory. The presentation of the empirical results is accompanied by a discussion on the application and extension of previously developed models from behavioural ecology, such as Pulliam’s utilised here, to human behaviour and implications for quantitative predictions. Further directions for follow-up studies that we discuss include the use of recent methods for analysing coordinated behaviour using auto- and cross-correlation functions in the context of Beta-Normal models (Ukrow et al., 2024). Thus, we highlight the relevance of substantive psychological knowledge about the functionality of behaviour for appropriately selecting advanced statistical models. 12: Multiple Imputation of Binary Outcomes in Small Samples – A Comparative Study of Predictive Mean Matching and Logistic Regression Imputation University of Siegen, Germany Multiple Imputation (MI) is a common method for addressing missing data under Missing At Random (MAR) mechanisms , but has its limitations when sample sizes are (very) small [1]. While there is some research supporting MI’s reliability in small samples for continuous data, binary outcomes have received limited attention so far. One study found MI to be effective for logistic regression modeling binary data, but performance declined with smaller samples and higher missingness. This study systematically compares two MI approaches — Predictive Mean Matching (PMM) and Bayesian Logistic Regression Imputation (BLR)—for handling missing binary outcomes in small samples. Using Monte Carlo simulations (n = 1,000) and a 2-level factorial design, both methods were evaluated on estimate bias, computational efficiency, and algorithm convergence. Factors varied included missingness (10–60%), sample size (20–500), regression coefficient size (small/medium), and outcome imbalance (low/medium). Results show that both methods yield unbiased estimates for sample sizes above 200. However, bias increases in smaller samples, especially for PMM, which also showed larger standard errors and more frequent convergence issues under conditions of high missingness and outcome imbalance. Runtime was comparable across methods, with slight efficiency advantages for PMM. The results are discussed with regard to the applicability of both methods to provide practical guidance for researchers seeking to choose appropriate strategies to handle missing data in small samples. [1] Kristian Kleinke. “Multiple Imputation by Predictive Mean Matching When Sample Size Is Small”. In: Methodology 16.4 (2018), pp. 3–15. 13: On the Potential of Latent Space Item Response Models to Accommodate and Detect Differential Item Functioning in IRT Universität Mannheim, Germany Latent Space Item Response Models (LSIRMs; Jeon et al., 2021) offer a novel approach to addressing misspecifications in IRT models by embedding unexplained person-item interactions as distances in a latent metric space. Among model misspecifications, Differential Item Functioning (DIF) is particularly concerning, as it indicates that item parameters are not equal across groups of respondents despite them having equal person parameter distributions. Usually, the presence of DIF is investigated by comparing item parameters across a priori specified person groups. In contrast, LSIRMs offer a promising approach for DIF analysis without this a priori specification. Moreover, LSIRMS should inherently accommodate DIF, ensuring unbiased person and item parameter estimates. Until now, LSIRMs´ potential for DIF analysis has not been systematically investigated. Therefore, in the current project, a simulation study is conducted assessing LSIRMs' capability to accommodate and detect uniform and non-uniform DIF in dichotomous response data. The performance of four models in handling DIF is compared: a 2PL model, a 2PL LSIRM, the data-generating model, and the data-generating model equipped with an additional latent space. It is hypothesized that (1) the parameter estimates are estimated without bias in the latter three models but with bias in the 2PL model, and (2) the items and persons affected by DIF are detectable in the visual representation of the latent space of the 2PL LSIRM. Encouraging findings would highlight LSIRMs as a beneficial method for DIF analysis, emphasizing both the theoretical and practical merits of this model class. 15: STARTistica: An R package for extended outputs to accompany introductory statistics courses Bielefeld University, Germany Introductory statistics courses in psychology and related fields are often highly challenging for students and associated with increased statistics anxiety. During self-study phases, students often struggle to independently verify whether they have specified the test as intended, which diminishes self-efficacy. Additionally, the reliance on external (AI-)tools for solving R-related issues may hinder students' ability to learn and intuitively apply the programming language. The poster introduces the R package STARTistica, which is based on evaluations of students’ needs and aims at increasing the self-efficacy among students when practicing the correct application of statistical test functions. The package provides two key features: 1. Verification of Statistical Procedures: Statistical test results saved as R objects may be passed to a verification function. This function provides an extended test output with additional information on test specifications, permissible inferences, and statistical assumptions, enabling students to identify and correct misspecifications. 2. Visualizations: The extended outputs of the verification functions are accompanied by visualizations of data characteristics, statistical assumptions, and test results, since manual (or AI-assisted) generation of such complex visualizations in R would exceed the required programming skills of students. The package also provides a quick and efficient way for lecturers to show statistical test results alongside complementary information and visualizations. The poster showcases the key features of the package, illustrates applications for lecturers and students, and discusses methods to evaluate the benefits and limitations of the package’s contents. 16: Systematic evaluation of decision bias estimation in the drift diffusion model University of Münster, Germany Decision biases are of substantive interest for both cognitive and neuroscientists and are often modelled within the framework of evidence accumulation models. In one of the most prominent evidence accumulation models, the drift diffusion model (DDM; Ratcliff, 1978), decision bias can manifest as a starting point or drift rate bias. Only the starting point bias, however, is a parameter of the standard DDM. To date, simulation studies have rarely examined the estimation performance of these multiple biases, nor have they addressed mis-specified bias in the model (e.g., estimating the standard bias in starting point when, in fact, a drift rate bias was present). In our simulation study, we aimed to close this gap by systematically examining the estimation performance of all bias parameters, as well as the effect of mis-specified bias, in the DDM. We adopted a hierarchical Bayesian framework, estimated all models in JAGS, and systematically evaluated their estimation across hierarchical levels. The simulation also allowed us to compare and further develop possible transformations of the DDM bias parameters, offering, in sum, a comprehensive evaluation of decision bias estimation with insights for the cognitive sciences. 17: Temporal-Difference Learning Maps Onto Response Times - A Reinforcement Learning-Diffusion Decision Model of Two-Stage Decision-Making 1Heidelberg University, Germany; 2Leiden University; 3University of Amsterdam Behavioral adaptation in probabilistic environments requires learning through trial and error. While reinforcement learning (RL) models can describe the temporal development of preferences through error-driven learning, they neglect mechanistic descriptions of single-trial decision-making. On the other hand, sequential sampling models such as the diffusion decision model (DDM) allow for the mapping of state preferences on single response times. We present a Bayesian hierarchical RL-DDM integrating temporal-difference (TD) learning to bridge these perspectives. Our implementation incorporates variants of TD learning, including SARSA, and Q-Learning models. We tested the model with data from N = 54 participants in a two-stage decision-making task. Participants exhibited learning over time, becoming both more accurate and faster in their choices. They also reflected a difficulty effect, with faster and more accurate responses for easier choices, as reflected by greater subjective value differences between available options. Model comparison using predictive information criteria and posterior predictive checks demonstrated that, overall, participants seemed to employ on-policy learning through a SARSA learning model. Furthermore, the RL-DDM captured both the temporal dynamics of learning and the difficulty effect in decision-making. Our work represents an important extension of the RL-DDM into temporal-difference learning. 18: The Guessing Problem: Improving the Validity of a Knowledge Test - A Bayesian-Hierarchical Multinomial Processing Tree Approach Universität Mannheim, Germany A common way to reduce guessing in psychological achievement tests is to include an "I don't know" response option. However, this approach holds the risk that test-takers can systemically inflate their test score through correct guessing when they do not use this response option. This can lead to situations where people who guess a lot outperform people who guess less, even though the former know less. To disentangle the contributions of correct guessing and factual knowledge, we apply Bayesian hierarchical multinomial processing tree (MPT) modeling. The MPT modeling we use decomposes item processing into knowledge, honesty, and guessing parameters. We test our model on an empirical dataset of a general knowledge test administered to a sample of N = 371 subjects who took the test as part of a career counseling setting. This study evaluates the usefulness of the MPT approach in contrast to conventional assessment strategies, such as Classical Test Theory, by correlating the estimated person parameters with external criteria, such as crystalline intelligence or performance in vocabulary tests. In addition, the model fit is evaluated. Finally, possible extensions as well as advantages and disadvantages of this approach are discussed. 19: Three-Step Estimation of Structural Equation Models with Diffusion Model Components 1MLU Halle-Wittenberg, Germany; 2Uni Duisburg-Essen; 3TU Dortmund Structural equation models with diffusion model components combine a path model that represents relations between latent constructs with measurement models that relate the latent constructs to responses and response times on tests on basis of diffusion models. Fitting these models to data with maximum likelihood or Bayesian estimation is computationally intensive and difficult. In this paper, we investigate the performance of a simpler estimation approach that consists of three steps. In the first step, the measurement models are fit to the data. In the second step, the values of the latent traits are estimated. In the third step, the parameters of the path model are estimated using the estimated values of the latent traits as proxies for the true values. We investigate the performance of three-step estimation by means of a simulation study. In the study, we systematically vary the sample size and the number of items. We also compare different versions of the three-step estimator that differ in how the traits and the standard errors are estimated. The findings suggest that parameter recovery is good when traits are estimated with at least 48 items. With fewer items, only those versions perform well that take account of the effect of short scales on trait estimation. 20: To Accumulate or Not to Accumulate? A Comparison of Two Approaches for Modeling Temporal Trends in Autoregressive SEMs Universität Kassel, Germany Two popular approaches for analyzing longitudinal data with small numbers of time points using Structural Equation Models (SEM) are the Autoregressive Latent Trajectory Model (ALT; Bollen & Curran, 2004) and the Latent Curve Model with Structured Residuals (LCM-SR; Curran et al., 2014). Both frameworks include trends and autoregressive (AR) effects, but differ in their specification: In the LCM-SR framework, the AR effects are specified between residuals, keeping trends and AR effects separate, whereas in the ALT framework, AR effects are specified at the observational level, resulting in an accumulating (non-linear) effect of the trend over time. Therefore, in the ALT model, the time-specific (within) and person-specific (between) components of change are merged, whereas in the LCM-SR, they are separated, leading to divergent interpretations of the parameters. We compared these two models in a simulation study in terms of model convergence and interpretability for different sizes of autoregressive effects, trends, numbers of time points and sample sizes. We aim to identify specific advantages and disadvantages of each modeling framework and to derive recommendations for their use. 21: Unmasking the Faker: Heterogeneous Perception of Social Desirability in Context of the Multidimensional Nominal Response Model University of Mannheim, Germany Self-report questionnaires are widely used in research and practice, yet they are vulnerable to response biases, which can distort the interpretation of results. Biases like socially desirable responding (SDR), are particularly problematic in high-stakes situations, such as personnel selection, where individuals may engage in faking to present themselves more favourably. A promising approach to disentangle faking from substantive trait variance is the Multidimensional Nominal Response Model (MNRM). The MNRM models the probability of choosing a certain response category as a function of multiple latent dimensions. By allowing for varying effect patterns of social desirability on response categories, the MNRM reduces bias in substantive trait estimation. However, the model assumes equivalent effect patterns of social desirability across all test-takers, disregarding interpersonal heterogeneity in perception of desirability in a given context. This assumption may limit the applicability of the model in real-world scenarios. Here, I use a simulation approach to examine how violations of this assumption affect the MNRM’s ability to correct substantive trait estimates (and other model parameters) for faking. I apply different manipulations of heterogeneous social desirability effect patterns to test the model’s ability to recover person parameters of the substantive traits, and the performance of model selection criteria. 22: Using features of dynamic networks to guide treatment selection and outcome prediction: The central role of uncertainty 1Department of Psychology, University of Marburg, Germany; 2Department of Psychometrics and Statistics, Faculty of Behavioural and Social Sciences, University of Groningen Multivariate time series models are commonly used in psychology to investigate person-specific associations between multiple variables. They are often represented and interpreted as dynamic network models, where features such as the centrality of nodes can potentially guide treatment selection and outcome prediction. Researchers typically rely on point estimates of specific network features while ignoring estimation uncertainty, which can lead to wrong inferences and over-optimistic claims. We introduce a one-step Bayesian approach to estimating multilevel vector autoregressive models (BmlVAR), which enables uncertainty quantification for person-specific network features and the regression of external outcomes on such features. In a preregistered simulation study, we compare the new model with several popular methods for network estimation. We also apply all methods to empirical data to highlight their differences. Our simulation results indicate that all methods perform mediocrely in estimating different centrality measures in practically relevant settings. BmlVAR still outperforms the other methods in many simulation conditions, especially in terms of the power to detect associations with an outcome. However, estimating the model properly can be challenging with limited data. Overall, all methods require a lot of data or very large effects to produce reasonably accurate results. Thus, although centrality measures based on dynamic networks are widely used, most simulation settings suggest they are unlikely to work well for their current goals, such as guiding treatment selection and outcome prediction. We provide a new model that incorporates estimation uncertainty into the modelling process, thereby protecting against premature conclusions. 23: Why Did the Gender Gap in Life Satisfaction Among Adolescents in Leipzig Grow from 2010 to 2023? Making Sense of Models with marginaleffects 1Leipzig University, Germany; 2Max Planck Institute for Evolutionary Anthropology, Leipzig, Germany; 3Leuphana University of Lüneburg, Germany; 4Amt für Statistik und Wahlen, Stadt Leipzig, Germany The city of Leipzig in Germany conducts regular surveys of students aged 12 to 18. Following the surveys in 2010 and 2015, the 2020 wave was postponed to 2023 due to the COVID-19 pandemic. In this latest wave, the gender gap in life satisfaction had significantly widened—by approximately 0.3 standard deviations of the outcome scale. We explore why this relative decline occurred, probing several possible explanations, including scaling artifacts, shifts in the underlying population, and changes in assessment mode. We conclude that: (1) the widening gender gap is most pronounced among students with a migration background, likely due to major changes in the countries of (parental) origin; (2) among students without a migration background, part of the gap may stem from girls responding differently to tablet-based assessments; and (3) these factors alone cannot fully explain the widening gap in satisfaction with leisure time activities and friendships. Our analyses demonstrate the utility of the marginaleffects framework, which enables researchers to interrogate a wide range of statistical models in a consistent way to address targeted research questions. This approach can help applied researchers improve their modeling practices by reducing the need to fully understand the technical implications of modeling decisions—such as how the coding of categorical variables, the inclusion of interactions, or the use of ordinal models affect the meaning of individual coefficients—allowing them to focus instead on what they want to learn from their models. 24: The Early Prosocial Behaviour Questionnaire (EPBQ): Examining factorial, convergent, and discriminant validity through a multitrait-multirater approach Faculty of Psychology and Education Sciences, University of Porto, Portugal This study investigates the factorial, convergent, and discriminant validity of the Early Prosocial Behaviour Questionnaire (EPBQ; Giner-Torréns & Kärtner, 2017), a 10-item questionnaire designed to assess young children's prosocial behavior (PB) through caregivers' reports. In its original form, the EPBQ is structured around three underlying constructs reflecting distinct types of PB: helping, sharing, and comforting. After translating and adapting the Portuguese version of the EPBQ for both parents and teachers, we collected ratings of children's PB from mothers, fathers, and teachers in a community sample of 247 preschool-aged children (54% boys; Mage = 56.26 months; SD = 12.19). Confirmatory factor analyses supported the EPBQ's hypothesized three-factor structure, χ²(264) = 449.64, p < .001, RMSEA = 0.06 (90% CI [0.05, 0.07]), CFI = 0.96, TLI = 0.94, SRMR = 0.05. Moreover, partial scalar measurement invariance was established across mothers', fathers', and teachers' ratings. A CT-C(M-1) model (Eid et al., 2008), using teachers' reports as the reference method, was fitted to inspect convergent and discriminant validity, χ²(218) = 293.48, p < .001, RMSEA = 0.04 (90% CI [0.03, 0.05]), CFI = 0.98, TLI = 0.97, SRMR = 0.06. Despite relatively high item reliability, parents' ratings demonstrated low convergent validity relative to teachers' ratings. Mothers and fathers had a unique perspective on children's PB above and beyond their partial overlap with teacher reports. The correlations between latent traits indicated modest discriminant validity between traits representing distinct nuances of PB. | ||