Combining machine learning with mixed effects models
Chair(s): Björn S. Siepe (University of Marburg, Germany)
Machine learning techniques are increasingly being used in psychology to deal with high-dimensional datasets. However, data in psychology are often hierarchically structured, which many machine learning algorithms are not equipped to deal with. This symposium consists of several talks that explore how well machine learning approaches can be used in and adapted to these hierarchical structures.
The first two talks focus on solutions for cross-sectional multilevel settings with many predictors. Jansen discusses regularisation techniques in the context of meta-regression, where it is often unclear which of potentially many moderator variables are relevant. The talk includes a performance comparison of different approaches and a discussion of extensions to the multilevel setting. Hany investigated tree-based methods with random effects, which have been increasingly used in recent years. As the talk illustrates, their focus on predictive performance may often lead to inaccurate inclusion of predictors and erroneous conclusions when interpreting these models.
The second two talks in the symposium discuss modelling techniques for psychological time series data. Andriamiarana investigates Bayesian regularising priors in the context of longitudinal multilevel latent variable models. Simulation results and an empirical example show that certain regularisation priors may be preferable for such longitudinal models. Finally, Ernst combines long short-term memory networks, a deep learning technique that has achieved great success in time series forecasting, with random effects estimation. Results from several synthetic and real data benchmarks show the potential and pitfalls of using this complex modelling approach for hierarchical data.
Presentations of the Symposium
Moderator selection in meta-analysis using lasso and elastic net regularization
Katrin Jansen, Steffen Nestler Department of Psychology, University of Münster
A central aim of meta-analysis is to assess how much effect sizes vary across studies, and to determine whether specific study characteristics account for this variability. Typically, meta-regression approaches are used to identify potential moderator variables and to assess their impact on the size of effects. However, several challenges frequently arise when using these approaches: First, there is often insufficient knowledge about which study characteristics are truly relevant while at the same time, there is limited guidance on effective methods for model selection in meta-analysis. Second, the number of potential moderators is often large relative to the number of available studies, so that including all moderators in the same meta-regression model may compromise stability. Third, correlations among moderator variables can cause multicollinearity, resulting in imprecise coefficient estimates and potential overfitting. Regularization techniques such as the Lasso and the Elastic Net offer a promising framework for addressing these issues simultaneously. Interestingly, their use in meta-analysis has been limited, possibly due to a paucity of research on their application within this specific context. In this talk, we therefore demonstrate how these regularization techniques can be applied in meta-regression. Furthermore, we present the results of a simulation study comparing the performance of regularized meta-regression in terms of moderator selection and parameter estimation with that of traditional meta-regression and information-theoretic approaches. Finally, we discuss potential extensions to multilevel and multivariate meta-regression.
Tree-based methods for multilevel data: The influence of predictor levels on variable selection and prediction
Linus Hany, Mirka Henninger University of Basel
Machine learning methods, such as decision trees and random forests, allow researchers to investigate complex non-linear and interaction effects. Therefore, these methods have become valuable tools for exploring complex psychological processes. In recent years, first attempts have been made to extend machine learning methods for their application to multilevel data, e.g., with Level-1 (e.g., students) and Level-2 (e.g., classes) units. While these adaptations often include adding random effects to the models, they typically do not address the influence of the levels at which the predictor variables operate. We conducted a simulation study evaluating variable selection and the quality of prediction for a range of tree-based methods when the data has a multilevel structure. We assessed different tuning strategies, and varied key parameters like the ICC, the effect sizes of the predictors, and the sample size. Our simulation studies show that for variable selection, the risk of inaccurately selecting Level-2 predictor variables is substantial, especially when the intra-class correlation is high. Further we show that prediction can be affected when Level-2 predictors are falsely selected. When developing new decision tree methods, researchers predominantly aim at enhancing predictive capabilities often at the cost of rigorously assessing the inferential quality, such as the risk of drawing false positive conclusions. We will discuss how the focus on prediction rather than inference might imply risks of drawing false conclusions when applying tree-based methods to multilevel data and how to potentially overcome these risks.
Are Bayesian regularization methods a must for dynamic latent variable models?
Vivato V. Andriamiarana, Pascal Kilian, Holger Brandt, Augustin Kelava Methods Center, Eberhard Karls University of Tübingen
Due to the increased availability of intensive longitudinal data, researchers have been able to specify increasingly complex dynamic latent variable models. However, these models present challenges related to overfitting, hierarchical features, non-linearity, and sample size requirements. There are further limitations to be addressed regarding the finite sample performance of priors, including bias, accuracy, and Type-I error inflation. Bayesian estimation provides the flexibility to treat these issues simultaneously through the use of regularizing priors. In this paper, we aim to compare several Bayesian regularizing priors (ridge, Bayesian Lasso, adaptive spike-and-slab Lasso, and regularized horseshoe). To achieve this, we introduce a multilevel dynamic latent variable model. We then conduct two simulation studies and a prior sensitivity analysis using empirical data. The results show that the ridge prior is able to provide sparse estimation while avoiding overshrinkage of relevant signals, in comparison to other Bayesian regularization priors. In addition, we find that the Lasso and heavy-tailed regularizing priors do not perform well compared to light-tailed priors for the logistic model. In the context of multilevel dynamic latent variable modeling, it is often attractive to diversify the choice of priors. However, we instead suggest prioritizing the choice of ridge priors without extreme shrinkage, which we show can handle the trade-off between informativeness and generality, compared to other priors with high concentration around zero and/or heavy tails.
Mixed effects LSTMs: Long short-term memory neural networks for hierarchical data
Anton Ernst, Daniel W. Heck, Björn S. Siepe Department of Psychology, University of Marburg
Machine learning models have become a popular choice for analyzing psychological data. One major limitation of nearly all of these models is their assumption that data is independent and identically distributed (IID assumption). This assumption cannot generally be assumed to hold for hierarchical data structures, such as time series data from multiple participants. Therefore, this type of data poses a challenge to many machine learning models, including a commonly-used recurrent neural network architecture, the long short-term memory (LSTM, Hochreiter & Schmidhu ber, 1997). We propose the mixedLSTM, a variant of the LSTM for hierarchical data, based on the mixedML framework (Kilian, Ye, & Kelava, 2023), that allows for capturing non-linear effects at the group level and linear effects at the cluster levels and does not require data to be strictly IID. The mixedLSTM is tested in three different benchmarks on simulated and real-life data and compared to a range of state-of-the-art models (vector autoregression, Sims, 1980, other LSTM variants, and gradient boosted trees, Natekin & Knoll, 2013). Overall, the results indicate that including cluster-level effects is indeed beneficial. The benchmark on real-life data motivates future extensions for non-linear random effects.
|