| 
Level-specific reliability coefficients from the perspective of latent state-trait theory  Lennart Nacke, Axel Mayer Bielefeld University, Germany 
The growing popularity of the ecological momentary assessment (EMA) method in psychological research requires adequate statistical models for intensive longitudinal data, with latent state-trait models (LST models) and multilevel confirmatory factor analysis (ML-CFA) as two frequently applied alternatives. Within these models, considerable attention has been given to the computation of different types of reliability coefficients capturing the proportion of variance attributable to stable between-person differences, situation-specific aspects and measurement error variance. In (indicator-specific) multistate-singletrait LST models, consistency, specificity, reliability, and method specificity are defined, whereas in ML-CFA, intra-class correlation coefficients (ICC) and reliability at Level 1 (within-subject reliability) and Level 2 (between-subject reliability) are considered. While LST models for EMA studies can also be specified in a multilevel framework, the formal relationship between commonly applied LST coefficients and coefficients derived from ML-CFA has not been explored yet. The current study closes this gap and demonstrates similarities and differences between these coefficients. In particular, we show that method specificity reflects between-subject reliability, that consistency corresponds to the ICC, and highlight differences between specificity and within-subject reliability. The theoretical findings are illustrated using data from an experience sampling study on the within-person variability of narcissistic admiration and rivalry (Heyde et al., 2023). Our results are discussed in terms of previous findings reporting very high between-subject reliability for methodologically homogenous scales, arguing that these estimates should be interpreted with caution. 
 
 
Estimating trait negative emotion differentiation: How many measurement occasions and emotion items are needed?  Sabrina Ecker1, Esther Ulitzsch2, Tanja Lischetzke1 1RPTU University Kaiserslautern-Landau, Landau, Germany; 2University of Oslo, Oslo, Norway 
Negative emotion differentiation (NED) – the extent to which individuals distinguish between negative emotional states in a fine-grained manner – is commonly assessed as a trait by calculating the intraclass correlation across momentary emotion ratings from intensive longitudinal data. However, there are currently no recommendations regarding the number of measurement occasions and the number of emotion items required to adequately assess NED, resulting in considerable variation in these two aspects across studies. The present research aimed to address this gap by examining the trustworthiness of individuals’ NED estimates across different numbers of occasions and emotion items. Intensive longitudinal data from an ambulatory assessment study with a relatively large number of occasions (100 per participant, 40 participants) served as a benchmark. We systematically manipulated the number of occasions and the number of emotion items post hoc by drawing subsets from the benchmark data. We then compared the NED estimates from these conditions to the NED estimates from the benchmark using both absolute indicators (estimation problems, between-person standard deviation, reliability) and relative indicators (correlation with the benchmark, difference in the estimates, root mean square error). The NED estimates showed a high reliability (≥ .82) for at least 60 occasions per person, and they became stable and comparable to the benchmark for both item number conditions (5 vs. 15) at around 50 to 60 occasions. The results suggest that the trustworthy and reliable estimation of trait NED requires a comparatively large number of occasions, challenging researchers to balance participant burden and trustworthy estimation. 
 
 
Dynamic Systems Approaches for Satisfaction and Affect Panel Data -- Some Complications  Michael Aristodemou, Charles C Driver UZH, Switzerland 
How do affect and satisfaction type variables relate to each other over long time spans of months and years? To what extent do between person differences at shorter time scales become within-person differences at longer? Using comparable datasets from GESIS (bi-monthly) and SOEP (yearly) we will discuss some of the results of this continuous-time state space modelling work, as well as important considerations on the modelling of time and the link between measurements and process. For example, if people really integrate over the past month when asked 'how have you felt for the last month', how can we represent this aggregation process in our model? Or, when negative affect responses are clustered towards the bottom of the scale, how can we include this in a dynamic systems representation -- does this clustering arise from stable individual differences or from properties of the measurement scale which also influence detection of within-person change? 
 
 
Comparing modeling strategies for intensive longitudinal data within the Latent State-Trait framework – an example using students’ fatigue  Denny Kerkhoff, Axel Mayer Bielefeld University, Germany 
The increasing popularity of ecological momentary assessment in psychological research has been accompanied by advancements in modeling strategies for intensive longitudinal data (ILD). Modeling ILD requires not only decisions on the underlying temporal process like autoregression and latent change, but also an assessment of the psychometric properties of the – typically short – scales. In addition to measurement invariance tests to evaluate the quality of the scales, the revised Latent State-Trait (LST-R) framework provides a complementary perspective on the psychometric properties by decomposing variance into trait, occasion, and error components—yielding consistency, specificity, and reliability coefficients. In light of the multitude of plausible modeling strategies for ILD, a comparison of approaches is warranted to assess the robustness of inferences regarding psychometric properties across the different analysis strategies. To achieve this, we compare classic LST models in wide data format with multilevel SEM approaches like dynamic structural equation models using data from 137 university students who reported momentary fatigue seven times per day over a two-week period using a three-item scale. Modeling strategies differ in assumed temporal processes, e.g., variability vs. change. We examine the implications of each modeling strategy for assessing longitudinal measurement invariance and variance-decomposition based coefficients (reliability, specificity, consistency). Preliminary analyses based on a five-day subset suggest that factor loadings remain relatively stable across modeling specifications, whereas item intercepts show greater sensitivity to assumptions about change. Since the variance decomposition is contingent on the statistical model, caution is warranted when interpreting reliability, consistency, and specificity across modeling strategies. |