Session | ||
Differential Item Functioning and Parameter Instability
| ||
Presentations | ||
Differential Item Functioning in Polytomous Diagnostic Classification Models: An Extension of the Sequential G-DINA Model 1Research Unit of Psychological Assessment, Faculty of Rehabilitation Sciences, TU Dortmund University, Dortmund, Germany; 2Department of English, Vali-e-Asr University of Rafsanjan, Rafsanjan, Iran; 3Department of Educational Psychology, University of Minnesota, Minneapolis, U.S.A. In psychological assessment, differential item functioning (DIF) occurs when respondents from different groups (e.g., gender) respond differently to an item despite having the same underlying symptom profile. In the diagnostic classification models (DCMs) framework, an item is flagged as DIF if respondents from different groups with the same symptom profile have different probabilities of endorsing the item, suggesting that the item may be influenced by group-specific factors unrelated to the underlying attributes. Despite the growing interest in DCMs, little attention has been paid to DIF analyses using polytomous DCMs, such as the sequential Generalized Deterministic Inputs, Noisy “And” gate (sG-DINA) model. One major challenge is that existing R packages such as GDINA and CDM currently lack support for conducting DIF analyses within polytomous DCMs. To address this gap, we developed custom code to extend the GDINA package, allowing DIF detection for polytomous items and providing detailed information on response category thresholds. Using this extension, we analyzed responses from 50,831 German participants (both clinical and non-clinical) to the simplified version of the Beck Depression Inventory (BDI-S), a polytomous psychological screening tool, to investigate DIF across gender. The results of Wald test identified DIF in 20 response categories across different items, indicating potential measurement inequivalence in the BDI-S. These findings highlight the importance of evaluating DIF in polytomous DCMs, especially when applied in diverse populations, to ensure the validity and fairness of psychological assessments. A Local Nonparametric Framework for Detecting DIF Along a Continuous Covariate Across Diverse IRT Models Institut für Psychologie, Goethe-Universität Frankfurt am Main Differential item functioning (DIF) threatens the validity of test score interpretations by biasing comparisons of person abilities, making its detection a central issue. Research has developed three main strategies for detecting DIF regarding a continuous covariate: (1) model-agnostic procedures (e.g., multiple-group approach) are compatible with various item response theory (IRT) models but rely on discretizing continuous covariates; (2) model-specific extensions (e.g., moderated factor analysis) allow for continuous covariates but are restricted to specific IRT models with predefined functional forms (e.g., quadratic trends) on DIF parameters; and (3) model-agnostic tests (e.g., score-based test) preserve covariate continuity and are not restricted to IRT models but merely flag the presence of DIF without describing how it changes across the whole range of the covariate. Inspired by local structural equation models, we propose a local non-parametric DIF detection framework that inherits the flexibility of all three lines while avoiding their limitations. This framework integrates kernel-based local weighting with an overlapping, weighted multiple group M-estimator. For the statistical inference of DIF, a person-level cluster bootstrap is employed due to the overlapping samples. Using a preliminary simulation, we demonstrate that the proposed framework outperforms previous methods while revealing nonlinear DIF patterns along the covariate continuum. Trend Estimation in Longitudinal Assessments: Comparing Concurrent Calibration, Item Parameter Drift Detection-Based Methods, Robust Linking, and Regularized Estimation 1IPN − Leibniz Institute for Science and Mathematics Education, Kiel, Germany; 2Centre for International Student Assessment (ZIB), Kiel, Germany In longitudinal assessments, tests are frequently used to estimate trends. When item parameters lack invariance, time point comparisons can be distorted, requiring appropriate statistical methods for accurate estimation. This talk compares trend estimates using the 2PL model under item parameter drift (IPD) across four linking approaches for two time points. First, two methods assume invariant item parameters: concurrent calibration jointly estimates item parameters across time points, while fixed item parameter calibration estimates them at one point and fixes them at the other. Second, separate calibration of the two points is followed by robust Haberman or robust Haebara linking via common items to place parameters on a common scale. Third, noninvariant items are detected using likelihood ratio tests or the root mean square deviation (RMSD) statistic with fixed or outlier-based cutoffs, and trend estimates are recomputed using only the identified invariant items. Fourth, regularized estimation under a smooth Bayesian information criterion (SBIC) is applied, shrinking small or null IPD effects toward zero while estimating all others as nonzero. The simulation varied sample size, number of items, IPD effect size, IPD item proportion, balanced or unbalanced IPD, and the average change in ability between time points. Bias and relative RMSE were evaluated for the mean and SD at the second time point. Results suggest SBIC generally performed best, followed by Haberman linking with the L0 loss function. For the detection-based approach, commonly used RMSD cutoffs may be too lenient; stricter thresholds appear necessary to achieve satisfactory parameter estimates. On the meaning of measurement invariance in social relations – confirmatory factor analysis for relative variance parameters Universität Konstanz, Germany We present and illustrate meaningful ways to assess relative variance parameters (variance components) in multiple indicator social relations – confirmatory factor analysis models for dyadic round-robin data where different types of measurement invariance may hold. With simulation studies, we investigate under which conditions of sample-size, true parameter values, and (mis-)specified invariance restrictions estimation issues as well as biased and inaccurate parameter estimates occur. Estimation issues are commonly observed in realistic data situations with low person-level variances and comparably few members per round-robin group. However, such issues can be effectively avoided by (falsely) implementing invariance restrictions across factor loadings without severely biasing relative variances for the sum-score and reciprocity correlations. Implications and limitations are discussed. |