The Influence of Rapid Guessing on Group Comparisons with Plausible Values
Eva Zink, Jana Welling, Timo Gnambs
Leibniz Institute for Educational Trajectories, Germany
Accurate estimation of achievement gaps in large-scale assessments relies on valid and unbiased measures of competence. However, test-taking disengagement, particularly rapid guessing, can distort ability estimates and bias group comparisons. Existing approaches for mitigating the distorting effects of rapid guessing on competence estimates focus mainly on point estimates of ability, disregarding the fact that competence analyses are often based on plausible value estimates, which are corrected for measurement error.
This study aims to (a) determine the impact of rapid guessing on achievement gaps based on plausible values and (b) propose and evaluate approaches for accounting for rapid guessing in plausible value estimation. Four models were compared: (1) a baseline model that did not account for rapid guessing while including the group variable in the background model, (2) a person-level model incorporating response time effort as an additional covariate in the background model, (3) a response-level model filtering all responses with item response times lower than a predetermined threshold, and a (4) combined model merging the person-level model and the response-level model entailing filtering responses flagged as rapid guesses while including response time effort as a covariate in the background model.
Using both a simulation study and empirical data from the National Educational Panel Study, we evaluate these approaches and offer methodological recommendations for improving the estimation of achievement gaps in the presence of rapid guessing.
Validation of an accumulator model for persistence in cognitive tests
Sören Much1,4, Augustin Mutak2, Steffi Pohl3, Jochen Ranger4
1Universität Leipzig, Germany; 2University of Zagreb, Croatia; 3Freie Universität Berlin, Germany; 4Martin-Luther-Universität Halle, Germany
Interindividual variation of effort in low-stakes cognitive tests is challenging for psychometric modelling. Previous successful approaches focus on threshold or model-based identification strategies which are innately dichotomic. We present the results of a validation study of a process model that incorporates test-taking engagement as a continuous facet of each item response process. This considers that test-takers can solve parts of an item and generate a response based on their current progress when they disengage.
The model is based on the Linear Ballistic Accumulator model (LBA, Brown & Heathcote, 2008), that describes a race of information accumulators for each response option towards a common response threshold. In our extension, we assume two accumulation processes that generate correct and incorrect responses and an accumulator that represents a test-taker’s persistence. When this one wins the race, the response is determined by the status of the other two (correct, incorrect or an omission). The parametrization includes person and item parameters.
In the preregistered validation study for this model, we recorded data from N=1244 participants online who completed a matrix reasoning test. The dataset (Much & Mutak et al., 2025) includes, among other things, self-report and behavioral measures of test-taking effort. Relaxing assumptions of trait distributions due to the heterogenous sample yielded a good model fit. Trait estimates of persistence correlate strongly with the behavioral effort measure, providing evidence for the validity of the model parameter interpretation.
Incorporating Attention Check Items into a Mixture IRT Model: An Experimental Evaluation of the Detection of Careless Responding
Irina Uglanova1, Gabriel Nagy1, Esther Ulitzsch2,3
1Leibniz Institute for Science and Mathematics Education; 2Centre for Research on Equality in Education, University of Oslo; 3Centre for Educational Measurement, University of Oslo
Self-report surveys are often compromised by careless and insufficient effort responding (C/IER), wherein respondents provide answers without fully considering the item content. Detecting C/IER is critical for maintaining data quality. In substantive research, detection typically relies on attention check items—purposefully designed items to identify inattentive respondents by prompting predictable responses. In contrast, advanced psychometric approaches employ mixture item response theory (IRT) models, classifying respondents as attentive or careless based solely on their responses to substantive items.
This study integrates these two streams of research by evaluating an extended mixture IRT model that incorporates attention check items with additional constraints tailored to C/IER behavior. We compared this extended model to the initial mixture IRT model (based solely on content item responses), using data collected through an experimental design. The experimentally manipulated conditions were designed to either provoke or prevent C/IER through variations in survey instructions and length.
The initial model identified 90.6% of respondents as attentive, while the extended model classified 86.2% as such. The two models demonstrated substantial agreement, with a classification correlation of .79. Both models detected a significantly greater proportion of C/IER responses in the C/IER-evoking condition compared to the C/IER-preventing condition, but no significant difference between the C/IER-preventing and baseline conditions. These findings suggest that the extended model applies a stricter classification criterion without reducing sensitivity to experimental manipulations.
Incorporating Disengagement Indicators in Differential Effect Analysis: An Empirical Comparison of Modeling Strategies
Marie-Ann Sengewald1, Janne v. K. Torkildsen2, Jarl K. Kristensen2, Esther Ulitzsch2
1Leibniz Institute for Educational Trajectories, Germany; 2University of Oslo, Norway
Various methods have been proposed to utilize response time data from computer-based assessments to account for disengaged responses. In psychometric analyses, the primary goal is to improve the assessment design and enhance data quality. With the growing use of digital treatments, such as learning apps, response time data also provides insights into how participants interact with these apps, which can be beneficial for evaluating their effectiveness. Key questions regarding the use of response time data for substantive analysis include how to effectively integrate psychometric and substantive analysis approaches, as well as the validity of disengagement indicators. To address these questions, we provide new empirical insights based on a study by Torkildsen et al. (2022), who developed an app-based morphological training program and evaluated its effectiveness in a randomized controlled trial involving 717 second-grade students. Specifically, we: a) demonstrate the construction of indicators for baseline disengagement using different approaches and their integration into differential effect analysis, and b) compare their benefits for evaluating the effectiveness of the treatment. Our findings reveal that disengagement indicators provide valuable information not only for enhancing data quality but also for gaining a better understanding of who benefits most from the treatment. However, substantial differences in effect sizes emerge depending on the chosen approach for constructing disengagement indicators and the assumptions underlying the differential effect analysis. We discuss the implications and limitations of the presented modeling approaches, as well as directions for future methodological developments to effectively use response time data in evaluating digital treatments.
|