17th Conference of the Methods & Evaluation Section of the German Psychological Society (DGPs)

Overview and details of the sessions of this conference. Please select a date or location to show only sessions at that day or location. Please select a single session for detailed view (with abstracts and downloads if available).

Session

Response Scales

Time:

Wednesday, 01/Oct/2025:

9:00am - 10:30am

Session Chair: Tanja Kutscher

Location: Raum L 116

60

Presentations

When Preferences Matter: The Impact of Rating Scale Preferences on Survey Responses and Measurement Quality

Tanja Kutscher¹, Michael Eid²

¹Leibniz Institute for Educational Trajectories, Germany; ²Freie Universität Berlin, Germany

Rating scales are prone to individual response tendencies, which can compromise measurement quality, particularly when response formats do not align with respondents' preferences. Building on Tourangeau et al.'s (2000) cognitive response model, this study examines how the use of preferred versus non-preferred rating scales affects response behavior, measurement invariance, and correlations with external variables.

In a large-scale online study (N = 6,836) conducted via Amazon Mechanical Turk, two constructs - flourishing and job satisfaction - were assessed using three rating scales (4-, 6-, and 11-point). Participants were randomly assigned to one of the rating scales and subsequently indicated their preferred rating scale. This design enabled systematic comparisons between preference-congruent and incongruent groups.

Measurement invariance was tested using item response theory with multiple-group generalized partial credit models (GPCMs). Additionally, beyond the primary trait, category-specific preferences were estimated with an extended multidimensional partial credit model based on Bolt et al. (2014). Post-hoc analyses included principal component analyses and multinomial logistic regressions to explore dominant response patterns and differences between preference groups.

Results revealed partial measurement invariance, particularly in metric and scalar terms, across preference-congruent and incongruent groups. Systematic differences in response behavior emerged between groups that preferred or did not prefer a particular rating scale. However, preferences had only limited effects on correlations with external variables.

Implications for the construction of psychological measures and for considering individual response format preferences in survey research are discussed.

Theoretical and Empirical Evidence For High Data Quality With Visual Analogue Scales

Tim Angelike^1,2, Frederik Funke¹, Ulf-Dietrich Reips¹

¹University of Konstanz, Germany; ²Heinrich Heine University Düsseldorf, Germany

In three independent studies, we investigated the relationship between the type of response scale (discrete vs. continuous) and the respective formatting error defined as the disparity between the true value and a given response. In Study 1, we simulated responses from various bounded continuous distributions and transformed them into discrete response options to quantify the extent of error associated with discrete rating scales. A substantial interaction was observed between the number of response options and the actual distribution of true values. Furthermore, we examined the impact of response styles to select response options situated at the center or at the extremes of the response scale on the occurrence of formatting error. The presence of response styles to the center or the extremes generally resulted in an increase in formatting error and amplified the interaction between the number of response options and the distribution of the true value. Study 2 and Study 3 were Web surveys experiments, in which respondents were randomly assigned to either visual analogue scales (VASs) or to 5-point scales (Study 2), or to VASs, 5-point, 7-point, or 9-point scales (Study 3). Both studies provide empirical evidence that VASs outperform discrete rating scales at the item level regarding the standard error of measurement. Overall, VAS measurement is subject to significantly less error, resulting in narrower confidence intervals and greater statistical power, facilitating the detection of small effects that may not be observable with discrete rating scales.

Detecting Response Styles in Slider Scales

Dominik Vollbracht¹, Mirka Henninger², Tanja Lischetzke¹

¹RPTU University Kaiserslautern-Landau, Germany; ²University of Basel

Slider scales (or visual analogue scales) are frequently used in psychological research as continuous alternatives to traditional Likert-type scales. While their psychometric properties have been examined, research on response styles—systematic biases affecting data validity and reliability—is limited. Common response styles include extreme response style, midpoint response style, and acquiescence response style.
Analyzing response styles in slider data is challenging due to the lack of appropriate statistical methods. To address this gap, we developed the Beta Item Response Model with Response Styles (BIRM-RS), an extension of the Beta Item Response Model. The BIRM-RS explicitly models extreme response style, midpoint response style, and acquiescence response style, enabling the detection and interpretation of response biases in slider scales and other bounded continuous data.
We are currently evaluating the BIRM-RS through simulations assessing its accuracy in recovering item and person parameters, including response style parameters, under varying conditions (such as sample size and parameter variability). Performance metrics include root mean square error, bias, and correlations between estimated and true parameters. An empirical application further demonstrates the practical usefulness of the BIRM-RS.

Measuring Ambivalence of Affective Normative Words Using Dual Range Sliders

Matthias Kloft

Philipps-University Marburg, Germany

Affective normative words like “fire” or “hospital” are used as stimuli in research on emotion and attention. The normative valence ratings for these words were initially collected using a bipolar rating scale from negative to positive. This comes with the challenge that neutral valence ratings on the center of the scale might mask ambivalence instead of indicating true neutrality. For example, the word "fire" can evoke both positive (bond fire) and negative (forest fire) associations for a respondent. Prior studies have investigated ambivalence by employing distinct rating scales for positive and negative valence. We propose an alternative method, namely using interval ratings on a bipolar scale collected via the dual range slider response format, where respondents can set a lower and upper bound for the valence from negative to positive. In a study involving psychology students, we demonstrate that interval ratings are effective in measuring the ambivalence of affective normative words, comparable to distinct unipolar scales.

Conference Agenda