17th Conference of the Methods & Evaluation Section
of the German Psychological Society (DGPs)

September 28 - October 1, 2025 | Berlin, Germany

JavaScript is Disabled
Your browser's JavaScript functionality is disabled. It has to be enabled to use this function of ConfTool.
Here you can find information on how to enable JavaScript
If you have any problems, please contact the organizers at fgme-berlin-2025@psychologie.fu-berlin.de.

Conference Agenda

Overview and details of the sessions of this conference. Please select a date or location to show only sessions at that day or location. Please select a single session for detailed view (with abstracts and downloads if available).

Session Overview

Session

Import to your local calendar

Large Language Models

Time:

Monday, 29/Sept/2025:

11:30am - 1:00pm

Session Chair: Rahel Franziska Geppert

Location: Raum L 113

76

Presentations

Agent-Based Model on a Quantum Computer

Rahel Franziska Geppert^1,2, Hedwig Körfgen², Sabine Tornow²

¹Health and Medical University Erfurt, Germany; ²University of the Bundeswehr Munich, Germany

Agent-based modeling (ABM) has long been used to explore and predict patterns of societal polarization. Building on previous work, we have proposed an extension that introduces agent heterogeneity based on empathy levels, enhancing the model’s predictive power by allowing agents to react differently to opposing views. In this contribution, we compare two distinct implementations of this empathy-based polarization model. Version A is a traditional ABM developed using repast4py, simulating interactions through sequential time steps where agents evaluate and potentially change opinions based on predefined rules and neighbor states. Version B translates the same conceptual framework into a quantum computing context, where antecedents of opinion change are evaluated simultaneously via quantum state estimation. We discuss key differences in the formalization of these models. While Version A follows a classical rule-based paradigm with discrete agent state transitions, Version B reformulates these transitions using quantum superposition and entanglement to reflect the probabilistic nature of opinion shifts. Implementation-wise, Version A benefits from mature tooling and scalable computation on classical hardware, whereas Version B requires encoding agent states into qubits and using quantum gates to simulate interaction dynamics. Finally, we compare how outcomes are interpreted in both models. This comparative study highlights both the potential and current limitations of quantum approaches to modeling complex social dynamics.

From Text to Context: Conversational Large Language Models as Automated Data Extraction Tools

Isabel Mertins, Fridtjof Petersen, Laura F. Bringmann

Department of Psychometrics and Statistics, University of Groningen, Groningen, The Netherlands

Background: Recently, there has been a rise in qualitative information gathered with experienced sampling methods (ESM). However, most researchers may be lacking the appropriate tools to effectively analyze these vast amounts of qualitative responses. A promising aid to reduce the burden of manual coding and to increase consistency among coders are conversational Large Language Models (LLMs). LLMs may assist in the manual coding process by automatically extracting relevant information from participants’ responses and by identifying missing information. In the current study, we provide a proof of concept that contextual information of daily activities can be extracted using LLMs.

Methods: We use data from an ESM study in which participants describe stressful daily events in free-text format. The gathered descriptions are manually coded for participants’ location, time, and company. The same event descriptions are used as input for local LLMs, i.e., “deepseek-r1:7b” and “gemma3:4b”. The LLMs are instructed using elaborative system prompts to extract participants’ location, time, and company from the event descriptions. Agreement between the manually coded information and that extracted by the LLMs is calculated.

Results: Analyses are currently being undertaken, and results will be presented at the conference.

Conclusion: Our results provide insights into how accurately LLMs can extract contextual information from qualitative data gathered with ESM. This may inform the utilization of LLMs as automated data extraction tools in ESM research.

Can AI Judge Like Humans? A Psychometric Framework

Aaron Petrasch

LMU Munich, Germany

The use of artificial intelligence (AI) in psychological research is growing rapidly. When AI is employed to complement or replace human raters, researchers must reconsider both the conceptualization of a “rater” or "judge" and the psychometric requirements it must fulfill. In this talk, I present a framework for treating AI systems as judges and argue that their validation must extend beyond “accuracy” metrics, such as mere criterion correlations. I introduce a set of psychometric techniques to compare AI and human judgments, including an extension of the Brunswikian Lens model to analyze the cues used to infer judgments. Drawing on empirical results from text‑based AI judgments, I demonstrate the similarities and differences between human and AI judges, and offer several recommendations for evaluating the circumstances under which AI can complement or replace human judges. I conclude that integrating AI as judges requires methodological standards no less rigorous than those applied to human raters, thereby ensuring that AI‑derived assessments are valid, reliable, and transparent.

Diagnostic evidence from graphical responses – using synthetic data for vision models

Sonja Hahn¹, Leon Hammerla², Ulf Kroehne¹

¹DIPF, Germany; ²Goethe University Frankfurt, Germany

Machine learning methods allow the development of classifiers for extracting diagnostic evidence in constructed text and graphical responses. Human labeled training data may be expensive and hard to obtain and usually contains deviations between raters, resulting in respective inter-rater reliabilities and limiting the performance of machine learning approaches trained on the labeled data. Detailed scoring rubrics addressing both conceptual and realization variance, may not only enhance the inter-rater reliability, but also be used as a blueprint to generate artificial data for machine learning.

In the current contribution, we transfer a scheme known from the evaluation of text responses to graphic responses from a formative assessment. It is shown how classifiers can be trained with this synthetic training data analogue to the traditional paradigm and used for the evaluation of constructed graphical responses. The translation of the evidence rules into the algorithmic generation of synthetic training data makes these available in greater quantities. We outline strengths as well as critical points, lessons learned and directions of further research.

Mobile View Print View

Contact and Legal Notice · Contact Address:

fgme-berlin-2025{at}psychologie.fu-berlin dot

Privacy Statement · Conference: FGME Berlin 2025

17th Conference of the Methods & Evaluation Section of the German Psychological Society (DGPs)

September 28 - October 1, 2025 | Berlin, Germany

Conference Agenda

17th Conference of the Methods & Evaluation Section
of the German Psychological Society (DGPs)