JavaScript is Disabled
Your browser's JavaScript functionality is disabled. It has to be enabled to use this function of ConfTool.
Here you can find information on how to enable JavaScript
If you have any problems, please contact the organizers at annualevent@dariah.eu.

Conference Agenda

Overview and details of the sessions of this conference. Please select a date or location to show only sessions at that day or location. Please select a single session for detailed view (with abstracts and downloads if available).

Please note that all times are shown in the time zone of the conference. The current conference time is: 21st Apr 2026, 03:56:37pm CEST

Agenda Overview

Session

Topic: From Speech to Symbols: AI Systems Transforming Cultural Knowledge

Time:

Wednesday, 27/May/2026:

4:30pm - 6:00pm

Session Chair: German Rigau, University of the Basque Country

Location: Aula Bisconti

Università degli Studi Roma Tre – Dipartimento di Studi Umanistici, Via Ostiense, 234, 00146 Roma RM, Italy

Presentations

4:30pm - 4:45pm

The Silicon Scribe: Auditing the Algorithmic Colonization of Cultural Heritage in Large Language Models

Agnieszka Blanka Ziemińska

Pontifical University of John Paul II, Krakow, Poland

As Large Language Models (LLMs) increasingly serve as the primary interface for public engagement with cultural heritage, the infrastructure of memory is undergoing a profound transformation. Digital Humanities scholarship has frequently championed the generative potential of AI to foster new dialogues. However, an urgent need exists to audit the interpretative capacity of these systems to silence historical voices. This paper presents a novel computational methodology - High-Dimensional Vector Space Analysis - designed to quantify and visualize the "hermeneutical colonization" of non-Western and ancient concepts by commercially dominant models.

The study investigates how LLMs, trained predominantly on modern, Western, democratic corpora, structurally flatten the radical political vocabulary of the ancient world into compliant, modern categories. Using the Greek New Testament as a diagnostic case study for contested cultural heritage, we measured the "Semantic Displacement" of key political terms - specifically Basileia (Empire/Rule) and Ekklēsia (Assembly) - against modern semantic anchors.

Our findings reveal a statistically significant "Depoliticization Bias" within the vector geometry of state-of-the-art multilingual models (e.g., paraphrase-multilingual-mpnet-base-v2). Preliminary data indicates that the vector for Basileia aligns nearly twice as closely with the modern, spiritualized concept of "Kingdom" (0.59 similarity) than with its historical, political counterpart "Empire" (0.25 similarity). Similarly, Ekklēsia maps structurally to "Religious Institution" rather than "Democratic Assembly," effectively erasing the text’s original civic agency.

This phenomenon exceeds simple translation error; it constitutes a structural failure of digital infrastructure. If our "Infrastructures of Engagement" rely on these models, we risk automating a "monoculture of memory" where the past is rewritten to mirror the present status quo.

This paper proposes that true participatory engagement requires "Counter-Infrastructures": open-access, critical auditing tools that allow communities to visualize and contest the biases encoded in AI models. We outline the development of an open "Vector Atlas" - a tool enabling scholars and the public to "see" the distance between their cultural concepts and the model's representation of them. By making these hidden algorithmic displacements visible, we empower users to transform from passive consumers of AI content into critical auditors of AI memory.

This contribution aligns with the conference theme of "building resilient infrastructure" by demonstrating that resilience requires semantic sovereignty - the right of historical and cultural communities to define their own concepts against the flattening pressure of global algorithmic norms. We conclude by offering a framework for "Critical AI Hermeneutics" as a necessary literacy for digital citizenship in an age of automated interpretation.

4:45pm - 5:00pm

The Bigger Picture: A Participatory Pipeline for Re-imagining AI Imagery through Art-Science Collaboration

Emma Clarke¹, Helen Sheridan², Nic Flanagan³

¹ADAPT Centre, Dublin City University, Ireland; ²ADAPT Centre, TU Dublin, Ireland; ³Munster Technological University, Ireland

Current socio-technical imaginaries of Artificial Intelligence (AI) are rooted in speculative fiction - think humanoid robots and glowing brains. These depict AI as an abstract, futuristic phenomenon rather than a ubiquitous element of contemporary life. When the public imagines AI as sentient robots or superintelligence, they often don't recognise everyday AI applications like the recommendation algorithms curating their social media feeds (Leufer et al.; Cave and Dihal; Dihal and Duarte; Singler; Sartori and Bocca).
The Bigger Picture [1] is a participatory art-science initiative in Ireland that counters these narratives by shifting focus from "sci-fi" abstraction towards representations of AI grounded in everyday applications. Through participatory educational workshops (Hu), the project demystifies AI through collaborative inquiry and creative practice.
This paper presents a case study of The Bigger Picture through the lens of knowledge co-production (Jasanoff), examining how participants' experiential knowledge of AI shaped thematic frameworks and visual outputs, generating alternative ways of understanding emerging technologies that challenge established narratives.

The participatory pipeline comprises:
1. Participatory Educational Workshops function as co-production environments where participants explore AI through interactive activities focused on lived experiences. By introducing Explainable AI concepts (Ali et al.) alongside ethical issues such as algorithmic bias (Romele; Dubber et al.), workshops draw on participants' everyday AI experiences - from social media algorithms to voice assistants - as situated knowledge (Haraway) that challenges dominant visual stereotypes (Dihal and Duarte).

2. Thematic Translation: Workshop insights are synthesised into co-produced themes (e.g. AI is All Around Us, AI is Human, AI is Complex) emerging from participant deliberation rather than expert pre-determination. These inform a curated call for artistic submissions reflecting everyday AI rather than sci-fi tropes.

3. Exhibition and Digital Resources: The pipeline culminates in public exhibitions. Selected images are integrated into the Better Images of AI library [2] under a Creative Commons CC4.0 license.
The pipeline generated measurable impact: workshops engaged 79 participants (24 artists/technologists during Science Week [3]; 55 youth participants via Cruinniú na nÓg [4]), and yielded 20 exhibited works viewed by 1,913 visitors. Eleven images entered the Better Images of AI library and have since been reused in 90+ outlets including the Alan Turing Institute and international policy papers, demonstrating how local engagement can lead to open public resources.
The Bigger Picture demonstrates key insights about participatory infrastructure for emerging technologies. First, grounding technologies in everyday experience enabled participants to contribute situated knowledge that experts alone could not generate. Second, translating public insights into creative constraints shaped artistic outputs through collective inquiry. Third, CC licensing transformed ephemeral engagement into durable infrastructure: 90+ reuses suggest participatory projects can generate sustained public value when outputs are contributed to commons. This demonstrates how co-produced research conducted 'in conversation with' publics (Fitzpatrick) can inform engaged scholarship on emerging technologies.

[1] The Bigger Picture: https://thebiggerpictureai.com/
[2] Better Images of AI Image Library: https://betterimagesofai.org/images
[3] Science Week takes place nationwide annually in November Ireland: https://www.scienceweek.ie/
[4] Cruinniú na nÓg is a day of free creativity for young people that takes place nationwide annually in Ireland in June: https://cruinniu.gov.ie/

5:00pm - 5:15pm

Digital Humanities Hub: from silos to knowledge graphs and Neurosymbolic AI

Miquel Centelles Velilla, Núria Ferran-Ferrer

Universitat de Barcelona, Spain

This paper presents the Digital Humanities Hub at the University of Barcelona, an institutional initiative designed to consolidate dispersed research databases, strengthen public-facing digital infrastructures, and foster engaged scholarship. The Hub responds to a structural problem widely recognised in the humanities: while project-level datasets are abundant, specialised, and often publicly funded, they frequently remain fragmented, inconsistently documented, and at risk of long-term loss. Many originate from short-term projects, theses, or individual initiatives lacking sustainable maintenance, resulting in valuable knowledge being locked in silos and inaccessible to wider communities.

The Hub addresses this challenge by developing an interoperable, open, and socially oriented infrastructure grounded in three pillars: (1) standardisation and FAIR/LOD compliance, (2) knowledge-graph integration supported by neurosymbolic AI, and (3) participatory, human-centered design. Together, these elements aim to create an environment where datasets can be connected, explored, enhanced, and reused, enabling new research possibilities while expanding public engagement with humanities knowledge.

The first pillar focuses on harmonising data practices across the institution. Building on European standards and open data methodologies, Hub proposes a phased model for improving data quality and interoperability: inventory and localisation of existing resources; refinement and normalisation to address errors, inconsistencies, and divergent formats; and the adoption of FAIR and Linked Open Data principles. Through the use of ontologies, controlled vocabularies, and persistent identifiers, the infrastructure ensures semantic coherence and facilitates long-term accessibility via RDF- and SPARQL-based technologies. This systematic approach supports institutional goals by preventing the abandonment of smaller but culturally valuable datasets and by promoting sustainable, connected stewardship of humanities data.

The second pillar integrates knowledge graphs and neurosymbolic AI to enable reliable and transparent information retrieval. A multi-layered knowledge graph connects heterogeneous institutional databases with external resources such as Wikidata, supporting both human-readable exploration and machine reasoning. A Retrieval-Augmented Generation (RAG) layer, grounded in the curated graph, provides explainable generative outputs and mitigates hallucinations and biases typical of large language models. This hybrid approach transforms isolated datasets into interconnected semantic ecosystems, enabling more nuanced and relational analyses that reflect the complex structures of cultural and historical knowledge.

The third pillar centres on engaged, participatory methodologies. Researchers, students, librarians, Wikimedia contributors, and community stakeholders take part in co-design processes that shape the ontology, user interfaces, and interpretative layers of the graph. This human-centered dimension ensures accessibility, relevance, and ethical representation, especially important for datasets that document sensitive histories or historically marginalised groups. Participation is understood not only as a methodological choice but as a commitment to inclusive knowledge creation and responsible digital humanities practice.

Beyond technical innovation, the Hub foregrounds the institutional and societal value of digital research infrastructures. By adopting social-return-on-investment perspectives, it frames humanities data as a public good with significant cultural, educational, and civic impact. Transforming scattered datasets into a connected, participatory knowledge infrastructure demonstrates how universities can align open data practices, AI technologies, and public engagement to strengthen their role as democratic knowledge institutions.

This contribution argues that hybrid infrastructures are essential for advancing socially responsive, open, and resilient digital humanities.

5:15pm - 5:30pm

FrWhisper – An Open-Source, Verbatim Automatic Speech Recognition System for Older French Speakers

Hanno Müller¹, Annette Gerstenberg²

¹Hasso Plattner Institute, Germany; ²University Potsdam, Germany

Despite their usefulness, AI-supported automatic speech recognition (ASR) systems such as Whisper (Radford et al., 2022) often do not meet crucial academic standards of the humanities. First, Whisper-based transcripts reflect an idealised language, adapted to the standard written language through additions (e.g., of grammatically mandatory words) and omissions (e.g., of repetitions, interjections, and discourse markers, Lea et al., 2023). This increases the need for ASR systems producing non-idealised transcripts. Such ASR systems are currently available only for English (Wagner et al., 2024), reinforcing a language bias in the digital humanities. Second, many ASR systems exhibit a pronounced age bias (Fukuda et al., 2023), reflected by less accurate transcriptions for older speakers. Idealised language transcription and age bias impose severe limitations for the analysis of the procedural nature of spoken languages in the humanities and cultural heritage preservation, frequently relying on data from older speakers.

Against this background, this paper presents FrWhisper, an open-source ASR model for French that produces non-idealised transcripts and is specifically optimised for older speakers. FrWhisper is based on Whisper and was fine-tuned using two datasets from fieldwork corpora featuring spontaneous speech. Both datasets dominantly comprise speech from older speakers, from Orléans and its surroundings: The LangAge corpus (www.langage-corpora.org) consists of biographical interviews with mostly retired speakers (mean age = 80.1, SD = 9.3); the Enquête Sociolinguistique sur Orléans (eslo.huma-num.fr) is a large-scale interview corpus (mean age = 51.0, SD = 15.6).

Empirical evaluation shows that FrWhisper substantially outperforms Whisper, reducing Word Error Rate from 98.98% to 84.64% on a held-out validation set. Qualitative analyses further demonstrate that FrWhisper more reliably preserves linguistically meaningful features of spoken French, including interjections (e.g., euh), ne-deletion (e.g., c’est pas vs. ce n’est pas), and repetitions.

FrWhisper enables new forms of collaboration between academia, archives, libraries, and the public. It supports participatory research practices such as community-driven oral history projects, citizen-science annotation workflows, and public humanities initiatives that seek to make spoken cultural heritage accessible without erasing linguistic diversity. FrWhisper is openly available under the GNU General Public License v3.0, enabling reuse by researchers, memory institutions, and citizen-science initiatives. Its open documentation (Müller & Gerstenberg, 2025) further serves as a blueprint for developing verbatim ASR systems for other languages, aligning with DARIAH’s commitment to sustainable, ethical, and resilient research infrastructures.

References

Fukuda, M. et al. (2023). A new speech corpus of super-elderly Japanese for acoustic modeling. Computer Speech & Language, 77, 101424. https://doi.org/10.1016/j.csl.2022.101424

Lea, C. et al. (2023). From user perceptions to technical improvement: Enabling people who stutter to better use speech recognition. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems (pp. 1–16). New York, NY: ACM.

Müller, Hanno & Gerstenberg, Annette. (2025): Transkription mit Fine-Tuned Whisper-Modellen. AI Service Centre. https://github.com/aihpi/pilotproject-FrWhisper

Radford, A. et al. (2022). Robust speech recognition via large-scale weak supervision. In Proceedings of the 39th International Conference on Machine Learning. https://doi.org/10.48550/arXiv.2212.04356

Wagner, L., Thallinger, B., & Zusag, M. (2024). CrisperWhisper: Accurate timestamps on verbatim speech transcriptions. arXiv preprint arXiv:2408.16589.

DARIAH Annual Event 2026

Rome, Italy. May 26–29, 2026

Conference Agenda