DARIAH Annual Event 2025

The Past

Göttingen, Germany. June 17-20, 2025

JavaScript is Disabled
Your browser's JavaScript functionality is disabled. It has to be enabled to use this function of ConfTool.
Here you can find information on how to enable JavaScript
If you have any problems, please contact the organizers at annualevent@dariah.eu.

Conference Agenda

Overview and details of the sessions of this conference. Please select a date or location to show only sessions at that day or location. Please select a single session for detailed view (with abstracts and downloads if available).

Please note that all times are shown in the time zone of the conference. The current conference time is: 7th June 2025, 11:28:27pm CEST

Session Overview

Session

Demonstration Room

Time:

Thursday, 19/June/2025:

11:30am - 5:30pm

Session Chair: Kim Ferguson, DANS

Location: Taberna (Alte Mensa venue)

Ground floor, Wilhelmsplatz 3, 37073 Göttingen, Germany

Schedule for extra demonstration sessions will be posted closer to the AE

Session Abstract

Each Demonstration session will be 10 minutes in length

Presentations

Teaching the Past with Future Tools: Digital Humanities in Historical Education

Mojca Šorn¹, Neja Blaj Hribar¹, Ana Cvek¹, Ida Leonida Gnidovec¹, Vojko Gorjanc^1,3, Nataša Henig Miščič¹, Tjaša Konovšek¹, Tamara Logar¹, Katja Meden^1,2, Mihael Ojsteršek¹, Kristina Pahor de Maiti Tekavčič^1,3, Sergej Škofljanec¹, Robert Vurušič¹

¹Institute of Contemporary History, Slovenia; ²Institut Jožef Stefan, Ljubljana, Slovenia; ³Faculty of Arts, University of Ljubljana, Slovenia

The Research Infrastructure of the Institute of Contemporary History (coordinating entity of DARIAH-SI) has long offered practical training in areas such as DH and library and information science.

In 2024, a new initiative was launched in cooperation with the Department of History, Faculty of Arts, University of Ljubljana, which aimed at providing a structured training programme for students. The result was a comprehensive course focussing on 19th century research, combining in-depth lectures with practical training to enhance students' historical research skills, in particular with digital methods

The course was divided into four phases:

Introduction to Research Infrastructure: students learnt about the research landscape in Slovenia, focusing on the Institute’s role, covering the entire data cycle, metadata processing and publication on the SIstory portal.
Foundations of digital humanities: the next phase provided an overview of the field, covering its key definitions, methods and various outputs such as digital editions, databases, tools and corpora. Students were introduced to the digitisation pipeline, from raw content to structured corpora, along with basic XML and TEI encoding principles.
Hands-on training: practical sessions reinforced theoretical knowledge. Students worked with individual TEI XML files containing errors in the transcriptions, correcting them directly within the TEI framework. Additionally, they were introduced to noSketchEngine, where they followed search processes and explored linguistic data.
Individual research projects: in the final phase, students applied their newly acquired skills to individual seminars.

For their research seminar, they could choose from the following five different modules:

Carniola Regional Assembly (corpus Kranjska 1.0): this module introduced students to the stenographic records of the Carniola Regional Assembly and allowed them to analyse the debates by tracking themes like nationalism using key terms. A corpus-based analysis was to be contextualised with historical positioning, literature and newspaper sources.
Stenographic records of the First Yugoslavia (corpus yu1Parl 1.0 (1919-1939)): this module enabled research on specific topics and keywords within the parliamentary debates of the Assembly of the First Yugoslavia (1919-1939). Although this period extends beyond the 19th century—the course’s focus—it allowed for comparative analysis with other parliamentary corpora.
Population censuses: this module allowed students to familiarise themselves with censuses and the SIstory transcription tool by transcribing and analysing some of the data. The latter focuses on the methodology of processing census data and its applicability in historical studies.
19th and 20th century personalities: the module enabled students to study the 19th- and 20th- century personalities associated with the Carniolan or Yugoslav assemblies (up to the 1920s) by analysing their work using archival sources, newspaper archives and relevant corpora for additional context.
Content addition to the History of Slovenia – SIstory portal: students processed the publications assigned to them through the entire data workflow, from acquisition to publication on the SIstory portal. They also analysed the publication in a broader historical context, using newspaper archives and relevant literature.

Each student worked with two mentors proficient in historical research methodology and digital techniques, respectively. This ensured a well-rounded approach to theory and methodology, preparing students to integrate digital methods into historical research.

How to annotate those thousands of entities? Approaches to (semi-)automatic entity linking for scholarly editions.

Felix Helfer¹, Thomas Eckart¹, Uwe Kretschmer¹, Johannes Korngiebel², Martin Prell², Margrit Glaser²

¹Saxon Academy of Sciences and Humanities in Leipzig; ²Klassik Stiftung Weimar

Entity Linking is the resolution of entity mentions to appropriate entries in a knowledge base. It represents a valuable, albeit laborious enrichment for text-based data. Our poster presents work-in-progress experiments to annotate texts of the project PROPYLÄEN, which is a research project of the Saxon Academy of Sciences and Humanities in Leipzig in cooperation with the Klassik Stiftung Weimar/Goethe and Schiller Archive and the Academy of Sciences and Literature |Mainz, with linked entities and the practical application of this enriched data in an entity-based, federated search engine.

The poster introduces the PROPYLÄEN project, which merges biographical data of Johann Wolfgang von Goethe (letters, diary entries and other testimonials) from four different sub-projects, enriches them and makes them available digitally in a fifth sub-project – most prominently on the project's research platform (https://goethe-biographica.de/). In the digital texts, mentions of people and places should be linked to entries in the research database so:fie, preferably with automated or semi-automated processing. Initially, this will be tested for a subset of the data for which a register exists documenting all entities occurring in the text – but not directly annotating them.

The poster discusses experiments by the SAW Leipzig for the linking process for this subset of data. Entity linking can be divided into three sub tasks: First, the detection of entity mentions in the text, which is often achieved via named entity recognition. Second, a candidate search, meaning the preselection of a subset of suitable candidate entries in the respective knowledge base to shrink the search space. In this case however, the register already predefines the candidate sets. Third, the candidate disambiguation, which for every entity mention ranks the candidate entries to find the most likely link. For this disambiguation, several research questions are explored in experiments or previewed: Does a string-based matching (using edit distance or a similar metric) already give usable results? Can an embedding-based approach, drawing from additional information in the knowledge base, improve on it in a meaningful way? Can these embeddings be improved with additional information, e.g. from external knowledge bases like the Integrated Authority File / Gemeinsame Normdatei (GND) or Wikidata?

These experiments are relevant beyond the context of the PROPYLÄEN project: Especially in retrodigitized resources, there often are registers available for the relevant entities in a resource – but without the explicit annotations in the text itself. Therefore, an application for automatically annotating these resources – even with a possible human-in-the-loop quality assurance – could help enrich them in a more resource-efficient manner.

Finally, a practical application of EL-annotated data is introduced: the integration of the annotated resource into the EntityFCS – a federated, entity-based content search platform of the German research infrastructure consortium Text+. This allows users to query relevant text passages with entity IDs of the appropriate knowledge base – showcasing how these annotations can increase, among other things, the explorability and findability of a resource in the context of a research infrastructure.

Interdisciplinary Approaches in the Dariah.hub Poland e-infrastructure

Krzysztof Abramowski¹, Aleksandra Nowak¹, Marcin Heliński¹, Bartosz Szymendera¹, Tomasz Umerle², Tomasz Parkoła¹

¹Poznan Supercomputing and Networking Center, Poland; ²The Institute of Literary Research of the Polish Academy of Sciences, Poland

Dariah.hub (2024-2025) aims to deliver a new, integrative platform for DARIAH-PL: the Interdisciplinary Research Platform. Thanks to Dariah.hub, the Polish consortium is able to leverage the infrastructure of complementary and distributed laboratories set up in a previous Dariah.lab project (2021-2023) into a new service: a platform to integrate state-of-the-art digital methods from various disciplines of digital humanities. By uniting diverse disciplines, the platform expands research perspectives, encourages knowledge sharing, and fosters synergistic collaborations.

The platform is designed to facilitate interdisciplinary research by leveraging multidimensional data models, allowing for integration across spatial, temporal, and behavioral dimensions. This enables a more holistic approach to source material analysis, fostering connections between previously unlinked datasets and methodologies. Through advanced tools like Optical Character Recognition (OCR), Handwritten Text Recognition (HTR), and Named Entity Recognition/Linking (NER/NEL) – as well as a broader suite of research tools integrated at multiple levels, the platform empowers researchers to conduct cross-domain analyses with increased accuracy and speed. A core feature of this infrastructure is its modular architecture, which streamlines comprehensive data management of diverse files, including texts, images, and multimedia repositories relevant to multiple disciplines. The platform is integrated with multiple tools from Dariah.lab toolkit, enabling cross-referencing of various sources, ensuring interoperability across disciplines. These tools are complemented by collaborative workspaces, where secure, shared editing, and transparent version control enable seamless multi-author engagement. Moreover, automated workflows and data pipelines facilitate interoperability between different tools, ensuring that researchers can connect diverse datasets without disciplinary or institutional limitations. Open licensing frameworks support data sharing while reinforcing interoperability standards crucial for large-scale, cross-institutional investigations. This ensures that research outputs remain accessible to a wide range of stakeholders, including academia, cultural institutions, and commercial entities interested in digital humanities applications.

At the heart of this endeavor is a dynamic feedback loop between data providers (e.g., archivists, cultural institutions, field researchers) and data consumers (e.g., historians, sociologists, anthropologists, archaeologists). This iterative exchange drives continuous enhancement of curated datasets, simultaneously expanding the underlying knowledge graph. Drawing upon project partners long-standing tradition in high-performance computing, the platform’s computational backbone manages resource-intensive tasks – from large corpus analyses to 3D reconstructions of archaeological sites – opening new lines of inquiry for scholars across an ever-wider range of fields.

By integrating advanced digital tools with established scholarly practices, the Interdisciplinary Research Platform paves the way for explorations that transcend conventional academic boundaries. Archaeologists and anthropologists alike can combine textual and spatial data to delve into cultural patterns, while sociologists utilize networked archives to contextualize both historical and contemporary social phenomena. Ultimately, this collaborative environment not only broadens the scope of digital scholarship, but also showcases how genuine interdisciplinary synergy can deepen and enrich humanities research at large.

Enhancing the Digital Humanities Research in R: Accessing the Finnish Cultural Heritage Data through R Packages finna and finto

Akewak Jeba, Julia Matveeva, Leo Lahti

Turku University, Finland

The integration and analysis of cultural heritage metadata are very important for advancing research in the field of digital humanities.This demonstration presents two R packages, finna and finto designed to facilitate seamless access to Finnish cultural heritage metadata and ontological resources, thereby empowering researchers to conduct comprehensive analyses within the R environment.The finna package serves as an interface to the Finna API, aggregating content from Finnish archives, libraries, and museums. It enables users to perform targeted searches, retrieve metadata, and analyze a diverse array of cultural artifacts. For instance, researchers can explore historical documents, images, and audio recordings pertinent to their studies, streamlining the data acquisition process. Complementing this, the finto package provides tools to access interoperable thesauri, ontologies, and classification schemes across various subject areas via the Finto service and its Finto AI which is an automated subject indexing service. This functionality allows researchers to incorporate standardized vocabularies into their analyses, ensuring consistency and enhancing the interoperability of their research outputs.Through this demonstration, attendees will gain insights into the capabilities of these packages, including practical examples of metadata retrieval and analysis. The session aims to showcase how finna and finto can be leveraged to enrich digital humanities research, particularly in projects involving Finnish cultural heritage materials. By integrating these tools into their workflows, researchers can enhance the depth and scope of their analyses, fostering new perspectives and discoveries in the study of the past.

Introducing the new DARIAH-Campus Content Management System

Vicky Garnett

DARIAH-EU, Ireland

Since its launch in 2019, DARIAH-Campus has grown and become one of the prime destinations for reusable learning resources produced within the DARIAH ecosystem and beyond. The discovery platform now houses over 250 free, open asynchronous training and learning resources, including our own ‘captured event’ format, covering a broad range of digital-humanities related content including Feminism in DH, automated text recognition (ATR), performing arts, and open science.

In its initial stages, contributions were made exclusively using Markdown and the git ‘commit/push’ methods. This method required some existing knowledge of both Markdown syntax and the GitHub environment, or at the very least users needed to undergo a steep learning curve to get comfortable with their use. This became a barrier for many users, and also led to delays in publication of resources as errors and edits were inevitable. So in 2021, work began with a web developer in ACDH-CH, Vienna, on implementing a Content Management System (CMS) on top of GitHub using Netlify CMS with Vercel supporting deployments and previews. The content management system proved very popular with new and existing contributors, allowing them to see in almost real-time a preview of their resource as they made edits and changes.

Nearly 4 years on, we have found that the needs of the community were no longer met, as certain features are not possible within the Netlify CMS. Therefore we once again turned to our colleagues in ACDH-CH, Vienna, to develop a new content management system, this time using Keystatic. Keystatic is built with a modern, file-based approach that integrates seamlessly with Next.js and other modern web frameworks. It also provides a more flexible and extensible API, making it easier to customise and scale for different content management needs.

This demo will walk potential training content providers through the process of using the new content management system, with a demonstration of an example resource from the initial proposal stage up to the final publication stage. Users will also be able to make a start on their own resources, or ask questions for guidance if they are already working on a resource and need some assistance.

NewNa Segmentation App: An app to segment and dynamically interact with magazine pages

Tobias Johannes Kreten, Svend Thorbjörn Göke

Georg-August University Göttingen, Germany

This project, titled NewNa Segmentation App, evaluated the zero-shot capabilities of an existing Detectron2 model and developed an application to facilitate manual correction and expand train-ing data to improve model performance. The automatic segmentation of advertisements from historical newspapers and magazines presents a challenge for Optical Layout Recognition models trained on editorial newspaper pages. While models for this task are lacking, the Newspaper Navi-gator Model (Lee et al. 2020; Lee and Weld 2020) offers a promising approach for detecting vis-ual material in American historical newspapers. However, its effectiveness on other printed media, such as German cultural magazines from around 1900, remains uncertain. Well-segmented data is essential for further digital analyses, such as using multimodal models.

The Newspaper Navigator Model was tested on 1,789 pages from the German cultural magazine “Die Jugend.” A manually annotated ground truth dataset was created and converted into COCO format for consistency. The dataset was provided by our supervisor, Johanna Störiko, who anno-tated the data as part of her PhD dissertation at the Georg-August-Universität Göttingen, based on scans from the University Library Heidelberg (https://doi.org/10.11588/diglit.3565). Initial results showed that while the model could detect advertisements, its accuracy was only around 63%, measured using Intersection over Union (IoU) with a 0.5 threshold, along with precision and re-call, from which an F1-score was derived. This relatively low accuracy highlighted the need for a tool that enables efficient correction and annotation, reducing the effort required to generate high-quality training data. To address this, we developed an interactive segmentation application inte-grating the model with a MySQL database and providing an intuitive user interface. Figure 1 shows the database schema, which stores annotated advertisements and complete pages. Users can modify bounding boxes, delete incorrect predictions and add new segmentations.

To enhance usability, the application includes an interactive category legend, and a toggle func-tion between modes of operation. An Upload-Only mode lets users upload pages, segment them and download results as structured JSON files and segmented images. The Database mode, de-signed for large-scale dataset creation, enables direct storage of segmented advertisements with metadata. This structured approach supports systematic data curation and model improvements. The evaluation showed that integrating machine learning predictions into an annotation tool streamlines segmentation, even when model accuracy is suboptimal. By embedding automated suggestions in a friendly interface, annotation workload is reduced while generating high-quality training data in a still human-driven interpretation.

This application demonstrates how pre-trained models can be adapted and reused in different re-search contexts. By integrating model predictions into an annotation tool, time can be saved while generating labeled data for future model training. Unlike many tools, this application is tailored for advertisement segmentation, making it highly optimized for its use case. At the same time, its flexible architecture allows adaptation to other image segmentation tasks, provided outputs are structured in COCO format. This flexibility offers a promising avenue for further research in au-tomated document analysis and digital humanities. The NewNa Segmentation App is not just a technical innovation but a tool that enhances humanities research questions.

Towards interdisciplinary approaches to digital cultural heritage: GLAM Labs and data spaces

Alba Irollo¹, Gustavo Candela², Mahendra Mahey³, Katrine Hofmann⁴, Olga Holownia⁵, Nele Gabriëls⁶, Steven Claeyssens⁷

¹Europeana Foundation, Netherlands; ²University of Alicante, Spain; ³Tallinn University, Estonia; ⁴Royal Danish Library, Denmark; ⁵International Internet Preservation Consortium, USA; ⁶KU Leuven Libraries, Belgium; ⁷National Library of the Netherlands, Netherlands

At the intersection between the cultural heritage and research sectors, the increasing availability of digital cultural heritage data has supported new ways of producing knowledge, research, publishing results, and teaching in academic settings. The efforts put into the digitisation of their collections and capturing born digital archives, have led GLAMs (Galleries, Libraries, Archives and Museums) and other organisations such as universities to focus on many factors. These include the wider accessibility of the data and its reuse-related potential refining approaches to digital curation so that digital cultural heritage can support computational analysis, especially through Digital Humanities approaches. Emergent initiatives, such as Collections as Data, FAIR and the CARE principles for Indigenous data governance, have emphasized the need for best practice when making data available based on sets of principles (Padilla, 2023; Carroll, 2020). Moving from these principles, the International GLAM Labs Community has grown as a collaborative and interdisciplinary initiative aimed at promoting the publication, computational access and responsible use of data (Mahey, 2019; Candela, 2023; Candela, 2023b). It includes more than 90 institutions covering a wide diversity of domains from GLAMs and wider. In this context, collaborations with researchers and university support staff from different fields have become crucial factors in supporting the study of the past with a careful eye on the needs of the present and future.

- The European data space for cultural heritage fosters collaborations between the cultural heritage sector and academia, building on the experience of the Europeana Initiative and its long-standing partnerships with research infrastructures like DARIAH. These partnerships have led to joint efforts, especially in digital humanities. A new trend deserves attention in this context, and it is the increasing number of university courses focusing on digital cultural heritage across Europe. Acting as an observatory, the data space aims to foster exchanges around this topic, enhancing cooperation between the academia and cultural heritage institutions in the educational paths of the next generation of (digital) curators.

- GLAM institutions are exploring new ways to make their content available suitable for computational access. This section will introduce examples of publication and reuse of digital collections published by relevant institutions such as the Royal Danish Library and KU Leuven. It will include a description of challenges and opportunities illustrating their approach.

-The publication of digital collections suitable for computational use can be a difficult task for the institutions. Best practices and guidelines to publish digital collections suitable for computational use promote its adoption (Candela, 2023a). This section will introduce a selection of relevant and innovative initiatives that can be useful in this context. It will also provide examples of use based on a wide diversity of content such as GLAM datasets and Web Archives.

- Will highlight examples of use and lessons learned from his research with users of digital cultural heritage at the British Library and those providing services for the reuse of digital cultural heritage as data around the world in the GLAM Labs community.

Mobile View Print View

Contact and Legal Notice · Contact Address:
Privacy Statement · Conference: DARIAH Annual Event 2025