Conference Agenda

Session

Poster Session

Time:

Thursday, 19/June/2025:

9:30am - 11:00am

Session Chair: Alexander Steckel, Göttingen State and University Library
Session Chair: Stefan Buddenbohm, Göttingen State and University Library

Location: Emmy-Noether Saal (Alte Mensa venue)

Ground floor, Wilhelmsplatz 3, 37073 Göttingen, Germany

Presentations

Workers’ Voices in the Digital Age: A Newspaper-Based Digital Collection on Portuguese Self-Management Movement

João Pedro Oliveira

Faculdade de Ciências Sociais e Humanas, Universidade NOVA de Lisboa, Portugal

Following the Portuguese Carnation Revolution (1974), workers autonomously organised in response to the severe economic and financial crises inherited from Estado Novo dictatorship. (Fontes & Cabreira, 2020) Without trade union support, they occupied and self-managed workplaces to improve living and working conditions. (Fontes & Cabreira, 2020)

This digital collection preserves news items from two historically significant Portuguese newspapers, Diário de Lisboa and Combate. Its primary objective is to foreground workers' arguments for workplace occupations while providing insights into their organisational structures, dynamics, and fluidity. By centring workers' perspectives, the collection contributes to a more nuanced understanding of their experiences. Informed by methodologies from various Digital Humanities Conferences, the project aligns with established best practices in the field. (Terrón Quintero et al., 2023)

The collection's development follows two key methodological approaches.[1] First, a critical analysis of newspapers extracts articles thematically relevant to the self-management movement. Second, to enhance accuracy and efficiency, PDF newspaper files are processed using the OCR tool, MasterPDF Editor, rendering documents searchable and facilitating keyword identification.[2] Initially, 316 news items from Diário de Lisboa were compiled into a dedicated spreadsheet[3], which then underwent refinement to ensure consistency and correct minor syntactical errors.[4] The final curation phase involved critically proofreading each item to maintain content quality available to the public.

This project highlights the notable lack of archival development concerning this movement. In the absence of a stable institution dedicated to preserving its memory, both academics and the public encounter significant challenges in accessing historical information. The project aims to facilitate scholarly and public engagement by developing the first fully dedicated digital collection on the Autogestão (Self-Management) movement in Portugal and centralising research within a unified digital platform.[5] By encouraging users to explore the collection and conduct further investigations across archival networks and research institutions, this initiative contributes to a broader understanding of the movement's historical significance. (Sinn, 2012)

Implementing the digital collection fosters engagement with academia and the wider public. By establishing a centralised platform, this project serves as a foundational resource for researchers, ensuring accessibility to primary materials. Additionally, future collaboration with archival institutions promotes research cooperation and advancing knowledge of self-management movements. Through this initiative, users are encouraged to critically examine the materials and contribute to expanding the historiographical discourse surrounding workplace occupations. By structuring and this collection, the project enhances both academic research and public awareness, reinforcing the importance of digital humanities in historical preservation.

[1] The methodological approaches are well defined within the “Development Stages of a Digital Collection”.

[2] To gain insight into the volume of extracted news items, consult the following URL: https://docs.google.com/spreadsheets/d/1SdDptcbeO2P5dl4Emz9-mXA-B8_HqnD4iKL5AVXFNDs/edit?usp=sharing

[3] To gain insight into the volume of extracted news items, consult the following URL: https://docs.google.com/spreadsheets/d/1C3YOPINsrzw1HmiwDfVuEHqoCkMBqVmWLlTU_CTYnKI/edit?usp=sharing

[4] To gain insight into the data refinement process, consult the image “Data Refinement through OpenRefine”.

[5] To understand how this digital collection will serve as a starting point in the academic research process, consult the image “Workers’ Voices in the Digital Age: A Newspaper-Based Collection on Portuguese Self-Management - A starting Point”.

Of Yak Shaving and Data Taming: Building an RDF ETL Pipeline for the CLSCor Graph

Lukas Plank, Katharina Wünsche

Austrian Academy of Sciences, Austria

The poster shall present our experiences and insights in building an RDF ETL Pipeline for the CLSCor Knowlegde Graph in the context in the CLS INFRA project. See the PDF abstract.

Shared History, Shared Data: Unlocking World War II Victim Databases for Public Engagement

Mojca Šorn¹, Marta Rendla¹, Andrej Pančur¹, Tamara Logar¹, Vid Klopčič², Matevž Pesek², Katja Meden^1,3

¹Institute of Contemporary History, Slovenia; ²Faculty of Computer Science, University of Ljubljana, Slovenia; ³Institut Jožef Stefan, Ljubljana, Slovenia

This contribution discusses the development of the research database entitled Victims Among the Population in the Territory of the Republic of Slovenia During and Immediately After the Second World War, developed within the SIstory portal. The collection is a systematic record of military and civilian persons who had the right of residence in the present-day Republic of Slovenia during the Second World War and the immediate post-war period (May 1940 – January 1946) and lost their lives due to wartime and (revolutionary) post-war violence or the consequences of war. Currently, there is data for more than 100,000 victims, representing 6,7% of the population at the time. Each victim’s identity is documented through personal data and information on the circumstances of death, comprising a total of 25 metadata fields.

The database is the result of research conducted by the Institute of Contemporary History between 1997 and 2012 as part of four major research projects. Originally, the database was designed to compile data from various historical sources while ensuring the accuracy and veracity of records through rigorous verification. However, due to the sensitive nature of the information and the ongoing war- and ideologically-charged discourse surrounding the WWII in Slovenia, only partial data was made publicly accessible in the early project phases. Specifically, details on the death of victims and status classification were collected but omitted from the publicly available records.

Recent legislative changes and the commitment to open research and public engagement prompted a shift toward greater accessibility. As a result, the database has been redesigned to not only provide unrestricted access to previously limited data, but also to enable public participation. The updated version now allows users to contribute additional information, comments and personal narratives within designated layers, promoting a more comprehensive and collaborative approach to historical documentation.

User registration is required via phone number and can then enrich the existing dataset either by correcting existing records (with support from relevant literature and sources if available) or by entering data about a new victim. The structured layers within the database ensure a clear distinction between data verified by the Institute of Contemporary History (original database) and contributions of individual users or affiliated institutions.

This project contributes to preserving historical memory and increasing transparency through open data and citizen participation. By enabling the public the opportunity to provide additional information and corrections, it promotes a more comprehensive and inclusive record. The transition to open access supports both scientific research and broader public engagement. Combining verified data with user contributions, the database provides a balanced approach to documenting a complex historical period while fostering collaboration in historical research.

Threads of the Past: Exploring Open Digital and Manually Extracted Data to Visualize Social Networks in María Lejárraga’s Legacy (1874–1974)

Dolores Romero López, Patricia García Sánchez-Migallón

Universidad Complutense de Madrid, Spain

From the perspective of content, this project examines María Lejárraga's
contributions to feminism, modernism, and theatre during Spain's Silver Age,
focusing on the social networks she engaged with and fostered from 1900 to 1936.
It highlights her ability to build connections in intellectual and social spaces despite
the constraints of her historical context. Lejárraga's role as a prominent yet often
overshadowed author due to her husband's influence, Gregorio Martínez Sierra, is a
central theme. The study concludes that her intellectual and social networks were
pivotal to her emancipation and the advancement of feminism in Spain.
This project offers a comparative analysis of manually extracted data from historical
printed sources and open digital datasets made available by national and
international institutions. By assessing both types of data, we explore their potential
contributions to a social network visualization project while uncovering critical
disparities in data reliability, completeness, and accessibility, emphasizing the
necessity of expert-curated data for rigorous humanities research.
In the specific case of this poster, the data and metadata provided by public
institutions regarding the life and literary production of the Martínez Sierra spouses
have been tested. These institutions include the Biblioteca Nacional de España, the
Residencia de Estudiantes, the Biblioteca Virtual Miguel de Cervantes, HathiTrust,
and the Revistas Culturales 2.0 Web. We also try to downlow data from Wikidata,
but the quantity of the data was insufficient. The aim was to verify that these
institutions do not provide the necessary open digital data to analyse socialization
networks. We were only able to download open data from the Biblioteca Digital
Mnemmosine, available on GitHub and Zenodo. Thanks to this open data and the
raw data from expert bibliography (see below), we have been able to create our own
database, which can serve as a foundation for future research on other authors of
the so-called Silver Age of Spanish literature (1900-1939).
The project also proposes a workflow that directly relies on European
infrastructures such as DARIAH. First, to find a useful social network visualization
tool, we searched in the SSHOC Open MarketPlace and found Gephi as a very
powerful option. On the DARIAH Campus, we found many training materials to learn
how to use the tool, and after data curation and visualization creation, the resulting
dataset was uploaded to the Zenodo repository and then linked back to the SSHOC
Open MarketPlace. In this way, we ensure the circularity of the research, the
application of FAIR principles, and our commitment to Open Science. In the future,
we will create a workflow in SSHOC with these best practices, thanks to the
feedback we hope to receive at the DARIAH Annual Event 2025 and publish the main
results in Open Access.

A Human- and Machine-Readable Thesaurus for the Conservation of Archaeological Heritage - Development, Technical Implementation and Application in digital space

Kristina Fischer, Lasse Mempel-Länger

Leibniz-Zentrum für Archäologie (LEIZA), Germany

Access to the latest research on examination methods, damage phenomena, conservation techniques and materials, as well as preventive measures is essential for conservators. The evolution of conservation ethics, utilised materials and techniques over time provides valuable insights into the challenges and decisions faced by previous generations of conservators. By studying historical conservation practices, we can critically assess their long-term effects, learn from past experiences, and refine current approaches. However, a major issue in leveraging historical conservation knowledge is the scattered and inconsistent nature of the data landscape. Conservation records are stored in different formats, distributed across various databases, and described using heterogeneous terminologies. Experts from different periods and disciplines have used distinct terms for similar concepts, creating inconsistencies that hinder data retrieval, comparison, and interdisciplinary collaboration. The lack of standardised terminology not only complicates historical analysis but also impedes the development of interoperable digital resources for archaeological conservation [1-4].

To address these challenges, the "Conservation and Restoration Thesaurus for Archaeological Cultural Heritage" was developed at the Leibniz-Zentrum für Archäologie (LEIZA) as part of NFDI4Objects [5]. One of the key objectives of this thesaurus is the systematic extraction of used terminology from digital and retro-digitised conservation reports from approximately 170 years of conservation history at LEIZA (formerly Römisch-Germanisches Zentralmuseum), supported by technical innovations like Natural Language Processing (NLP). By identifying hierarchical, equivalent, and associative relationships amongst the terms, the thesaurus enables a systematic connection between past and present knowledge, ensuring that past experiences remain accessible and reusable for future generations. This controlled vocabulary not only facilitates human communication by standardising terminology but also enables machine-readable, semantically structured data integration for the Semantic Web [4]. Based on the Simple Knowledge Organization System (SKOS) [6], the thesaurus ensures interoperability and follows the FAIR principles (Findable, Accessible, Interoperable, Reusable) [7]. The content classification also aligns with ISO and DIN standards [8, 9].

To support the creation of this vocabulary, a user-friendly web application was developed [10]. This application validates tabular vocabulary data against the SKOS-based schema, identifying errors like duplicate identifiers or incorrect hierarchies. It provides multiple visualisation options for hierarchical structures and enables collaborative content development through an interactive comment function. The validated data can be exported as an RDF turtle or JSON file and integrated into central thesaurus repositories such as DANTE [11] and SkoHub [12] or Linked Open Data platforms like Wikidata [13]. The collaboration between domain experts and software engineers ensured both user-friendliness for professionals and machine readability for digital applications.

At the DARIAH Annual Event 2025, the methodological and technical aspects of the thesaurus development will be presented in a poster session. The presentation will highlight the practical benefits of structured vocabularies, with a particular emphasis on how standardised terminology can help unlock conservation knowledge of the past. The official release of the Conservation Thesaurus version 1.0 is anticipated before the conference, showcasing how semantic technologies can contribute to preserving and enhancing knowledge, as well as overcoming long-standing challenges in communication and data integration within the field of tangible cultural heritage.

Finding Long-Term Solutions for GRETIL, a Large Indologist Corpus

David Herting¹, José Calvo Tello¹, Maximilian Mehner²

¹Georg-August-Universität Göttingen, SUB Göttingen; ²Philipps-Universität Marburg

Many digital pioneers in the humanities who started in the 1990s and 2000s are now struggling to keep up with the current digital world. Not only are expectations increasing, but many projects are finding it difficult to maintain their original functionalities. After 20 years or more since their inception, the difficulties are not only technological, but also a lack of funding, diminished enthusiasm and the fact that the original leaders are no longer active and some have passed away.

One example of this is GRETIL, a collection of digital texts developed between 2001 and 2020. This resource is the largest repository of machine-readable Sanskrit texts and includes texts in other Indian languages. The corpus remains popular with scholars for quick reference and text mining, and has been incorporated into several ground-breaking digital humanities projects in Indology.

Although GRETIL relied on TEI to encode its texts in the final phase of the project, the project found ad-hoc solutions for many other issues, such as its own website, its own conversion system to HTML and plain text, its own collection of secondary literature in PDF, and even its own OPAC. Not least due to its early development, at a time when most suitable e-texts were not encoded in Unicode, a major technological update was inevitable after its founder Reinhold Grünendahl retired in 2016.

In 2022, the Text+ consortium was launched as part of the German National Research Data Infrastructure (NFDI) initiative. The main objective of the consortia in this initiative is to ensure the long-term accessibility of research data, to integrate existing solutions and, in general, to improve the FAIR status of the resources. A user story by Buchholz suggested the integration of GRETIL into the Text+ portfolio. As part of the new developments of the TextGrid repository and the integration of existing corpora, we decided to publish the already converted TEI documents in this repository. We are also working on the transformation of HTML into TEI and on improving the quality of the metadata and thus its FAIR status, e.g. by using terms from the Authority control system of the German-speaking (GND). Other components of GRETIL will be published in other repositories (eDocs and DARIAH-DE Repository).

In its environment in TextGrid, GRETIL will offer new possibilities to comfortably search and compare all texts of the collection. The keywords and categories (languages, genre, religious affiliation) now standardized will be available as individual filters for the whole corpus, allowing a more flexible filtering and querying of the corpus.

Some aspects of the GRETIL will remain as they currently are. This means that the imbalances GRETIL exhibits in certain areas, e.g. the ratio of Sanskrit to Prakrit or Tibetan texts, will carry over. However, the new environment will make it easier for new projects to expand or enrich the text material in the future, thus affording the opportunity for further revitalisation of this text corpus.

From Folklore Collections to Digital Research Infrastructures: Expanding Access, Engagement, and Analysis

Sanita Reinsone¹, Line Esborg¹¹, Terry Gunnell¹³, Kati Kallio^2,3, Kyrre Kverndokk⁵, Sandis Laime⁹, Will Lamb⁷, Angun Sønnesyn Olsen⁵, Fredrik Skott⁶, Asta Skujytė-Razmienė⁸, Tim Tangherlini¹², Ida Tolgensbakk¹⁰, Mari Väina⁴, Viesturs Vēveris¹

¹University of Latvia (LV); ²Finnish Literature Society; ³University of Helsinki (FI); ⁴Estonian Literary Museum (EE); ⁵University of Bergen (NO); ⁶Institute for Language and Folklore (SE); ⁷University of Edinburgh (UK); ⁸independent scholar (LT); ⁹Institute of Literature, Folklore and Art of the University of Latvia (LV); ¹⁰Norwegian Museum of Cultural History (NO); ¹¹University of Oslo (NO); ¹²University of California (US); ¹³University of Iceland (IS)

Since the early 20th century, folklore archives have served as fundamental research infrastructures – systematically documenting oral traditions, narratives, and cultural expressions. Built through long-term collection efforts – often with public participation – these archives have been instrumental in the development of folklore studies, ethnology, and related disciplines. Despite evolving in response to disciplinary, methodological and institutional transformations, folklore archives remain vital and living repositories of cultural heritage, while the advent of digital technologies has profoundly expanded access, enhanced research potential, and reshaped their workflows and functions.

Over the past decade, large-scale digitization advancements have given rise to multifaceted digital platforms such as dúchas.ie (Ireland), Folke (Sweden) sok.folke.isof.se, garamantas.lv (Latvia), samla.no (Norway), sagnagrunnur.arnastofnun.is (Iceland), Danish Folklore Nexus scando.ist.berkeley.edu, Dutch Legend Database (Netherlands) verhalenbank.nl, kivike.kirmus.ee (Estonia), and tautosakos-rankrastynas.lt (Lithuania) exemplify this shift – providing structured access to historical and contemporary folklore materials.

Collaborative projects strengthen interoperability across linguistic and national boundaries. The ISEBEL project[3], an intelligent cross-collection search engine for belief legends, and FILTER [1], a research environment for analyzing poetic text variation, are pioneering initiatives enabling comparative folklore research across linguistic and national boundaries.

A defining feature of digital folklore archives is the integration of citizen science and crowdsourcing, enhancing accessibility while keeping folklore collections dynamic. Volunteers are involved in manuscript transcription, increasingly aided by AI-powered HTR tools, as seen in Ireland duchas.ie/en/meitheal, Latvia lfk100.garamantas.lv, and Sweden sok.folke.isof.se. Other platforms, such as minner.no (Norway) minner.no and kratt.folklore.ee (Estonia) kratt.folklore.ee, facilitate knowledge exchange. Some initiatives support artistic reinterpretation of archival content, such as “Sing with the Archives” (Latvia) dziedi.garamantas.lv and Lithuania’s “To Remember Me By” project.

A key advantage of folklore digitization is the ability to extract large-scale datasets, unlocking new possibilities for computational folkloristics. These datasets reveal structural patterns, narrative evolution, and cross-cultural connections while offering deeper insights into how folklore collections were formed and curated. E.g., FILTER project [1] exemplifies this potential by applying computational methods to extensive Finnish-Estonian folksong corpora, while the ISEBEL project [3] demonstrates how to enhance comparative folklore research through automated translations and metadata-driven analyses. The use of large language models in Scottish Gaelic storytelling [2] demonstrates how synthetic text generation can expand training data for speech recognition, further supporting low-resource language technologies and computational folklore studies.

This poster presents the development of digital folklore archives as emerging digital research infrastructures – emphasizing the critical need for cross-border and multilingual integration. Emerging large language models further accelerate this shift – enabling automated processing and translation of diverse folklore corpora, including under-resourced and endangered languages. These innovations enhance usability, support archival curation and public engagement, offering new opportunities for research and international collaboration.

References

[1] Janicki, M. et al.(2024). Developing a Digital Research Environment for Finnic Oral Poetry. BJMC, 12(4),535–547. https://doi.org/10.22364/bjmc.2024.12.4.15

[2] Lamb, W. et al.(2025) Synthesising a corpus of Gaelic traditional narrative with cross-lingual text expansion. Celtic Language Technology Conference 5,12–26. ACL Anthology.https://aclanthology.org/2025.cltw-1.2.pdf

[3] Meder, T. et al.(2023). The ISEBEL Project: Collecting international narrative heritage in a multilingual search engine. Fabula 64,1–2. https://doi.org/10.1515/fabula-2023-0006

LLM-based geospatial data extracting: A case study based on travel literature

Dolores Sáez, Pilar Escobar, Manuel Marco-Such

University of Alicante, Spain

Cultural heritage institutions, commonly known as Galleries, Libraries, Archives, and Museums (GLAM), are exploring new ways to provide a richer experience of accessing and exploiting their collections for humanities researchers, thus encouraging their reuse not only of metadata but also of content. Several initiatives, such as Labs, are based on the creative and innovative reuse of materials published by these institutions, complying with the FAIR principles (Findable, Accessible, Interoperable, and Reusable).

The visualisation of collections within cultural heritage institutions is evolving quickly, driven by the need to improve the user experience and leverage available digital collections. Nevertheless, significant challenges remain in order to achieve easier and more efficient access to digital resources.

Artificial intelligence (AI) has emerged as a highly powerful resource in cultural heritage. Traditional Named Entity Recognition (NER) processing methods can be substantially improved through AI advances. For example, the learning process has been simplified by reducing supervised training. The innovation of this method is the application of Large Language Models (LLM) to extract geospatial information from textual sources, thereby improving efficiency and expanding the scope of analysis in this field.

The primary objective of this study is to validate an automated system based on LLM, designed to extract geographical information from travel literary texts and subsequently render it graphically on an interactive map.

The main contributions of this work are: (a) extract the different places located in the text of the work which may be towns, cities, monuments, squares, or streets, (b) georeferencing travel literature text content through open data, and (c) visualisation on maps.

“The Atlas of the Holocaust Literature” - mapping the ghetto experience.

Kajetan Mojsak, Paweł Rams

The Institute of Literary Research, Polish Academy of Science (IBL PAN), Poland

“The Atlas of the Holocaust Literature – Warszawa/Łódź” created and developed in the Institute of Literary Research of the Polish Academy of Sciences in Warsaw by the Department of Digital Editions and Monographs, the Research Group for Holocaust Literature under direction of professor Jacek Leociak and the scholars from the Center of Jewish Research at the University of Łódź. The project fuses such fields as digitalization, documentary work, popularisation of knowledge about the history of the ghetto, Holocaust studies, as well as urban studies.

The aim of the project is to collect written sources, describing the experience of life in ghettos, with the focus on the spatio-temporal dynamics of the events. Its principle is to use the digital possibilities to narrate the story through the prism of topography, using the form of interactive digital maps. The exact moment in time (the stage of the Holocaust) and the space (author’s whereabouts) are interconnected and co-create a specific “space-time” of the texts, the topographic and chronological grid, which determines the type of experience, (and often - conditions of survival).

It is a truism to say that literary texts and memoirs are historical sources (Ankersmit 2001; Stefanowska, Sławiński 1978; White 2004). However, using both memoirs or diaries and literary works as sources of knowledge about the past poses many questions. The difficulty of working with this kind of texts as historical sources is perfectly illustrated by the work on the creation of the Atlas of Holocaust Literature.

Comparison between various testimonies created over the years (from testimonies written on a daily basis in the ghetto, to those written down decades later) sheds a light on the mechanisms of memory and their role in processing the past and creating memories.

To the problems with the mechanisms of memory and those of literary genres one should also add the issues of the emotional approach to the experienced events. This raises additional research questions for both the texts analysed and the way they are represented on the map: how to deal with emotional description of experience and transform it into the language of a map? How to incorporate this specific experience into the already existing structures of the project, where the texts developed so far are referential to the described space? We should also ask how to use this type of testimony in a digital project, whose narrative structures are sometimes not flexible enough, without losing their uniqueness.

eManuSkript: Developing Tools for Digital Manuscript Literacy

Jeremy Thompson, Mohamed Basuony

Institute for Digital Humanities, University of Göttingen, Germany

As in other areas of the historical humanities, the field of medieval studies has been rapidly transformed by a growing storehouse of digitized manuscripts. Innovative research tools have not merely facilitated the study of original medieval documents, but have also brought new research questions to light. Image enhancement on photographs or scans have exposed “invisible” features latent in digital data; non-invasive image captures can document written texts trapped in book linings, delicately wrapped around saints’ bones, or pasted and varnished inside of violins; script layers in palimpsests can be visually separated; manuscript fragments scattered globally across disparate repositories can be united in virtual ensembles where physical reconstitution is impossible. Although it may seem counterintuitive, this digital turn has been accompanied by a material turn, a profound reflection on the experiential and cognitive stakes of medieval media in their material reality.

In this context, the demand for manuscript literacy is arguably higher than ever––and indeed for a literacy surrounding the digital objects documenting medieval manuscripts. The eManuSkript project at the Institute for Digital Humanities, Göttingen, has received funding for two years from the Stiftung Innovation in der Hochschullehre in order to satisfy this demand. A collaborative undertaking between students, teachers, digital humanities scholars, and software programmers, the project is developing a suite of web-based apps and tutorials that will serve as teaching and study tools. This poster will present the project’s plans for building a first-stop portal for studying a medieval manuscript in light of the so-called auxiliary sciences: palaeography, codicology, and bookbinding. One tool will automatically detect the elements of a manuscript’s mise-en-page and allow users to extract selected visual elements to build a research corpus. A bookbinding tool will allow users to draw and describe sewing structures on a book spine. A complementary tool will enable users to build a bookbinding visually by translating SVG images of binding components alongside a guided explanation. All of these tools target students and scholars, and are being designed with both groups in mind.

In the context of this poster session, we propose to demonstrate test versions of two applications. The first is an image enhancement tool that manipulates UV/IR images or high-resolution scans to expose new visual data. The second application, a script analysis tool, enables users to measure letter strokes, angles, and distances in historical scripts and to generate cumulative statistical data about the script. With it, users can create descriptive profiles of individual letters, ligatures and abbreviations. It is aimed at advancing palaeographical studies in general and at facilitating precise descriptions of specific script samples. Both tools will help students to train their eyes and have already been tested in small classroom settings. Scholars should benefit, as well. Broader user feedback is critical at this phase since the project end date stands a year away. We hope for a fruitful exchange at DARIAH and for constructive feedback about interface usage and user-friendliness, the desirability of supplementary features, and other conceivable learning or research goals.

Rewriting the past: A multi-faceted approach to improve quality in the NAKALA repository

Nicolas Larrousse¹, Edward Gray², Julie Verleyen¹, Claire Carpentier¹, Michel Jacobson¹, Hélène Jouguet¹, Sara Tandar¹

¹IR* Huma-Num, CNRS, France; ²IR* Huma-Num & DARIAH ERIC

Huma-Num[1] is a French national infrastructure dedicated to SSH (Social Science and Humanities) research projects. In order to meet one of our primary missions, that is, to provide the SSH community with solutions to preserve research data, Huma-Num has developed over the last 10 years a repository named NAKALA[2]. The goal at the time was primarily to secure research data. In that respect, it has been a resounding success, with around 2 million files grouped in 800 000 deposits, and NAKALA has found its place in the national ecosystem[3].

However, the quality of the data and metadata of existing deposits is far from perfect, which is detrimental to the visibility and hence potential reusability of these datasets: but how can this broad objective of improving data and metadata quality be tackled? Given the large amount of data in NAKALA, it is clearly impossible to process each deposit individually; it was therefore decided to adopt a multi-faceted approach.

During the “Core Trust Seal[4]” certification process, HumaNum was compelled to review a number of features associated with NAKALA. Documentation was completely revised to align with best practices and help guide users to submit better quality data and metadata, with a particular focus placed on the aspects of data preparation before depositing[5]. To help users implement these recommendations during the process of data deposit, compliance checks were added, and autocompletion was added to several metadata fields to encourage the use of controlled vocabulary.

Three main levels of data curation were identified:

1) data whose quality is unreviewed by humans

2) data that are checked from a documentary point of view by humans

3) data already at level 2 that are also verified from a technical and archival point of view to be preserved over the long term using the platform of our partner CINES[6].

For levels 2 and 3, in order to check the data sets, a network of data stewards and experts was identified across France and a “moderation” workflow[7] was created. Researchers can now reach out to local specialists to help them improve quality and receive a quality label.

The various actions described above are designed to improve the quality of future deposits of data. The remaining question is how to handle previous deposits. A study has been launched to examine the overall quality using multiple types of criteria, ranging from the content of the title and the use of URIs in appropriate metadata to more complex cross-metadata queries. The main idea is to be able to determine indicators and thus build dashboards to have a continuous vision of the global content of NAKALA and also more specifically for users in order to encourage them to improve quality.

The poster will review the results obtained by implementing these different approaches to “rewrite” the past and how these decisions impact the future developments of the NAKALA repository and the evolution of user support.

[1] https://documentation.huma-num.fr/en/humanum-en/

[2] https://nakala.fr/

[3] https://recherche.data.gouv.fr/en/repositories

[4] https://www.coretrustseal.org/

[5] https://documentation.huma-num.fr/en/nakala-preparer-ses-donnees-en/

[6] https://www.cines.fr/archivage/

[7] https://documentation.huma-num.fr/en/nakala-workflow-en/

Building a FAIR Training Ecosystem for the Social Science and Humanities within the H2IOSC project

Alessia Spadi¹, Emiliano Degl'Innocenti¹, Lucia Francalanci¹, Francesca Frontini², Giulia Pedonese², Jana Striova³, Laura Benassi³, Antonina Chaban³, Alessia Scognamiglio⁴, Federico Boschetti², Pietro Restaneo⁵

¹Opera del Vocabolario Italiano, Consiglio Nazionale delle Ricerche; ²Istituto di Linguistica Computazionale “Antonio Zampolli”, Consiglio Nazionale delle Ricerche; ³Istituto Nazionale di Ottica, Consiglio Nazionale delle Ricerche; ⁴Istituto per la Storia del Pensiero Filosofico e Scientifico Moderno, Consiglio Nazionale delle Ricerche; ⁵Istituto per il Lessico Intellettuale Europeo e Storia delle Idee, Consiglio Nazionale delle Ricerche

The study of the past, in all its shapes, underwent a transformative evolution in the last decade given by the introduction of new technologies and digital tools to support research endeavors. Digital technologies can provide effective support in different fields in the Social Science and Humanities sector; however, the availability of these technologies does not guarantee their effective utilization. Scholars, students and researchers need training materials to engage with digital tools and methodologies to use the full potential of technology in their studies. In this proposal, the Training Environment developed within the H2IOSC project will be presented to show how it can support interdisciplinary training and continuous professional development in the Social Science and Humanities (SSH) sector.

The Humanities and Cultural Heritage Italian Open Science Cloud (H2IOSC) project aims to create a federated cluster of services and resources, developed by the Italian national nodes of four Research Infrastructures (RIs) that are part of the ESFRI (European Strategy Forum on Research Infrastructure) roadmap in the field of Social and Cultural Innovation: DARIAH.it; E-RIHS.it, CLARIN.it and OPERAS.it.

Within the H2IOSC project, the Work Package dedicated to Training, Capacity Building, Engagement aims to support research through knowledge transfer and the implementation of good practices in education. The involved RIs can share information, training and guidance initiatives that aim to promote knowledge of the products, services and opportunities offered by RIs to potential users. In the field of training, initiatives are often scattered across different platforms, not always described with specific metadata and not always accessible to the public. To address this issue, the H2IOSC training infrastructure has been developed to provide an integrated environment for accessible and reusable courses for both trainers and students, supported by the implementation of a common methodology for structuring educational materials according to the FAIR principles (Findable, Accessible, Interoperable, Reusable). The infrastructure consists of two platforms: the H2IOSC Training Environment, which is used for the delivery and use of courses by users, and the H2IOSC Training Library, a specific repository of training materials for trainers (https://www.h2iosc.cnr.it/training-infrastructure/).

The H2IOSC Training Environment, based on a design shared by the 4 infrastructures to offer a complete experience to both students and teachers belonging to the different research areas, is a learning management system designed to offer a highly interactive virtual learning environment.

The H2IOSC Training Library is a specific repository of training materials for trainers. It is dedicated to the FAIR deposit of modular teaching materials, allowing the assignment of Persistent Identifiers (PIDs), standard licenses and integrated version update management.

The H2IOSC Training Environment and Training Library platforms aim to empower trainers and researchers in the Social Sciences and Humanities (SSH) and beyond to successfully integrate digital tools and methodologies into their work. H2IOSC is building a federated ecosystem that provides essential training resources and supports the implementation of FAIR principles in training materials as digital objects.

Needles in Haystacks? The Text+ Registry as Finding Aid for Scholarly Editions and other Resources

Daniela Monika Schulz^4,5,6, Nils Geißler^1,2,6, Kilian Erasmus Hensen^1,6, Leon Fruth^3,6, Tobias Gradl^3,6

¹Cologne Center for eHumanities; ²Fachinformationsdienst Philosophie; ³Otto-Friedrich-Universität Bamberg; ⁴Herzog August Bibliothek Wolfenbüttel; ⁵Universität zu Köln; ⁶Text+

Whilst the importance of editions for research is undeniable, their discoverability can be challenging at best. This is due to various factors: scholarly editions are usually prepared within the framework of third-party-funded projects, whereby not only the type of editions, but also the funding requirements and the formats for disseminating vary to a great extent. Databases of funding bodies such as the German Research Foundation (DFG) do contain some information on these projects, but usually lack integration with external knowledge bases. Printed editions in Germany are generally recorded in library catalogues, but it is not trivial to find them there, as no subject term is commonly used to denote them. Digital editions are for the most part not included in library catalogues at all. While there are inventories of digital editions curated by individuals or small groups, all of these services have different scopes and therefore provide different (levels of) information. Hence, users have to manually query numerous systems if they want to find all existing editions and available resources.

Within the context of the German NFDI consortium Text+[1], an overarching Registry has been developed to overcome these impediments. The Text+ Registry serves as a unified system to catalogue, describe and connect different types of scholarly resources (lexical resources, collections, editions), but also services, repositories and other entities such as people and institutions. The added value of the Text+ Registry lies in its overarching approach and the layering of information from different sources to provide richer descriptions. It enables researchers to conduct quick and targeted queries for relevant data, in combination with other integrated tools such as the Federated Content Search (FCS), and also provides interfaces.

Within the domain of scholarly editing, benefits compared to existing systems and catalogues result from the joint recording of both printed and digital editions, as well as completed and ongoing projects, from the inclusion of the FAIR principles in the context of digital editing , and from the integration with the basic service nfdi.software via the software registry in order to make the technical genesis of an edition transparent. The Registry can therefore fulfil a wide range of scholarly requests, such as the identification of best practice examples, or data for direct re-use (e.g., for compiling a specialised corpus), finding relevant resources for teaching purposes, or increasing the findability and thus visibility of one’s own work.

This contribution presents the technologies and methods behind the Text+ Registry and discusses challenges and advantages. Its signature layering approach and the architectural design of the Text+ Registry are outlined using the domain of editions as an example.

[1] Text+ is a consortium of the National Research Data Infrastructure (NFDI) dedicated to the sustainable preparation, provision and preservation of text- and language-based research data. Since its launch in autumn 2021, various measures have been taken along the three data domains of lexical resources, digital collections and editions in order to contribute to these goals, and provide researchers with the best possible support in data creation and provision.

Percy Bysshe Shelley’s Influence on the British Suffrage Movement: An AI Multi-Agent system for Tracing intertextuality

Tess Dejaeghere¹, Mariaane Van Remoortel¹, Salva Ros², Julie Birkholz¹

¹GhentCDH, Ghent Center for Digital Humanities, Ghent University; ²CLARIAH-UNED, National Distance Education University

Scholars have long acknowledged the radical Romantic poet Percy Bysshe Shelley (1792–1822) as a significant source of inspiration for the women’s suffrage movements in early twentieth-century Britain. It is widely recognized, for instance, that the suffragette motto “Deeds, Not Words” was derived from his poem The Mask of Anarchy (1819), that Katie Gliddon clandestinely recorded her diary in a copy of Shelley’s works while imprisoned in Holloway, and that numerous prominent suffrage campaigners were profoundly influenced by Shelley’s revolutionary ideals and progressive conceptions of womanhood.^{^[1]}

Building upon previous computational studies and advancing the systematic analysis of Shelley’s role in the suffrage movement, we propose an agentic generative AI system designed to examine the intertextual relationships between Shelley’s works and four suffrage newspapers—Votes for Women, Suffragette, Vote, and Common Cause. This approach seeks to illuminate how Shelley’s poetic legacy was not merely passively received but actively reinterpreted within the discursive framework of the suffrage campaign.

This system of agents sketch a portrait of Shelley’s influence on the suffrage movement, weaving together explicit references, echoes, and evolving ideas.The Mentions Agent, guardian of names that shape literary history, traces direct citations and references. The Allusions Agent reads in the shadows of the texts, uncovering ideas that slip into texts unseen. The Thematic Influence Agent follows restless concepts as they migrate, transform, and persist across time. The Citations Agent listens for the clearest echoes, tracking repeated words embedded in new contexts. Lastly, the Paraphrase Agent detects the art of saying the same thing differently, capturing how meaning reshapes itself without vanishing.

In this innovative approach, we propose a methodology that integrates expert evaluation into the assessment of system performance. This intricate task surpasses mere machine-based error counting, demanding a more nuanced and intelligent analysis—one that only human expertise can provide. Moreover, the inclusion of experts in this process will not only refine the evaluation but also foster critical discussions on the broader implications of AI-driven research in the digital humanities. assessing the viability of agentic LLM-based systems in expediting historical research

This study also advances a methodological framework that embeds expert evaluation into the assessment of system performance, recognizing that the complexity of this task exceeds the capabilities of machine-based error counting alone. A more refined, intellectually discerning analysis based on human expertise is essential. Furthermore, integrating expert oversight enhances the evaluative process and stimulates critical discourse on the broader ramifications of AI-driven research in the digital humanities, particularly in assessing the efficacy of agentic LLM-based systems in accelerating historical inquiry.

^{^[1]} See, for example, Hilda Kean, “Public History and Popular Memory: Issues in the Commemoration of the British Militant Suffrage Campaign”, Women’s History Review, 14.3–4 (2005), p. 583–85; Anne Schwan, “‘Bless the Gods for My Pencils and Paper’: Katie Gliddon’s Prison Diary, Percy Bysshe Shelley and the Suffragettes at Holloway”, Women’s History Review, 22.1 (2013), pp. 152–54; Kate Flint, The Woman Reader, 1837–1914 (Oxford: OUP, 1993), p. 245.

A problematic afFAIR?! Planning for the future in long-term edition projects

Daniela Monika Schulz^1,2

¹Arbeitsstelle "Edition der fränkischen Herrschererlasse", Universität zu Köln; ²Herzog August Bibliothek Wolfenbüttel

Although the role of scholarly editions for historical research is largely undisputed, as they provide the fundament for academic investigations and discourse,[1] their preparation remains a very complex, expensive, and time-consuming endeavour. This complexity has increased even further in recent decades due to new (unforeseeable) developments and changing requirements. Traditional printed editions have long predominated and still remain popular, but the proportion of digital editions and hybrid formats is steadily increasing, as has the use of digital resources in general. (Porter 2012) Besides publishing an edition, the provision of data derived from such projects in a standardised form to make it usable for various scholarly questions, has become a eligibility criteria for funding in recent years. But what exactly is meant by ‚standardised‘ form in this case, and which measures need to be taken has not yet been fully defined. The FAIR principles (Wilkinson et al. 2016) serve as guidelines, but have been formulated in rather generic terms, hence their application in the respective (disciplinary) contexts must be carefully adapted. The critical reflection on their implementation in the field of (digital) editing has only just commenced (especially in the context of the German National Research Data Infrastructure, NFDI), and remains largely a desideratum to this day. (Gengnagel et al. 2023, Hegel et al. 2023)

The ‘Edition der fränkischen Herrschererlasse’ is a long-term project funded since 2014 as part of the Academies' Programme, in which a new edition of the so-called capitularies, which are among the central legal sources of the European Middle Ages, is being prepared. The project is conceived as a hybrid edition. While the individual texts are published in a printed historical-critical edition, the accompanying digital edition also provides transcriptions of the collections as they have been preserved in the manuscripts. As a long-term endeavour, the project faces a number of challenges. One specific challenge lies in the preparation of the data in accordance with the FAIR principles. The issue of sustainability was certainly considered from the outset, but since the project started back in 2014, concrete measures for ‘FAIRification’ were not part of the original work program. (Schulz et al. 2017) How can they now be integrated retrospectively and in the most resource-efficient way? What specific measures should be implemented? What services (e.g. in the NFDI context) are available and how sustainable are they? How can overarching connectivity to other projects and databases be established in order to create added value for research?

The project is currently addressing these and other questions in addition to its day-to-day editorial work. The contribution would like to present the current considerations and concepts for discussion, but also outline the existing open questions that many (ongoing) projects are facing.

[1] On the discussion of the importance of editions in historical studies and their distinctive features still fundamental is Arnold Esch, Der Umgang des Historikers mit seinen Quellen. Über die bleibende Notwendigkeit von Editionen, in: Schieffer, Rudolf; Gall, Lothar, Quelleneditionen und kein Ende? (Historische Zeitschrift. Beiheft 28), München 1999, S. 129-148.

Documentation of the Polish Literary Digital Culture - Quest in the Past

Beata Koper¹, Paulina Czwordon-Lis²

¹University of Opole, Poland; ²Institute of Literary Research of the PAS, Poland

The Polish Literary Bibliography (PBL)^{^[1]} is a continuously supplemented online database collecting information on Polish literature, theater, and film history. The iPBL project^{^[2]}, conducted within its framework, is a crucial initiative aiming at documenting online materials from the same subject area and attempting to perform a unique reconstruction of Polish literary digital culture and its history. In our poster, we would like to present the “cyber-archaeological” workflow used in the project and highlight the significant challenges of documentation of digital culture.

Digital media can hardly be called “new media” anymore: the once vibrant sites, phenomena, formats, and platforms are aging and disappearing - becoming history of utmost importance to preserve. Therefore, studying (documenting) the literary Internet requires an archaeological approach and knowledge of web history.

In Poland, there has not yet been an effort to systematically archive digital culture. There are also no relevant inventories, “maps,” or documentation that would allow starting bibliographic work in a systematic way. The poster will present different approaches to documentation: site-biographical and event-based^{^[3]}.

The list of sources for detailed elaboration in iPBL will eventually include 200 representative born-digital magazines, services, and blogs concerning literature, theater, and film, still active on the web. The oldest compiled records (articles, literary works, reviews) date back to 1999 (1 service - from 2000, 4 - from 2001, 3 - from 2002). A list of 4,000 internet addresses of other noteworthy literary, theatrical, and film websites on the Polish web will be supplemented.

The work on the selection of sources let us isolate certain aspects of Polish literary digital culture, which flourished in the particular phases of its development in the shape of peculiar archaeological layers: some digital spaces (social chat rooms, blogs) are buried under the emerging forms of digital activity, e.g., social media.

The phase of the pioneers of the web: when the Internet was “unfenced”, “pirate”, grassroots, open, “challenging the institutional order”^{^[4]}, time of experimentation with e-literature, hypertexts;
The phase of the Internet of communities: seamlessly overlapping with the previous phase, the development of “interpretive communities” and “writing communities”^{^[5]}: forums, vortals, collaborative analysis, and literary blogs as a space of direct contact between the writer and commenting audience, situating themselves in a network of websites connected through mutual links;
The current phase: the transfer of writers activity to social media^{^[6]}, the commercialization of the review blogosphere, the wave of expansion and professionalization of born-digital literary magazines, new formats: podcasts, video channels, the apogee of video broadcasting of literary events during the pandemic^{^[7]}.

In creating collections, compiling, and archiving in the iPBL digital culture project, we have encountered the following challenges:

reaching out to as many available (though not necessarily updated) websites as possible;
an ethical issue - selecting and describing sources: it may prove to be a gesture in the future to preserve certain sources and condemn those that have been discarded or slipped into oblivion;
maintaining a balance between amateur and professional circuits.

Scalable refinement of the Finnish national bibliography for large-scale statistical analysis

Julia Matveeva¹, Akewak Jeba¹, Veli-Matti Pynttäri², Kati Launis², Osma Suominen³, Leo Lahti¹

¹University of Turku, Finland; ²University of Eastern Finland; ³National Library of Finland

Statistical analyses of bibliographic metadata catalogs can provide quantitative insights into large scale trends and turning points in publishing patterns, enriching, and even challenging the prevailing views on the history of knowledge production (Lahti, 2019a). The use of bibliographic catalogs has become a well-established tool in literature history and helped to renew research methodology (Umerle, 2023). However, the efficient utilization of large-scale data collections as research material depends critically on our ability to critically evaluate data representativeness, completeness, quality and trustworthiness. Our earlier work has demonstrated how remarkable fractions of the bibliographic metadata curation and analysis process can be automated through dedicated bibliographic data science workflows (Lahti, 2019b, 2015; Tolonen, 2016, 2019).

This study presents further development of an open and scalable data science workflow to support literary research using the Finnish National Bibliography, Fennica. The scalability of the solutions varies by data type, and the refinement process must strike a balance between accuracy and scale. Our reproducible workflows emphasize transparency, consistency, and provenance as key elements of this process; we show how standardized refinement procedures and automated generation of versatile statistical summaries of the refined data can be used to monitor the curation process while supporting in-depth statistical analyses and modeling of publishing patterns over time and geography.

We present good practices and conceptual approaches for bibliographic data refinement and demonstrate how enriched national bibliographies can offer a data-rich perspective on Finland’s literary history. This study particularly focuses on Finland’s Grand Duchy era (1809–1917) literary analysis, contrasting manual and automated data extraction methods. The dataset, sourced from the National Library of Finland, records in Fennica from 1488 to the present day and numerous fields and subfields. We only approach around 50 fields to cater for our research needs. The workflow employs tailored methods to standardize key metadata fields, including author information, language, publisher, publication place, classification schemes, genre fields such as call number, UDC, control field and index term/genre, title, physical dimensions, and gender. These functions harmonize inconsistencies, remove ambiguities, and integrate supplementary information from external databases, ensuring high data fidelity.

The refined dataset reveals insights into Finnish publishing history, addressing gaps in metadata completeness and quality. Enrichment from external collections and complementary sources, including Finna, Kanto, and Finto, helps mitigate limitations such as missing author information, ambiguous publisher and publication place data, and the absence of gender classification. Additionally, UDC numbers were converted to words using Finto vocabulary via web scraping.

The results highlight the effectiveness of automated bibliographic data refinement in supporting large-scale research. Key outputs include a comprehensive bibliographic data science workflow, harmonized metadata dataset for research applications, and novel solutions for semi-automatic curation of national bibliographies. Informative data summaries facilitate quality control and bibliographic analysis while enabling focused studies on specific periods. The approach can be adapted for various temporal, geographic, and thematic analyses.

RECLAIMING THE PAST: DIGITAL HISTORY AND CULTURAL PRESERVATION OF KENYA’S INDIGENEOUS COMMUNITIES

Precious Joan Wapukha

Kibabii University Kenya,

African past has been presented diversely by various actors such as the; missionaries, colonial masters and African scholars. However the bottom-line is that the precolonial African cultural structures have been affected due to lack of proper documentation. Therefore this study seeks to unravel how utilization of digital technology can lead to cultural preservation and presents a platform for reclaiming the history Kenyan indigenous communities. As the world is rapidly moving towards a digitalized system, which poses a critical question on how will the indigenous communities preserve, reclaim and revitalize their rich diverse cultural heritage?. Which have long been marginalized or rendered invisible in historical narrative. Digital history is an inroad to the fulfillment of interdisciplinariness approach that enhances a modus operandi for documentation hence preserving the histories of indigenous population. Most of the indigenous communities in Kenya such as the Turkana and Samburu poses invaluable cultural heritage such as ritualism, songs, language, oral tradition and myth. These practices have been preserved through African oral tradition which has been passed from one generation to another. However in the contemporary period the cultural practices are at risk of being eroded due to facets such as urbanization, neo-colonialism, modernization and globalization. This study’s argument is that digital technology is a safe house for cultural heritage for indigenous communities. It further unravels how digital technology is pertinent in cultural preservation. This is attributed to variance of documentation platforms that ranges from mapping, Multimedia, text mining, and online repositories. Therefore the study underscores the vitality of understanding Kenya’s history in detail and amplification platform for the voices of values and perspectives of indigenous communities. This study is a step towards deconstruction process of the passive indigenous cultural heritage. Instead it offers a foundation for reconstructing cultural values and histories hence, preventing exploitation and distortion of cultural practices. This study will utilize cultural Heritage theory, which emphasizes that digitization of cultural heritage means reasserting community’s unique identity in the face of colonial history and globalization, which often erased or devalued indigenous cultures. The study will focus on two indigenous communities that is the Turkana and Samburu. The study will adopt qualitative approach that captures experiences in relation to digitalization of history for cultural preservation. Purposive and snowballing sampling techniques will be used in identifying respondents. Data will be collected through Focus Group Discussions and Interview schedules. This study will target the following groups; Council of Elders, Nyumba Kumi,community members, officials of Kenya National Archives and technological experts in Kenya National Library service. Qualitative data will be analyzed thematically capturing patterns and themes in relation digital history and cultural preservation of Kenya’s indigenous communities. Digitalization is significant in reclaiming and preservation of cultural practices that fosters identity, pride and a sense of ownership over indigenous community’s traditions.

Dariah.hub project (2024-2025): Advancing interdisciplinary collaboration in digital humanities

Marcin Heliński¹, Aleksandra Nowak¹, Tomasz Umerle², Krzysztof Abramowski¹, Bartosz Szymendera¹

¹Poznan Supercomputing and Networking Center, Poland; ²The Institute of Literary Research of the Polish Academy of Sciences, Poland

The Dariah.hub project (2024-2025) builds upon the earlier Dariah.lab initiative (2021-2023),
which aimed to create a network of distributed digital humanities laboratories. Unlike
Dariah.lab, which focused on addressing the diverse needs of researchers from various
domains, Dariah.hub is designed to enhance collaboration through a central Interdisciplinary
Research Platform. This new infrastructure is based on knowledge graph architecture,
allowing for the integration of various disciplines such as archaeology, musicology, and
sociology, while also enriching research objects.

Three levels of integration

A key aspect of the platform is its ability to integrate tools and services, including those
provided by DARIAH partners, at three different levels:

Aggregation of research objects from digital repositories, supplying the platform with resources and data for the knowledge graph.
Asynchronous processing of objects retrieved from the platform using external tools that enhance them with additional metadata and resources.
Interactive engagement with objects through direct execution of tools from the platform, allowing researchers to work with data in real-time.

Use cases: integrating partner tools

An example of integrating DARIAH partner tools is the use of the Archaeological Module to
document artifacts and archaeological sites. Once aggregated into the platform, these data
can then be analyzed using historical text analysis tools, potentially provided by other
partners specializing in this field.

The OCR (Optical Character Recognition) and HTR (Handwritten Text Recognition) results,
obtained through tools integrated with the platform, can be further processed by NER
(Named Entity Recognition) and NEL (Named Entity Linking) mechanisms, also potentially
provided by DARIAH partners. This enables automatic detection and linking of semantic
relationships, creating dynamic connections between entities, cultural contexts, and
sociological frameworks within the knowledge graph.

Additionally, the platform supports asynchronous data processing by external
interdisciplinary tools. For example, geospatial data provided by one partner could be used
to contextualize archaeological findings, which are then analyzed using tools from another
partner specializing in spatial analysis. This enables researchers to combine diverse
datasets and methodologies, leading to a more holistic approach to analyzing source
materials and uncovering previously unrecognized relationships.

Interoperability and collaboration

It is also worth noting that Dariah.hub integrates tools from the Dariah.lab suite, ensuring
interoperability across disciplines. The platform offers shared workspaces with secure,
collaborative editing and version control, facilitating multi-author cooperation. Automated
workflows and data pipelines ensure seamless interoperability between different tools,
eliminating disciplinary and institutional constraints. Additionally, open licenses support data
sharing and reinforce interoperability standards.

A dynamic feedback loop between data providers and users contributes to the continuous
improvement of curated datasets and the expanding knowledge graph. The platform’s
high-performance computing infrastructure enables resource-intensive tasks such as
large-scale text corpus analysis and 3D reconstructions of archaeological sites.

Conclusion: advancing digital humanities

The poster will highlight the Dariah.hub Interdisciplinary Research Platform’s role as a
significant step forward in integrating digital humanities resources and tools, fostering
interdisciplinary researchers collaboration, and enhancing the analysis of cultural and
scientific heritage present in the knowledge graph.

Swimming in a sea of data. Digital tools for the study of Ancient Mediterranean trade and society

Manel García Sánchez¹, Arnau Lario Devesa², Nina Mejuto García³, Oriol Morillas Samaniego⁴, Víctor Revilla Calvo⁵

¹University of Barcelona, Spain; ²University of Barcelona, Spain; ³University of Barcelona, Spain; ⁴University of Barcelona, Spain; ⁵University of Barcelona, Spain

Ancient societies, as any other human group, are amazingly complex and difficult to properly assess, especially when taking into account the impressive degree of cultural “globalisation” attained during the Roman empire. For this very reason, our research group focuses on several very relevant issues; by analysing inscriptions on amphorae—ancient containers used for goods like wine, fish preserves and olive oil—we can uncover vital information about ancient trade networks in the classical Greco-Roman world (5th c. BC – 3rd c. AD). Due to the scale of such endeavour, and the great amount of available data, the development of digital tools such as the CEIPAC Database of Amphora Stamps, housing tens of thousands of these inscriptions, becomes indispensable in order to identify patterns and track the movement of goods across time and space.

In addition to examining trade, the digital analysis of ancient texts and inscriptions has proven crucial for studying the representation and roles of certain marginalised sectors of society in antiquity, such as women. A digital archive compiling literary and epigraphic records of Greek and Roman women facilitates large-scale textual analysis, allowing for an exploration of gender dynamics in ancient societies. Through methods like text mining, it is possible to trace the evolution of women’s representation over time and across geographical boundaries, uncovering previously overlooked aspects of social and cultural history. This digital approach not only enhances understanding of gender roles but also demonstrates the value of textual analysis in revealing new insights into the social structures of ancient civilizations.

A key objective of these digital humanities projects is to broaden access to historical research beyond academic circles into the general public, which is the one that ultimately fund such initiatives. By creating user-friendly, open access platforms that make vast databases of ancient texts and inscriptions publicly available, such as the Roman Open Data project, these initiatives invite the broader public to engage directly with historical data. Interactive platforms that host archives of ancient trade records or texts related to women in antiquity, such as the “Gynaikes-Mulieres” website, encourage public involvement in the process of historical discovery, thus making research accessible and engaging for diverse audiences.

By integrating digital tools like spatial analysis, machine learning, and online archives, these projects aim to redefine how historical research is conducted and shared. Whether through the digital reconstruction of trade networks using Network Science, the analysis of gender roles giving agency to ancient women, or the creation of interactive platforms for public engagement, these efforts reflect the growing impact of digital humanities on the study and presentation of the past. They also underscore the potential of digital technologies to engage new audiences, enhance scholarly analysis, and open up new avenues for interdisciplinary collaboration, advancing both research and public engagement with the ancient world.

Systematic Research Data Management at the Göttingen Campus - Showcasing the National Research Data Infrastructure

Stefan Buddenbohm, Alexander Steckel, Lukas Weimer

Göttingen State and University Library, Germany

Göttingen: A Strong Player in the National Research Data Infrastructure (NFDI)

This poster introduces all NFDI consortia represented at the Göttingen Campus and showcases the broad disciplinary spectrum of research data management.

Various institutions of the Göttingen Campus are involved in 17 out of the 27 NFDI consortia. The strategy for creating local support structures is increasingly driven by the Göttingen State and University Library (SUB) and the GWDG, the datacenter for the Georg-August-Universität Göttingen and the Max Planck Gesellschaft.

What is the NFDI?

NFDI systematically indexes and networks valuable scientific data for the entire German scientific system and makes it available for sustainable use. Until now, such efforts to provide sustainable data access have mostly been pursued on a decentralised, project-related or temporary basis.

NFDI represents Germany as a mandated member of the European Open Science Cloud (EOSC). NFDI is also a member of the internationally active Research Data Alliance (RDA).

The NFDI consortia with Göttingen participation cover a broad range of disciplines, thereby emphasizing the campus' ambition to deliver outstanding achievements across a wide disciplinary spectrum. Notably, the involvement in all four funded consortia for the humanities and cultural sciences highlights Göttingen's long-standing tradition of expertise in the necessary infrastructure for these fields.

Base4NFDI (core services for the NFDI; all funded NFDI consortia from all three funding rounds are involved, with the University of Göttingen taking on responsibility in governance as a co-applicant)
DAPHNE4NFDI (Data from Photon and Neutron Experiments for the NFDI, involvement as co-applicant)
FAIRAgro (agroecosystem research; the Department of Crop Sciences at the University of Göttingen, involvement as participant)
FAIRmat (FAIR Data Infrastructure for Condensed-Matter Physics and the Chemical Physics of Solids, involvement as participant)
KonsortSWD (consortium for the Social, Behavioral, Educational and Economic Sciences, involvement as participant)
NFDI4Biodiversity (Biodiversity and Environmental Data, GWDG as co-applicant)
NFDI4BIOIMAGE (research data management for microscopy and bioimage analysis; the Excellence Cluster "Multiscale Bioimaging" at the University of Göttingen is involved as a participant)
NFDI4Chem (Chemistry Consortium in the NFDI, involvement as participant)
NFDI4Culture (Consortium for Research Data on Tangible and Intangible Cultural assets, involvement as participant)
NFDI4Earth (Adresses Digital Needs of Earth System Sciences, involvement as participant)
NFDI4Energy (interdisciplinary energy systems research; the Sociological Research Institute Göttingen (SOFI) e.V. is involved as a co-applicant)
NFDI4Health (National Research Data Infrastructure for Personal Health Data, Universitätsmedizin as co-applicant)
NFDI4Ing (National Research Data Infrastructure for Engineering Sciences, GWDG as participant)
NFDI4Memory (research data management for historical data; the Academy of Sciences in Göttingen, the SUB, and the GBV central office (VZG) are involved as participants)
NFDI4Objects (research data infrastructure for the material heritage of human history; the GBV central office (VZG) is involved as a co-applicant, and the SUB as a participant)
NFDIxCS (research data infrastructure for computer science; the GWDG is involved as a co-applicant)
Punch4NFDI (Consortium of Particle, Astro-, Astroparticle, Hadron and Nuclear Physics, university as co-applicant)
Text+ (Text- and Language-Based Research Data, university as co-applicant)

Increasing the discoverability of research services and resources through contextualization and community use cases in the SSH Open Marketplace

Stefan Buddenbohm², Edward J. Gray¹, Cristina Grisot³, Michael Kurzmeier¹

¹DARIAH-EU, Germany; ²Göttingen State and University Library; ³Swiss National Data and Service Center for the Humanities

Introduction

The Social Sciences and Humanities Open Marketplace (SSHOMP) is a discovery portal for Social Sciences and Humanities research communities. It showcases solutions and research practices for the research data life cycle and facilitates discoverability and findability of resources that are essential to enable sharing and re-use of workflows and methodologies. With a population of ~5000 items, the SSHOMP relies on community curation to ensure the catalogue remains up-to-date and useful for researchers. Curation routines, mixing automatic and manual tasks, are set up to ensure and continuously improve (meta)data quality.

Contextualization and use cases

Contextualization is one of the key pillars of the SSHOMP (Barbot et al. 2024). It is meant to provide a discovery portal for tools and services, while placing these tools and services in context via publications, training materials, datasets, and workflows. As such, these last four categories are indexed in the SSHOMP insofar as they can be placed in relation with tools and services. This is an objective we are pursuing, through the automatic creation of relations and their manual curation, and through encouraging authors to create relations to other items when they create new items. This poster shows how the SSHOMP facilitates diverse methods of studying the past via contextualization of resources, relying on three community use cases:

The integration of items created within the scope of the ATRIUM project
The integration of items created within Text+ and DARIAH-DE
The integration potential of items, including workflows, originating from a DARIAH WG dealing with historical data

ATRIUM Project

The network analysis in our poster shows how the SSHOMP provides insights into the use of tools, methods and standards in the DH research communities, and how it increases serendipity in the discovery of new methods and standards, by interlinking the resources and describing workflows. These relations demonstrate how inter-related this specific catalogue of tools is with the overall catalogue, and the broad impact that initiatives like ATRIUM can have on the community.

Text + and DARIAH-DE

Text+ along with the Society for Humanities and Cultural Research (GKFI), use the SSHOMP as an aggregator and delivery service to present their offerings. Much like ATRIUM, resources are tagged in the Marketplace with minimal metadata and harvested regularly via the API, allowing for the portals of Text+ and GKFI to display over 80 services on institutional websites with minimal effort - needing only to implement harvesting and display: creation and curation of resources are managed solely through the Marketplace, which is a huge benefit for both entities.

DARIAH WGs

Currently, DARIAH has four WG groups focusing on historical data: ARChitectural HEritage Thesaurus through Integrated digital Procedures and Open data (ARCHETIPO), Digital Practices for the Study of Urban Heritage (UDigiSH), Digital Numismatics and Women Writers in History. Through their cross-country and cross-disciplinary character, these WGs create unique resources, tools and knowledge about the past. The SSHOMP is a powerful tool for the dissemination of these resources, and for translating their knowledge and expertise into step-by-step, practical workflows.

Reconstructing urban transformations: Digital Humanities for the documentation of large-scale construction sites in historic cities

Sofia Darbesio

Politecnico di Torino, Italy

Large-scale architectural construction sites are a relevant part of cities' contemporary developments. They also produce a high amount of diversified data, tools, and information that is complex and challenging to access or communicate. However, especially in the case of historic cities, they embody inventive capabilities, innovation, and processes, representing relevant information for architectural and urban history. By applying Digital Humanities approaches, it is possible to explore new ways of documenting, studying and narrating inherent complexities such as decision-making processes, the interaction of interdisciplinary urban history actors, and the contextualisation of spatial-heritage relationships, while framing these dynamics in the context of the historic city and its past.

This research addresses the challenges of recording and preserving construction site documentation by producing a critically structured digital library of diverse (born-digital, digitised and non-digitised) data, metadata and resources. By intersecting historical and contemporary materials through ICTs, a new interactive multimedia solution can provide a dynamic virtual representation of the urban space and its past. By representing site processes through a spatialised digital reconstruction of phases and interactions, the system can document and interpret the evolution of architectural worksites in historic cities, offering transparency and insight into the spatial-cultural relationships that shaped the present identity of the urban space. Therefore, the research deals with producing a digital prototype to allow cross-referencing the current and past versions of the site-related materials, promoting the accessibility and sustainability of the collected information. This digital interface will make it possible to visually communicate the dynamics of urban development through different historical-critical narratives.

The chosen demonstrator is the Piazza Municipio metro station in Naples, a large-scale infrastructural and architectural worksite in the historic city centre that has taken over twenty years to complete. Designed in 2003 by the Portuguese Pritzker Prize-winning architects Álvaro Siza and Eduardo Souto de Moura, the architectural project dealt with multiple historical heritage segments. During the excavation phases, important archaeological evidence emerged, testifying to the overlapping of multiple historical layers throughout time due to the succession of many different cultures and social changes. These findings added complexity and depth to the worksite process, making it even richer and more multifaceted.

By addressing these dynamics through a Digital Humanities approach, the study contributes to the understanding of large-scale urban transformations and their relationship to urban history, providing new digital means to document and narrate the city and its past. In this context, this contribution focuses on the challenges of building a digital tool to reconstruct, make visible, map and narrate complex processes of a large-scale contemporary construction site as a spatial and conceptual node of the historic city at the crossroads of its past, thus leading to a more comprehensive understanding of its connection to urban history.

AI-Enabled Citizen Participation in Safeguarding Ukrainian Cultural Heritage: Ethical and Methodological Frameworks

Tugce Karatas¹, Sanita Reinsone², Marianna Ziku⁵, Uldis Zariņš², Katerina Zourou⁵, Pavlo Shydlovskyi³, Alba Irollo⁴

¹University of Luxembourg, Luxembourg; ²University of Latvia; ³Taras Shevchenko National University of Kyiv; ⁴Europeana; ⁵Web2Learn

The preservation of Ukrainian cultural heritage faces unprecedented threats due to ongoing geopolitical turmoil. Destruction, displacement, looting and loss of cultural artifacts require urgent and innovative responses to safeguard both tangible and intangible heritage. Artificial Intelligence (AI) presents new opportunities for digital preservation, documentation, and restoration, yet its application raises ethical and methodological challenges. This poster explores how AI, when combined with citizen participation, can be effectively and responsibly leveraged to protect Ukrainian cultural heritage, ensuring an ethical, sustainable, and community-driven approach to digital preservation.

The AISTER project, funded under the Erasmus+ KA2 programme, employs AI technologies alongside active citizen engagement to develop participatory models of cultural heritage safeguarding. It aims to advance AI-driven methodologies, including multilingual text recognition, AI-powered image analysis, and 3D reconstruction of cultural sites. These technologies facilitate the documentation, restoration, and digital preservation of endangered heritage assets while actively involving communities in the process. The project aligns with European Union regulations[1] and UNESCO guidelines[2], ensuring AI applications uphold ethical principles related to transparency, inclusivity, and sustainability. Responsible AI use is crucial to mitigate biases, prevent the misuse of sensitive cultural data, and support heritage professionals in navigating AI-driven decision-making processes. Beyond technological innovation, AISTER underscores the importance of community engagement and public awareness. The project fosters participatory approaches where university students, cultural heritage professionals, and citizens collaborate to identify risks, document artifacts, and develop AI-enhanced heritage preservation strategies. Through co-creation workshops, hackathons, and roundtables, AISTER promotes knowledge exchange and strengthens public involvement in heritage safeguarding. These activities help demystify AI, making it more accessible and fostering public trust in AI-assisted preservation efforts. The project’s outcomes will contribute to ongoing discussions on AI ethics, human-centered AI, and citizen participation in cultural heritage preservation. In addition to expert roundtables and open-access research publications, AISTER will organise an "AI for 3D Cultural Heritage of Ukraine" hackathon, community-driven AI workshops, and a structured policy framework for ethical AI use in the heritage sector. The AISTER Manifesto will further outline best practices for integrating AI in heritage protection, emphasising ethical considerations, public engagement, and long-term sustainability.

By fostering interdisciplinary collaboration between computer science, cultural heritage, and humanities and social sciences, AISTER showcases AI’s potential beyond documentation and analysis, serving as a catalyst for engagement, awareness, and collective responsibility in the preservation of cultural heritage. This poster aims to contribute to broader discussions at the DARIAH Annual Event, highlighting the intersection of AI, digital humanities, and participatory heritage preservation to ensure an ethical and sustainable approach to digital humanities research.

An Open Access database for Khmer Buddhism (Cambodia): enhancing iconography with Omeka-S

Juliette Lecorney¹, Lauriane Locatelli²

¹University of Strasbourg, France / DISTAM; ²INIST - CNRS

The aim of this poster is to present an iconographic database gathering Buddhist images from Ancient Cambodia. As a matter of fact, the purpose of my doctoral dissertation is to study the specific features and evolution of Buddhism in Ancient Cambodia (pre-angkorian and angkorian periods). In this context, iconography is one of the main sources. However, ancient Khmer Buddhist images are plentiful and scattered in collections around the world. Thus, one part of my doctoral research is to create a database to collect and centralise Buddhist images from Ancient Cambodia. The objective is also to encourage and simplify cross-referencing between sources, using mapping tools and links with epigraphic data. The aim of this iconographic database (containing statues in particular) is to promote and make available to the scientific community and the general public, iconography from the pre-Angkorian and Angkorian periods (from the earliest attestations to the reign of Jayavarman VII, 1181 - approx. 1218) in the Far East (Cambodia, Thailand, Laos and Vietnam).

This website and database is developed with support from INIST-CNRS and the DISTAM Consortium. This project is aligned with the objectives of Open Science and FAIR data. The database mostly featured photographs taken during my research missions in Cambodia, but also images from museums and institutions. This is a multidisciplinary approach, combining the history of religions, the study of written sources and iconography. This work is part of the Open Science approach through its compliance with FAIR principles and the digital humanities aspect, particularly as regards cartography with GeoNames.

Thus, one part of my poster will focus on the methodology, the technical aspects and the construction of the various tools (cartography, search tool, classification, etc.). The other part will focus on the benefits offered by the creation and the use of such a database in the field of Buddhist (and Khmer) studies. Thereby, the aim of this poster is to present this database and its tools specially designed for Buddhist studies. But it will also provide an opportunity to discuss on improvements and additions that can be made, as well as the limits of this project. Finally, this poster will also raise questions relating to image licences and the use of images by scholars.

Aspect Detection and Classification in Historical Travel Literature: A study on Prompting Strategies and on the Diachronic influence of Language on Generative AI Performance

Tess Dejaeghere¹, Salvador Ros², Julie Birkholz¹

¹GhentCDH, Ghent Center for Digital Humanities, Ghent Universityn; ²CLARIAH-UNED, National Distance Education University

In Digital Humanities (DH), the recognition, extraction, detection, and classification of aspects are fundamental for tasks such as entity linking and network visualization. Traditionally, these tasks have relied on rule-based methods or discriminative language models functioning as classifiers, (Dejaeghere and Singh 2024). However, the rapid development of Generative AI models, including LLMS as GPT-4, Gemini, Llama 3, and Claude, (Openai 2024; Google 2024; Anthropic 2023), presents a significant opportunity for advancing aspect task. These models enable users to engage with extensive training data through natural language instructions, reducing the need for manual feature engineering or large annotated datasets. The increasing accessibility of chat-based interfaces, (Amazon 2023). further lowers the technical barriers for researchers and practitioners in DH, facilitating experimentation with information extraction techniques.

However, scholars require clear guidelines to effectively utilize LLMs for information extraction. To achieve this, it is essential not only to conduct a systematic study of prompting techniques and their impact on extraction performance but also to analyze the diachronic effect of language on model accuracy, as this remains a critical research challenge. Therefore, it is necessary to answer these questions:

1.- Which prompting strategies most successfully detect and categorize aspects?

2.- Do language models perform better on texts from more recent centuries, where language is more standardized, or do they perform equally well on texts from earlier periods?

3.- Do language models predispose to perform better or worse when detecting different categories of aspects?

For this purpose, we examine the effectiveness of Generative AI models and prompting techniques for aspect recognition and classification in historical travel literature, focusing on English texts from the 18th, 19th, and 20th centuries (Dejaeghere and Singh 2024). Using an annotated dataset containing entities related to travelers' environments—such as fauna, flora, weather, locations, and organizations— we leverage different LLMS (e.g. LLama 3.2, GPT4o) to evaluate four prompting techniques, across different time periods and entity categories. A category-specific ablation study assessed the impact of both prompt design and linguistic variations across centuries on aspect detection performance. This approach provided deeper insights into the interaction between historical language variation and AI-based extraction techniques.

Statistical analysis, including Kruskal-Wallis and Mann-Whitney U tests, revealed that tailoring prompting strategies to the specific objectives of the aspect detection task is essential. Among the tested prompting techniques, few-shot prompting and chain-of-thought (CoT) prompting yielded the highest precision. However, these methods did not significantly outperform others, suggesting that the trade-offs between precision and recall should be carefully considered based on the task’s requirements. When maximizing the number of detected aspects, zero-shot and CoT prompting were more effective, though they required additional validation to ensure completeness. In addition, the study further highlights variability in aspect detection performance across linguistic periods, with texts from the 18th and 19th centuries outperforming those from the 20th century. This finding underscores the impact of training data alignment with historical linguistic characteristics. Future research will expand this analysis by testing other high-performing generative models, assessing their cross-linguistic and cross-model generalizability.

Identification of Coptic Dialects Using Supervised Machine Learning

Peter Missael

University of Göttingen, Germany

Dialect identification plays a crucial role in understanding the linguistic and cultural nuances of the Coptic language, the last stage of Ancient Egyptian (an Afro-Asiatic language). Despite its historical significance, there has been limited research in this area. Most machine learning models focus on only one or two dialects (cf. Smith and Hulden 2016; Zeldes and Schroeder 2016; Levine et al. 2024), as these two comprise the bulk of Coptic texts. This study presents a machine learning model for dialect identification of the Coptic language, addressing the existing gap in linguistic research. Using supervised machine learning in dialect identification for other languages showed promising results (cf. Doostmohammadi and Nassajian 2019; Jauhiainen et al. 2022; Vaidya and Kane 2023).

Various methods were evaluated, including Support Vector Machine (SVM), Random Forest Classifier, Multinomial Naïve Bayes (NB), Logistic Regression, and Recurrent Neural Network (RNN) with a Long Short-Term Memory (LSTM) layer. The best performing method was Multinomial NB with an F1-score of 0.92, while most methods achieved an F1-score of 0.91.

The dataset comprises texts from six (sub)dialects, written in Coptic Unicode. Preprocessing involved removing diacritics and punctuation marks, and splitting texts into sentences, each labeled by dialect. The distribution of sentences is shown in Figure 1.

Feature extraction was performed using TF-IDF 1- to 2-grams. Grid search cross-validation (cv=5) was used to identify the optimal parameters for each method. The imbalance in the dataset impacted the results, as shown in the confusion matrices (Figure 2; Figure 3), with the three most represented dialects being the most accurately identified.

However, there is room for improvement. The accuracy of identifying the underrepresented dialects can be improved through digitizing more texts from these dialects. The model can be fine-tuned to identify additional dialects and enhance the identification of existing ones. It can serve as an initial step in a Coptic NLP pipeline or in research to identify the most characteristic features (words) in each dialect, particularly in texts which show influences from various dialects.

Teaching late antique and byzantine illuminated manuscripts through digital humanities. A field report

Thorben Langer, Johanna Störiko

Georg-August-Universität Göttingen, Germany

The complex demands placed on teaching in the field of Digital Humanities become immediately apparent when considering the students enrolled in a typical course in Göttingen. For our practical exercise, "Digital Analysis of Illuminations in Late Antique and Byzantine Manuscripts," we had 24 students enrolled in the winter semester 2024/25. Of these, approximately half (13) were pursuing a Master's degree in Digital Humanities, while 8 students were studying Data Science or Computer Science, and a few came from Iranian Studies or History. Only a few students had prior experience with historical manuscripts, and even fewer had any familiarity with the Late Antique or Byzantine periods. The three manuscripts (Madrid Skylitzes, Ashburnham Pentateuch, and Mutinensis graecus 122) that we prepared as examples for the course were unknown to the students at the outset. Therefore, we faced the challenging task of not only teaching theoretical and methodological competencies in Digital Humanities to a group with such diverse skills and fields of study, but also of introducing them to the scholarly motivations of Late Antique and Byzantine archaeology and art history.

But how can the balance between technical processing and historical interpretation be achieved in practice in the classroom? In this experience report, we would like to share our observations on this matter. We will present the structure and content of our course and reflect on the difficulties and learning effects experienced by the students.

The core objective of the course was to teach students digital methods for analyzing illuminations in manuscripts. We aimed to introduce both established and experimental methods from Digital Image Science and Computer Vision (Image Captioning, SVG-Annotation, Image Clustering, Shape Analysis, Image Segmentation, Face Recognition). As a foundation, the students received an introduction to the various illuminations and a basic historical contextualization of the codices. Additionally, the students were encouraged to explore the codices on their own.

Following this, the digital work with the manuscripts began. We would particularly like to highlight the work on the Madrid Skylitzes. Initially, the students annotated all the heads of people in the illuminations using SVG polygons. This resulted in a dataset of over 2500 heads. We then statistically analyzed this dataset and performed a clustering of the heads based on image embeddings. It became apparent that the students not only needed practice and guidance in applying the tools, but also in fundamental image-scientific methods such as describing or categorizing images.

We therefore believe it is essential that these competencies are not taken for granted in DH teaching, but taught actively. Only then can the diverse backgrounds of the students be taken into account, and an exchange on equal terms be facilitated. This is particularly relevant for students of Computer Science or Data Science. Despite their great interest in the humanities, they often face significant challenges, while their expertise can greatly enrich the work of students with a humanities focus.

Enhancing Historical Learning Through Digital Tools: A Wikipedia-Based Teaching Innovation in Archaeology

Jordi Martín i Pons^1,2

¹Universitat de Barcelona; ²CEIPAC (Center for the Study of Provincial Interdependence in Classical Antiquity) University of Barcelona

Objectives and Theme

This project is part of an innovative teaching initiative in the Historical Sources course of the Archaeology degree at the University of Barcelona. In this course, students are invited to create a Wikipedia page, in open access. The proposal aims to leverage digital tools to enrich the teaching and learning of the past through free, open-access resources.

Teaching Innovation and Digital Humanities

The project’s primary objective is to promote a better understanding of the past by applying innovative digital methodologies. Students are at the center of learning and content creation, with the guidance of an instructor. The focus is not only on studying ancient cultures but also on utilizing digital technologies to enhance history teaching and transmission through accessible media. Using Wikipedia as a platform provides an effective way to present historical information, creating significant educational and social impact.

A key part of the project is analyzing and comparing the average length of articles in languages with many speakers (e.g., English, French, Russian) versus those with fewer speakers (e.g., Basque, Catalan, Finnish). The project assesses the impact in terms of visit numbers and article quality, using sources and internal quality assessments. This allows students to observe the impact of their article, its role in helping speakers of their language access information, and its connection with other Wikipedia entries tracked through metrics.

Benefits for Society and Academic Research

Through this activity, students learn not only about historical sources but also engage in creating digital knowledge for the academic community and the public. They are introduced to tools like Wikipedia and made aware of its mission (free and cooperative knowledge). Students also learn citation practices and how to use images correctly, enhancing their understanding of digital technologies and historical data analysis. This practice bridges the gap between digital tools and historical research, enabling students to create a digital narrative of the past and understand its impact.

Open Access and Minority Languages

Additionally, creating content in Catalan demonstrates support for minority languages in the digital space. The availability of historical information in languages like Catalan is crucial for ensuring their survival. This initiative addresses the need to preserve linguistic diversity in the digital age, a critical challenge today. The project has both academic value and social benefits, facilitating access to quality knowledge in a language with less global presence. This example can inspire other scholars to apply similar strategies to underrepresented languages, improving their visibility online in alignment with the European Charter for Regional or Minority Languages.

Academically, this activity promotes a methodology combining primary research, digitization, and the continuous creation of editable, accessible content. It fosters an active learning process that empowers students to contribute meaningfully to global digital knowledge creation.

This proposal stands out for linking traditional teaching with digital humanities. Through this, students not only gain expertise in analyzing historical sources but also contribute to creating and disseminating historical knowledge in the global digital space. The project has resulted in significant satisfaction, reflected in students’ demonstrated interest.

Cultural Data in Australian History: An Intimate Analytics Methodology

Rachel Fensham¹, Tyne Sumner², Nat Cutter¹

¹University of Melbourne, Australia; ²Australian National University

Cultural Data in Australian History: An Intimate Analytics Methodology

This poster begins by defining cultural data and identifies 5 Australian digital collections that curate cultural data; as such, they are knowledge structures that narrate the past, as Jurissi Parikka with his concept of media archaeology argued.[1] It examines these extensive curated databases for the performing arts, architecture and visual arts history (https://www.daao.org.au/;https://www.ausstage.edu.au/pages/browse/; https://qldarch.net/; https://www.womenaustralia.info/; https://researchdata.edu.au/circus-oz-living-archive-collection/939530), in order to find aggregated and comparative approaches to cultural analysis.

Cognisant of Australia’s settler-colonial history, the conceptual framing of the research infrastructure project, the Australian Cultural Data Engine (ACD-E, 2021-2023), developed a cross-walk architecture, that retained unique entitites while facilitating a more critical analysis, in ways that expand the logics of the originating research communities.[2] Accessing the affordances of interoperability– generated in the friction that exists between data custodians and data engineers – we identified rich deposits of historical insight about art, artists and cultural change.

With the case study of Know My Name, a major national exhibition as a discrete dataset, the project proposes an intimate analytics methodology that harnesses the database as well as a close historical reading of contexts that shape the data. Following, for instance, the question of careers in the arts, we examine event datapoints that accumulate over time, as well as consider the extent to which key markers of success impact on the trajectory of a career. Moreover, being able to aggregate diverse datapoints via mapping technologies, we identify how artistic networks and scenes were indicative of other longer-term social formations.

Recognising that the database is always incomplete, misleading, and sometimes violently empty, we exemplify at a granular level how the complexity of relations between data and place, dataset and person, database and narrative must be reconceived, particularly when indigenous knowledge traditions relating to place or naming conventions are articulated. When pursued with vigour, this approach, that we term intimate analytics, can enhance, trouble, and unsettle conventional approaches to cultural data research, and put it into dialogue with the existing structures, biases and conventions of knowing the past.

Subsequently, we argue that such born-digital cultural collections must grapple not only with their distinctive content but also the historical contexts embedded in the cultural data itself. As repatriation and decolonial curation efforts become increasingly prominent in Europe and North America, our physical location in the Global South represents an imperative and an opportunity to speak from the ‘periphery’ to the ‘centre’ of global cultural data infrastructures and digital heritage.

[1] Erkki Huhtamo and Jussi Parikka. Media. Archaeology: Approaches, Applications, Implications. Berkeley: University of California Press, 2011.

[2] Fensham, R., et al. ‘Towards a National Data Architecture for Cultural Collections: Designing the Australian Cultural Data Engine’. Digital Humanities Quarterly, vol. 18, no. 2, 2024.

Pervisum: a Tool for Digital Storytelling and Writing on the past in scholarly publications

Bulle Tuil Leonetti¹, Margaux Faure²

¹INVISU (CNRS/INHA, France); ²INHA (France)

Led by the InVisu research unit (CNRS) and the Digital Research Department of the French National Institute of Art History (INHA), the PerVisum project is funded by the National Fund for Open Science (COSO). It was developed in response to a growing demand among scholars and the GLAM sector for the ability to construct scientific narratives about the past. The challenge was to leverage the increasing availability of digital images in open access while maintaining digital sobriety, a goal that the use of IIIF technologies could facilitate.

Furthermore, images used in support of scholarly publications have not yet fully benefited from the evolution of the Web. Most of the time, we limit ourselves to simply providing access to images in regards of texts, thereby missing the opportunity for real integration into the research workflow.

With the Pervisum project and tool, relying on IIIF technologies, we can integrate functionalities such as annotation, in-depth exploration of images and persistent links to sources. For instance, when illustrating the evolution of an urban landscape, the work of a painter among his peers, or the coinage of a bygone empire, scholars can now employ a vast collection of curated images interwoven with text.

The Pervisum project explores the potential for publishing scientific demonstrations as IIIF manifests, rethinking the relationship between textual and visual content. The annotations made on IIIF images form the core of the scientific argumentation.

How does it work ?

The tool provides interfaces that enable users to construct demonstrations based on IIIF manifests published by heritage institutions. Users can select images and organise them according to their text plan, integrating them into the demonstration. The process involves annotating the images to support the argumentation. The writing of the demonstration is therefore based on the idea that the argumentation is contained in IIIF annotations, which are defined and ordered by the user according to the framework they wish to set up.

The objective of this poster is twofold: firstly, to present the tool currently under development, and secondly, to consider IIIF manifests as editorial objects that can be used both to disseminate enriched image corpora in publications and to offer a new article format.

A progress report of the Corpus Musicae Ottomanicae on the challenges of data modelling of historical Middle Eastern music manuscripts

Sven Gronemeyer^1,2

¹Max Weber Stiftung, Germany; ²La Trobe University Melbourne, Australia

The Corpus Musicae Ottomanicae (CMO) is a long-term research project focusing on nineteenth-century Ottoman music manuscripts and their critical edition. Many of these manuscripts are written in Hampartsum notation and later in Western staff notation. CMO has collected more than 10,000 musical sources and expressions and edited more than 550 pieces over the past nine years.

The musical pieces are historically and textually accurate transcriptions, reflecting the original sources as closely as possible. Similarly, the sung poetry follows the original arrangement and orthography of the sources as faithfully as possible. The musical transcriptions are fully digitized, considering the special notational requirements of Middle Eastern and Eastern Mediterranean music. The editions will be stored in a variety of formats in order to provide access to the edition in both graphical (human readable) and semantic (machine readable) form.

The latter aspect in particular is a challenging undertaking and will be the focus of this presentation. The data modelling for the editions was (and still is) in an area of conflict between the schemas of existing and established formats (namely TEI and MEI) and the editorial requirements. Here, particularly in the case of the MEI, one can see a certain Western bias in the use or definition of certain elements and schemas, based on the source materials on which they were modelled. CMO is an example of how creative use of the existing guidelines can not only meet the needs of the project, but can also stimulate discussions on how to adapt the guidelines and, in particular, open them up to perspectives from non-Western contexts.

While Western staff notation is strict and precise in terms of meters, pitches and durations, early Hampartsum notation is characterized by a minimal stock of durational signs with relative values, but differentiated pitch signs. The system later underwent a specification of durational values and a reduction in the number of pitch signs. In the 20th century, leading Turkish musicologists tried to create pitch systems that could represent Ottoman music in the most rational and efficient way. Transcription into Western staff notation is therefore always time-specific, and the original notation must be correlated with it. This was eventually achieved through the definition of custom symbols using the semiotic triangle, where the original notation could be correlated with both the Ottoman pitch names and the Western pitches. These relationships may change as a result of ongoing research, so a central reference file would be ideal. Yet, MEI does not allow for linking custom symbols in the appropriate part of the header – hopefully stipulating a change in a future guideline version.

This example shows that the maintenance of formats and standards can benefit from input from ongoing research projects. Equally, research projects can benefit from the dissemination and exchange of best practice. With this presentation, CMO aims not only to provide an overview of its efforts to make a historical music tradition available, but also to advocate for more networking to improve technical guidelines for digital scholarly editions.