2:00pm - 2:15pmAI4LAM: A Collaborative Network for Reliable and Trustworthy Use of AI in Libraries, Archives, and Museums' Historical Collections
Ines Vodopivec
AI4LAM, National Library of Norway, Stanford University
AI4LAM's Activities: The AI for Libraries, Archives, and Museums (AI4LAM) community is an international, participatory network of more than 1.300 members dedicated to advancing the use of artificial intelligence within the cultural heritage sector. The community is at the forefront of developing and maintaining cutting-edge AI tools and services tailored for heritage institutions to better provide access, management and (re)use of digitised and digitally born content by supporting collaboration, innovation, and sharing of knowledge in the field of AI for institutions worldwide.
Its mission is to foster a framework for organizing, sharing and elevating the knowledge about and use of AI as well as to advocate for reliable and trustworthy AI tools and services. The community's efforts are underpinned by the principles of FAIR data, which are strongly implemented also in heritage sector by licensing frameworks, enabling fully open, interoperable and standardised metadata and rights statements which make the reuse possibilities for each item in digital collections clear.
DARIAH Collaboration: The extensive DARIAH network can play a crucial role in the development of a common dialogue environment for implementing AI technical and theoretical global developments, to be shared and adopted among LAM institutions across Europe and beyond. The AI4LAM community has already made significant strides in integrating AI technologies. This includes a range of tools and resources, such as machine learning models for metadata extraction, image recognition systems for digitized collections, and natural language processing applications for cataloguing and archival processes of historical collections. But collaboration between AI4LAM and the DARIAH communities could lead to the development of innovative solutions that address the unique challenges faced by the heritage institutions, especially when addressing issues of providing data for research of historical collections.
Use Cases of the Past: To further strengthen collaboration, this presentation will reveal the AI4LAM community's future strategic steps and showcase use cases of AI applied to historical materials, demonstrating the potential of AI in LAM institutions. By sharing experiences and insights, we can help shape the future of cultural heritage sector, research and educational stakeholders, and ensure that AI tools and resources meet the diverse needs of the upcoming decade. AI-driven solutions can enhance data management, improve access to digital collections, and streamline administrative processes. Additionally, the collaboration can foster interdisciplinary research and innovation, leading to new discoveries and advancements in various fields. The integration of AI tools into LAM digital infrastructures provides researchers with powerful tools to analyse and interpret vast datasets. This not only enhances the quality and efficiency of research but also fosters a culture of openness and collaboration across disciplines.
In conclusion, the partnership between AI4LAM and the DARIAH community represents a unique opportunity to advance the use of AI in LAM. The slogan of AI4LAM, “Individually, we are slow and isolated; collectively, we can go faster and farther,” encapsulates the community's goal: to work together and build a more innovative, secure, and collaborative future.
2:15pm - 2:30pmArchiving for the Future Past - Multimodality and AI - Challenges and Opportunities
Moa Johansson1, Vyacheslav Tykhonov2, Sophia Alexandersson1, Kim Ferguson2, James Hanlon3, Hella Hollander2, Jetze Touber2, Andrea Scharnhorst2, Nigel Osborne1
1ShareMusic & Performing Arts, 563 32 Gränna, Sweden; 2Data Archiving and Networked Services, Royal Netherlands Academy of Arts and Science, Netherlands, The; 3X-System Ltd, Hampshire, PO157FX England, UK
This paper discusses how to enhance existing digital archival solutions with new AI-based approaches. We take as an example the creation of multimodal representations [1] of performing arts around a newly emerging repository hosted by ShareMusic, a Swedish Knowledge Centre for Artistic Development and Inclusion. Traces of performing arts make a prime example for embracing new technological challenges when it comes to archiving in the present for the coming past. [2] The traces usually represent complex digital objects, often combining text, image, video, 3D object representations and so on. [3] To encapture their multimodality features as well as building multimodality (use of various senses) into retrieving them adds another layer of complexity to the digital preservation. In this paper we present the different phases when it comes to the design of a repository fit for documentation around an inclusive performing arts with an interface providing inclusive access. [4] Technologically, open source developments like the Dataverse project [5] and tools to foster local implementation of mature archival solutions [6] form the solid fundament. Leading for the design process are knowledge organisation workflows which involve human experts [7] to create a knowledge base for arts and inclusion. [8] At the core of this paper we demonstrate how innovative local AI solutions (Ghostwriter [9]) can be used to enhance the annotation of datasets next to enhancing their accessibility via various web interface frames (see Figure in pdf). In particular we zoom into the role of Monomodal Transformative AI (MTA) and Multimodal Cognitive AI (MCAI). The first (MTA) refers to a set of technologies that convert a single-source input into multiple accessible formats. For example, text can be transformed into audio or haptic representations, enabling broader accessibility for individuals with different needs.The second (MCAI) is a class of AI systems trained on multiple modalities to generate context-aware outputs by leveraging multimodal knowledge. These approaches are still in an early stage. We reflect how they can be developed further alongside the expansion of multimodal data stores, which provide the necessary corpus for effective training. On a metalevel, this paper discusses how such innovative explorations, done in the context of EC and national funded projects (SSHOC.EU, MuseIT, SSHOC.nl) can be transported to mature repository services. Content-wise the emerging ShareMusic repository and the established Data Stations at DANS-KNAW share the fact that their collection material by nature is heterogeneous. It encompasses a spectrum from scientific documentation about humanities and arts scholarship as well as source material (of multimodal nature). A shared feature is also that ‘data sets’ are often produced by smaller communities either in academia and/or in society, sometimes also produced by vulnerable groups, and that the resulting traces can easily become ‘‘endangered”. Adhering to the expertise function of DARIAH we exchange experiences on how to repurpose existing technological solutions and to enable division of labour via API service networks. This way costly tailored niche applications can be avoided, and the sustainability of research infrastructures for the humanities can be enhanced.
2:30pm - 2:45pmBest practices in pre- and post-ATR for historical research
Monika Renate Barget1, Koen Hufkens2
1Maastricht University, Institute for European History Mainz; 2BlueGreen Labs
Based on discussions and hands-on experiments conducted with the participants of a “Bring Your Own Data Labs” workshop hosted in Mainz in February 2025, we would like to share best practices for pre- and post-ATR (automatic text recognition) in historical research that we identified. (Barget, 2025) As ATR technologies (both for print and handwritten text) are quickly evolving due to new opportunities in Machine Learning and Artificial Intelligence, new challenges arise especially for small teams or individual researchers who may lack funding, infrastructures, expertise, or IT support to make optimal use of up-to-date tools. Moreover, existing workflows for layout and text recognition do not guarantee immediate success with historical documents in special formats or badly-preserved sources. In our workshop in Mainz, we addressed challenges from choosing the most suitable tool for one’s own project to deciding what image manipulation pre-OCR and automated text cleaning post-OCR could do for researchers, depending on their research goals and work environments. (Garzón Rodríguez, 2024) The sample sources that participants brought to the workshop ranged from historical scientific records in table formats to letters, and sample documents also came in different languages. This gave us the opportunity to consider specific solutions for each case study as well as collect general recommendations. One topic that we discussed in detail was using computer vision packages for the better identification of text areas in form-like documents. Koen Hufkens (2022) shared a model workflow based on his recent work with colonial climate records from the Belgian Congo. Another important topic was the use of AI (chatbots) for image manipulation and post-OCR text correction. Here, we compared LLM tests run by Florentina Armaselu (2024) on a small selection of French-language texts with German-language tests in the DigiKAR geohumanities project. (Barget, 2023) Balancing time investment, environmental concerns and questions of research reproducibility, we found that LLMs can be successfully used to build better regular expressions or to create controlled vocabularies based on small text samples, while a direct AI-based correction of large amounts of text seemed neither sustainable nor reliable. To the surprise of some participants, we also questioned if (high-quality) OCR was always strictly necessary to answer their research questions, suggesting tools for qualitative research or image annotation software as possible alternatives. In our paper, we would like to systematically share our findings with the larger research community to invite further discussion. We believe that the topic is of considerable interest to many members of the DARIAH community as it has also been covered in the recently published DARIAH Campus training module “Automatic Text Recognition (ATR)”. (Chiffoleau and Ondraszek, 2025).
2:45pm - 3:00pmContent Analysis of Historical Datasets with Large Multi-modal Models
Tianyu Yang1, Abdallah Mohamed Abdallah Abdelnaby2, Daniel Kurzawe1
1Niedersächsische Staats- und Universitätsbibliothek Göttingen; 2Universität Göttingen, Germany
The digitizing of historical documents is a critical step in unlocking their potential for digital research. The core process of digitization involves converting scanned images of documents into a textual format that is easier to index and retrieve. Typically, a scanned document page contains not only textual content but also graphics and illustrations. Therefore, in addition to the challenges of extracting textual content from historical writings through Optical Character Recognition (OCR), the annotation of graphics and illustrations is essential. This allows these elements and their semantic content to be discoverable and analyzable through registries.
The annotation of illustrations in historical documents is a labor-intensive task traditionally handled through manual cataloging, where experts describe visual elements and assign metadata manually. More recently, specialized machine learning models have been developed to identify and classify certain types of images, such as printed illustrations or maps. However, these methods often requiring extensive training data and domain-specific fine-tuning.
To facilitate the transcription of the textual content in scanned documents, many OCR tools have been developed in recent years, e.g. PaddleOCR, EasyOCR, and so on. However, most of these tools are designed for modern documents. Historical records present greater challenges due to factors such as cursive handwriting styles, degraded text quality (e.g., faded ink or damaged paper), language changes, and complex document layouts. As a result, transcribing historical documents with general OCR tools often fails to produce accurate or fluent results.
Recently, the remarkable success of Large Language Models (LLMs), such as GPT-4, LLaMA and Vicuna has paved the way for the development of Large Multi-modal Models (LMMs), which combine pretrained visual models with LLMs to enable their visual capabilities. Trained on large scale of image-caption pair datasets, LMMs demonstrate excellent zero-shot OCR and image caption performance in the wild, which provides a valuable enhancement to the digitization workflows by automating the generation of image descriptions, complementing existing OCR-based text extraction. Their integration allows for a scalable and more efficient digitization process, bridging the gap between manual expertise and fully automated image analysis.
In this abstract, we first demonstrate the OCR and image captioning capabilities of state-of-the-art LMMs on historical document datasets. Additionally, we provide an overview of general digitization workflows and propose a feasible approach to integrate LMMs, aiming to enhance both the efficiency and discoverability of visual content in historical documents.
|