Session | ||
Topic: LLMs in Action
| ||
Presentations | ||
2:00pm - 2:15pm
AI4LAM: A Collaborative Network for Reliable and Trustworthy Use of AI in Libraries, Archives, and Museums' Historical Collections AI4LAM, National Library of Norway, Stanford University AI4LAM's Activities: The AI for Libraries, Archives, and Museums (AI4LAM) community is an international, participatory network of more than 1.300 members dedicated to advancing the use of artificial intelligence within the cultural heritage sector. The community is at the forefront of developing and maintaining cutting-edge AI tools and services tailored for heritage institutions to better provide access, management and (re)use of digitised and digitally born content by supporting collaboration, innovation, and sharing of knowledge in the field of AI for institutions worldwide. Its mission is to foster a framework for organizing, sharing and elevating the knowledge about and use of AI as well as to advocate for reliable and trustworthy AI tools and services. The community's efforts are underpinned by the principles of FAIR data, which are strongly implemented also in heritage sector by licensing frameworks, enabling fully open, interoperable and standardised metadata and rights statements which make the reuse possibilities for each item in digital collections clear. DARIAH Collaboration: The extensive DARIAH network can play a crucial role in the development of a common dialogue environment for implementing AI technical and theoretical global developments, to be shared and adopted among LAM institutions across Europe and beyond. The AI4LAM community has already made significant strides in integrating AI technologies. This includes a range of tools and resources, such as machine learning models for metadata extraction, image recognition systems for digitized collections, and natural language processing applications for cataloguing and archival processes of historical collections. But collaboration between AI4LAM and the DARIAH communities could lead to the development of innovative solutions that address the unique challenges faced by the heritage institutions, especially when addressing issues of providing data for research of historical collections. Use Cases of the Past: To further strengthen collaboration, this presentation will reveal the AI4LAM community's future strategic steps and showcase use cases of AI applied to historical materials, demonstrating the potential of AI in LAM institutions. By sharing experiences and insights, we can help shape the future of cultural heritage sector, research and educational stakeholders, and ensure that AI tools and resources meet the diverse needs of the upcoming decade. AI-driven solutions can enhance data management, improve access to digital collections, and streamline administrative processes. Additionally, the collaboration can foster interdisciplinary research and innovation, leading to new discoveries and advancements in various fields. The integration of AI tools into LAM digital infrastructures provides researchers with powerful tools to analyse and interpret vast datasets. This not only enhances the quality and efficiency of research but also fosters a culture of openness and collaboration across disciplines. In conclusion, the partnership between AI4LAM and the DARIAH community represents a unique opportunity to advance the use of AI in LAM. The slogan of AI4LAM, “Individually, we are slow and isolated; collectively, we can go faster and farther,” encapsulates the community's goal: to work together and build a more innovative, secure, and collaborative future. 2:15pm - 2:30pm
Archiving for the Future Past - Multimodality and AI - Challenges and Opportunities 1ShareMusic & Performing Arts, 563 32 Gränna, Sweden; 2Data Archiving and Networked Services, Royal Netherlands Academy of Arts and Science, Netherlands, The; 3X-System Ltd, Hampshire, PO157FX England, UK This paper discusses how to enhance existing digital archival solutions with new AI-based approaches. We take as an example the creation of multimodal representations [1] of performing arts around a newly emerging repository hosted by ShareMusic, a Swedish Knowledge Centre for Artistic Development and Inclusion. 2:30pm - 2:45pm
Best practices in pre- and post-ATR for historical research 1Maastricht University, Institute for European History Mainz; 2BlueGreen Labs Based on discussions and hands-on experiments conducted with the participants of a “Bring Your Own Data Labs” workshop hosted in Mainz in February 2025, we would like to share best practices for pre- and post-ATR (automatic text recognition) in historical research that we identified. (Barget, 2025) As ATR technologies (both for print and handwritten text) are quickly evolving due to new opportunities in Machine Learning and Artificial Intelligence, new challenges arise especially for small teams or individual researchers who may lack funding, infrastructures, expertise, or IT support to make optimal use of up-to-date tools. Moreover, existing workflows for layout and text recognition do not guarantee immediate success with historical documents in special formats or badly-preserved sources. In our workshop in Mainz, we addressed challenges from choosing the most suitable tool for one’s own project to deciding what image manipulation pre-OCR and automated text cleaning post-OCR could do for researchers, depending on their research goals and work environments. (Garzón Rodríguez, 2024) The sample sources that participants brought to the workshop ranged from historical scientific records in table formats to letters, and sample documents also came in different languages. This gave us the opportunity to consider specific solutions for each case study as well as collect general recommendations. One topic that we discussed in detail was using computer vision packages for the better identification of text areas in form-like documents. Koen Hufkens (2022) shared a model workflow based on his recent work with colonial climate records from the Belgian Congo. Another important topic was the use of AI (chatbots) for image manipulation and post-OCR text correction. Here, we compared LLM tests run by Florentina Armaselu (2024) on a small selection of French-language texts with German-language tests in the DigiKAR geohumanities project. (Barget, 2023) Balancing time investment, environmental concerns and questions of research reproducibility, we found that LLMs can be successfully used to build better regular expressions or to create controlled vocabularies based on small text samples, while a direct AI-based correction of large amounts of text seemed neither sustainable nor reliable. To the surprise of some participants, we also questioned if (high-quality) OCR was always strictly necessary to answer their research questions, suggesting tools for qualitative research or image annotation software as possible alternatives. In our paper, we would like to systematically share our findings with the larger research community to invite further discussion. We believe that the topic is of considerable interest to many members of the DARIAH community as it has also been covered in the recently published DARIAH Campus training module “Automatic Text Recognition (ATR)”. (Chiffoleau and Ondraszek, 2025). 2:45pm - 3:00pm
Content Analysis of Historical Datasets with Large Multi-modal Models 1Niedersächsische Staats- und Universitätsbibliothek Göttingen; 2Universität Göttingen, Germany The digitizing of historical documents is a critical step in unlocking their potential for digital research. The core process of digitization involves converting scanned images of documents into a textual format that is easier to index and retrieve. Typically, a scanned document page contains not only textual content but also graphics and illustrations. Therefore, in addition to the challenges of extracting textual content from historical writings through Optical Character Recognition (OCR), the annotation of graphics and illustrations is essential. This allows these elements and their semantic content to be discoverable and analyzable through registries. The annotation of illustrations in historical documents is a labor-intensive task traditionally handled through manual cataloging, where experts describe visual elements and assign metadata manually. More recently, specialized machine learning models have been developed to identify and classify certain types of images, such as printed illustrations or maps. However, these methods often requiring extensive training data and domain-specific fine-tuning. To facilitate the transcription of the textual content in scanned documents, many OCR tools have been developed in recent years, e.g. PaddleOCR, EasyOCR, and so on. However, most of these tools are designed for modern documents. Historical records present greater challenges due to factors such as cursive handwriting styles, degraded text quality (e.g., faded ink or damaged paper), language changes, and complex document layouts. As a result, transcribing historical documents with general OCR tools often fails to produce accurate or fluent results. Recently, the remarkable success of Large Language Models (LLMs), such as GPT-4, LLaMA and Vicuna has paved the way for the development of Large Multi-modal Models (LMMs), which combine pretrained visual models with LLMs to enable their visual capabilities. Trained on large scale of image-caption pair datasets, LMMs demonstrate excellent zero-shot OCR and image caption performance in the wild, which provides a valuable enhancement to the digitization workflows by automating the generation of image descriptions, complementing existing OCR-based text extraction. Their integration allows for a scalable and more efficient digitization process, bridging the gap between manual expertise and fully automated image analysis. In this abstract, we first demonstrate the OCR and image captioning capabilities of state-of-the-art LMMs on historical document datasets. Additionally, we provide an overview of general digitization workflows and propose a feasible approach to integrate LMMs, aiming to enhance both the efficiency and discoverability of visual content in historical documents. |