DARIAH Annual Event 2025 - ConfTool Pro Printout

9:30am - 9:45am

The Digital Transformation of Maya Hieroglyphic Research

Christian Prager

University of Bonn, Germany

The classification and analysis of hieroglyphic writing systems present methodological challenges within Digital Humanities. Focusing on the Classic Maya script, this paper examines the limitations of traditional iconographic approaches in addressing the graphic and semantic complexity of Maya hieroglyphs. Established classification systems, such as J. Eric S. Thompson’s catalog, remain fundamental references but exhibit methodological constraints due to overlaps between iconographic and semantic criteria and the static nature of printed catalogs, which hinder updates and integration of new discoveries.

To address these challenges, the "Text Database and Dictionary of Classic Mayan" project refines and extends Thompson’s classification system through digital methodologies. A key innovation is the systematic digital documentation and encoding of palaeographic variants, particularly anthropomorphic and zoomorphic glyphs, historically underrepresented in classification efforts. By employing a numeric coding system independent of iconographic descriptions, this initiative provides a flexible framework, mitigating limitations of static classifications and enabling more precise analyses of Maya hieroglyphs' formal, semantic, and functional dimensions.

Another central aspect is the implementation of controlled vocabularies for consistent iconographic descriptions within digital research environments. This system supports structured analyses based on both external morphological traits and internal semantic properties. Additionally, the digital catalog framework facilitates the integration of newly identified glyphs, while digital concordances enable transparent comparisons with earlier classification systems. Researchers can systematically evaluate historical cataloging efforts in relation to contemporary findings, refining methodological perspectives.

A crucial feature of the project is its integration of TEI (Text Encoding Initiative) and XML standards for encoding textual data. The platform serves as an interface between RDF-based data structures and TEI-encoded textual content, enabling seamless retrieval and visualization of hieroglyphic information. Texts stored in TEI format are dynamically incorporated into the research portal, ensuring structured textual data can be efficiently linked with broader semantic web technologies. This interoperability facilitates data exchange between digital resources, creating interconnected research workflows.

Through case studies of recent discoveries, this presentation demonstrates how digital methodologies address classification constraints while maintaining continuity with established frameworks. Examples illustrate the benefits of a digital approach in classifying and analyzing variant glyphs, offering deeper insights into their linguistic, cultural, and functional contexts. Additionally, the study highlights digital interoperability’s role in fostering collaborative research and enhancing accessibility to Maya hieroglyphic studies.

Beyond Maya epigraphy, these methodological advances provide a model for the digital analysis of other complex historical writing systems facing similar challenges. By situating the discussion within broader Digital Humanities debates, this paper encourages interdisciplinary collaboration and underscores the necessity of robust digital infrastructures for ongoing research. Ultimately, this study demonstrates how digital innovation enhances the analysis and dissemination of historical scripts, fostering new perspectives in epigraphy and cultural heritage studies.

9:45am - 10:00am

Valorizing Past Art Historical Research with LLMs for European Cultural Heritage: A Case Study of the Corpus Rubenianum

Arnoud Wils

Maastricht University, Netherlands, The

The vast body of past art historical research, including encyclopaedias, monographs and art catalogues, represents a wealth of meticulous bibliographic and scholarly research. However, a significant portion of this valuable material exists only as scanned images or unstructured OCRed PDF documents, posing a challenge to contemporary researchers seeking to fully exploit these findings. This limitation hinders the seamless integration of established knowledge with newer discoveries and state-of-the-art digital analysis methods.

This paper explores an approach to bridge this gap by demonstrating the potential of Large Language Models (LLMs) to extract structured information from these historical resources.

To illustrate this potential, our research focuses on the Corpus Rubenianum, an impressive catalogue of over 40 books meticulously documenting the work of Peter Paul Rubens. This monumental work, based on the lifelong research of Ludwig Burchard, is universally recognised as the definitive resource on Rubens. Each volume is written by a leading scholar and aims to embody all that is currently known about the artist's oeuvre, with over 2,500 compositions and 10,000 works of art listed, based on Burchard's extensive documentation.

Through the use of LLMs, we aim to extract key structured data from a selection of the volumes of the corpus available in PDF only, including but not limited to

structured bibliographic references
a structured list of provenances for each artwork
a structured list of Greek mythological figures depicted in each painting (iconography).

The second part of this paper will demonstrate how this extracted structured data could be effectively valorised through various digital methods, including but not limited to:

enriching current bibliographic information with structured extracted bibliographic references from PDF sources;
creating an index of characters depicted on the paintings with pointers to the respective artworks;
visualising structured provenance information for specific artworks in an interactive timeline or network graph of individuals or institutions that owned an artwork during a particular period;
illustrating the potential of feeding the structured iconographic descriptions into a vision transformer model to develop an augmented reality layer that displays iconographic information alongside the artwork, allowing a wider audience to 'read' the artwork.

A prototype demonstrating these LLM extraction capabilities was recently presented and awarded second place at the ai4culture hackathon (in partnership with europeana.eu - February 2025).

By demonstrating the extraction of structured information from a significant historical resource such as the Corpus Rubenianum, and its subsequent digital manipulation and visualisation, this paper will highlight the transformative potential of these technologies to unlock data from valuable past research that is only available in unstructured PDF-like formats. This approach can facilitate more comprehensive research and enhance public engagement through digital interfaces.

Conference Agenda