Conference Agenda

Session

Topic: Let's Talk Infrastructure

Time:

Thursday, 19/June/2025:

2:00pm - 3:30pm

Session Chair: Tomasz Parkoła, Poznan Supercomputing and Networking Center

Location: Hannah-Vogt Saal (Alte Mensa venue)

Ground floor, Wilhelmsplatz 3, 37073 Göttingen, Germany

Presentations

2:00pm - 2:15pm

How not to reinvent the wheel – workflows as a leverage from the past to the future

Anne Baillot¹, Megan Black¹, Massimiliano Carloni², Vera Maria Charvát², Matej Ďurčo², Michael Kurzmeier¹

¹Digital Research Infrastructure for the Arts and Humanities (DARIAH); ²Austrian Centre for Digital Humanities and Cultural Heritage (ACDH-CH), Austrian Academy of Sciences (OEAW)

In recent years, a significant range of digital resources and methodologies have been developed in the Arts & Humanities. By reusing these resources, researchers can minimize redundancy and foster greater collaboration. However, challenges arise when methods are difficult to adapt or reflect biased perspectives. A key solution to these challenges lies in the thorough documentation of research choices, which ensures reproducibility and allows future generations to build on prior work. This concept is especially vital in workflows, which serve as a critical means of recording and reproducing the 'past of research'. The workflow descriptions featured in the SSH Open Marketplace (SSHOMP) play a central role in capturing this essential documentation, forming the primary focus of this paper.

We intend to examine the impact of various workflow paradigms on research reproducibility. Workflow typologies extend across a spectrum, ranging from text-based descriptive methodologies as employed in the SSHOMP to fully executable code-based frameworks, with intermediary hybrid forms – such as those employed in the Journal of Digital History – integrating features of both approaches. Although workflows serve as invaluable instruments for structuring research methodologies, their efficacy in ensuring research documentation and methodological reproducibility is contingent upon both their type and the contextual environment in which they operate.

Descriptive, text-based workflows as described by Barbot et al. afford greater structural and conceptual flexibility, functioning as high-level expositions rather than direct computational scripts. They facilitate the articulation of abstract methodological frameworks, offering enhanced accessibility to both authors and readers, while they minimize the technical overhead associated with their maintenance, thus also ensuring their long-term viability. However, more rigorous editorial oversight to maintain their scholarly integrity and usability is needed.

Conversely, code-based workflows provide a structured and automated approach to research reproducibility by leveraging computational scripts to execute analytical procedures. They enable precision, scalability, and automation, mitigating the risks of human error inherent in manual documentation. Moreover, they can facilitate seamless integration with version control systems, enhancing transparency and collaboration across research teams. However, code-based workflows require domain-specific technical expertise and are susceptible to software dependencies, which may impede their long-term accessibility and interoperability (Clavert et al.).

As workflows increase in executability, they become more enmeshed with specific software ecosystems, heightening their risk of deprecation as platforms evolve or become obsolete. This introduces a paradox: while greater executability enhances methodological rigor and repeatability, it may prevent the reconstruction of past research due to shifting technological landscapes.

This paper will systematically categorize different workflow types, with a particular emphasis on those utilized within DARIAH, created in the context of the ATRIUM project, to assess their suitability for documenting research processes. By analyzing how workflows function within scholarly infrastructures, it will offer insights into best practices for ensuring their longevity and accessibility. Furthermore, the study will provide recommendations for designing workflows that balance methodological rigor with sustainability, thereby making them more effective as tools for preserving and interpreting the past of research.

2:15pm - 2:30pm

Exploring the past with the AVOBMAT (Analysis and Visualization of Bibliographic Metadata and Text) multilingual research tool

Róbert Péter, Zsolt Szántó, Zoltán Biacsi, Gábor Berend, Vilmos Bilicki

University of Szeged, Hungary

he objective of this paper is to introduce the AVOBMAT (Analysis and Visualization of Bibliographic Metadata and Texts) multilingual research tool, which enables researchers to critically analyse bibliographic data and texts at scale with the help of data-driven methods supported by Natural Language Processing techniques. This exploratory tool offers a range of dynamic text and data mining tasks and provides interactive parameter tuning and control from the preprocessing to the analytical stages. The analysis and visualization tools both facilitate close and distant reading of texts and bibliographic data.

Compared to other similar tools, the unique features of the AVOBMAT toolkit are: (i) the use of transformer language models on a scalable, cloud-based infrastructure that allows researchers to preprocess and analyse texts and metadata at scale; (ii) it combines metadata and textual analysis, enabling users to ask complex research questions, in one integrated, interactive and user-friendly web application; (iii) it analyses and enriches texts and metadata in 16 languages; (iv) at the preprocessing phase, texts can be cleaned and each analytical tool can be individually configured using a total of 19 parameters; (v) users can test, validate, and save the configuration settings; (vi) private databases can be made public.

The user can search and filter the metadata and texts in faceted, advanced and command-line modes and perform all the subsequent analyses on the filtered dataset. AVOBMAT offers the following analytical functions: (i) metadata analysis (line, area, bar, pie network analyses); (ii) lexical diversity analyses; (iii) n-gram viewer; (iv) topic modelling; (v) frequency analyses (keyword context, significant text analysis; (vi) named entity recognition, disambiguation and linking (Wikidata, ISNI, VIAF), (vii) part-of-speech tagging; (viii) keyword-in-context.

The reproducibility and transparency of the experiments and results using the tool are enhanced by the ability to import and export the parameter settings as templates or JSON files. Users can create templates for the preprocessing and analytical functions on the graphical interface. The tabular statistical data and visualizations of the performed analyses can be exported in PNG or various CSV formats.

AVOBMAT helps users explore large historical and literary collections, uncover novel insights into historical events and trends, and unveil overlooked connections, themes and patterns. As sample databases, we have preprocessed the DraCor dramas (3793 in number) and ELTeC novels (1505) in 13 languages. We enriched the metadata of these collections and corrected inaccuracies. For example, as for the DraCor collection, we added, among others, the list of characters with their gender annotations, the authors’ gender and their age at the time of writing.

The beta version of AVOBMAT will be available via the infrastructure of the Gesellschaft für wissenschaftliche Datenverarbeitung mbH Göttingen (GWDG).

2:30pm - 2:47pm

Towards interdisciplinary approaches to digital cultural heritage: GLAM Labs and data spaces

Alba Irollo¹, Gustavo Candela², Mahendra Mahey³, Katrine Hofmann⁴, Olga Holownia⁵, Nele Gabriëls⁶, Steven Claeyssens⁷

¹Europeana Foundation, Netherlands; ²University of Alicante, Spain; ³Tallinn University, Estonia; ⁴Royal Danish Library, Denmark; ⁵International Internet Preservation Consortium, USA; ⁶KU Leuven Libraries, Belgium; ⁷National Library of the Netherlands, Netherlands

At the intersection between the cultural heritage and research sectors, the increasing availability of digital cultural heritage data has supported new ways of producing knowledge, research, publishing results, and teaching in academic settings. The efforts put into the digitisation of their collections and capturing born digital archives, have led GLAMs (Galleries, Libraries, Archives and Museums) and other organisations such as universities to focus on many factors. These include the wider accessibility of the data and its reuse-related potential refining approaches to digital curation so that digital cultural heritage can support computational analysis, especially through Digital Humanities approaches. Emergent initiatives, such as Collections as Data, FAIR and the CARE principles for Indigenous data governance, have emphasized the need for best practice when making data available based on sets of principles (Padilla, 2023; Carroll, 2020). Moving from these principles, the International GLAM Labs Community has grown as a collaborative and interdisciplinary initiative aimed at promoting the publication, computational access and responsible use of data (Mahey, 2019; Candela, 2023; Candela, 2023b). It includes more than 90 institutions covering a wide diversity of domains from GLAMs and wider. In this context, collaborations with researchers and university support staff from different fields have become crucial factors in supporting the study of the past with a careful eye on the needs of the present and future.

- The European data space for cultural heritage fosters collaborations between the cultural heritage sector and academia, building on the experience of the Europeana Initiative and its long-standing partnerships with research infrastructures like DARIAH. These partnerships have led to joint efforts, especially in digital humanities. A new trend deserves attention in this context, and it is the increasing number of university courses focusing on digital cultural heritage across Europe. Acting as an observatory, the data space aims to foster exchanges around this topic, enhancing cooperation between the academia and cultural heritage institutions in the educational paths of the next generation of (digital) curators.

- GLAM institutions are exploring new ways to make their content available suitable for computational access. This section will introduce examples of publication and reuse of digital collections published by relevant institutions such as the Royal Danish Library and KU Leuven. It will include a description of challenges and opportunities illustrating their approach.

-The publication of digital collections suitable for computational use can be a difficult task for the institutions. Best practices and guidelines to publish digital collections suitable for computational use promote its adoption (Candela, 2023a). This section will introduce a selection of relevant and innovative initiatives that can be useful in this context. It will also provide examples of use based on a wide diversity of content such as GLAM datasets and Web Archives.

- Will highlight examples of use and lessons learned from his research with users of digital cultural heritage at the British Library and those providing services for the reuse of digital cultural heritage as data around the world in the GLAM Labs community.

2:47pm - 3:02pm

Preserving the Past, Shaping the Future: your Gateway to a Collaborative and Innovative European Cultural Heritage Community

Charlotte Gallini^1,2, Rémi Petitcol^1,2

¹Fondation des Sciences du Patrimoine; ²ECHOES-ECCCH

*Note from AE 2025: This presentation was not subject to peer review, but rather represents a graciously offered solution to the final-minute cancellation of another paper due to events outside of the control of another author. The DARIAH Annual Event Programme Committee is thankful to the authors of this submission that they were able to present the project at the Annual Event and help round out the Session "Let's Talk Infrastructure".

ECHOES’ mission is to set up the European Collaborative Cloud for Cultural Heritage (ECCCH), a shared platform designed to facilitate collaboration among heritage professionals and researchers, enabling them to modernise their workflows and processes. This platform will offer access to data, cutting-edge scientific and training resources, and advanced digital tools, all developed collaboratively by the heritage community to meet their specific requirements. ECHOES will integrate the currently fragmented communities within the Cultural Heritage (CH) sector, bringing together diverse actors from various fields and disciplines into a cohesive community focused on the Digital Commons.

What sets ECHOES apart is its commitment to a visionary paradigm shift in the CH field. The whole ECHOES consortium, representing all main actors of the European CH community, wants the CH sector to move away from an object-focused approach – where the digitisation of heritage objects is the ultimate goal -, to a holistic approach to digital transformation of both tangible and intangible assets. ECHOES will create a digital environment which will empower users to interact with, manipulate and enrich Digital Twins, leading to new, jointly developed scientific knowledge. This digital environment will allow the creation of a new generation of heritage objects, the Digital Commons, which are semantically rich and collectively produced – we see this as “the heritage of tomorrow”. At the end of the project, ECHOES will deliver a single platform to integrate results of EU and national projects on CH. The ECCCH will be sustainable thanks to its inclusive legal entity, which will be created before the end of ECHOES.

A key differentiator of ECHOES is also that it ensures synergies with other European initiatives in order to avoid duplication of efforts: e.g., the common Data Space for Cultural Heritage; EOSC; JPI-CH; CHARTER Alliance, ARCHE etc.
Overall ECHOES’ brings together for the first time all the actors of the CH and Social and Human Sciences sectors as well as renowned academic and private organisations in the field of computer science and data science. The case studies built around each of the Vertical Applications developed in the project is emblematic of such interdisciplinary and collaborative work. This collaboration is moreover ensured at multiple levels thanks to the Cloud’s national nodes.

ECHOES’ main strength is its consortium, which is extremely well placed to represent the wider community, as it includes 15 umbrella organizations representing more than 1000 organisations in Europe, and a variety of disciplines. Ultimately, the project will establish a pan-European network of key stakeholders from CH institutions, including a robust scientific and professional network and will be open to the cooperative efforts of a wide community of users.