Session | ||
Topic: Let's Talk Infrastructure
| ||
Presentations | ||
2:00pm - 2:15pm
How not to reinvent the wheel – workflows as a leverage from the past to the future 1Digital Research Infrastructure for the Arts and Humanities (DARIAH); 2Austrian Centre for Digital Humanities and Cultural Heritage (ACDH-CH), Austrian Academy of Sciences (OEAW) In recent years, a significant range of digital resources and methodologies have been developed in the Arts & Humanities. By reusing these resources, researchers can minimize redundancy and foster greater collaboration. However, challenges arise when methods are difficult to adapt or reflect biased perspectives. A key solution to these challenges lies in the thorough documentation of research choices, which ensures reproducibility and allows future generations to build on prior work. This concept is especially vital in workflows, which serve as a critical means of recording and reproducing the 'past of research'. The workflow descriptions featured in the SSH Open Marketplace (SSHOMP) play a central role in capturing this essential documentation, forming the primary focus of this paper. We intend to examine the impact of various workflow paradigms on research reproducibility. Workflow typologies extend across a spectrum, ranging from text-based descriptive methodologies as employed in the SSHOMP to fully executable code-based frameworks, with intermediary hybrid forms – such as those employed in the Journal of Digital History – integrating features of both approaches. Although workflows serve as invaluable instruments for structuring research methodologies, their efficacy in ensuring research documentation and methodological reproducibility is contingent upon both their type and the contextual environment in which they operate. Descriptive, text-based workflows as described by Barbot et al. afford greater structural and conceptual flexibility, functioning as high-level expositions rather than direct computational scripts. They facilitate the articulation of abstract methodological frameworks, offering enhanced accessibility to both authors and readers, while they minimize the technical overhead associated with their maintenance, thus also ensuring their long-term viability. However, more rigorous editorial oversight to maintain their scholarly integrity and usability is needed. Conversely, code-based workflows provide a structured and automated approach to research reproducibility by leveraging computational scripts to execute analytical procedures. They enable precision, scalability, and automation, mitigating the risks of human error inherent in manual documentation. Moreover, they can facilitate seamless integration with version control systems, enhancing transparency and collaboration across research teams. However, code-based workflows require domain-specific technical expertise and are susceptible to software dependencies, which may impede their long-term accessibility and interoperability (Clavert et al.). As workflows increase in executability, they become more enmeshed with specific software ecosystems, heightening their risk of deprecation as platforms evolve or become obsolete. This introduces a paradox: while greater executability enhances methodological rigor and repeatability, it may prevent the reconstruction of past research due to shifting technological landscapes. This paper will systematically categorize different workflow types, with a particular emphasis on those utilized within DARIAH, created in the context of the ATRIUM project, to assess their suitability for documenting research processes. By analyzing how workflows function within scholarly infrastructures, it will offer insights into best practices for ensuring their longevity and accessibility. Furthermore, the study will provide recommendations for designing workflows that balance methodological rigor with sustainability, thereby making them more effective as tools for preserving and interpreting the past of research. 2:15pm - 2:30pm
Exploring the past with the AVOBMAT (Analysis and Visualization of Bibliographic Metadata and Text) multilingual research tool University of Szeged, Hungary he objective of this paper is to introduce the AVOBMAT (Analysis and Visualization of Bibliographic Metadata and Texts) multilingual research tool, which enables researchers to critically analyse bibliographic data and texts at scale with the help of data-driven methods supported by Natural Language Processing techniques. This exploratory tool offers a range of dynamic text and data mining tasks and provides interactive parameter tuning and control from the preprocessing to the analytical stages. The analysis and visualization tools both facilitate close and distant reading of texts and bibliographic data. Compared to other similar tools, the unique features of the AVOBMAT toolkit are: (i) the use of transformer language models on a scalable, cloud-based infrastructure that allows researchers to preprocess and analyse texts and metadata at scale; (ii) it combines metadata and textual analysis, enabling users to ask complex research questions, in one integrated, interactive and user-friendly web application; (iii) it analyses and enriches texts and metadata in 16 languages; (iv) at the preprocessing phase, texts can be cleaned and each analytical tool can be individually configured using a total of 19 parameters; (v) users can test, validate, and save the configuration settings; (vi) private databases can be made public. The user can search and filter the metadata and texts in faceted, advanced and command-line modes and perform all the subsequent analyses on the filtered dataset. AVOBMAT offers the following analytical functions: (i) metadata analysis (line, area, bar, pie network analyses); (ii) lexical diversity analyses; (iii) n-gram viewer; (iv) topic modelling; (v) frequency analyses (keyword context, significant text analysis; (vi) named entity recognition, disambiguation and linking (Wikidata, ISNI, VIAF), (vii) part-of-speech tagging; (viii) keyword-in-context. The reproducibility and transparency of the experiments and results using the tool are enhanced by the ability to import and export the parameter settings as templates or JSON files. Users can create templates for the preprocessing and analytical functions on the graphical interface. The tabular statistical data and visualizations of the performed analyses can be exported in PNG or various CSV formats. AVOBMAT helps users explore large historical and literary collections, uncover novel insights into historical events and trends, and unveil overlooked connections, themes and patterns. As sample databases, we have preprocessed the DraCor dramas (3793 in number) and ELTeC novels (1505) in 13 languages. We enriched the metadata of these collections and corrected inaccuracies. For example, as for the DraCor collection, we added, among others, the list of characters with their gender annotations, the authors’ gender and their age at the time of writing.The beta version of AVOBMAT will be available via the infrastructure of the Gesellschaft für wissenschaftliche Datenverarbeitung mbH Göttingen (GWDG). 2:30pm - 2:45pm
Open Infrastructures and the Scholarly Construction of the Past Institute of Literary Research of the Polish Academy of Sciences, Poland This presentation argues that research infrastructures (RIs) actively shape our understanding of the past, constructing representations, not mere reflections, of historical reality. Emphasising the importance of open data and infrastructures, it highlights how openness enables data pooling for comprehensive representations and ensures scholarly control, preventing commercial "black boxes." Case studies, including the Polish Literary Bibliography (PBL) and the Corpus of Literary Discourse (KDL) 1820-2020, demonstrate how methodological choices within RIs influence historical models. The PBL reveals challenges like incomplete coverage and rigid definitions, while the KDL faces issues with representativeness and digitisation. These examples illustrate that RIs are constructed representations that demand critical awareness. The presentation concludes by advocating for open infrastructures essential for equitable and comprehensive historical representations. 2:45pm - 3:02pm
Towards interdisciplinary approaches to digital cultural heritage: GLAM Labs and data spaces 1Europeana Foundation, Netherlands; 2University of Alicante, Spain; 3Tallinn University, Estonia; 4Royal Danish Library, Denmark; 5International Internet Preservation Consortium, USA; 6KU Leuven Libraries, Belgium; 7National Library of the Netherlands, Netherlands At the intersection between the cultural heritage and research sectors, the increasing availability of digital cultural heritage data has supported new ways of producing knowledge, research, publishing results, and teaching in academic settings. The efforts put into the digitisation of their collections and capturing born digital archives, have led GLAMs (Galleries, Libraries, Archives and Museums) and other organisations such as universities to focus on many factors. These include the wider accessibility of the data and its reuse-related potential refining approaches to digital curation so that digital cultural heritage can support computational analysis, especially through Digital Humanities approaches. Emergent initiatives, such as Collections as Data, FAIR and the CARE principles for Indigenous data governance, have emphasized the need for best practice when making data available based on sets of principles (Padilla, 2023; Carroll, 2020). Moving from these principles, the International GLAM Labs Community has grown as a collaborative and interdisciplinary initiative aimed at promoting the publication, computational access and responsible use of data (Mahey, 2019; Candela, 2023; Candela, 2023b). It includes more than 90 institutions covering a wide diversity of domains from GLAMs and wider. In this context, collaborations with researchers and university support staff from different fields have become crucial factors in supporting the study of the past with a careful eye on the needs of the present and future. - The European data space for cultural heritage fosters collaborations between the cultural heritage sector and academia, building on the experience of the Europeana Initiative and its long-standing partnerships with research infrastructures like DARIAH. These partnerships have led to joint efforts, especially in digital humanities. A new trend deserves attention in this context, and it is the increasing number of university courses focusing on digital cultural heritage across Europe. Acting as an observatory, the data space aims to foster exchanges around this topic, enhancing cooperation between the academia and cultural heritage institutions in the educational paths of the next generation of (digital) curators. - GLAM institutions are exploring new ways to make their content available suitable for computational access. This section will introduce examples of publication and reuse of digital collections published by relevant institutions such as the Royal Danish Library and KU Leuven. It will include a description of challenges and opportunities illustrating their approach. -The publication of digital collections suitable for computational use can be a difficult task for the institutions. Best practices and guidelines to publish digital collections suitable for computational use promote its adoption (Candela, 2023a). This section will introduce a selection of relevant and innovative initiatives that can be useful in this context. It will also provide examples of use based on a wide diversity of content such as GLAM datasets and Web Archives. - Will highlight examples of use and lessons learned from his research with users of digital cultural heritage at the British Library and those providing services for the reuse of digital cultural heritage as data around the world in the GLAM Labs community. |