2023 CSDH/SCHN Annual Conference

May 29th - 31st, 2023 | York University, Toronto

JavaScript is Disabled
Your browser's JavaScript functionality is disabled. It has to be enabled to use this function of ConfTool.
Here you can find information on how to enable JavaScript
If you have any problems, please contact the organizers at csdh-schn-2023@conftool.net.

Conference Agenda

Overview and details of the sessions of this conference. Please select a date or location to show only sessions at that day or location. Please select a single session for detailed view (with abstracts and downloads if available).

Session Overview

Session

Session 13: Data

Time:

Tuesday, 30/May/2023:

10:30am - 12:00pm

Session Chair: Markus Reisenleitner

Location: Ross Building S507

Presentations

Representational Data: a case study

Bordalejo, Barbara; O'Donnell, Daniel; Woods, Nathan

University of Lethbridge, Canada

A significant number of (largely non-digital) Humanists resist the idea that they “have” data. This translates into critical scepticism to the role of data in humanities research and the loss of the essence of what characterized humanistic objects and their treatment (see Marche 2012, Sinykin, 2021). This scepticism reflects a failure to recognise and understand the implications of a fundamental use of data in the Humanities, which we call “representational data.”

“Representational data” — the collection, analysis, and especially dissemination of cultural materials in the form of mediated research objects such as scholarly editions, curated museum or gallery catalogues, facsimiles and models — were not easily processed using the early systems of humanities computing. For this reason, the work of these early computational projects was often quite distinct from its analogue counterpart.

Much of the resistance to data in the humanities comes from an intuitive and largely unarticulated sense among analogue researchers that this primary use case has been overlooked, i.e. that debates about the definition of “data” ignore or deemphasise how such data have been used in the humanities. Here, we examine how the use of ‘representational data’ illuminates some of the issues involved in both the resistance and adoption of data in humanities scholarship.

Analogue humanists speak of “sources.” “Primary sources” are texts, objects, and artefacts they study; “secondary sources,” the work of others with whom they engage. Research objects such as editions of historical texts or models of artefacts can be both “primary” and “secondary”: proxies or representatives of the original objects and works of interpretation and analysis that can be engaged with by others in their own right depending on the use given to them at a particular time.

Computers, in the 1950s, understood data: the processing of “given things.” Busa’s Index Thomisticus was an ideal early application precisely because its textual nature and its end use were something that took full advantage of the computer’s capacity to process information.

Joanna Drucker’s influential suggestion that Humanists don’t have data (“given”) but rather capta (“taken”) separates what is recorded (data) from was is constructed (capta) (Drucker 2011). Although Father Busa was passively using data, the spirit of his work was not that of constructing an interpretation but of building tools to allow the navigation of Aquinas’ works. Computation historically forced scholars to talk about data in ways that seemed alien to analogue Humanists.

We conclude, based on the case of representational data, that the way analogue humanists think has not been fully understood by research data management specialists or infrastructure developers, whose practices have been developed almost entirely with a different understanding, in which “data” are things to be counted rather than represented and which are generated through experiment, observation and measurement. This explains the poor support such infrastructure provides for humanities research objects that work with representational data. It provides an agenda for a Humanities-informed approach to research infrastructure that can address the resistance to data that is still widely felt among Humanities researchers.

Reimagining the Data Problem in the Humanities: Data Type Versus Use-Case

Woods, Nathan D.; Bordalejo, Barbara; O'Donnell, Daniel Paul

Humanities Innovation Lab, University of Lethbridge, Canada

That the humanities has ‘a data problem’ is now a common refrain amongst many communities. Humanists often argue that humanities data is a problem because they don’t have or work with data (Borgman, 201. Librarians and information professionals, by contrast, believe that humanists have data, but assume they don’t realize it — meaning that the problem is that they must be trained to appropriately recognize and work with data (Flanders, Julia, and Trevor Muñoz, 2012; Ikeshoji-Orlati, Caton, and Stringer-Hye, 2018). Digital humanists know that they have data but believe that their data are special and that these data require special strategies and techniques as a result (Drucker,2011; Schöch,2013). Each premise informs a mélange of assumptions, advice and best practices that comprise the emerging literature on research data management (RDM) in the humanities (Gualandi, et. al, 2022; Thoegersen, 2018).

We argue that this focus on the discovery and definition of what is “special” about humanities data is a mistake. Humanities data are not special because of what they are, but rather because of how they are used (Borgman,2017; Leonelli, 2015), and hence how data are designed and structured by systems to meet particular ends. Data are data whether they are produced and used by scientists or humanists. The “problem” with humanities data lies in the use-case, or the system requirements of the scholarly tool or infrastructure that shapes data for particular purposes.

Our argument draws on evidence and analysis derived from a series of comparative case studies exploring the development of scholar-led data intensive projects over time. We examine how humanists conceptualize data as they build, navigate, and utilize research infrastructure for scholarly purposes. Originating in software engineering, use-case modeling (Jacobson et al. 1992) is a means of specifying, validating, and eliciting system requirements. Models describe, communicate, and facilitate all the ways a user interacting with a system or product may work to realize a desired end. We highlight how these models mediate between user-agency, the purpose of a scholarly project, and the ‘infrastructural work’ necessary to meet a project's goals.

In fields where use-case modeling are less well explicit, as is the case in the Humanities, humanists have worked far more improvisationally (Ciula, 2022), experimenting and innovating by designing and building infrastructure that have specific requirements, but often without clear requirements modeling. As a result, humanists often create custom information systems, data infrastructures or tools and interfaces developed by researchers for the collection, analysis, and presentation of their own data. The problem in humanities RDM is not that humanists don’t have a common understanding of what data are, it is that they don’t recognize the degree to which they are using these data for common ends or in generalizable ways.

In conclusion, we point to some ways to shift the conversation away from ‘the problem of humanities data’ and towards developing and interfacing with scholarly use-cases, the scenarios and problem-sets scholars are concerned with, and engage with the custom infrastructural strategies they have developed to speak to them.

Long Literary Covid: Archive of the Digital Present (ADP) and Reflections on the Meaning of Data About Pandemic Literary Events

Camlot, Jason; Wiener, Salena

Concordia, Canada

"Archive of the Digital Present for Online Literary Performance in Canada (COVID-19 Pandemic Period)" is a research and development project that arose out of the need to address foundational, practical and theoretical research questions about the impact of the COVID-19 pandemic, and attendant social disruptions and restrictions, on literary communities in Canada through the collection of information about organized literary events as they occurred from March 2020-March 2022.

Our paper will first present some of the design and development work pursued in building a searchable, open access database and directory – The Archive of the Digital Present (ADP) – to allow scholars, literary practitioners, and the public to gain knowledge about the nature and significance of events that occurred (mostly online) during the pandemic period, through the collection and structuring of metadata, and limited additional assets.

Our discussion will then focus in on the work in data collection and structuring we have pursued to bring content to the directory site. The ADP project necessarily began with questions about the data we were seeking to collect. In February 2021 we performed a preliminary analysis of online and social media postings for listings of literary events hosted in Canada. This revealed 77 discrete organizers of over one thousand (1,011, to be exact) literary events between 20 March 2020 - 31 December 2020. This list served as the starting point for an expanded catalogue of events, and for team discussions about the nature and number of metadata fields we would use. We proceeded by adapting extant categories of the SpokenWeb metadata schema that has been designed for the description of historical literary audio recordings. This allowed us to repurpose the backend of the Swallow Metadata Ingest System (Swallow), built for metadata management of historical research collections, through the development of a crosswalk that best serves the goals of data collection for ADP. Data fields we have shaped for this project include categories related to Title, Creator/Contributor, Language, Production Context, Genre, Duration, Date, Location, Online Platform, and Contents, among others.

As we now have a live site, even as data continues to be added, our presentation will recount our ongoing methods of discovering events to be included in the ADP database, explain the rationale of our selection of metadata categories and our approach to structuring those fields, rehearse some of the philosophical and ontological questions that have arisen in the process of abstracting the complex and mediated literary activities of the pandemic period into categories of searchable data, and will end with reflections on the relationship between quantitative data and the qualitative data we are now collecting in the form of interviews from the organizers of the events we have catalogued. Drawing upon our experience of data collection from a diverse range of literary organizations and communities, our paper concludes with an argument about the value of thinking about quantitative and qualitative data as functioning productively in an ongoing dialectic of data curation, presentation, and community consultation, and suggestions for methods of realizing such an approach.