Conference Agenda

Session
Libraries and Digital Resources
Time:
Wednesday, 18/Mar/2020:
2:10pm - 3:40pm

Session Chair: Koraljka Golub
Location: Hall A

Presentations
Short Paper (10+5min)

Online Participatory Memory Work: Understanding the Potential Roles of Online Mnemonic Communities in Building the Collections of Public Memory Institutions

Ina-Maria Jansson, Olle Sköld

Department of ALM, Uppsala University, Sweden

[Please note that the reference list is supplied in the PDF version of this summary]

1. Introduction

A key task for humanist scholarship is to continuously interrogate the workings of human culture, community, and the many iterations and permutations in and by which they exist. This task has been taken on by digital humanists as well, not seldom with sights set on better understanding the impact of information technology and networked communication on present-day human affairs (e.g., Kirschenbaum, 2008). This paper shares this ethos, and argues for the importance of understanding digital information infrastructures and platforms when seeking to include collective memories of diverse groups into the collections of public memory institutions (e.g., archives, libraries, museums). The paper presumes that participatory memory work of online communities are not isolated processes but elements of digital ecologies. A purposeful documentation and preservation of those ecologies are essential for contextualization and understanding of the outcome of community memory work.

Technical changes in how people communicate create ripple effects that extend through a wide range of human endeavors and processes, including the shaping, communication, and re-formulation of collective memory (Hoskins, 2009). Collective memories are thus dependent on, and carried on by technological and structural frameworks (for example common entities like standards) that reconstructs shared values and concepts (Bowker, 2005). Just as a tool shapes the object of its creation so do technologies make their imprint on the information that is communicated through it. The increasing use and complexity of online digital platforms for communication within and between communities is here defined as such a technical innovation that shapes memory practices.

Research has shown that networked forums constitute important arenas for minority communities in the for example social, cultural, gender, medical or socio-economic sense (Af Segerstad & Kasperowski, 2014; Boyd, 2010; Marwick & Ellison, 2012; Wagner, 2018). In these online spaces, communities engage in many different memory-making practices for a variety of purposes and intents (e.g, Sköld, 2015, 2017). Such community memory-work also plays important roles of social support for its members. It also increases a sense of identification with, and inclusion in, the community itself (Assmann & Czaplicka, 1995), as shared memories of a community can be used to socialize new members into a group (often termed ‘mnemonic socialization’) in order for them to identify with the group’s past (Misztal, 2003). It is clear that digital community platforms consist an essential part of digital existence.

Here, an insight emerges with regards to participatory memory work in the memory institution-sector. The ability to support an inclusive and diverse public memory rests on key ongoings in the digital present being competently grasped, collected, and integrated into the collections of public memory institutions. Such an ambition can only be realized if it also includes the massively productive memory work communally conducted by online communities in the different spaces and services of the Internet. The memory work of online communities is however an understudied topic, and the opportunities and pitfalls present in the important endeavor to include the infrastructural complexity of shared memories of communities in the collections of public memory institutions are poorly understood (e.g., McDonough et al., 2010; Sköld, 2018a; Winget, 2011).

2. Aims, materials, and methods

The aim of this study is to explore the information infrastructural premises for the memory work of online communities and how public memory institutions can succeed better in their efforts to create diverse and inclusive collections by recognizing and supporting those premises.

The study is based on two case studies of memory work in online MMORPG videogame communities. Videogame communities offer an interesting case in relation to the aim of the study for several reasons. Firstly, videogames and videogaming are landmark features of digital culture today. Videogames and videogaming have impacted many arenas of contemporary life. Examples include technology development and adoption (Swalwell, 2007), management and organizational thought (Deterding et al., 2011), and the everyday interactions of many people by becoming sites of meaningful play and social interplay (Pearce, 2009), storytelling (Albrechtslund, 2010), learning (Barr, 2014), and knowledge production (Sköld, 2017). Secondly, and owning mainly to the ubiquity of the videogame phenomenon, videogame communities showcase many of the key issues and considerations that confront memory institutions aiming to build bridges between online-community memory work and institutional practices. Examples include ethical issues, legal and economic and ownership issues, and in the broader online space commonly occurring patterns of power relations and memory-making practices.

The aim of the study is met in two steps (RQ-1 and RQ-2), and is guided by a basic tenet of preservational work: successful curation rests to a significant extent on sufficient knowledge of the material in focus (Mortensen, 1999; Kirschenbaum, 2008).

─ RQ-1. How are online communities conducting memory work, and what are the characteristics of the materials they produce as a result of this work?

─ RQ-2. What are the potential results, pitfalls, and opportunities of efforts seeking to integrate the memory work of online communities into the collections of public memory institutions?

The materials of the first case study consist of 40 World of Warcraft blogs collected in 2011 (Sköld, 2011). The second case is a study of 140 discussion threads (containing texts, images, videos, and audio) posted on a City of Heroes (CoH) discussion forum between 2012 and 2013 (Sköld, 2015). RQ-1 is answered by reporting on the WoW and CoH communities’ practices of memory work, and typological analysis of the materials they produce. RQ-2 is met with guidance from theory of information infrastructures, the concept of institutionalization, and the results of previous research on videogame preservation (see e.g., Sköld, 2018b; Winget, 2011 for overviews).

3. Theoretical framing and discussion

The concept of information infrastructures makes visible the otherwise often transparent foundations for information and communication (Star & Ruhleder, 1996). It is employed in this study to distinguish between different information spheres and to understand the conditions and the challenges that has to be overcome when including material produced by online communities in collections of memory institutions. It is used to highlight the differences in settings and practices between the community sphere and the institutional sphere. Furthermore, this bridging process of information spheres is discussed in terms of institutionalization, which denotes the integration process of material created within the online community, to become part of institutional collections of archives, libraries, or museums.

As one of the many challenges and concerns for the institutionalization of online-community memory work, this paper argues that the organizational paradigms usually found in public memory institutions are among the most critical. For example, the multi-medial characteristics of online-community communicative memory may create difficulties for memory institutions whose collection management and mediation practices are centred on mono-medial materials. The benefit that the institutionalization of (the often very diverse) online-community memory work offer to memory institutions seeking to support inclusive memory politics however makes it worthwhile to strive to overcome such hindrances.

The relevance of this paper extends beyond the issue of how and why online communities can and should be represented in the collections of memory institutions. It discusses the connectivities that can be built across communal and institutional practices of memory work and illustrates more broadly what challenges have to be met in order for other areas of contemporary digital culture and communication, like social media content, to potentially become a part of the cultural memories of our societies.



Short Paper (10+5min)

Detecting Social Structures Using Library Loan Data

Olli Nurmi1, Kati Launis2, Erkki Sevänen3

1VTT, Finland; 2University of Turku, Finland; 3University of Eastern Finland, Finland

Finland is a country with high PISA rankings and a well-functioning publicly funded, free-of-charge library system. About 80% of the Finnish population use public libraries regularly, and during the last two decades 35-50% of Finnish people have loaned something at least once a year (books, journals, films, cd-records) from the public libraries. Against this background, it can be stated, that Finland has active reading culture. However, radical changes in time use, digitization, as well as attitudes towards reading have influenced our reading habits substantially.

In this article, we study the current Finnish reading culture by analysing the loan data collected by Vantaa City Library in Finland’s metropolitan area. In earlier studies of the Finnish readership, methods such as interviews and queries have been widely used (see, for example, Eskola 1979). Since then attempts to introduce quantitative methods into the study of literary culture have been hampered by the lack of suitable data. The situation has changed radically along the rise of the digital humanism: nowadays big data - e.g. library loan data used in this article - constitutes a significant resource for understanding literary culture from a new and wider perspective. Integration of large “born-digital” material, new computational methods and literary-sociological research questions open a possibility to find new knowledge within the qualitative approach in humanities.

Our method is to apply social network analysis to the data concerning public libraries’ loan activities in the Helsinki metropolitan area. Firstly, we draw a co-occurrence network based on the paired presence of books within a specified loan cart. We then apply the modularity maximization method to detect book clusters. Visual representations of book clusters are drawn to reveal associated cultural and literary phenomena. This paper shows that current Finnish reading culture is heterogeneous and consists of several sub clusters. It also shows that the library users favour the newest literature and typically borrow multiple books of the same series and the same writer. Our methodological contribution is to demonstrate how social network analysis and clustering technique can be applied to library loan data to characterize reading culture.

Introduction

Starting from a previous work done in the field of digital studies of cultural trends through quantitative analysis of digitized texts (Michel et al., 2010), we use social network analysis as a method for detecting changes in book reading culture and identifying reading subcultures. In literary research, social network analysis and community detection has been a popular method used to visualize certain structural features of a text or a corpus. A common usage is the visualization of relationships between the texts based on the similarities of the textual contents, and relationships between textual entities such as words (Jänicke, Franzini, Cheema, & Scheuermann, 2015).

In this article, we use the visualization to disclose relationships between the books in the library collection. Firstly, we draw a co-occurrence network based on the paired presence of books within a specified loan cart. We then apply the modularity maximization method to detect book clusters. Visual representations of book clusters are drawn to reveal associated cultural and literary phenomena. This paper shows that current reading culture is a heterogeneous cultural phenomenon consisting of several different sub clusters. The position of national classics (such as Väinö Linna), popular among Finnish readership some decades ago, has radically weakened.

Data source

The largest public library network in Finland is Helsinki Metropolitan Area Library network (Helmet) consisting of the city libraries of Helsinki, Espoo, Kauniainen, and Vantaa. In this work, we had access to anonymized Vantaa City Library loan data. The Helmet collection, consisting of 3.4 million items, is available for the Vantaa City Library users through this network. Our data sample includes all the loan interactions of Vantaa City Library users during 20th July 2016 – 22nd October 2017 containing about 1.5 million records.

We build our understanding on the library loan data because it gives an accurate, actual and much wider picture that interviewing a limited number of book readers. This work provides a reliable evidence basis for decision-making and development of effective policies in libraries.

Results

The analysis shows that the library users typically borrow the multiple books of the same series and the same writer: four of the six of the largest clusters are formed around contemporary female authors, writing entertaining fiction in series and under a pseudonym. This can be explained by the increased use of branding where a set of marketing and communication methods are applied to distinguish the author from competitors, aiming to create a lasting impression in the minds of the readers. An author brand is, in essence, a promise to its readers including emotional benefits. When readers are familiar with an author’s brand, they tend to favour it over competing others.

The type of analysis used in this article, can also facilitate new ways to create book recommendations or place the books in the libraries. The books can, for example, be placed in libraries in clusters, which then may be sorted alphabetically. This facilitates the library users’ ability to shift smoothly from one cluster to another when a library user is searching and selecting new books. In addition, book series should be marked to enable the readers to locate them easily.

The analysis may also help obtain the ‘market intelligence’ for a better understanding about the different book genres and subcultures performance and evolution. Several algorithms can be used to calculate the importance of any given node in a network. In libraries’ case, we can use these algorithms to identify books with influence over the whole network. By promoting these influential books, the librarians could increase their effect on the reading culture.

The library collection consists of tens of thousands of books and no one is able to read through them all to get the "whole picture" of the literature available for the loaners and the reading culture based on it. The distant reading (Moretti 2000) of the contemporary reading culture - based on the big, digitized, daily loan data during the 1.5 years - is the method that makes this kind of definition possible. Using data analytics methods and social network analysis we can focus on a manageable piece of information and enable literary scholars to make surprising discoveries, generate new hypotheses or suggest further research.



Short Paper (10+5min)

Automatic Morphological Annotation of Ego-Documents: Evaluating Automatically Disambiguated Annotation of Estonian Semper-Barbarus Correspondence Corpus

Olga Gerassimenko1, Kadri Vider1, Neeme Kahusk1, Marin Laak2, Kaarel Veskis2

1University of Tartu, Center of Estonian Language Resources, Estonia; 2Estonian Literary Museum, Estonia

The digitization of the cultural heritage is massive in Estonia: the national programme of mass digitisation started in 2018, and the creation of digital heritage resources is made a priority for Estonian memory institutions (Viires, Laak 2018). Yet, the majority of the digitised literary data is captured and used in the raw format: at best, the digitised source is transformed to the plain text that is searchable for strings. The digital resources are mostly used by the humanity scholars in the same way as the published texts: digital texts are read at length and analysed qualitatively. The quantitative methods, even as simple as word frequency analysis, are not possible for unannotated texts. The morphological annotation and disambiguation is an undisputed necessity for the digitized data, especially considering the rich morphology of Estonian and the great amount of homoforms. Big amounts of data need to be parsed and disambiguated automatically that implies some error rate but still makes corpus search, data analysis and data mining much more efficient.

There are many challenges for morphological tagging of older cultural heritage sources (especially non-edited ego-documents such as letters and diaries). The authomatical morphological parser and disambiguator of Estonian ESTMORF has been created for contemporary Estonian and trained on the texts of second half of the 20th century, proving to be 99% efficient on the contemporary published texts (Kaalep, Vaino 2001). The efficiency of parcer has been tested on less normative texts types such as chatroom texts (Kaalep, Muischnek 2011), but ego-documents offer some specific complications: sentences are lengthy and syntactically complicated, and yet letters and diary entries may include ad hoc abbreviations, unmarked switching to other languages, specific orthography and punctuation. Is the automatic morphological annotation of such texts reliable enough for a decent corpus research and for comparison of the target sources with other corpora?

We are exploring it on the data of the Correspondence Corpus of Estonian avant-garde poets and writers Johannes Semper and Johannes Barbarus in 1911-1940 (Laak et al. 2019). This is a unique and multidimensional collection of private letters, the hand-written originals of which are held at the Estonian Cultural History Archives of the Estonian Literary Museum in Tartu. The correspondence consists of 670 letters with more than 1,100 pages and more than 310 890 tokens (249 970 words). The range of subjects touched upon in the letters is extremely wide: Semper and Barbarus as friends and colleagues discuss all events in the Estonian cultural life, organize the publication of their books and discuss the problems of their contemporary literary and political life and even economics in Estonia and in other countries. The letters were already transformed to typewritten and then to electronic format; morphological categories had to be automatically annotated and disambiguated in them and metadata had to be described manually to transform the letters to a machine-readable format. Corpus is openly accessible through KORP query system and is currently being used by the literary scholars for textual search.

In order to evaluate the quality of the morphological analysis of the Semper-Barbarus Correspondence Corpus, we are manually checking certain excerpts of the output and computating the error rate in general and the error rate for Estonian text only (there are lengthy foreign-language excerpts in the Semper-Barbarus correspondence). We are going to calculate and compare the error rate to the error rate of the texts of the same time period (1920-1930) from Estonian Literary Criticism Corpus containing published publicistic texts and to compare it to the previous work of Liba and Veskis (2008) on Estonian automatic tagger evaluation.

The results of the study are going to be used to propose the systematic modifications of the morfological parser by manually adding words to the parser lexicon. The reliably annotated corpus can be used for quantitative research of phenomena mentioned in texts: for instance, we can evaluate the relative frequency of the words related to politics in the various decades of correspondence. Having a reliable automatic morphological annotation, we can annotate texts syntactically and semantically, and, in perspective, apply sentiment analysis to see whether the affective polarity of texts changes with time.

References

Barbarus-Semper Correspondence Corpus, https://doi.org/10.15155/9-00-0000-0000-0000-00190L, last accessed 2019-09-14.

Estonian Literary Criticism Corpus, https://doi.org/10.15155/9-00-0000-0000-0000-00193L, last accessed 2019-09-14.

Kaalep, Heiki-Jaan and Tarmo Vaino. 2001. Complete morphological analysis in the linguist’s toolbox. In Proceedings of Congressus Nonus Internationalis Fenno-Ugristarum, Pars V. 9–16.

Kaalep, H.-J.; Muischnek, K. (2011). Morphological analysis of a non-standard language variety. Proceedings of the 18th Nordic Conference of Computational Linguistics: NODALIDA 18, Riia, Läti, 11-13. mai 2011. Ed. Bolette Sandford Pedersen, Gunta Nešpore, Inguna Skadina. Riia, Läti, 130−137. (NEALT Proceedings Series; 11).

Laak, Marin; Veskis, Kaarel; Gerassimenko, Olga; Kahusk, Neeme; Vider, Kadri (2019). Literary Studies Meet Corpus Linguistics: Estonian Pilot Project of Private Letters in KORP. DHN 2019 Digital Humanities in the Nordic Countries, 2364: Proceedings of the Digital Humanities in the Nordic Countries 4th Conference Copenhagen, Denmark, March 5-8, 2019.. Ed. Costanza Navarretta, Manex Agirrezabal, Bente Maegaard. Copenhagen, Denmark: University of Copenhagen, Faculty of Humanities, 283−294.

Viires, P., Laak, M.: Digital humanities meet literary studies: Chal-

lenges facing estonian scholarship. In: Mkel, E., Tolonen, M., Tuomi-

nen, J. (eds.) DHN Helsinki 2018. Book of Abstracts (2018), https://www.helsinki.fi/sites/default/files/atoms/files/dhn2018-book-of-

abstracts.pdf, last accessed 2019-09-14.

Veskis, K., Liba, E.: Automatic tagger evaluation. NLP course assignment report

(2008), https://entu.keeleressursid.ee/public-document/entity-7052, last accessed 2019-09-14.

Acknowledgements

Research supported by the institutional research grant "Formal and Informal Networks of Literature, Based on Sources of Cultural History" (IRG22-2, Estonian Ministry of Education and Research), related to the Centre of Excellence in Estonian Studies (CEES) and by the programme ASTRA (2014-2020.4.01.16-0026) via the European Regional Development Fund (TK145).

Development of KORP and adding corpora in Estonian is supported by the ERDF project "Federated Content Search for the Center of Estonian Language Resources" (2014-2020.4.01.16-0134) under the activity "Support for Research Infrastructures of National Importance, Roadmap".



Long Paper (20+10min)

Hearth Tax Digital: New Narratives on Restoration England

Andrew Wareham1, Jakob Sonnberger2, Theresa Dellinger2, Georg Vogeler2

1Roehampton University, UK; 2Graz University, Austria

The Restoration hearth tax was the first Parliamentary tax to impose a direct levy upon householders in Britain and Ireland, which did not unleash major political unrest and/or a regime change (e.g. Poll Taxes of the late 1370s). Because of its success at the political level, there are a remarkable number of extant records in national and local archives on tax payers, locations, numbers of hearths, and whether they were charged/uncharged (assessments) or paid/did not pay (returns). These data present economic historians with a substantial opportunity to provide a new understanding of social and economic conditions in the late 17th century Britain. The first part of the paper will discuss why it is useful to have hearth tax records in a digital format, and the second part will present some preliminary research findings from Hearth Tax Digital. This will not only use GIS to assess distributions of population and wealth iin diachronic and national contexts, but also draw upon extraneous data on occupations, rank and gender.

Since 2000 there have also been important developments in digital transcription and archiving. On ScotlandsPlaces (National Records of Scotland (NRS) website), all the hearth tax returns arising from the 1691 collection can be searched; and on British History Online (BHO) there is an Access database of the 1666 Lady Day return for London and Middlesex. Between 2020 and 2015 Hearth Tax Online (HTO) made PDFs, reprinted from hard-copy hearth tax editions, available electronically. Each of these methods has distinct advantages and disadvantages, dependent upon the aims of BHO, HTO and NRS. BHO maximizes users’ ability to manipulate the data, but does not enable users to read the 1666 return in its original order. ScotlandsPlaces is at the opposite end of the spectrum, taking careful note of manuscript marks, but with limited facility to manipulate the data; and HTO was best used in tandem with the hard-copy editions from which the printed transcripts were taken.

Hearth Tax Digital, arising from a partnership between the British Academy Hearth Tax Project/Centre for Hearth Tax Research at the University of Roehampton and the Centre for Information Modelling at the University of Graz uses the methods of an assertive digital edition to achieve 5 aims:

1. digital archiving and long-term preservation of hearth tax records

2. access to the digital transcripts in the original order in which they were written

3. manipulation of the statistical data synchronically/county based and diachronically/nationally

4. depiction and research enquiries on population/wealth distribution in GIS

5. searching based upon extraneous data on social conditions/rank/occupations etc. with standard data

The new website is hosted by the FEDORA-based, OAIS-compliant humanities digital archive infrastructure of Graz University (GAMS), a repository both for long-term archiving and publication of digital humanities resources. Hearth Tax Digital, essentially, is built upon two types of digital sources.

Firstly, for some regions we have been granted access to transcripts of the original records, which were produced for the print editions published by the British Academy Hearth Tax Project and the British Record Society. These transcripts are further encoded in XML, following the guidelines of the Text Encoding Initiative (TEI). Additionally, taking the ‘assertive edition’ approach, distinct semantic units are labeled using the ana-attribute. During the ingest process, a ‘toRDF’ stylesheet makes use of those labels, creating a graph database from the transcripts.

For other regions, lists of taxpayers are only available, lacking any contextual information or initial order given in the original documents (‘Returns in database Format’). In this case, the data - usually given in database files or spreadsheets – are directly transformed to RDF/XLM, and joined with the graph data arising from the transcripts in our triple store, forming one sole semantic database.

Notably, all these processes, once they have been set up for the project, automatically apply to all upcoming further data ingested to the repository following our schema, providing HTML and spreadsheet representations for both the transcripts and the ‘Returns in database format’, as well as adding the extracted semantic information to the database.

According to the aims of the project, it can be said that:

1. The GAMS repository, certified according to the criteria of the ‘Data Seal of Approval’ as a trusted repository, guarantees long-time preservation and archiving of all records in scope. Additionally, users may easily access and download the source data (TEI/XML, RDF) of all documents.

2. The visual representation of the digital transcripts is kept as close to the original transcripts as possible, maintaining the initial order and spelling, obtaining all conveyed information as well as trying to reconstruct the original layout (e.g. columns) of the documents. But, as the aim of a digital edition goes beyond the mere digital reproduction of the print edition, all additional information like regularizations, editorial notes, geographical hierarchies etc. have been marked up and visualized by optical highlighting and tooltips.

3. We are also able to deliver any kind of statistical information on our data just by formulating suitable database requests.

4. By adding the geographical information on county/parish boundaries (GML, Shapefiles) provided for the print editions to our database, we can visualize almost every statistic projected on various different background maps (e.g. Open Street Map). Ranges and parameters therefor can be manipulated by the users, offering a vast playground for research beyond the standard parameters.

5. The database provides both a full-text search for any terms occurring anywhere in the transcripts, as well as a structured search based on categories like number of hearths, personal names etc.

Currently (August 2019), Hearth Tax Digital holds more than 142,000 taxation entries, with further 46.000 in publication.

Hearth Tax Digital means that for the first time it is possible to study the hearth tax in a national context, moving across county boundaries and returns between the mid 1660s and early 1670s. This paper will set out both the methods which have been used in developing this digital resource, and some preliminary findings on social and economic conditions in the Restoration age.

Select Bibliography

Johannes Stigler & Elisabeth Steiner: GAMS – An infrastructure for the long-term preservation and publication of research data from the Humanities. In: Vereinigung Oesterreichischer Bibliothekarinnen und Bibliothekare. Mitteilungen. 71,1. 2018. 207-216. doi:10.31263/voebm.v71i1.1992.

Vogeler, Georg: The ‘assertive edition’ : On the consequences of digital methods in scholarly editing for historians. In: International Journal of Digital Humanities. 1,2. 2019. 309-322. doi:10.1007/s42803-019-00025-5

Wareham, Andrew ‘The unpopularity of the hearth tax and the social geography of London in 1666’. In: Economic History Review, 70 (2017) pp. 452-82.



Short Paper (10+5min)

In Quest of Transition Books

Denis Kotkov1, Kati Launis2, Mats Neovius1

1Åbo Akademi, Finland; 2University of Turku, Finland

Literature read by a person not only reflects, but also affects that person. In fact, certain books (transition books) might trigger this process of becoming interested in grownup's literature and therefore mentally becoming a grownup. In this paper, we detect books that are likely to be transition books or transition book candidates based on a loan dataset provided to us by Vantaa City Library. With four methods applied to this dataset we show what books and why are likely to be the candidates. We found the following candidate books: Tähtiin kirjoitettu virhe by John Green, Punainen kuin veri by Salla Simukka and Luukaupunki by Cassandra Clare. Our findings also indicate a few other books that are less likely, but still good candidates for transition books.