SUPPORTING RESEARCH USE OF WEB ARCHIVES: A ‘LABS’ APPROACH
The use of the archived web as an object of research remains at the fringes of (digital) humanities research (Winters, 2017). While a number of surveys and studies have identified common challenges and researchers’ requirements (See e.g. Costa & Silva, 2010; Costea, 2018; Riley & Crookston, 2015; Stirling, Chevallier, & Illien, 2012), the conclusion saying that ”there is still a gap between the potential community of researchers who have good reason to engage with creating, using, analysing and sharing web archives, and the actual (generally still small) community of researchers currently doing so” (Dougherty et al., 2010, p. 5) largely holds true. In our paper we argue that Library Labs – a growing network of experimental environments which provide data-level access to digitised and born-digital collections – can help bridge that gap.
Research use of web archives
Although many researchers in the humanities and social sciences still need to begin to explore the web archives, some projects have already investigated their potential. Mapping the Danish Web (Brügger & Laursen, 2018; Brügger, Laursen, & Nielsen, 2019), Big UK Domain Data for the Arts and Humanities (BUDDAH) project (Hockx-Yu, 2011; Winters, 2015), text-mining projects such as Néonaute (Cartier, Stirling, & Aubry, 2018) and Semantic Change Detection (McGillivray & Basile, 2018), the research being undertaken by members of the RESAW network (Research Infrastructure for the Study of Archived Web Materials) and PROMISE (PReserving Online Multiple Information: towards a Belgian StratEgy) (Geeraert, Michel, & Vlassenroot, 2018; Vlassenroot et al., 2019) being particular examples. The Internet Archive Research Services have provided important use cases that expand beyond national domains while the Archives Unleashed Project has focused on developing a toolkit, a cloud service to work with WARC files and a community around their regular datathons.
Access and labs as “incubators for research”
As a result of legal restrictions, many web archives still remain solely accessible through dedicated computers inside (national) libraries. Additionally, managing archived web-resources as large, complex and messy datasets, requires a relatively advanced level of digital literacy, not always at the fingertips of all humanities researchers. In this paper, we will consider whether the concept of ‘library labs’, as pioneered by organisations such as the British Library, and more recently, exemplified through the international Building Library Labs network  (Chambers et al., 2019) could be a) an ideal incubator for both increasing access to archived-web resources, such as within national library buildings themselves and b) whether the inclusion of web-archives as one of the many available resources alongside e.g. digitised newspapers, etc. could increase their take-up and usage in the humanities and social sciences research community. We will also examine case studies from national and university libraries that have experimented with offering datasets from their web archives as part of labs or research services (e.g. Library of Congress, Royal Danish Library, Austrian National Library and British Library). Furthermore, the recently established Research Working Group of the International Internet Preservation Consortium (IIPC), which a) seeks to promote the use of web archives and IIPC collections among researchers, b) share information about web archiving research projects at IIPC member organisations, including workflows and lessons learnt, and c) facilitate ways for dissemination and discussion of use cases, which could be an ideal framework for fostering research-use of archived web material, will be introduced.
Brügger, N., & Laursen, D. (2018). Historical Studies of National Web Domains. In N. Brügger & I. Milligan (Eds.), The SAGE Handbook of Web History (1. ed., pp. 413-427). London: SAGE Publications.
Brügger, N., Laursen, D., & Nielsen, J. (2019). Establishing a corpus of the archived web: the case of the Danish web from 2005 to 2015. In N. Brügger & D. Laursen (Eds.), The historical web and Digital Humanities: The case of national web domains (pp. 124-142). Abingdon: Routledge.
Cartier, E., Stirling, P., & Aubry, S. (2018). Néonaute: mining web archives for linguistic analysis. Paper presented at the IIPC Web Archiving Conference, Wellington.
Chambers, S., Mahey, M., Gasser, K., Dobreva-McPherson, M., Kokegei, K., Potter, A, Ferriter, M. and Osman, R. (2019). Growing an international Cultural Heritage Labs community. Retrieved from http://doi.org/10.5281/zenodo.3271382
Costa, M., & Silva, M. J. (2010). Understanding the Information Needs of Web Archive Users. Retrieved from http://xldb.di.fc.ul.pt/xldb/publications/costa2010understandingneeds_document.pdf
Costea, M.-D. (2018). Report on the Scholarly Use of Web Archives. Retrieved from http://netlab.dk/wp-content/uploads/2018/02/Costea_Report_on_the_Scholarly_Use_of_Web_Archives.pdf
Dougherty, M., Meyer, E. T., McCarthy Madsen, C., van den Heuvel, C., Thomas, A., & Wyatt, S. (2010). Researcher Engagement with Web Archives: State of the Art. Retrieved from https://ssrn.com/abstract=1714997
Geeraert, F., Michel, A. , & Vlassenroot, E. (2018). Critical reflections on unlocking web archives for humanities research. Paper presented at the 5th DH Benelux Conference.
Hockx-Yu, H. (2011). Up close and personal - Researchers and the UK Web Archive Project. Paper presented at the IIPC Web Archiving Conference, The Hague. https://web.archive.org/web/20120501064731/http:/netpreserve.org/events/Hague/Presentations/Out%20of%20the%20Box/Researchers_HockxYu.pdf
McGillivray, B., & Basile, P. (2018). Exploiting the Web for Semantic Change Detection. Paper presented at the 21st International Conference, DS 2018, Limassol, Cyprus.
Riley, H., & Crookston, M. (2015). Awareness and Use of the New Zealand Web Archive: A Survey of New Zealand Academics. Retrieved from https://natlib.govt.nz/files/webarchive/nzwebarchive-awarenessanduse.pdf
Stirling, P., Chevallier, P., & Illien, G. (2012). Web Archives for Researchers: Representations, Expectations and Potential Uses. D-Lib Magazine, 18(3/4). doi:10.1045/march2012-stirling
Vlassenroot, E., Chambers, S., Di Pretoro, E., Geeraert, F., Haesendonck, G., Michel, A., & Mechant, P. (2019). Web archives as a data resource for digital scholars. International Journal of Digital Humanities, 1(1), 85-111. doi:10.1007/s42803-019-00007-7
Winters, J. (2015). Big UK Domain Data for the Arts and Humanities. Paper presented at the IIPC Web Archiving Conference, Stanford. https://web.archive.org/web/20170315123348/http:/netpreserve.org/sites/default/files/attachments/2015_IIPC-GA_Slides_07_Winters.ppt
Winters, J. (2017). Coda: Web archives for humanities research: some reflections. In N. Brügger & R. Schroeder (Eds.), The Web as History: Using Web Archives to Understand the Past and Present (pp. 238-248). UCL Press: London.
 Further information about the BUDDAH project is available at https://buddah.projects.history.ac.uk
 Further information about the RESAW project is available at www.resaw.eu
 Further information about the PROMISE project is available at https://promise.hypotheses.org
 Further information about the Archives Unleashed project is available at https://archivesunleashed.org
 Further information about the Building Library Labs Network is available at: https://blogs.bl.uk/digital-scholarship/2018/09/building-library-labs-around-the-world.html
 Further information is available at http://netpreserve.org/about-us/working-groups/research-working-group/