Conference Agenda

Crowdsourcing, Audiovisual Heritage
Friday, 20/Mar/2020:
1:50pm - 3:20pm

Session Chair: Ieva Tihovska
Location: Hall B

Short Paper (10+5min)

Crowdsourcing Metadata for Audiovisual Cultural Heritage: Finnish Full-Length Films, 1946–1985

Hannu Salmi1,4, Kimmo Laine2,3, Tommi Römpötti2, Noora Kallioniemi1, Elina Karvo1

1University of Turku, Department of Cultural History, Finland; 2University of Turku, Department of Media Studies, Finland; 3University of Oulu, Department of Art Studies and Anthropology, Finlanf; 4Turku Group for Digital History, Finland

This paper is based on a crowdsourcing project which was realised at the School of History, Culture and Arts Studies of the University of Turku between the years 2013–2018. The idea was to develop a format through which long-term crowdsourcing could be integrated into the humanities curriculum. The project was realised in close cooperation with the National Audiovisual Institute (KAVI) in Finland. The aim was to help KAVI in developing its open database for Finnish cinema, Elonet, by engaging both graduate and postgraduate students in producing keywords, genre characterisations, plot summaries and other relevant fields of information for Finnish cinema. In total, the project produced metadata for 572 full-length films, both fiction films and long documentaries that had their theatre release between the years 1946 and 1985. The amount is substantial considering that, to date, around 1,600 full-length films have been released in Finland. At the same time, it produced a successful model for drawing on crowdsourcing in the classroom.

Long Paper (20+10min)

Emotion Preservation in Translation: Evaluating Datasets for Annotation Projection

Kaisla Kajava1, Emily Öhman1, Hui Piao2, Jörg Tiedemann1

1University of Helsinki, Finalnd; 2University of Tokyo, Japan

This paper is a pilot study that aims to explore the viability of annotation projection from one language to another as well as to evaluate the multilingual data set we have created for emotion analysis. We study different language pairs based on parallel corpora for sentiment and emotion annotations and explore annotator agreement. We show that the source data is a possible source for reliable L1 data to be used in annotation projection from high-resource languages, such as English, into low-resource languages and that this is a reliable way of creating data sets for fine-grained sentiment analysis and emotion detection.

Short Paper (10+5min)

Towards an Analysis of Gender in Video Game Culture: Exploring Gender-Specific Vocabulary in Video Game Magazines

Thomas Schmidt, Isabella Engl, Juliane Herzog, Lisa Judisch

Media Informatics Group, University of Regensburg, Germany

We present preliminary results of a project examining the role and usage of gender specific vocabulary in a corpus of video game magazines. The corpus consists of three popular video game magazines with 634 issues from the 1980s until 2011 and was gathered via OCR-scans of the platform We report on the distribution and progression of gender-specific words by using word lists of the LIWC for the categories "male" and "fe-male". We can indeed show that words of the type male are considerably more frequent than words of the type female, with a slight increase of female words during 2006-2010. This is in line with the overall development of gaming culture throughout these years and previous research in the humanities. Furthermore, we analyzed how the usage of negatively connoted words for female depictions (e.g. chick, slut) has evolved and identified a constant increase throughout the years reaching the climax around 2001-2005, a timespan that coincides with the release and popularity of games encompassing rather sexist concepts. We discuss the limitations of our explorations and report on plans to further investigate the role of gender in gaming culture.

Long Paper (20+10min)

Digital Analysis and Machine View on Latvian National Catalogue of Museum Collections

Maija Spurina

Latvian Academy of Culture, Latvia

Latvian National Catalogue of Museum Collections (Latvijas Nacionālais Muzeju Krājuma kopkatalogs) is an on-line national digital data base of museum collection ( that currently contains information on more than 1.2 million objects stored in 129 state certified museums. For the past fifteen years enormous efforts and financial means have been invested to create this one digital gateway to the cultural heritage stored in Latvian museums. The database was centrally designed and is administered top-down by a government agency – Culture Information System Center. The data has been input bottom-up by each museum. The resulting database can be seen as a digital representation of the national cultural heritage preserved in Latvian museums. In my presentation I will show preliminary results of my analysis of publicly available part of the database using OpenRefine and machine learning algorithm for object recognition (Python). I will discuss the challenges faced in the process of analysis due to the design of the database, as well as point out possibilities and limits of using machine learning algorithm for analysis of museum collections.