Corpus methods and multimodal data: A new approach

William Dance

LAEL, Lancaster University

Within corpus linguistics, multimodality is a subject which is often overlooked. While there are multiple projects tackling multimodal interactional elements in corpora, such as the French interaction corpus RECOLA and the video meeting repository REPERE, corpus linguistic approaches generally tend to struggle when faced with extra-textual content such as images. Until now, the only viable approach to including such content in a corpus has been manual image annotation, but such an approach runs into two overarching issues.

First, as Fanelli et al. (2010) note, visual modality is the most labour-intensive form of multimodal corpus annotation when performed in traditional methods 'by hand'. Second, multimodal corpora are often limited in terms of scope and remain "domain specific, mono-lingual [...] and/or of a specialist nature" (Knight, 2010, p. 397) in order to reduce variables and make corpus construction less complex. This new approach redresses both these issues as it automates the annotation process and consequently widens the scope so that studies can be extended to millions of images.

To interpret images, we utilise the Google Cloud Vision service; a service which allows images to be automatically annotated by machine learning algorithms. Of these, we use two forms in particular: 'label annotation' and 'web detection'. The former provides general annotations ("speaker"; "public"; television") while the latter provides specific content labels ("Hillary Clinton"; "Presidential Election"; "DNC").

This approach was created in response to Twitter's recently released elections integrity dataset. In October 2018, Twitter released a massive trove of data containing all communications from and between accounts believed to be connected to the Russian organisation known as the Internet Research Agency (IRA). The dataset (hereafter 'T-IRA') contains over 9 million tweets and 1.7 million images, GIFs and videos, rendering traditional corpus linguistic and multi-modal methods ineffective. This necessitated a new form of combined visual and textual analysis which can efficiently encode images and text to create fully integrated multimodal corpora.

This talk will comprise two main discussions. The first will be a demonstration of the method, reflecting on its reliability, consistency and other methodological implications. The second, will discuss the results from a pilot study of the T-IRA dataset making use of critical approaches to multimodal discourse analysis (Kress, 2011) to help shed light on how hostile state information operations are carried out on social media.

References:

Fanelli, G., Gall, J., Romsdorfer, H., Weise, T., & van Gool, L. (2010). 3D vision technology for capturing multimodal corpora: chances and challenges. LREC Workshop on Multimodal Corpora (pp. 70-73). Valletta: European Language Resources Association (ELRA).

Giraudel, A., Carré, M., Mapelli, V., Kahn, J., Galibert, O., & Quintard, L. (2012, May). The REPERE Corpus: a multimodal corpus for person recognition. In LREC (pp. 1102-1107).

Knight, D. (2011). The future of multimodal corpora. Revista brasileira de linguistica aplicada, 391-415.

Kress, G. (2011). Multimodal discourse analysis. In J. P. Gee, & M. Handford, The Routledge Handbook of Discourse Analysis (pp. 35-50). Abingdon: Routledge.

Ringeval, F., Sonderegger, A., Sauer, J., & Lalanne, D. (2013, April). Introducing the RECOLA multimodal corpus of remote collaborative and affective interactions. In 2013 10th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG) (pp. 1-8). IEEE.

Week 21 2018/2019

Thursday 28th March 2019
3:00-4:00pm

Management School LT 12