Visualizing Dispersion: a new tool for Corpus Linguistics software

Andressa Gomide

CASS, Lancaster University

Measuring word distribution in a corpus is a very important, yet underused technique in Corpus Linguistics (CL). Although recent work has highlighted and discussed the relevance of dispersion measures (DM) when analysing a corpus (e.g. Biber et al. 2016), their presence in research is still very limited (Gries 2008).

This talk presents the development and implementation of a CL tool designed to help users understand dispersion measures and apply them in research. This tool was created as part of a project which aims to enhance the experience of users with CL software through the creation of graphical data visualisation.

In this presentation, I will outline the steps taken to achieve the final tool and do a software demonstration.

The development of the tool consisted of three steps: (a) identifying the target audience and understanding their needs; (b) development and implementation of the visualization; and (c) user assessment of the newly developed tool. User needs were assessed via (a) literature investigation into papers reporting corpus-based methods and (b) a contextual design approach (Beyer and Holtzblatt 1998), allowing observation of how users interact with CL software in their own environment. Key issues for a successful data visualization, such as its functionality, aesthetics and accuracy (Cairo 2016) were also considered. The new functionality was implemented in CQPweb (Hardie 2012), an open-source piece of software for corpus linguistic analysis. Finally, a user assessment was conducted to allow final adjustments to the system to be further fitted to the users' needs.

Towards the end of this presentation, I will offer a demonstration of the dispersion tool using the Sydney Corpus of Television Dialogue, a recently-launched corpus by Bednarek (2018).

References

Bednarek, M. (2018) Language and Television Series. A Linguistic Approach to TV Dialogue. Cambridge: Cambridge University Press.

Beyer, H. & Holtzblatt, K. (1998). Contextual Design: Defining Customer-Centered Systems. San Francisco: Morgan Kaufmann. ISBN 1-55860-411-1

Biber, D., Reppen R., Schnur E., Ghanem, R,. (2016). "On the (Non)Utility of Juilland's D to Measure Lexical Dispersion in Large Corpora." International Journal of Corpus Linguistics 21 (4):439-64. https://doi.org/10.1075/ijcl.21.4.01bib

Cairo, A. (2016). The truthful art: data, charts, and maps for communication. New Riders.

Gries, S. T. (2008). Dispersions and adjusted frequencies in corpora. International Journal of Corpus Linguistics, 13(4), 403-437. http://doi.org/10.1075/ijcl.13.4.02gri

Hardie, A. (2012). CQPweb — combining power, flexibility and usability in a corpus analysis tool. International Journal of Corpus Linguistics, 17(3), 380-409. http://doi.org/10.1075/ijcl.17.3.04har

Week 23 2018/2019

Thursday 13th June 2019
4:30-5:30pm

Fylde LT 2