Automated prototypical text detection for corpus and critical discourse studies using KeyAnt

Laurence Anthony1 & Paul Baker2

1Faculty of Science and Engineering, Waseda University, Japan.  2LAEL, Lancaster University

Corpus-based researchers and traditional qualitative researchers, such as critical discourse analysts, are often required to select prototypical texts for close reading that include the language features of interest present in a much larger body of work. When these selections are made, the researcher can sometimes be criticised for 'cherry-picking' texts that illustrate a preconceived idea or point. In this presentation, we will present a principled way of selecting texts for close reading based on a ranking of texts in terms of the number of keywords they contain. To facilitate this analysis, we have developed a multiplatform, freeware software tool called KeyAnt that analyses a corpus of texts, generates a ranked list of keywords based on statistical significance and effect size, and then orders the texts by the number of keywords in them. We will present the theoretical background to this work, the KeyAnt tool, and a detailed KeyAnt analysis of various corpora to illustrate the tool's ability to automatically detect both prototypical and outlier texts in a corpus.

Week 11 2014/2015

Thursday 15th January 2015

Management school LT9