Wmatrix corpus analysis and comparison tool
Wmatrix is a software tool for corpus analysis and comparison. It provides
a web interface to the English USAS and
CLAWS corpus annotation tools, and
standard corpus linguistic methodologies such as frequency lists and
concordances. It also extends the keywords method to key grammatical
categories and key semantic domains.
Wmatrix allows the user to run these tools via a web browser such as
Chrome or Firefox,
and so will run on any computer (Mac, Windows, Linux) with a web browser and
a network connection.
Wmatrix was initially developed by Paul Rayson
extended and applied to corpus linguistics during PhD work
and is still being updated regularly. Earlier versions were available for Unix via
terminal-based command line access (tmatrix) and Unix via Xwindows (Xmatrix),
but these only offer retrieval of text pre-annotated with USAS and CLAWS.
Sections in this introduction to Wmatrix:
screencasts (short video introductions),
acknowledgements and references for Wmatrix,
and example applications and publications.
Tutorial for Wmatrix: with step-by-step instructions using a case study on how
to compare Liberal Democrat and Labour Party Manifestos for the 2005 UK General Election
(updated May 2022).
Further examples of the application to the 2010 general election manifestos can be seen
on Paul's blog.
The plain text versions of the 2010 UK election manifestos can be downloaded for
use in your favourite text analysis software (with thanks to Martin Wynne for editing two of the files).
TEI encoded versions of the 2010 election manifestos are now available (with thanks to Lou Burnard).
Similar application has also been carried out on the 2015,
General Election manifestos with downloadable versions
of the documents from seven main parties.
One version of Wmatrix is now currently live for public use:
Usernames for Wmatrix are free to members and alumni of Lancaster University for non-commercial research.
Please apply on Wmatrix5 using your Lancaster email address, or if you no longer have access to a Lancaster address as an alumni then please contact
Accounts on Wmatrix5 are freely available for UK government and academic researchers in countries on the OECD DAC list
of ODA recipients (https://www.oecd.org/), and these accounts will stay free beyond the
current one month trial period.
Please apply on Wmatrix5 using your organisational email address.
Usernames for non-commercial research and teaching: (e.g. by non-Lancaster academics and students).
A free one-month trial is available for individual academic users,
please apply on Wmatrix5 using your organisational email address
to set up a username and password.
Once the one-month trial has expired, usernames are available for £50 per username per year
from the online secure order page run by Lancaster University.
Multiple usernames (or years) may be purchased at a reduced cost e.g. for teaching purposes.
Please contact Paul for details.
Further development, support, and external availability of Wmatrix currently depends on licensing its use.
Introduction to Wmatrix
Wmatrix users can upload their own corpus data to the system,
so that it can be automatically
annotated and viewed within the web browser.
Each file is stored in a folder (equivalent to a folder in Windows
or directory on Unix).
Input format guidelines
The analysis may be improved with some pre-editing of the input text,
although pre-editing is not normally required. There are
provided for texts to be tagged by CLAWS. Most important is the replacement
of less-than (<) and greater-than (>) characters by the corresponding SGML entity
references (<) and (>) respectively.
The text may contain well-formed HTML, SGML or XML tags. If the text
contains less-than or greater-than symbols in formulae, for example,
then CLAWS may mistake large quantities of the following text for SGML tags,
or fail to POS tag the file.
The guidelines mention start and end text markers, but these are not required
since they are inserted for you by Wmatrix.
Wmatrix users can upload their file and complete the
automatic tagging process by clicking on the tag
wizard. Once the file has been uploaded to the web server, it is POS tagged by
and semantically tagged by
This process can be carried out step by step starting
with the 'load file without tagging' option in the advanced interface.
As a shortcut you can simply upload frequency profiles
if you have them.
The format for a frequency list is a very simple two column format
with a total line at the head of the file. You can
see an example of this. The column widths are not
My Tag Wizard
My Tag Wizard is a variant of the tag wizard which allows you to
override or extend the system dictionaries for your own data. There are
two main uses. First, you can override the current most likely tag for any
word or MWE. Second, you can extend the dictionaries in terms of coverage
of vocabulary and tagset. For example, you can create a new tag by
listing the words and MWEs that you wish to be tagged with it.
By clicking on the folder name, the user can see its contents.
Following the application
of the tag wizard, the folder contains the original text, POS and semantically tagged
versions of that text, and a set of frequency profiles.
Simple and advanced interfaces
The user can toggle between simple and advanced interfaces in Wmatrix.
The advanced interface offers more options and more control over the data.
From the folder view, the user can click on a frequency list to see the
most frequent items in their corpus.
Frequency lists are available for words in the simple interface, and in the advanced interface
for POS tags and semantic tags.
The lists can be sorted alphabetically or by frequency.
From the frequency list view, the user can click on 'concordance' and see standard
concordances. These can show the usual word based concordance as well as
all occurrences for words in one POS or semantic category.
Key words, key POS and key domains: comparison of frequency lists
From the folder view, the user can click on compare frequency list to
perform a comparison of the frequency list for their corpus against another larger
normative corpus such as the BNC sampler, or against another of their own texts
(once that text has been loaded into Wmatrix). This comparison can be carried out
at the word level to see keywords, or at the POS (in the advanced interface), or at the
semantic level (to see key concepts or domains). The log-likelihood statistic is employed by
Wmatrix. For more details, see the log-likelihood calculator.
In the simple interface, word and tag clouds are shown
which visualise the more significant differences in the larger font sizes.
In the advanced interface more detailed frequency information is
also displayed in table form.
Then the key comparison shows the most significant key items
towards the top of the list since the result is sorted on the LL
(log-likelihood) field which shows how significant the difference is.
You should just look at items with a '+' code since this shows overuse
in your text as compared to the standard English corpora. To be
statistically significant you should look at items with a LL value
over about 7, since 6.63 is the cut-off for 99% confidence of
N-grams and c-grams
Recurrent sequences of words are called n-grams in Wmatrix. These are similar
to clusters in WordSmith and lexical bundles in Biber's work. You can calculate
n-grams of length 2 to 5 for each text. Collapsed-grams (or c-grams) are
a merged version of these lists. They show you which 2-grams are subsets of
3-grams, which 3-grams are subsets of 4-grams, and so on. The resulting c-gram
list is a tree structure with the longest n-grams on the left and
shortest n-grams on the right.
Collocations in Wmatrix are pairs of words that occur together more often than would be expected
due to chance. There are a choice of 11 different statistics that can be used to calculate the
strength of association between the two words.
For further details about these statistics, see the following paper:
Piao, S. (2002) Word alignment in English-Chinese parallel corpora.
Literary and linguistic computing, 17 (2), 207-230.
The collocation feature was introduced in September 2009 and is currently in beta testing.
This section shows short video introductions to the Wmatrix software.
Further videos will be appearing soon.
Acknowledgements and references:
Wmatrix was initially developed within the
(REVerse Engineering of Requirements)
funded by the EPSRC, project number
Lancaster University Proof of concept funding in July 2006
provided support for a new server and continued software development.
In December 2006, further interface design using XHTML/CSS was carried out by
Andrew Foote (InfoLab21 Knowledge Business Centre) funded under support from
the European Regional Development Fund. Through a Lancaster University small grant
(Towards an Online Conceptual Database of the Latin Vulgate Bible)
a 'reader' interface is being developed for pre-tagged corpora.
Why the name, Wmatrix? Originally, I wrote a piece of software called Matrix which presented
tables of frequency information from corpora, hence the named is
partially derived from mathematical 'matrices'. This was Unix terminal
based using 'curses'. I then wrote an X-windows version with a
graphical user interface and named it Xmatrix. The web based version
came next, hence Wmatrix. I also have a Java API to the website called
Jmatrix. There's a note in my PhD saying that it has nothing to do with
any films featuring Keanu Reeves, but if you're a Doctor Who fan like
me, you may recognise another meaning of the
The collocation feature in Wmatrix uses software derived from
MLCT developed by
The C-grams feature uses software developed by Andrew Stone.
Thanks are due to Steve Wattam
who ported the semantic tagger, frequency
profiling and concordance software to Linux from Solaris.
Please reference Wmatrix as one of the following:
Rayson, P. (2008).
From key words to key semantic domains.
International Journal of Corpus Linguistics.
13:4 pp. 519-549.
Rayson, P. (2009) Wmatrix: a web-based corpus processing environment,
Computing Department, Lancaster University. http://ucrel.lancs.ac.uk/wmatrix/
Rayson, P. (2003).
Matrix: A statistical method and software tool for linguistic analysis through
Ph.D. thesis, Lancaster University.
(abstract or full text
All icons and emojis designed by OpenMoji - the open-source emoji and icon project. License: CC BY-SA 4.0
Publications and applications using Wmatrix:
Wmatrix has been applied to numerous issues including:
Aspect oriented requirements engineering,
impact analysis of academic research,
Frequency profile comparison of written and spoken English,
Political science research,
Training chatbots: comparison of human-human and human-machine dialogues,
Key word analysis,
Key word-class analysis for EAP,
Key domain analysis,
Comparison of political party manifestos,
Metaphors in political discourse,
Analysis of online language,
e-learning materials development,
Computer content analysis: analysis of interview transcripts
Entrepreneurship studies and knowledge transfer.
Abu Shawar, Bayan; Atwell, Eric. Using dialogue corpora to train a chatbot.
In Archer, D, Rayson, P, Wilson, A & McEnery, T (editors)
Proceedings of CL2003: International Conference on Corpus Linguistics,
pp. 681-690 Lancaster University. 2003.
- Bandar Al-Hejin (2014)
Covering Muslim women: Semantic macrostructures in BBC News.
Discourse & Communication.
Archer, D., Culpeper, J. and Rayson, P. (2005)
Love - a familiar or a devil? An exploration of key domains in Shakespeare's
Comedies and Tragedies.
Presented at the AHRC ICT Methods Network Expert Seminar on Linguistics.
Lancaster University, 8 September 2005.
- Archer, D. and Malory, B. (2017) Tracing facework over time using semi-automated methods.
International Journal of Corpus Linguistics, Volume 22, Number 1, 2017, pp. 27-56.
- Giuseppina Balossi (2014)
A Corpus Linguistic Approach to Literary Language and Characterization
Virginia Woolf's The Waves.
Balossi, G. (2020). Key Pronouns through Wmatrix in a Novel of Formation: Conrad's The Shadow-Line.
Umanistica Digitale, 5(9), 79-96. DOI: 10.6092/issn.2532-8816/10542
Beigman Klebanov, B., Diermeier, D., and Beigman, E. 2008.
Automatic annotation of semantic fields for political science research.
Journal of Language Technology and Politics 5(1):95-120.
Bianchi F. (2016). "Subtitling Jane Austen: Pride & Prejudice by Joe
Wright". In Colomba C. (ed.), Pride and Prejudice: A Bicentennial
Bricolage, ALL, Forum, Udine, pp. 253-265.
Bianchi F. (2017). The social tricks of advertising. Discourse
strategies of English-speaking tour operators on Facebook. Iperstoria 10,
Bianchi F. (2017). Strategie promozionali degli operatori del lusso in
Facebook, Lingue e Linguaggi, 20, Special Issue, edited by M.G. Guido,
"Strategie di comunicazione dei prodotti di lusso attraverso l'inglese
come 'lingua franca' internazionale. Sostenibilita ed emozioni come leve
strategiche per lo sviluppo del 'Made in Puglia'", pp. 239-271.
- Borza N. (2021) The Discursive Representation of Violence in the Context of the Migration Crisis in Europe: A CDA Case Study on the Discursive Support of Non-violence in the Media Reporting on the Chemnitz Events.
In: Anesa P., Fragonara A. (eds) Discourse Processes between Reason and Emotion.
Postdisciplinary Studies in Discourse. Palgrave Macmillan, Cham.
Breeze, R. (2018) Imagining the people in UKIP and Labour.
In Hidalgo Tenorio, E., Benitez-Castro, M. A., de Cesare, F. (eds). Populist Discourse: A Methodological Synergy. London: Routledge (pp. 120-135).
Breeze, R. (2019) Emotion in politics: Affective-discursive practices in UKIP and Labour. Discourse & Society 30 (1), 24-43.
Calvo Maturana, Ma del Coral. 2012. Maternidad y voces poéticas en
'The Adoption Papers' de Jackie Kay: un estudio de estilistica de
corpus. [Motherhood and poetic voices in
'The Adoption Papers' by Jackie Kay: a corpus stylistics study] PhD. Granada: Universidad de Granada.
Calzada Pérez, Maria. 2010. "Learning from Obama and Clinton: Using
individuals' corpora in the language classroom". Moreno Jaen et al.
(eds) Exploring New Paths in Language Pedagogy, London: Equinox. p.
- Caimotto, M. Cristina (2020).
Discourses of Cycling, Road Users and Sustainability: An Ecolinguistic Investigation.
Palgrave Macmillan. (see chapter 5)
- Capriello A, Mason P, Davis B, Crotts J. 2013. Farm tourism experiences in travel reviews. A cross-comparison of three alternative methods for data analylsis. Journal of Business Research, 66: 778-785
Castaneda, A., & Lopez de D'Amico, R. 2012
PODER Y LENGUAJE EN BRUISED HIBISCUS, DE ELIZABETH NUNEZ: ANÁLISIS LITERARIO A
TRAVÉS DE LA HERRAMIENTA INFORMÁTICA WMATRIX.
[Power and Language in
Elizabeth Nunez's Bruised Hibiscus: a literary analysis through the use of
Tonos Digital [Online] 22:0.
Available at http://www.tonosdigital.es/ojs/index.php/tonos/article/view/736/512
- Castañeda, R. R. (2015).
Land Acquisition and the Semantic Context of Land within the Normative Construction of "Modern Development". In E. Osabuohien (Ed.),Handbook of Research on In-Country Determinants and Implications of Foreign Land Acquisitions (pp. 63-82). Hershey, PA: Business Science Reference.
- Chandra, Y. (2016) A rhetoric-orientation view of social entrepreneurship. Social Enterprise, 12:2, 161-200.
- Cheng, Le and Cheng Chen. (2019).
The Construction of Relational Frame Model in Chinese President Xi Jinping's Foreign Visit Speeches, Text & Talk 2:149-170.
- Christos Charitonidis, Awais Rashid, Paul J. Taylor (2017)
Predicting Collective Action from Micro-Blog Data.
In J. Kawash et al. (eds.), Prediction and Inference from Social Networks and Social
Media, Lecture Notes in Social Networks,
Jonathan Charteris-Black and Clive Seale. (2010).
Gender and the language of illness.
Basingstoke: Palgrave Macmillan.
- Charteris-Black, J., & Seale, C. (2013). Men and emotion talk: Evidence from the experience of illness. Gender And Language, 1(1). Retrieved 1 May, 2013, from
Chitchyan, R., Sampaio, A., Rashid, A. and Rayson, P. (2006).
Evaluating EA-Miner: Are Early Aspect Mining Techniques Effective?
In proceedings of Towards Evaluation of Aspect Mining (TEAM 2006).
Workshop Co-located with ECOOP 2006, European Conference on Object-Oriented Programming, 20th edition,
July 3-7, Nantes, France, pp. 5-8.
Da Silva AL, Dennick R. Corpus analysis of problem based learning transcripts
: an exploratory study. Medical education. 2010;44(3):280-8.
- Da Silva AL, Dennick R. 2009 CORPORA ANALYSIS OF PROBLEM-BASED LEARNING
TRANSCRIPTS. In ASME Annual Scientific Meeting 2009. Edinburgh, UK
- Da Silva AL, Dennick R. 2009 - PBL - "it's all talk".
Corpora Analysis of
Problem Based Learning transcripts. In o Association for Medical Education in
Europe (AMEE) conference 2009. Malaga, Spain
- Da Silva AL, Dennick R. 2010 -Applying corpora research methods to the
study of Language and Clinical Reasoning in a Problem Based Learning
Curricula. In Promoting Excellence in Healthcare Educational Research - A
Multiprofessional Conference. Law and Social Sciences Building University of
- Da Silva AL, Dennick R 2010 EVALUATING PROBLEM-BASED LEARNING TRANSCRIPTS
USING CORPUS ANALYSIS: DO MEN AND MACHINES AGREE?. In 14th Ottawa Conference.
Miami, Florida, US
- Da Silva AL, Dennick R 2010 EVALUATING PROBLEM Corpus Analysis of
Problem-Based Learning Transcripts: A new method to look into PBL. In o
Researching Medical Education. London, UK
- Da Silva, Wharrad & Pitt., 2011. Interprofessional Learning Sets:
Exploratory analysis of online students discussions (Poster). In NET
Conference, 2011. Cambridge, UK.
- Da Silva, & Pitt., 2011. More than words: Analysis of students'
Interprofessional online discussions. In EIPEN 2011. Ghent, Belgium.
- Davis B, Pope C, Mason P, Magwood G, Jenkins C. 2011. 'It's a wild thing, waiting to get me': Stance analysis of African Americans with diabetes. Diabetes Educator, 409-418
- Davis B, Maclagan M. 2013. Talking with Maureen: Pauses, extenders, and formulaic language in small stories and canonical narratives by a woman with dementia. In R. Schrauf and N Mueller, eds. Dialogue and dementia: Cognitive and communicative engagement. NY: Psychology Press
- Davis B, Mason P. 2013. Computer-aided identification of stance shifts and semantic themes in electronic discourse analysis. In H. Lim & F. Sudweeks, eds, Innovative Methods and Technologies for Electronic Discourse Analysis. Hershey: ICI.
- Debras, C. and L'Hôte, E. (2015)
Framing, metaphor and dialogue:
A multimodal approach to party conference speeches.
Metaphor and the Social World 5:2 (2015), 177-204.
Marilyn Deegan, Harold Short, Dawn Archer, Paul Baker,
Tony McEnery, Paul Rayson (2004)
Computational Linguistics Meets Metadata, or the Automatic Extraction of
Key Words from Full Text Content.
Vol. 8, No. 2.
- Demjén, Z. (2011) The role of second person narration in representing
mental states in Sylvia Plath's Smith Journal.
Journal of Literary Semantics. 40(1), pp1-22.
Doherty, N., Lockett, N., Rayson, P. and Riley, S. (2006).
Electronic-CRM: a simple sales tool or facilitator of relationship
marketing? 29th Institute for Small Business & Entrepreneurship
Conference. International Entrepreneurship - from local to global
enterprise creation and development. 31 October - 2 November 2006,
- Escobar, W. (2015). Language configurations in the spoken production of Colombian EFL university students.
Colomb. Appl. Linguist. J., 17(1), pp. 114-129
- FBI Law Enforcement Bulletin (July 2012)
The Language of Psychopaths: New Findings and Implications for Law Enforcement.
By Michael Woodworth, Ph.D.; Jeffrey Hancock, Ph.D.; Stephen Porter, Ph.D.; Robert Hare, Ph.D.; Matt Logan, Ph.D.; Mary Ellen O'Toole, Ph.D.; and Sharon Smith, Ph.D.
Gabrielatos, C. and McEnery, T. (2005). Epistemic modality in MA dissertations.
In. Fuertes Olivera, P.A. (ed.) Lengua y Sociedad: Investigaciones recientes en
linguistica aplicada. Linguistica y Filologia no. 61. Valladolid: Universidad de Valladolid, pp. 311-331.
Gacitua, R., Sawyer, P., Rayson, P. (2008). A flexible framework to
experiment with ontology learning techniques. In Knowledge-Based
Systems, 21, 3, April 2008, pp. 192-199. DOI:
- Jeffrey T. Hancock, Michael T. Woodworth and Stephen Porter (2013)
Hungry like the wolf: A word-pattern analysis of the language of psychopaths.
Legal and Criminological Psychology.
Volume 18, Issue 1, pages 102-114.
- He, J. (2019) Two-layer reading positions in comments on online news discourse about China.
Discourse & Communication.
- Hidalgo-Downing, Laura and Yasra Hanawi (2017) Bush's and Obama's addresses
to the Arab World: recontextualizing stance in political discourse. In
Karin Ajmer & Diana Lewis (eds.) The Yearbook of Corpus Linguistics
and Pragmatics. Special Issue on 'Contrastive Analysis of Discourse
-pragmatic Aspects of Linguistic Genres'.
- Hidalgo-Downing, Laura (2014) The role of negative-modal synergies in
Charles Darwin's The Origin of Species. In Geoff Thompson and Laura
Alba Juez (eds.) Evaluation in Discourse. John Benjamins. Pps. 259-279.
- Hidalgo-Tenorio E. (2009) The Metaphorical Construction of Ireland.
In: Ahrens K. (eds) Politics, Gender and Conceptual Metaphors. Palgrave Macmillan, London.
Yufang Ho. (2007) Investigating the key concept differences between the two
editions of John Fowles's The Magus - a corpus semantic approach.? The
27th International Conference of the Poetics and Linguistics Association
(PALA), Kansai Gaidai University, Hirakata, Osaka, Japan, 31 July - 4
- Hou, Z. (2019) Using semantic tagging to examine the American Dream and the Chinese Dream. Semiotica (227), pp. 145-168.
- Hu, C. (2015) Using Wmatrix to Explore Discourse of Economic Growth.
English Language Teaching, Vol. 8, No. 9.
Xin Huang (2003) A Computer-aided Diachronic Content Analysis of Twentieth Century
Political Discourse in China. MA dissertation in Language Studies, Lancaster University.
- Irwin, P.M. (2015). The development of resilience in two cohorts of older, single women, living on their own, in a small rural town in Australia. (Unpublished doctoral dissertation). University of Oxford, Oxford, UK.
- Isaacs T, Murdoch J, Demjén Z, Stevenson F. (2020)
Examining the language demands of informed consent documents in patient recruitment to cancer trials using tools from corpus and computational linguistics. Health.
Jones, M., Rayson, P. and Leech, G. (2004)
Key category analysis of a spoken corpus for EAP.
Presented at The 2nd Inter-Varietal Applied Corpus Studies
International Conference on "Analyzing Discourse in Context"
The Graduate School of Education, Queen's University, Belfast, Northern
Ireland, 25 - 26 June, 2004.
- Kheovichai, B. (2015). Metaphorical scenarios in business science discourse. Iberica, 29, 155-178. Available from http://www.aelfe.org/documents/09_IBERICA_29.pdf
Emilie L'Hôte and Maarten Lemmens
(2009) Reframing treason: metaphors of change and progress in new Labour discourse.
CogniTextes, Volume 3, http://cognitextes.revues.org/index248.html
Leech, G., Rayson, P., and Wilson, A. (2001).
Word Frequencies in Written and Spoken English: based on the British National Corpus.
(see the companion website for more details)
Leech, G. (2013) Virginia Woolf meets Wmatrix. Etudes de Stylistique Anglaise No. 4, pp. 15-26.
Leedham, M., Lillis, T. & Twiner, A. (2020). Exploring the core 'preoccupation' of social work writing: A corpus-assisted discourse study. Journal of Corpora and Discourse Studies. 3. Pp.1-26. https://jcads.cardiffuniversitypress.org/articles/abstract/26/
Leedham, M. (2020). 'Social workers dismissed concerns': A corpus-assisted discourse study of the portrayal of a profession in UK newspapers. In: Corpus Assisted Discourse Studies (CADS) Conference, 17-19 June 2020 (online). University of Sussex.
Leedham, M.; Lillis, T. and Twiner, A. (2019). Exploring the core 'preoccupation' of social work writing: A corpus-assisted discourse study. In: International Corpus Linguistics Conference, 23-26 July 2019. Cardiff University.
- Lin, Y-L. (2015) Contrastive analysis of adolescent learner interlanguage in asynchronous online communication: A keyness approach. System. Volume 55, December 2015, Pages 53-62.
- Lin, Y-L. (2017)
Keywords, semantic domains and intercultural competence in the British and Taiwanese Teenage Intercultural Communication Corpus.
Corpora, Volume 12 Issue 2, Page 279-305.
- López-Rodríguez, C. I. (2022). Emotion at the end of life: Semantic annotation and key domains in a pilot study audiovisual corpus.
Lingua, 277, 103401. DOI: 10.1016/j.lingua.2022.103401
- Lord V, Davis B, Mason P. 2008. Stance-shifting in language used by sex offenders. Psychology, Crime & Law 14, 357-379.
- MacArthur, F., Krennmayr, T. and Littlemore, J. (2015). How basic is UNDERSTANDING IS SEEING when reasoning about knowledge? Asymmetric uses of SIGHT metaphors in office hours' consultations in English as academic lingua franca. Metaphor and Symbol 30 (3): 184-217.
- Maclagan M, Davis B, Lunsford R. 2008. Fixed expressions, extenders and metonymy in the speech of people with Alzheimer's disease. In Phraseology: an interdisciplinary perspective, eds. S. Granger & F. Meunier. Amsterdam & NY: John Benjamins,
- Patrick Maiwald (2011).
Exploring a Corpus of George MacDonald's Fiction.
North Wind: Journal of George MacDonald Studies 30: 50-84.
- Markowitz DM, Hancock JT (2014) Linguistic Traces of a Scientific
Fraud: The Case of Diederik Stapel. PLoS ONE 9(8): e105937.
McIntyre, D. and Walker, B. (2010) 'How can corpora be used to explore the
language of poetry and drama?' in McCarthy, M. and O'Keefe, A. (eds)
The Routledge Handbook of Corpus Linguistics. Abingdon: Routledge.
Afida Mohamad Ali (2007). Semantic fields of problem in business English:
Malaysian and British journalistic business texts.
Corpora, 2, 2, pp. 211-239.
- Akira Murakami, Paul Thompson, Susan Hunston and Dominik Vajn (2017)
‘What is this corpus about?’: using topic modelling to explore a specialised corpus.
Corpora, Volume 12 Issue 2, Page 243-277.
Murphy, S. (2007). Now I am alone: A corpus stylistic approach to Shakespearian soliloquies.
Papers from the Lancaster University Postgraduate Conference in
Linguistics & Language Teaching, Vol. 1. Papers from LAEL PG 2006
Edited by Costas Gabrielatos, Richard Slessor & J.W. Unger.
- Manvender Kaur Sarjit Singh, N. H. D. (2020). Automated Detecting of Key concept of Kurdish National Identity for Discourse-Historical Approach (DHA). International Journal of Advanced Science and Technology, 29(3s), 404 - 418. Retrieved from
Nakano, T. and Koyama, Y. (2005).
e-Learning Materials Development Based on Abstract Analysis Using Web Tools.
Knowledge-Based Intelligent Information and Engineering Systems.
9th International Conference, KES 2005, Melbourne, Australia, September 14-16, 2005, Proceedings, Part I,
LNCS 3681, Springer, pp. 794-800. DOI 10.1007/11552413_113
Newman, J. and K. Geeraert. 2014. TIME in a semantically annotated corpus of Canadian English. In B. Lewandowska-Tomaszczyk and K. Kosecki (eds.), Time and Temporality in Language and Human Experience, pp. 241-262. Lodz Studies in Language 32. Frankfurt a. M.: Peter Lang.
O'Halloran, K.A. (2010) 'Critical reading of a text through its electronic
supplement', Digital Culture and Education, 2(2): 210-229.
O'Halloran, K.A. (2011a) 'Limitations of the logico-rhetorical module:
Inconsistency in argument, online discussion forums and Electronic Deconstruction',
Discourse Studies, 13(6): 797-806.
O'Halloran, K.A. (2011b) 'Investigating Argumentation in Reading Groups:
Combining Manual Qualitative Coding and Automated Corpus Analysis Tools',
Applied Linguistics 32(2): 172-196.
O'Halloran, K. (2012)
Deleuze, Guattari and the use of web-based corpora for facilitating critical analysis of public sphere arguments.
Discourse, Context & Media.
Volume 2, Issue 1, March 2013, Pages 40-51, ISSN 2211-6958, 10.1016/j.dcm.2012.12.001.
O'Halloran, K.A. (2014)
Deconstructing arguments via digital mining of online comments.
Literary and Linguistic Computing, DOI: 10.1093/llc/fqu034
O'Halloran, K.A. (2017) Posthumanism and Deconstructing Arguments: Corpora and Digitally-driven Critical Analysis, London: Routledge.
O'Halloran, K.A. (2019) A posthumanist pedagogy using digital text analysis to enhance critical thinking in higher education.
Digital Scholarship in the Humanities.
Vincent B.Y. Ooi, Peter K.W. Tan & Andy K.L. Chiang (2007)
Analyzing personal weblogs in Singapore English: the Wmatrix approach.
Studies in Variation, Contacts and Change in English.
Volume 2. Research Unit for Variation, Contacts and Change in English (VARIENG), University of Helsinki.
Vincent B.Y. Ooi (2008) lexis of electronic gaming on the Web: a Sinclairian approach, International Journal of Lexicography, 21 (3), 311-323.
- Parkinson, C. and Howorth, C. (2008) 'The language of social entrepreneurs', Entrepreneurship and Regional Development, 20(3): 285-309.
Magali Paquot, Sylviane Granger, Paul Rayson and Cédrick Fairon (2004)
Extraction of multi-word units from EFL and native English corpora:
The phraseology of the verb 'make'.
Europhras, European Society of Phraseology,
26-29 August 2004, Basel, Switzerland.
- Pérez-Paredes, P. (2017). A Keyword Analysis of the 2015 UK Higher Education Green Paper and the Twitter Debate. In Power, persuasion and manipulation in specialised genres: providing keys to the rhetoric of professional communities. Bern: Peter Lang.
- Pérez-Paredes, P. & Díez-Bedmar, B. (2018) Researching learner language through POS Keyword and syntactic complexity analyses. In S. Götz and J. Mukherjee (eds.) Learner Corpora and Language Teaching. Studies in Corpus Linguistics Series. Amsterdam: John Benjamins.
- Potts, A. (2015). Filtering the Flood: Semantic Tagging as a Method of Identifying Salient Discourse Topics in a Large Corpus of Hurricane Katrina Reportage.
In Paul Baker and Tony McEnery (eds.)
Corpora and Discourse Studies, pp. 285-304.
- Potts, A. and Baker, P. (2013) Does semantic tagging identify cultural change in British and American English?,
International Journal of Corpus Linguistics 17(3): 295-324.
- Potts, A. and Kjær, A.L. (2015)
Constructing Achievement in the International Criminal Tribunal for the Former Yugoslavia (ICTY): A Corpus-Based Critical Discourse Analysis.
International Journal for the Semiotics of Law.
- Amanda Potts, Monika Bednarek, Helen Caple (2015)
How can computer-based methods help researchers to investigate news values in large datasets? A corpus linguistic study of the construction of newsworthiness in the reporting on Hurricane Katrina.
Discourse & Communication,
Vol 9, Issue 2, pp. 149 - 172.
Paul Rayson (2004).
Keywords are not enough.
Invited talk for JAECS (Japan Association for English Corpus Studies)
at Chuo University, Tokyo, Japan, 27th November 2004.
Rayson, P. and Smith, N. (2006)
The key domain method for the study of language varieties.
The Third Inter-Varietal Applied Corpus Studies (IVACS) group International Conference on
"LANGUAGE AT THE INTERFACE".
University of Nottingham, UK, 23-24 June 2006.
Sawyer, P., Rayson, P. and Cosh, K. (2005)
Shallow Knowledge as an Aid to Deep Understanding in Early Phase Requirements Engineering.
IEEE Transactions on Software Engineering. Volume 31, number 11, November, 2005, pp. 969 - 981.
- Sera, H. (2013). Dipictions of emotions in Snow Country: A semantic analysis. Presented at PALA 2013, Heidelberg.
- Sera, H. (2012). Dickens' 'The Signal-Man' and Poe's 'The Fall of the House of
Usher': How did they describe terror?
Presented at PALA 2012, Malta.
- Shapero, J. J. (2011). The Language of Suicide Notes. Unpublished Thesis. The University of Birmingham.
- Shapero, J. J. & Blackwell, Susan A. (2012) "'There are letters for you all on the sideboard': what can linguists learn from multiple suicide-note writers?" p.225-244. In Samuel Tomblin, Nicci MacLeod, Rui Sousa-Silva and Malcolm Coulthard (Eds.) Proceedings of The International Association of Forensic Linguists' Tenth Biennial Conference. Centre for Forensic Linguistics, Aston University, U.K.
[ISBN: 978 1 85449 432 0]
- Song, Y., Lee, CC., Huang, Z. (2019). The news prism of nationalism versus globalism: How does the US, UK and Chinese elite press cover 'China's rise'?. Journalism, 1-20.
First published online on May 8, 2019. https://doi.org/10.1177/1464884919847143
- Emily C. Soriano (2014) A corpus linguistics approach to exploring interpersonal processes in couple-focused therapy for problematic alcohol use.
Thesis for Master of Experimental Psychology, University of Arizona.
- Emily C. Soriano, Kelly E. Rentscher, Michael J. Rohrbaugh and Matthias R. Mehl
A Semantic Corpus Comparison Analysis of Couple-Focused Interventions for Problematic Alcohol Use.
Clinical Psychology and Psychotherapy.
- M Stubbs (2014) Patterns of emotive lexis and discourse organization
in short stories by James Joyce. In P Blumenthal et al eds. Les
émotions dans le discours. Emotions in Discourse. Frankfurt/Main:
Peter Lang. 237-53.
Francois Taiani, Paul Grace, Geoff Coulson and Gordon Blair (2008)
Past and future of reflective middleware: Towards a corpus-based
The 7th Workshop On Adaptive And Reflective Middleware (ARM'08)
December 1st 2008,
Leuven, Belgium, collocated with Middleware 2008.
Tan, Yesheng. A Corpus-based Cognitive Study of the "Rustic Literariness" of Translated Chinese Fiction:
Focusing on Sinologist Translators' Works in the Last Four Decades.
In Ricardo Morrato (ed.) Diverse voices in Chinese Translation and interpreting, Springer, 2021.
Trotta, J. (2019). What can a corpus tell us about apocalyptic/dystopian texts?
In J. Trotta, Z. Filipovic, & H. Sadri (eds.), Broken mirrors: Representations of Apocalypses and Dystopias in Popular Culture. London: Routledge, pp. 179-201.
- Van de Putte, Thomas. (2017)
European citizenship policy between building collectives and appealing to individuals: A study of person deixis.
Flubacher, Mi-Chia;Diederich, Catherine;Dankel, Philip - Bulletin VALS-ASLA, 2016, vol. 104, p. 105-123. Swiss Association of Applied Linguistics (Vereinigung für angewandte Linguistik in der Schweiz VALS-ASLA)
Walker, B. (2010) Wmatrix, key-concepts and the narrators in Julian Barnes'
Talking It Over. In Busse, B. and McIntyre, D. (eds.)
Language and Style, pp. 364-387.
- Walker, B. (2012). Character and Characterisation in Julian Barnes' Talking It Over:
A Corpus Stylistic Analysis. PhD Thesis, Lancaster University.
Walkerdine, J. and Rayson, P. (2004)
P2P-4-DL: Digital Library over Peer-to-Peer.
In Caronni G., Weiler N., Shahmehri N. (eds.)
Proceedings of Fourth IEEE International Conference on Peer-to-Peer Computing
25-27 August 2004, Zurich, Switzerland.
IEEE Computer Society Press, pp. 264-265. ISBN 0-7695-2156-8.
- Rebecca Willis (2017) Taming the Climate? Corpus analysis of
politicians' speech on climate change, Environmental Politics, 26:2, 212-231.
- Wong, I., Ou, J. and Wilson, A. (2021) Evolution of hoteliers' organizational crisis communication in the time of mega disruption. Tourism Management.
- Xin, Jing and Matheson, Donald (2015) The Chinese writer as empty signifier: A corpus-based analysis of the English-language reporting of the 2012 Nobel Prize in Literature. Chinese Journal of Communication 8 (3): 289-305.
A number of papers were presented at the PALA 2007 conference
(29-30 July 2007, Kansai Gaidai University, Osaka, Japan)
including those by Geoffrey Leech, Yu-fang Ho, Dan McIntyre, Haruko Sera, Brian Walker.
Mick Short and Brian Walker also ran a Workshop: Using Wmatrix to compare scenes from Harold Pinter's Betrayal.
See the book of abstracts on the conference website for more details.
InfoLab21 Knowledge Transfer Study Report and the
Knowledge Transfer Research Project