Previous seminars

Seminars from previous years are still being added, the archive is still available on the old website.

Academic year:


Thursday 6th February 2020


Charles Carter A15

Understanding caregivers' experiences of supporting suicidal relatives with psychosis or bipolar disorder: a qualitative analysis of an online-peer support forum

Paul Marshall

Spectrum Centre for Mental Health Research, Lancaster University

  • Abstract

Over recent decades, significant research attention has focused on understanding the experiences of those who support or care for people with psychosis or bipolar disorder. Over the same period, research has consistently shown that those experiencing bipolar disorder, psychosis, and related diagnoses such as schizophrenia, are at a much greater risk of suicidal ideation, attempting suicide, or dying by suicide than the general population. Despite this, few studies to date have investigated caregivers' experiences of providing support to people with psychosis or bipolar disorder during periods of increased suicide risk.

A qualitative thematic analysis of data derived from an online peer support forum was undertaken to address this previously understudied topic. The forum was established by researchers at Lancaster University as part of a randomised controlled trial to evaluate an intervention for relatives of people with psychosis and bipolar disorder - the Relatives Education and Coping Toolkit. This seminar will present the initial findings of this analysis in addition to reviewing some of the challenges and advantages of using online forum data in qualitative research. There will also be an opportunity to discuss how corpus methods could be applied to further investigate this dataset.

Week 12

Thursday 23rd January 2020


Charles Carter A18

Internal validity in learner corpus research

Pascual Pérez-Paredes

Universidad de Murcia, RSLE University of Cambridge

  • Abstract

This talk will report findings from two case studies where different corpora have been used to investigate the use of stance adverbs in spoken communication. All the corpora discussed were collected using the same design criteria. The focus on this discussion is on the range of inferences from the data available (Gray, 2017) and ultimately on the nature of learner corpus research (LCR), both ontologically and epistemologically.

The first case study will probe into the use native speaker corpora (Aguado-Jiménez et al, 2012), while the second will focus on the analyses of different English L2 (learner) corpora (Pérez-Paredes & Bueno, 2019). I will discuss the implications of using triangulation techniques (Baker & Egbert, 2016; Flick, 2018) in LCR and how researchers may benefit from increased criticality in their research designs.

Keywords: corpus linguistics, stance adverbs, data triangulation, research validity


Aguado-Jiménez, P., Pérez-Paredes, P. & Sánchez, P. 2012. Exploring the use of multidimensional analysis of learner language to promote register awareness, System 40(1), 90-103.

Baker. P. & Egbert, J. (eds). 2016. Triangulating methodological approaches in corpus linguistic research. London: Routledge.

Flick, U. 2018. Doing triangulation and mixed methods. London: Sage.

Gray, D. 2017. Doing research in the real world. 4th Edition. London: Sage.

Marchi, A. & Taylor, C. 2009. If on a winter's night two researchers...: a challenge to assumptions of soundness of interpretation. Critical Approaches to Discourse Analysis across Disciplines: CADAAD,3(1), 1-20.

Pérez-Paredes, P. & Bueno, C. 2019. A corpus-driven analysis of certainty stance adverbs: obviously, really and actually in spoken native and learner English. Journal of Pragmatics, 140,22-3

Week 11

Thursday 16th January 2020


Charles Carter A18

Corpus linguistics and clinical psychology: examining the psychosis continuum

Luke Collins1 & Elena Semino2

1CASS, Lancaster University  2LAEL, Lancaster University

  • Abstract

We present our work with the 'Hearing the Voice' project, a study exploring experiences of Auditory Verbal Hallucinations (AVHs), or voices that others cannot hear. Auditory Verbal Hallucinations are experienced by a large proportion of individuals with a psychiatric diagnosis (such as schizophrenia or bipolar disorder) and approximately 1% of people with no psychiatric diagnosis (Kråvik et al., 2015). Researchers have investigated similarities/differences across 'clinical' and 'non-clinical' populations (i.e. those who seek clinical support for their experiences and those who do not) and proposed a 'continuum' model for those experiences. Our corpus linguistic approach offers a novel contribution to debates in clinical psychology around the validity of the 'psychosis continuum' model.

We analysed semi-structured interviews with 67 'voice-hearers': 27 self-identified 'Spiritualists' (non-clinical) and 40 individuals registered with Early Intervention in Psychosis services (clinical) to consider what evidence there is for a 'continuum' with respect to their reports. We conducted a keyness analysis at the level of semantic domains, using the USAS tagger (Rayson, 2008). From the list of key semantic domains, we identified four major themes through which to investigate the (dis)similarity of aspects of the voice-hearing experience across our two cohorts: Affect; Control; Meaning-making; and Sensory input. These themes corresponded with aspects of the voice-hearing experience identified by psychologists as points of similarity/difference between clinical and non-clinical populations (Baumeister et al., 2017).

We found that there is evidence for continuity between the reports of clinical and non-clinical participants, though in some circumstances there is also grounds for considering sub-categories of the clinical population. Our analysis thereby offers the means through which to critically assess the validity of the 'continuum' model and consider its implications for clinical treatment.


Baumeister, D., Sedgwick, O., Howes, O., Peters, E. (2017) Auditory verbal hallucinations and continuum models of psychosis: A systematic review of the healthy voice-hearer literature. Clinical Psychology Review 51: 125-41.

Kråkvik. B., Larøi, F., Kalhovde, A. M., Hugdahl, K., Kompus, K., Salvesen, Ø., Stiles, T. C. and Vedul-Kjelsås, E. (2015) Prevalence of auditory verbal hallucinations in a general population: a group comparison study. Scandinavian Journal of Psychology 56: 508-15.

Rayson, P. (2008) From key words to key semantic domains. International Journal of Corpus Linguistics 13(4): 519-549.

Week 10

Thursday 12th December 2019


Management school LT5

Weird and non-WEIRD: Introducing the Corpus of Indonesian Sign Language (BISINDO)

Nick Palfreyman

University of Central Lanchasire

  • Abstract

We have entered the age of the sign language corpus, with several comprehensive corpora already available -including for Australian Sign Language, NGT (Sign Language of the Netherlands) and British Sign Language. However, there is currently a noticeable bias towards SLs of WEIRD (Western, educated, industrialized, rich, democratic) countries. This presentation introduces the BISINDO Corpus, which features over 45,000 tokens from spontaneous conversation between 131 participants using Indonesian Sign Language. For this corpus, data were collected between 2010 and 2017 from six Indonesian cities/islands. I begin by discussing some of the challenges in compiling the BISINDO corpus, including - in some cases - finding deaf sign language users in the field. Other challenges are not particular to sign language research and seem to be faced by corpus linguists in many non-WEIRD societies, especially around ethics. I then move on to look at two examples of how the corpus can shed light on processes of language change in BISINDO. First, I look at the grammatical domain of negation, and second at signs based on BISINDO's two manual alphabets.

Week 9

Thursday 5th December 2019


B78 (DSI Space) InfoLab21

Extractive Summarisation for Scientific Articles; making them more discoverable

Daniel Kershaw


  • Abstract
At Elsevier, a lot of effort is focussed on content discovery for users, allowing them to find the most relevant articles for their research. This, at its core, blurs the boundaries of search and recommendation as we are both pushing content to the user and allowing them to search the world's largest catalogue of scientific research. Apart from using the content as is, we can make new content more discoverable with the help of authors at submission time, for example by getting them to write an executive summary of their paper. However, doing this at submission time means that this additional information is not available for older content. This raises the question of how we can utilise the author's input on new content to create the same feature retrospectively to the whole Elsevier corpus. Focusing on one use case, we discuss how an extractive summarization model (which is trained on the user-submitted summaries), is used to retrospectively generate executive summaries for articles in the catalogue. Further, we show how extractive summarization is used to highlight the salient points (methods, results and finding) within research articles across the complete corpus. This helps users to identify whether an article is of particular interest for them. As a logical next step, we investigate how these extractions can be used to make the research papers more discoverable through connecting it to other papers which share similar findings, methods or conclusion. In this talk we start from the beginning, understanding what users want from summarization systems. We discuss how the proposed use cases were developed and how these tie into the discovery of new content. We then look in more technical detail at what data is available and which deep learning methods can be utilised to implement such a system. Finally, while we are working toward taking this extractive summarization system into production, we need to understand the quality of what is being produced before going live. We discuss how internal annotators were used to confirming the quality of the summaries. Though the monitoring of quality does not stop there, we continually monitor user interaction with the extractive summaries as a proxy for quality and satisfaction.

Joint UCREL and DSG talk

Week 6

Thursday 14th November 2019


Management school LT5

Acronyms as an Integral Part of Multi-Word Term Recognition - A Token of Appreciation

Irena Spasic

University of Cardiff

  • Abstract
The increasing amount of textual information in requires effective term recognition methods to identify textual representations of domain-specific concepts as the first step toward automating its semantic interpretation. The dictionary look-up approaches may not always be suitable for dynamic domains such as biomedicine or the newly emerging types of media such as patient blogs, the main obstacles being the use of non-standardised terminology and high degree of term variation. Term conflation is the process of linking together different variants of the same term. In automatic term recognition approaches, all term variants should be aggregated into a single normalized term representative, which is associated with a single domain-specific concept as a latent variable. FlexiTerm is an unsupervised method for recognition of multi-word terms from a domain-specific corpus. It uses regular expressions to constrain the search space based on term formation patterns and then processes them statistically to identify largest frequently occurring bags of words and the corresponding terms. FlexiTerm uses a range of methods to normalize three types of term variation - orthographic, morphological, and syntactic variations. Acronyms, which represent a highly productive type of term variation, were not originally supported. In this talk, we describe how the functionality of FlexiTerm has been extended to recognize acronyms and incorporate them into the term conflation process. We evaluated the effects of term conflation in the context of information retrieval as one of its most prominent applications. On average, relative recall increased by 32 points, whereas index compression factor increased by 7% points. Therefore, evidence suggests that integration of acronyms provides non-trivial improvement of term conflation.

Joint UCREL and DSG talk

Week 4

Thursday 31st October 2019


Charles Carter A18

Representations of health conditions in the UK and US press: A corpus linguistic approach

Ewan Hannaford

University of Glasgow

  • Abstract

Mental and physical illness have traditionally been seen as distinct categories, but this division is now recognised by medical experts and health professionals as largely unhelpful and inaccurate. Medical experts now often encourage a more holistic approach to healthcare, whereby mental health conditions are treated as just another type of illness (RCP Report, 2010; Kolappa, et al., 2013). However, amongst the general public, mental illness remains highly stigmatised and viewed as distinct from physical illness (Kendell, 2001; Pescosolido, et al, 2010). Media representations have a large impact on public perceptions and significantly influence the prevalence of stigmas and stereotypes surrounding both traditionally physical and traditionally mental disorders (Wahl, 1995; Stuart, 2006; Young, Norman, & Humphreys, 2008). Differences in media coverage of such illnesses may therefore be contributing to the persistence of a societal distinction between 'physical' and 'mental' illness.

My research is investigating this using two regional corpora of UK and US press coverage, each spanning over 20 years and covering a range of disorders across the traditional physical/mental health spectrum. Through corpus linguistic analysis of these datasets, my work aims to uncover differences and similarities in the themes, topics, and attitudes present in different health condition discourses, and to identify the potential causes of these features. My talk will provide a brief background on my work and previous research into press representations of health conditions, before discussing my methodological approach and presenting some preliminary results from a recent pilot study I conducted. The implications of these pilot findings for my full study will then be explored, with a current update on the status of my research.


American Psychiatric Association. (1994). Diagnostic and Statistical Manual of Mental Disorders (4th Ed.). Washington, DC: American Psychiatric Association.

Kendell, R. (2001). The distinction between mental and physical illness. British Journal of Psychiatry 178. 490-493.

Kolappa, K., Henderson, D., & Kishore, S. (2013). No physical health without mental health: Lessons unlearned? Bulletin of the World Health Organisation 91:3. 3-3A.

Pescosolido, B., Martin, J., Long, J., Medina, T., Phelan, J., & Link, B. (2010). "A disease like any other"? A decade of change in public reactions to schizophrenia, depression, and alcohol dependence. American Journal of Psychiatry 167. 1321-1330.

Royal College of Psychiatrists. (2010). No Health Without Mental Health: The Supporting Evidence. RCP Report. Available from:

Stuart, H. (2006). Media portrayal of mental illness and its treatments: What effect does it have on people with mental illness? CNS Drugs 20:2. 99-106.

Wahl, O. (1995). Media Madness: Public Images of Mental Illness. New Brunswick, NJ: Rutgers University Press.

Young, M., Norman, G., & Humphreys, K. (2008). Medicine in the popular press: The influence of the media on perceptions of disease. PloS ONE 3:10. E3552. Available from:

Week 3

Thursday 24th October 2019


Charles Carter A18

Social Networks in Early Modern English Comedies

Jakob Ladegaard & Ross Deans Kristensen-McLachlan

Aarhus University

  • Abstract

Social network analysis is used in sociological and sociolinguistic research to study patterns of verbal interaction between members of a community. This method has rarely been applied to literary texts at scale, but in this talk I present a work in progress that attempts to use computationally assisted social network analysis on a corpus of around 20 dramatic texts; so-called prodigal son comedies written in English between 1590 and 1640. Literary criticism of these plays often focuses on the relationship between prodigal sons and their father figures but pay relatively little attention to the social networks they are part of. However, we believe these networks are important not only for the dramatic plots, but also for what the plays might tell us about social and economic questions of the time. We therefore wanted to study the plays' social networks. This can be done in terms of the plays' overall network metrics, which might reveal structural changes in this dramatic subgenre over time, but mainly we were interested in comparing characters with specific traits across plays. This was done by constructing networks with characters as nodes and their verbal exchanges as links. We extracted overall network metrics for all plays as well as count measures (line and word counts) and metric measures (degree and centrality measures) for all characters in all the texts. We then ranked the characters in each play according to their scores on these measures. This allowed us to compare the roles of different types of characters across plays, in particular the prodigal sons, their father figures and the minor characters who in some cases mediate their relation. The talk will present some preliminary results of these comparisons and end out with a discussion of the possibility of combining this social network approach with other, more stylistically oriented corpus based approaches to these texts.

The work presented here was done in collaboration with Ross Deans Kristensen-McLachlan, Aarhus University.


Jakob Ladegaard is Associate Professor in Comparative Literature, Aarhus University. His research is primarily concerned with the relations between modern literature, politics and economy. He is the PI of the research project: 'Unearned Wealth - A Literary History of Inheritance, 1600-2015', 2017-2021. The project uses digital methods to study English and French literary representations of inheritance. Recent publications include Context in Literary and Cultural Studies (ed. with J.G. Nielsen), UCL Press, 2019.

Week 2

Thursday 17th October 2019


Charles Carter A18

Detecting Meaningful Multi-word Expressions in Political Text

Ken Benoit

London School of Economics and Political Science

  • Abstract
The rapid growth of applications treating text as data has transformed our ability to gain insight into important political phenomena. Almost universal among existing approaches is the adoption of the bag of words approach, counting each word as a feature without regard to grammar or order. This approach remains extremely useful despite being an ob- viously inaccurate model of how observed words are generated in natural language. Many politically meaningful textual features, however, occur not as unigram words but rather as pairs of words or phrases, especially in language relating to policy, political economy, and law. Here we present a hybrid model for detecting these associated words, known as collocations. Using a combination of statistical detection, human judgement, and machine learning, we extract and validate a dictionary of meaningful collocations from three large corpora totalling over 1 billion words, drawn from political manifestos and legislative floor debates. We then examine how the word scores of phrases in a text model compare to the scores of their component terms.

Week 1

Thursday 10th October 2019


Charles Carter A18

Verbs in specialized language: the case of the knowledge base EcoLexicon

Míriam Buendía-Castro

University of Granada (Spain)

  • Abstract
This research presents EcoLexicon (, a multilingual terminological knowledge base on the environment developed at the University of Granada which contains over 3,500 concepts and over 20,000 terms in English, Spanish, German, French, Russian, and Modern Greek. This talk focuses on how verb phraseological information is encoded in EcoLexicon. As is well known, verbs are an extremely important part of language, however, very few specialized knowledge resources include them. It is our assertion that verbs and their potential arguments can be classified and structured in a set of conceptual-semantic categories typical of a given specialized domain. In this context, when semantic roles and macroroles are specified as well as the resulting phrase structure, it is then possible to establish templates that represent this meaning for entire frames. In this regard, within the context of a specialized knowledge domain, the range of verbs generally associated with potential arguments can be predicted within the frame of a specialized event.