Previous seminars

Seminars from previous years are still being added, the archive is still available on the old website.

Academic year:

2016/2017

Week 28

Thursday 15th June 2017

3:00-4:00pm

Furness LT 2

What can we learn from Big Social Data?

Walid Magdy

The University of Edinburgh

  • Abstract

Social media is becoming a hub for most of internet users to communicate, share their thoughts, report news, and express themselves. A large amount of research started to utilize the huge amount of data from social media in many different applications. In this talk, three example fields of research with social media data are presented. Initially, the task of information retrieval is discussed, where the main objective is to retrieve the most relevant information needed from social media effectively and efficiently. Secondly, data mining applications from social media are presented and demoed for applications such as text classification and sentiment analysis. Thirdly, using social media for computational social science studies is presented with some focus on political issues. A few examples are presented, including: studying the antecedent of ISIS support on social media, public response towards Muslims after Paris attacks 2016, and finally, the nature and dynamics of social media during the US Presidential Election 2016. Most of the work in this presentation was featured in news articles in popular press, such as CNN, Washington Post, BBC, the Independent, Aljazeera, the Daily Mail, and Mirror.

Week 27

Thursday 8th June 2017

3:00-4:00pm

Furness LT 2

Starting with verbs: complementary methods for the corpus-assisted discourse analysis of HEAR.

Alison Sealey & Emma Franklin

LAEL, Lancaster University

  • Abstract

We are both interested in the ways nonhuman animals are represented in language. One research theme concerns patterns in how discourse represents animals' perceptive and communicative capacities. A core verb relevant to both is HEAR, and we use corpus evidence to help answer the following questions: What are humans reported to 'hear', in general and in relation to sounds made by animals? What are animals reported to 'hear'?

In our presentation, we will explain the different corpora, corpus tools and analytical approaches we have used to generate answers to these questions. These include both familiar reference corpora (BNC, enTenTen), methods and tools (concordances using AntConc, SketchEngine), as well as a topic-specific corpus compiled for the project 'People', 'Products', 'Pests' and 'Pets'*, and the Corpus Pattern Analysis method* that was originally developed for lexicography, but is adapted here for discourse analysis.

[*refer to notes for references]

*'People', 'Products', 'Pests' and 'Pets': The Discursive Representation of Animals. http://animaldiscourse.wordpress.com/; *Hanks, P. (2004).Corpus pattern analysis. In Euralex Proceedings (Vol. 1, pp. 87-98)

Week 26

Thursday 1st June 2017

3:00-4:00pm

Furness LT 2

What do British newspapers mean when they talk about droughts? Examining 25 years of discourse, 1990-2014

Carmen Dayrell

CASS, Lancaster University

  • Abstract

This study combines Critical Discourse Analysis with Corpus Linguistics and Geographical Information Systems (GIS) to explore British media discourses surrounding drought events that took place in Britain between 1990 and 2014. In this talk, I will discuss the methodological challenges involved in compiling a newspaper corpus on droughts events in Britain specifically. It required dealing with the frequent metaphorical use of the word "drought", which posed a problem for the automatic collection of texts. Another challenge was the fact that newspapers do not discuss drought within the context of Britain only, thus identifying where the drought happened was a key issue in the analysis. I will conclude by presenting my initial results and exploring public discourses surrounding drought and attitudes to water shortage.

Week 24

Thursday 18th May 2017

3:00-5:00pm

Hannaford Lab

Wmatrix corpus annotation and comparison tool: bring your own corpus!

Paul Rayson

SCC, Lancaster University

  • Abstract

Wmatrix is a web based corpus annotation and retrieval tool which currently supports the analysis of small to medium sized English corpora and brings together techniques from corpus linguistics and natural language processing. In this two-hour hands-on workshop in the lab you will be guided through the main functions of Wmatrix which combine the keyness approach with corpus annotation to allow stylistic and conceptual analysis of texts loaded through the web interface. Wmatrix has been used for numerous studies in linguistics and social sciences (e.g. metaphor in end-of-life care, Shakespeare's comedies and tragedies, argumentation in reading groups, cultural change, formulaic language, language of psychopaths, political discourse analysis), medical education (e.g. problem-based learning transcripts), computer science (e.g. mining of early aspects and requirements in software engineering texts), management and financial studies (e.g. entrepreneurship, financial narratives). You will also hear about the USAS multilingual semantic tagger which will be incorporated into the next version of Wmatrix to extend the system to new languages. Please bring along your own corpus data in plain text or XML encoded form to test the system.

Week 23

Thursday 11th May 2017

3:00-4:00pm

Furness LT 2

Discourse- Semantic Changes of "Risk" in the New York Times, 1987-2014

Jens Zinn

CASS, Lancaster University

  • Abstract

The notion of risk has become pervasive in societal discourses and scholarly debate. From early work on risk and culture to the risk society, from governmen­tality theorists to modern systems theory all have built their work around the notion of risk and implicitly or explicitly refer to linguistic changes. Though this body of literature offers different explanations for the shift towards risk and its connection to social change, to date there has been no attempt to empirically examine their relative ability to explain this change in the communication of pos­sible harm to advance theorizing.

The presentation will present a study on the discourse semantic shift towards risk utilising a corpus-based investigation of risk words in a number of US newspa­pers from 1987 to 2014. The study supports Mary Douglas's claim that the mean­ing of risk is shifting towards the negative end. There is also good evidence that risk is an increasingly common experience but characterised by decreasing indi­vidual control. Decreasing agency in risk processes supports assumptions that the individualisation of risk in the news is accompanied by the scandal of not being in control. Generalised worries about risk are more common. There is a tendency of average people (e.g. men, women or children) being reported as vulnerable while powerful people are presented as risk takers. In contrast to Beck's theorizing, the study shows the importance of risk in the health area. The risk society might be much more characterised by concerns about health issues such as civilisation ill­nesses rather than new mega risks.

The research shows how corpus based approaches can be used to test and develop sociological hypothesis on historical change in the realm of risk.

Week 20

Thursday 23rd March 2017

3:00-4:00pm

Management school LT9

The Encyclopaedia of Shakespeare's Language Project and Corpus Methods

Jonathan Culpeper

LAEL, Lancaster University

  • Abstract

This presentation revolves around work taking place as part of the Encyclopaedia of Shakespeare's Language Project (http://wp.lancs.ac.uk/shakespearelang/). I will not be presenting one study in all its detail, but several studies - pilot studies, spin-off studies and in-progress studies. These will include discussion of neologisms, lexical words and grammatical words, multi-word units and the language of emotion. As I go along, I will reflect methodological challenges and solutions (or partial solutions!). I will also give glimpses of a range of results.

Week 19

Thursday 16th March 2017

3:00-4:00pm

Management school LT9

Evaluation metrics matter: predicting sentiment from financial news headlines

Andrew Moore

SCC, Lancaster University

  • Abstract

The SemEval Task 5 track 2 competition this year was to predict the sentiment with respect to a company in financial news headlines. We came 5th out of 45 competitors using a Bi-Directional Long Short-Term Memory (LSTM) deep learning model. We describe this approach and how we practically implemented it and the other methods we attempted. We describe the experience of participating in the event and how the evaluation metric used to evaluate the models should reflect the real world problem being solved. This talk we hope will be interesting to all wanting to see how easy it is now to implement deep learning models and how we can learn from others mistakes with regards to setting up an evaluation task to solve a real world problem.

Week 18

Thursday 9th March 2017

3:00-4:00pm

Management school LT9

Transgender people in the British press: A corpus-based discourse analysis

Angela Zottola

University of Napoli Federico II

  • Abstract

The research that will be presented focuses on the representation of transgender people in the British press, through a comparison between two sub-corpora representative of the popular press and the quality press (Jucker 1992).

Gender variant, non-binary and queer identities are nowadays a topic of discussion in many different domains, and language, due to its social function, has been playing a seminal role in the shaping and negotiating of these identities. The existing binary and heteronormative linguistic categories, generally used in defining gender, are conflicting with gender diversity and gender fluidity, possibly leading to the creation of new hybrid, inclusive, non-discriminating discourses that comprise social, cultural, and legal issues.

Against this backdrop, the press works as one of the most active sources in the creation of these discourses surrounding gender non-conforming people. Therefore, in the framework of Corpus-based Discourse Analysis (Baker, Gabrielatos and McEnery 2013), this investigation will focus on the linguistic choices retraceable in the corpora collected conveying a given representation of the transgender community as a social subject, and highlighting the ideologies underlining this specific discourse, as well as the media stance on transgender people in the UK.

Week 17

Thursday 2nd March 2017

3:00-4:00pm

Management school LT9

Constructing resources for the identification of native language features online: the process, possibilities and challenges

Sheryl Prentice

LAEL, Lancaster University

  • Abstract

This presentation outlines the construction of five parallel corpora written in English online by native speakers of five target languages. The corpora have been constructed as part of the Native Language Influence Detection (NLID) project. Their purpose is to aid the automatic identification of native influences on individuals' online communications in English. The presentation will outline the collection strategies developed to obtain this data, as well as the results of a quality review into its content. Particular sources and queries are found to be more productive than others in gathering data of this nature. Based on initial content investigations, the collection appears to be largely authentic and free from interference. The difficulties of matching such data to a sampling frame due to missing meta-data are also discussed. This presentation comes at a point at which the project team are investigating potential solutions to the problems of missing meta-data and further improving data accuracy, some of which will be outlined as part of this presentation. With this in mind, additional suggestions from audience members are particularly welcome.

Week 16

Thursday 23rd February 2017

3:00-4:00pm

Management school LT9

Collocation statistics in OCR data: are they reliable?

Amelia Joulain-Jay

Department of History; CASS

  • Abstract

The increasing availability of digitized texts is good news for scholars, including in the Humanities and Social Sciences, who are interested in analysing large quantities of texts in both quantitative and qualitative ways. Unfortunately, these texts are often digitized using Optical Character Recognition (OCR) software, with variable success. This is especially an issue for historical texts, which often attract lower quality output. In this presentation, I explore the impact of OCR errors on two common collocation statistics (Mutual Information and Log Likelihood), comparing statistics generated from a set of matching samples, including one hand-corrected ('gold') sample, an uncorrected sample, and an automatically corrected sample. These matching samples are excerpts from the British Library's c19th British Newspapers (part 1) collection. I find a clear effect of OCR errors especially on larger collocation spans and describe this effect in terms of differences between the values of the statistics generated in the various samples, rankings of the statistics in the various samples, and rates of false positives and false negatives in the uncorrected and automatically corrected samples when compared with the gold sample.

Week 15

Thursday 16th February 2017

3:00-4:00pm

Bowland North Seminar Room 26

Ideological functions of metonymy: The metonymic location names in Chinese and American political discourses

Min Chen

LAEL, Lancaster University

  • Abstract

The talk examines ideological functions of metonymy by exploring the metonymic location names in Chinese and American political discourses. Based on evidence extracted from the self-built corpora by the news articles on Sino-US relations with the aid of USAS Semantic Tagger and the Concordance Tool, the study intends to investigate the ideological motivation underlying the use of PLACE metonymies. Following MIP (Zhang, et al., 2011), it identifies and concludes the (sub)models of PLACE metonymy, namely PLACE FOR INSTITUTION, PLACE FOR INHABITANT, PLACE FOR POWER and PLACE FOR PRODUCT. A statistical analysis of these metonymies based on the chi-square test from the conceptual and discursive parameters presents their distributional features with finer granularity, the coverage commonalities and differentiation of the same metonymies in two discourse communities. The findings reveal that the metonymies, while operative on an embodied basis, are actually tied up with and reinforce the hidden ideologies of discourse communities, and thus become the carrier of the political stance or disposition, and evaluative partiality of two media groups, implicitly exercising the manipulative check on their audiences.

The talk is based on collaborative research with Yishan Zhou, a postgraduate at College of Foreign languages, University of Shanghai for Science and Technology (USST), China.

Week 14

Monday 6th February 2017

1:00-2:00pm

County South B89

Why privacy makes privacy research hard

Matt Edwards & Steve Wattam

SCC, Lancaster University

  • Abstract

Identity resolution capability for social networking profiles is important for a range of purposes, from open-source intelligence applications to forming semantic web connections. Yet research in this area is hampered by the lack of access to ground-truth data linking the identities of profiles from different networks. Almost all data sources previously used by researchers are no longer available, and historic datasets are both of decreasing relevance to the modern social networking landscape and ethically troublesome regarding the preservation and publication of personal data. We present and evaluate a method which provides researchers in identity resolution with easy access to a realistically-challenging labelled dataset of online profiles, drawing on four of the currently largest and most influential online social networks. We validate the comparability of samples drawn through this method and discuss the implications of this mechanism for researchers and potential alternatives and extensions.

This is a joint talk held in conjunction with The FORGE (The Forensic Linguistics Research Group)

Week 13

Thursday 2nd February 2017

3:00-4:00pm

Bowland North Seminar Room 26

Automatic detection of Spanish and Japanese modal markers and presence in spoken corpora

Carlos Herrero-Zorita

Autonomous University of Madrid

  • Abstract

The main aim of this study is to automatically find and classify elements that signal modality in Spanish and Japanese sentences, taking into account theoretical and empirical information. In an effort to join different disciplines such as typology, logic, corpus and computational linguistics, we aim to answer three main questions: (1) What is the best definition and classification of modality for a cross-linguistic computational work; (2) How is modality used in spoken Spanish and Japanese, and how modal markers are modified in discourse; (3) How can we formalise this information into a program that can annotate modals automatically in new texts.

The result is a rule-based program that outputs a XML with markers annotated and classified equally in both languages. Modality is seen from the logic perspective as a semantic feature that adds necessity or possibility meanings to the predicate of the sentence using a series of auxiliaries. The corpus shows how these auxiliaries can be affected by negation, ellipsis, syntactic separation and ambiguity, which need to be detected by the program for the sake of precision and recall.

The corpus study also provides information about modality usage, and reveals that its frequency is correlated with the type of interaction, possibly related to social constraints. Monologues achieve similar results in both languages, as well as non-linguistic factors of sex and age of the speakers. Dialogues on the other hand show a completely different picture in Spanish, with a predominance of necessity, and Japanese, with possibility slightly higher.

Week 12

Thursday 26th January 2017

3:00-4:00pm

Bowland North Seminar Room 26

Reading together, quoting aloud and literary education

John Gordon

University of East Anglia

  • Abstract

How do we talk about the books we read together? How do teachers guide reading of study texts in schools? This seminar reports on the continuing British Academy-funded project Literature's Lasting Impression which investigates shared reading of novels and reading aloud in primary schools, secondary schools, universities and public reading groups. In particular, it will attend to teachers' action of quoting study texts aloud during collective reading activity in primary and secondary classrooms. What functions does this appear to serve? Informed by Conversation Analysis, the presentation also extends exploration of quoting aloud as distinct from quotation in writing, which I have termed echo in earlier work investigating pupils' responses to poetry. Drawing on my role as a teacher educator in the field of Secondary English, I will also reflect on methodological issues and the role of empirical research in teacher education and the pedagogy of literary reading. How can transcripts of classroom interaction be used to refine and improve teacher education, and what is the potential of a corpus dedicated to this distinctive form of spoken language?

Week 11

Thursday 19th January 2017

3:00-4:00pm

Management school LT9

Word order in the recent history of English: syntax and processing on the move

Javier Pérez-Guerra

University of Vigo, Spain

  • Abstract

This talk examines the forces that trigger two word-order designs in English: (i) object-verb sentences (*?The teacher the student hit) and (ii) adjunct-complement vs complement-adjunct constructions (He taught yesterday Maths vs He taught Maths yesterday). The study focuses both on the diachronic tendencies observed in the data in Middle English, Early Modern and Late Modern English, and on their synchronic design in Present-Day English. The approach is corpus-based (or even corpus-driven) and the data, representing different periods and text types, are taken from a number of corpora (the Penn-Helsinki Parsed Corpus of Middle English, the Penn-Helsinki Parsed Corpus of Early Modern English, the Penn Parsed Corpus of Modern British English and the British National Corpus, among others). The aim of this talk is to look at the consequences that the placement of major constituents (eg. complements) has for the parsing of phrases in which they occur. I examine whether the data are in keeping with determinants of word order like complements-first (complement plus adjunct) and end-weight in the periods under investigation. Some statistical analyses will help determine the explanatory power of such determinants.

Week 9

Thursday 8th December 2016

3:00-4:00pm

Charles Carter A16

The Hungarian Route to the European Union: A Corpus-Assisted Study

Elena Valvason

University of Pavia

  • Abstract

On 12th April 2003, 83.6% of Hungarians voted in support of Hungary joining the European Union. This decisive result followed a massive parliamentary discussion about the issue and guaranteed Hungary access to the EU. But what was the attitude of Hungarian MPs towards the European Union? And how was Hungarian identity shaped in discourses about EU membership? In this talk I will present the preliminary results of a corpus-assisted study of Hungarian parliamentary speeches delivered between 1998 and 2003. After a brief historical introduction I will first outline the methodological approach I adopted to sketch attitudes and identities by means of collocation analysis. I will then describe the data I employed, namely the self-collected HUNPOL corpus. Finally, using the GraphColl software, I will show how semantic and discourse prosody can highlight the Hungarian politicians' stance regarding the European Union and the status they posit for themselves in a (possibly) new political dimension.

Week 8

Thursday 1st December 2016

3:00-4:00pm

Charles Carter A15

SAMS: Data and Text Mining for Early Detection of Alzheimer's Disease

Christopher Bull

SCC, Lancaster University

  • Abstract

The SAMS project (Software Architecture for Mental health Self-management) is investigating whether monitoring data from everyday computer-use activity can be used to effectively detect subtle signs of cognitive impairment that may indicate the early stages of Alzheimer's disease.

In this talk I will discuss the SAMS project, the collection of data and text form participants, and our approach to mining the text to infer cognitive health. During the SAMS project, bespoke software is used to collect data and text from participants (installed on the participants' home PCs). The collection software passively and unobtrusively collects many forms of data and text from the participants' PCs (inc. typed email and document text), which is securely logged, and later transferred to our server for analysis. The analysis consists of various data and text mining techniques to attempt to map trends and patterns in the data with clinical indicators of Alzheimer's Disease, e.g. working memory, motor control.

Tools usage within the SAMS project will also be discussed, including the development of the bespoke collection and analysis software, as well as existing tools that are re-used (Part of Speech Tagger, Semantic Tagger).

Week 7

Thursday 24th November 2016

3:00-4:00pm

Charles Carter A15

The Multilingual Semantic Annotation System

Scott Piao

SCC, Lancaster University

  • Abstract

In this talk, I will present ongoing work on the development of UCREL multilingual semantic annotation system. Over the past years, the original UCREL English semantic tagger has been adjusted and extended to cover more and more languages, including Finnish, Italian, Portuguese, Chinese, Spanish, French etc. Currently, a major project CorCenCC is underway in which a Welsh semantic tagger is under development in collaboration with Welsh project partners. This tool is useful for various corpus-based research such as cross-lingual studies. I'll discuss linguistic resources involved in the development of the tool, and introduce a GUI tool which links the multilingual tagger web services and help researchers to process corpus data conveniently.

Week 6

Thursday 17th November 2016

3:00-4:00pm

Charles Carter A15

How to use and read 25,000 texts from 1470-1700: an update from Visualising English Print

Heather Froehlich

University of Strathclyde

  • Abstract

This talk will present an overview of newly available resources from the Mellon-funded Visualising English Print project (Mellon-funded, University of Wisconsin-Madison, the University of Strathclyde and the Folger Shakespeare Library) for engaging with the Text Creation Partnership texts. She will discuss some of our the curation and standardisation principles guiding the project, how we envision scholars will use our resources, and present a case study of how to use our resources to conduct an analysis of Early Modern scientific writing.

Week 5

Thursday 10th November 2016

3:00-4:00pm

Charles Carter A16

'We are one family and serve [the] same god': Christians and Muslims' discourse about religion in south-west Nigeria

Dr Clyde Ancarno1 & Dr Insa Nolte2

1King's College London  2University of Birmingham

  • Abstract

Our paper will explore the ways in which religious identities are negotiated in a setting characterised by religious diversity and proximity: Yorubaland in South West Nigeria. We will explore how interreligious relationships are discursively constructed in extensive survey data (2,819 respondents in total) collected as part of an anthropological project focussing on the coexistence of Islam, Christianity and traditional practice in Yoruba-speaking parts of southwest Nigeria: 'Knowing each other: Everyday religious encounters, social identities and tolerance in southwest Nigeria'. Corpus tools and techniques will be used to examine the 1,535 questionnaires filled in English, particularly answers to open-ended questions (our corpus). The premise is that by exploring discursive choices made by Christian and Muslim respondents in this corpus, we can gain insights into Yoruba Muslims and Christians' perception of themselves and each other and their experiences of inter-religious encounters.

Owing to the focus of the paper, two sub-corpora of the above-mentioned corpus were compiled: one with all answers by respondents of Muslim faith and another with all answers by respondents of Christian faith. We will use four-grams for each of these corpora to show how corpus-assisted investigations into phraseology have helped us gain insights into the data which traditional anthropological methods alone would not have allowed. Our findings will concern, for example, the specific boundaries our Christian and Muslim respondents draw around their religious behaviour and their shared understanding of religion.

Week 4

Thursday 3rd November 2016

3:00-4:00pm

Management school LT9

A computational stylistic comparison between English used on Chinese governmental websites and English used on US and UK governmental websites

Jiyaue Wang

LAEL, Lancaster University

  • Abstract

English texts on Chinese governmental websites are often criticised for being 'Chinglish' or 'lifeless'. This project investigates how English versions of Chinese governmental websites can improve their stylistic quality. The project is a computational stylistic comparison between English texts on Chinese governmental websites and English texts on UK and US governmental websites. The approach is corpus-based and employs Biber's (1988) multidimensional analysis. A corpus (including two subcorpora) of websites had previously been downloaded using the wget-m method. Perl scripts were used to extract text content from web pages to form a txt file for each website, and word frequency lists and trigrams have also been extracted. Keyword lists for the two subcorpora have been generated based on a COCA word frequency list. Several issues remain to be dealt with before further analysis can be conducted, including: whether it is possible to separate 'real content' from purely repetitive content when data comes from web pages (such as menus, navigation, copyright); the alternatives to manual annotation when this is not a practical option given the massive size of the corpus; and how to identify which features to consider to make the comparison more significant.

Week 2

Thursday 20th October 2016

3:00-4:00pm

George Fox LT05

Tweet of the Art: an introduction to collecting, filtering, and exporting Twitter corpora with FireAnt

Claire Hardaker

LAEL, Lancaster University

  • Abstract

This talk will provide a basic introduction to FireAnt, a freeware tool that I have co-developed with Laurence Anthony (Waseda University). FireAnt offers three main utilities: the live collection of real-time tweets; the ability to filter that (and many other kinds of) data based on user-defined parameters; and the ability to export the data in formats suitable for corpus tools, network graphing, timeseries analysis, and so forth. I'll demonstrate each of these steps in turn, and provide suggestions along the way for possible types of investigation that FireAnt has been and can be used for. There will be time at the end for questions.

Week 1

Thursday 13th October 2016

3:00-4:00pm

Management school LT9

Corpus and software resources available at Lancaster

Andrew Hardie1 & Paul Rayson2

1CASS, Lancaster University  2SCC, Lancaster University

  • Abstract
This talk will provide a brief introduction to the UCREL research centre, and an overview of the corpus resources, software tools and infrastructure that is available for corpus linguistics and NLP researchers at Lancaster University. The talk will cover corpora of English and non-English varieties, and there will be brief descriptions of annotation, retrieval and other software. Two web-based systems (CQPweb and Wmatrix) will be briefly demonstrated. CQPweb is a corpus retrieval and analysis tool which provides fast access to a range of very large standard corpora. Wmatrix, on the other hand, allows uploading of your own English corpora, carries out tagging and provides key word and key domain analysis, plus frequency lists and concordancing.