Previous seminars

Seminars from previous years are still being added, the archive is still available on the old website.

Academic year:

2016/2017

Week 8

Thursday 1st December 2016

3:00-4:00pm

Charles Carter A15

SAMS: Data and Text Mining for Early Detection of Alzheimer's Disease

Christopher Bull

SCC, Lancaster University

  • Abstract

The SAMS project (Software Architecture for Mental health Self-management) is investigating whether monitoring data from everyday computer-use activity can be used to effectively detect subtle signs of cognitive impairment that may indicate the early stages of Alzheimer's disease.

In this talk I will discuss the SAMS project, the collection of data and text form participants, and our approach to mining the text to infer cognitive health. During the SAMS project, bespoke software is used to collect data and text from participants (installed on the participants' home PCs). The collection software passively and unobtrusively collects many forms of data and text from the participants' PCs (inc. typed email and document text), which is securely logged, and later transferred to our server for analysis. The analysis consists of various data and text mining techniques to attempt to map trends and patterns in the data with clinical indicators of Alzheimer's Disease, e.g. working memory, motor control.

Tools usage within the SAMS project will also be discussed, including the development of the bespoke collection and analysis software, as well as existing tools that are re-used (Part of Speech Tagger, Semantic Tagger).

Week 7

Thursday 24th November 2016

3:00-4:00pm

Charles Carter A15

The Multilingual Semantic Annotation System

Scott Piao

SCC, Lancaster University

  • Abstract

In this talk, I will present ongoing work on the development of UCREL multilingual semantic annotation system. Over the past years, the original UCREL English semantic tagger has been adjusted and extended to cover more and more languages, including Finnish, Italian, Portuguese, Chinese, Spanish, French etc. Currently, a major project CorCenCC is underway in which a Welsh semantic tagger is under development in collaboration with Welsh project partners. This tool is useful for various corpus-based research such as cross-lingual studies. I'll discuss linguistic resources involved in the development of the tool, and introduce a GUI tool which links the multilingual tagger web services and help researchers to process corpus data conveniently.

Week 6

Thursday 17th November 2016

3:00-4:00pm

Charles Carter A15

How to use and read 25,000 texts from 1470-1700: an update from Visualising English Print

Heather Froehlich

University of Strathclyde

  • Abstract

This talk will present an overview of newly available resources from the Mellon-funded Visualising English Print project (Mellon-funded, University of Wisconsin-Madison, the University of Strathclyde and the Folger Shakespeare Library) for engaging with the Text Creation Partnership texts. She will discuss some of our the curation and standardisation principles guiding the project, how we envision scholars will use our resources, and present a case study of how to use our resources to conduct an analysis of Early Modern scientific writing.

Week 5

Thursday 10th November 2016

3:00-4:00pm

Charles Carter A16

'We are one family and serve [the] same god': Christians and Muslims' discourse about religion in south-west Nigeria

Dr Clyde Ancarno1 & Dr Insa Nolte2

1King's College London  2University of Birmingham

  • Abstract

Our paper will explore the ways in which religious identities are negotiated in a setting characterised by religious diversity and proximity: Yorubaland in South West Nigeria. We will explore how interreligious relationships are discursively constructed in extensive survey data (2,819 respondents in total) collected as part of an anthropological project focussing on the coexistence of Islam, Christianity and traditional practice in Yoruba-speaking parts of southwest Nigeria: 'Knowing each other: Everyday religious encounters, social identities and tolerance in southwest Nigeria'. Corpus tools and techniques will be used to examine the 1,535 questionnaires filled in English, particularly answers to open-ended questions (our corpus). The premise is that by exploring discursive choices made by Christian and Muslim respondents in this corpus, we can gain insights into Yoruba Muslims and Christians' perception of themselves and each other and their experiences of inter-religious encounters.

Owing to the focus of the paper, two sub-corpora of the above-mentioned corpus were compiled: one with all answers by respondents of Muslim faith and another with all answers by respondents of Christian faith. We will use four-grams for each of these corpora to show how corpus-assisted investigations into phraseology have helped us gain insights into the data which traditional anthropological methods alone would not have allowed. Our findings will concern, for example, the specific boundaries our Christian and Muslim respondents draw around their religious behaviour and their shared understanding of religion.

Week 4

Thursday 3rd November 2016

3:00-4:00pm

Management school LT9

A computational stylistic comparison between English used on Chinese governmental websites and English used on US and UK governmental websites

Jiyaue Wang

LAEL, Lancaster University

  • Abstract

English texts on Chinese governmental websites are often criticised for being 'Chinglish' or 'lifeless'. This project investigates how English versions of Chinese governmental websites can improve their stylistic quality. The project is a computational stylistic comparison between English texts on Chinese governmental websites and English texts on UK and US governmental websites. The approach is corpus-based and employs Biber's (1988) multidimensional analysis. A corpus (including two subcorpora) of websites had previously been downloaded using the wget-m method. Perl scripts were used to extract text content from web pages to form a txt file for each website, and word frequency lists and trigrams have also been extracted. Keyword lists for the two subcorpora have been generated based on a COCA word frequency list. Several issues remain to be dealt with before further analysis can be conducted, including: whether it is possible to separate 'real content' from purely repetitive content when data comes from web pages (such as menus, navigation, copyright); the alternatives to manual annotation when this is not a practical option given the massive size of the corpus; and how to identify which features to consider to make the comparison more significant.

Week 2

Thursday 20th October 2016

3:00-4:00pm

George Fox LT05

Tweet of the Art: an introduction to collecting, filtering, and exporting Twitter corpora with FireAnt

Claire Hardaker

LAEL, Lancaster University

  • Abstract

This talk will provide a basic introduction to FireAnt, a freeware tool that I have co-developed with Laurence Anthony (Waseda University). FireAnt offers three main utilities: the live collection of real-time tweets; the ability to filter that (and many other kinds of) data based on user-defined parameters; and the ability to export the data in formats suitable for corpus tools, network graphing, timeseries analysis, and so forth. I'll demonstrate each of these steps in turn, and provide suggestions along the way for possible types of investigation that FireAnt has been and can be used for. There will be time at the end for questions.

Week 1

Thursday 13th October 2016

3:00-4:00pm

Management school LT9

Corpus and software resources available at Lancaster

Andrew Hardie1 & Paul Rayson2

1CASS, Lancaster University  2SCC, Lancaster University

  • Abstract
This talk will provide a brief introduction to the UCREL research centre, and an overview of the corpus resources, software tools and infrastructure that is available for corpus linguistics and NLP researchers at Lancaster University. The talk will cover corpora of English and non-English varieties, and there will be brief descriptions of annotation, retrieval and other software. Two web-based systems (CQPweb and Wmatrix) will be briefly demonstrated. CQPweb is a corpus retrieval and analysis tool which provides fast access to a range of very large standard corpora. Wmatrix, on the other hand, allows uploading of your own English corpora, carries out tagging and provides key word and key domain analysis, plus frequency lists and concordancing.