Previous seminars

Seminars from previous years are still being added, the archive is still available on the old website.

Academic year:

2016/2017

Week 15

Thursday 16th February 2017

3:00-4:00pm

Bowland North Seminar Room 26

Ideological functions of metonymy: The metonymic location names in Chinese and American political discourses

Min Chen

LAEL, Lancaster University

  • Abstract

The talk examines ideological functions of metonymy by exploring the metonymic location names in Chinese and American political discourses. Based on evidence extracted from the self-built corpora by the news articles on Sino-US relations with the aid of USAS Semantic Tagger and the Concordance Tool, the study intends to investigate the ideological motivation underlying the use of PLACE metonymies. Following MIP (Zhang, et al., 2011), it identifies and concludes the (sub)models of PLACE metonymy, namely PLACE FOR INSTITUTION, PLACE FOR INHABITANT, PLACE FOR POWER and PLACE FOR PRODUCT. A statistical analysis of these metonymies based on the chi-square test from the conceptual and discursive parameters presents their distributional features with finer granularity, the coverage commonalities and differentiation of the same metonymies in two discourse communities. The findings reveal that the metonymies, while operative on an embodied basis, are actually tied up with and reinforce the hidden ideologies of discourse communities, and thus become the carrier of the political stance or disposition, and evaluative partiality of two media groups, implicitly exercising the manipulative check on their audiences.

The talk is based on collaborative research with Yishan Zhou, a postgraduate at College of Foreign languages, University of Shanghai for Science and Technology (USST), China.

Week 14

Monday 6th February 2017

1:00-2:00pm

County South B89

Why privacy makes privacy research hard

Matt Edwards & Steve Wattam

SCC, Lancaster University

  • Abstract

Identity resolution capability for social networking profiles is important for a range of purposes, from open-source intelligence applications to forming semantic web connections. Yet research in this area is hampered by the lack of access to ground-truth data linking the identities of profiles from different networks. Almost all data sources previously used by researchers are no longer available, and historic datasets are both of decreasing relevance to the modern social networking landscape and ethically troublesome regarding the preservation and publication of personal data. We present and evaluate a method which provides researchers in identity resolution with easy access to a realistically-challenging labelled dataset of online profiles, drawing on four of the currently largest and most influential online social networks. We validate the comparability of samples drawn through this method and discuss the implications of this mechanism for researchers and potential alternatives and extensions.

This is a joint talk held in conjunction with The FORGE (The Forensic Linguistics Research Group)

Week 13

Thursday 2nd February 2017

3:00-4:00pm

Bowland North Seminar Room 26

Automatic detection of Spanish and Japanese modal markers and presence in spoken corpora

Carlos Herrero-Zorita

Autonomous University of Madrid

  • Abstract

The main aim of this study is to automatically find and classify elements that signal modality in Spanish and Japanese sentences, taking into account theoretical and empirical information. In an effort to join different disciplines such as typology, logic, corpus and computational linguistics, we aim to answer three main questions: (1) What is the best definition and classification of modality for a cross-linguistic computational work; (2) How is modality used in spoken Spanish and Japanese, and how modal markers are modified in discourse; (3) How can we formalise this information into a program that can annotate modals automatically in new texts.

The result is a rule-based program that outputs a XML with markers annotated and classified equally in both languages. Modality is seen from the logic perspective as a semantic feature that adds necessity or possibility meanings to the predicate of the sentence using a series of auxiliaries. The corpus shows how these auxiliaries can be affected by negation, ellipsis, syntactic separation and ambiguity, which need to be detected by the program for the sake of precision and recall.

The corpus study also provides information about modality usage, and reveals that its frequency is correlated with the type of interaction, possibly related to social constraints. Monologues achieve similar results in both languages, as well as non-linguistic factors of sex and age of the speakers. Dialogues on the other hand show a completely different picture in Spanish, with a predominance of necessity, and Japanese, with possibility slightly higher.

Week 12

Thursday 26th January 2017

3:00-4:00pm

Bowland North Seminar Room 26

Reading together, quoting aloud and literary education

John Gordon

University of East Anglia

  • Abstract

How do we talk about the books we read together? How do teachers guide reading of study texts in schools? This seminar reports on the continuing British Academy-funded project Literature's Lasting Impression which investigates shared reading of novels and reading aloud in primary schools, secondary schools, universities and public reading groups. In particular, it will attend to teachers' action of quoting study texts aloud during collective reading activity in primary and secondary classrooms. What functions does this appear to serve? Informed by Conversation Analysis, the presentation also extends exploration of quoting aloud as distinct from quotation in writing, which I have termed echo in earlier work investigating pupils' responses to poetry. Drawing on my role as a teacher educator in the field of Secondary English, I will also reflect on methodological issues and the role of empirical research in teacher education and the pedagogy of literary reading. How can transcripts of classroom interaction be used to refine and improve teacher education, and what is the potential of a corpus dedicated to this distinctive form of spoken language?

Week 11

Thursday 19th January 2017

3:00-4:00pm

Management school LT9

Word order in the recent history of English: syntax and processing on the move

Javier Pérez-Guerra

University of Vigo, Spain

  • Abstract

This talk examines the forces that trigger two word-order designs in English: (i) object-verb sentences (*?The teacher the student hit) and (ii) adjunct-complement vs complement-adjunct constructions (He taught yesterday Maths vs He taught Maths yesterday). The study focuses both on the diachronic tendencies observed in the data in Middle English, Early Modern and Late Modern English, and on their synchronic design in Present-Day English. The approach is corpus-based (or even corpus-driven) and the data, representing different periods and text types, are taken from a number of corpora (the Penn-Helsinki Parsed Corpus of Middle English, the Penn-Helsinki Parsed Corpus of Early Modern English, the Penn Parsed Corpus of Modern British English and the British National Corpus, among others). The aim of this talk is to look at the consequences that the placement of major constituents (eg. complements) has for the parsing of phrases in which they occur. I examine whether the data are in keeping with determinants of word order like complements-first (complement plus adjunct) and end-weight in the periods under investigation. Some statistical analyses will help determine the explanatory power of such determinants.

Week 9

Thursday 8th December 2016

3:00-4:00pm

Charles Carter A16

The Hungarian Route to the European Union: A Corpus-Assisted Study

Elena Valvason

University of Pavia

  • Abstract

On 12th April 2003, 83.6% of Hungarians voted in support of Hungary joining the European Union. This decisive result followed a massive parliamentary discussion about the issue and guaranteed Hungary access to the EU. But what was the attitude of Hungarian MPs towards the European Union? And how was Hungarian identity shaped in discourses about EU membership? In this talk I will present the preliminary results of a corpus-assisted study of Hungarian parliamentary speeches delivered between 1998 and 2003. After a brief historical introduction I will first outline the methodological approach I adopted to sketch attitudes and identities by means of collocation analysis. I will then describe the data I employed, namely the self-collected HUNPOL corpus. Finally, using the GraphColl software, I will show how semantic and discourse prosody can highlight the Hungarian politicians' stance regarding the European Union and the status they posit for themselves in a (possibly) new political dimension.

Week 8

Thursday 1st December 2016

3:00-4:00pm

Charles Carter A15

SAMS: Data and Text Mining for Early Detection of Alzheimer's Disease

Christopher Bull

SCC, Lancaster University

  • Abstract

The SAMS project (Software Architecture for Mental health Self-management) is investigating whether monitoring data from everyday computer-use activity can be used to effectively detect subtle signs of cognitive impairment that may indicate the early stages of Alzheimer's disease.

In this talk I will discuss the SAMS project, the collection of data and text form participants, and our approach to mining the text to infer cognitive health. During the SAMS project, bespoke software is used to collect data and text from participants (installed on the participants' home PCs). The collection software passively and unobtrusively collects many forms of data and text from the participants' PCs (inc. typed email and document text), which is securely logged, and later transferred to our server for analysis. The analysis consists of various data and text mining techniques to attempt to map trends and patterns in the data with clinical indicators of Alzheimer's Disease, e.g. working memory, motor control.

Tools usage within the SAMS project will also be discussed, including the development of the bespoke collection and analysis software, as well as existing tools that are re-used (Part of Speech Tagger, Semantic Tagger).

Week 7

Thursday 24th November 2016

3:00-4:00pm

Charles Carter A15

The Multilingual Semantic Annotation System

Scott Piao

SCC, Lancaster University

  • Abstract

In this talk, I will present ongoing work on the development of UCREL multilingual semantic annotation system. Over the past years, the original UCREL English semantic tagger has been adjusted and extended to cover more and more languages, including Finnish, Italian, Portuguese, Chinese, Spanish, French etc. Currently, a major project CorCenCC is underway in which a Welsh semantic tagger is under development in collaboration with Welsh project partners. This tool is useful for various corpus-based research such as cross-lingual studies. I'll discuss linguistic resources involved in the development of the tool, and introduce a GUI tool which links the multilingual tagger web services and help researchers to process corpus data conveniently.

Week 6

Thursday 17th November 2016

3:00-4:00pm

Charles Carter A15

How to use and read 25,000 texts from 1470-1700: an update from Visualising English Print

Heather Froehlich

University of Strathclyde

  • Abstract

This talk will present an overview of newly available resources from the Mellon-funded Visualising English Print project (Mellon-funded, University of Wisconsin-Madison, the University of Strathclyde and the Folger Shakespeare Library) for engaging with the Text Creation Partnership texts. She will discuss some of our the curation and standardisation principles guiding the project, how we envision scholars will use our resources, and present a case study of how to use our resources to conduct an analysis of Early Modern scientific writing.

Week 5

Thursday 10th November 2016

3:00-4:00pm

Charles Carter A16

'We are one family and serve [the] same god': Christians and Muslims' discourse about religion in south-west Nigeria

Dr Clyde Ancarno1 & Dr Insa Nolte2

1King's College London  2University of Birmingham

  • Abstract

Our paper will explore the ways in which religious identities are negotiated in a setting characterised by religious diversity and proximity: Yorubaland in South West Nigeria. We will explore how interreligious relationships are discursively constructed in extensive survey data (2,819 respondents in total) collected as part of an anthropological project focussing on the coexistence of Islam, Christianity and traditional practice in Yoruba-speaking parts of southwest Nigeria: 'Knowing each other: Everyday religious encounters, social identities and tolerance in southwest Nigeria'. Corpus tools and techniques will be used to examine the 1,535 questionnaires filled in English, particularly answers to open-ended questions (our corpus). The premise is that by exploring discursive choices made by Christian and Muslim respondents in this corpus, we can gain insights into Yoruba Muslims and Christians' perception of themselves and each other and their experiences of inter-religious encounters.

Owing to the focus of the paper, two sub-corpora of the above-mentioned corpus were compiled: one with all answers by respondents of Muslim faith and another with all answers by respondents of Christian faith. We will use four-grams for each of these corpora to show how corpus-assisted investigations into phraseology have helped us gain insights into the data which traditional anthropological methods alone would not have allowed. Our findings will concern, for example, the specific boundaries our Christian and Muslim respondents draw around their religious behaviour and their shared understanding of religion.

Week 4

Thursday 3rd November 2016

3:00-4:00pm

Management school LT9

A computational stylistic comparison between English used on Chinese governmental websites and English used on US and UK governmental websites

Jiyaue Wang

LAEL, Lancaster University

  • Abstract

English texts on Chinese governmental websites are often criticised for being 'Chinglish' or 'lifeless'. This project investigates how English versions of Chinese governmental websites can improve their stylistic quality. The project is a computational stylistic comparison between English texts on Chinese governmental websites and English texts on UK and US governmental websites. The approach is corpus-based and employs Biber's (1988) multidimensional analysis. A corpus (including two subcorpora) of websites had previously been downloaded using the wget-m method. Perl scripts were used to extract text content from web pages to form a txt file for each website, and word frequency lists and trigrams have also been extracted. Keyword lists for the two subcorpora have been generated based on a COCA word frequency list. Several issues remain to be dealt with before further analysis can be conducted, including: whether it is possible to separate 'real content' from purely repetitive content when data comes from web pages (such as menus, navigation, copyright); the alternatives to manual annotation when this is not a practical option given the massive size of the corpus; and how to identify which features to consider to make the comparison more significant.

Week 2

Thursday 20th October 2016

3:00-4:00pm

George Fox LT05

Tweet of the Art: an introduction to collecting, filtering, and exporting Twitter corpora with FireAnt

Claire Hardaker

LAEL, Lancaster University

  • Abstract

This talk will provide a basic introduction to FireAnt, a freeware tool that I have co-developed with Laurence Anthony (Waseda University). FireAnt offers three main utilities: the live collection of real-time tweets; the ability to filter that (and many other kinds of) data based on user-defined parameters; and the ability to export the data in formats suitable for corpus tools, network graphing, timeseries analysis, and so forth. I'll demonstrate each of these steps in turn, and provide suggestions along the way for possible types of investigation that FireAnt has been and can be used for. There will be time at the end for questions.

Week 1

Thursday 13th October 2016

3:00-4:00pm

Management school LT9

Corpus and software resources available at Lancaster

Andrew Hardie1 & Paul Rayson2

1CASS, Lancaster University  2SCC, Lancaster University

  • Abstract
This talk will provide a brief introduction to the UCREL research centre, and an overview of the corpus resources, software tools and infrastructure that is available for corpus linguistics and NLP researchers at Lancaster University. The talk will cover corpora of English and non-English varieties, and there will be brief descriptions of annotation, retrieval and other software. Two web-based systems (CQPweb and Wmatrix) will be briefly demonstrated. CQPweb is a corpus retrieval and analysis tool which provides fast access to a range of very large standard corpora. Wmatrix, on the other hand, allows uploading of your own English corpora, carries out tagging and provides key word and key domain analysis, plus frequency lists and concordancing.