Lancaster University UCREL Home Page
University Centre for Computer Corpus Research on Language 
 

Introduction
Who we are
Projects
Events
TALC2014
UCCTS4
UCREL bookshelf
Technical papers
Publications list
ACL Anthology mirror
Corpora
Corpus software
CLAWS POS tagger
USAS semantic tagger
Corpus Annotation
Relevant web links


Internal mailing list
Seminar series



CASS (ESRC Centre for Corpus Approaches to Social Science)


CREME (Corpus Research in Early Modern English


School of Computing and Communications
Linguistics Department
Lancaster University

 


[UCREL LOGO]
'LEADING THE WAY IN CORPUS-BASED NLP RESEARCH'

UCREL is a research centre of Lancaster University.

  • We specialize in the automatic or computer-aided analysis of large bodies of naturally-occurring language ('corpora').
  • We have a record of achievement of more than twenty years as pioneers in this field.
  • We remain at the leading edge of computer corpus construction and analysis.
  • Our work focusses on modern English, early modern English, modern foreign languages, minority, endangered, and ancient languages.

News:

August 2014:

In memory: Professor Geoffrey Leech

It is with great sorrow that we report the death on 19th August of Professor Geoffrey Leech.

Geoff was not only the founder of the UCREL research centre for corpus linguistics at Lancaster University, he was also the first Professor and founding Head of the Department of Linguistics and English Language. His contributions to linguistics - not only in corpus linguistics, but also in English grammar, pragmatics and stylistics - were immense.

All our thoughts are with Geoff's wife Fanny, and with his family. It is still hard for us to find the right words at this time. For many of us he was an inspirational teacher and mentor, but for all of us, he was a kind and generous friend.

There is a webpage where tributes, messages and memories can be posted: http://wp.lancs.ac.uk/geoffreyleech/

April 2014:

UCREL Summer School in Corpus Linguistics, Lancaster University, UK. 15th to 18th July 2014.
Call for Participation: The UCREL Summer School 2014 is the fourth in a highly successful series of free-to-attend training events that began in 2011. Sponsored by UCREL at Lancaster University - one of the world's leading and longest-established centres for corpus-based research - its aim is to support students of language and linguistics in the development of advanced skills in corpus methods. The UCREL Summer School is intended primarily for postgraduate research students (and secondarily for Masters-level students, postdoctoral researchers, and others) who require in-depth knowledge of corpus-based methodologies for their degree projects. It is not aimed at raw beginners, but rather at students who have at least some introductory experience of analysis using language corpora, and who wish to expand their knowledge of key issues and techniques in cutting-edge corpus research.

January 2014:

This month sees the start of #corpusMOOC, a Massive Open Online Course, led by Tony McEnery and featuring other tutors from UCREL and CASS in Lancaster. "Corpus linguistics: method, analysis, interpretation" can be followed by signing up to the course hosted by FutureLearn.

December 2013:

Season's greetings from everyone in UCREL at Lancaster University. As part of the University's short films from across campus there's a video with Chris Donaldson from the Spatial Humanities project talking about a Dickens ghost story with a Lancaster twist.

October 2013:

ICAME Journal: call for submissions and subscriptions

The ICAME Journal is published annually by UCREL at Lancaster University in both electronic format and paper copy. CALL FOR SUBMISSIONS: Deadline for submissions: 1 December 2013 Deadline for reviews: 31 December 2013
The ICAME Journal invites submissions for proposed contributions in the field of English Corpus Linguistics for immediate consideration for the next issue in 2014 or the following issue in 2015. Manuscripts for articles, progress reports and shorter notices can be sent to one of the editors: Merja Kyto (e-mail: merja.kyto@engelska.uu.se) and Anna-Brita Stenstrom (e-mail: ab.stenstrom@telia.com) Books for review and correspondence on reviews and abstracts should be sent to: Ilka Mindt (e-mail: ilka.mindt@upb.de) Date of publication of issue 38: May 2014
CALL FOR SUBSCRIPTIONS: More information and previous issues (in PDF) are available on-line at the ICAME Journal website: http://icame.uib.no/journal.html To subscribe to the ICAME Journal for the next issue to be published in May 2014 please visit our secure on-line order form Please note that a subscription to issue 38 (May 2014) is included in the registration for the ICAME 2013 conference organised by the University of Santiago de Compostela. Those who register for ICAME 2013 will automatically receive a copy at the conference.

October 2013:

Teaching and Language Corpora conference (TaLC 2014): First Call for Papers and Workshops

The next TaLC conference will take place in 2014 at Lancaster University, UK. The TaLC series of conferences was inaugurated in 1994. For this 20th anniversary event, we are delighted to welcome the eleventh TaLC back to Lancaster, the original host institution. TaLC 11 will run from Monday 21st to Wednesday 23rd July (inclusive), with a pre-conference workshop day on Sunday 20th July. The call can be found on the conference website http://ucrel.lancs.ac.uk/talc2014

May 2013:

UCREL Summer School in Corpus Linguistics

The programme consists of a series of intensive two-hour sessions, some involving practical work, others more discussion-oriented. The UCREL Summer School is intended primarily for postgraduate research students (and secondarily for Masters-level students and postdoctoral researchers) who require in-depth knowledge of corpus-based methodologies for their degree projects. It is not aimed at raw beginners, but rather at PhD students who have at least some introductory experience of analysis using language corpora, and who wish to expand their knowledge of key issues and techniques in cutting-edge corpus research.

For more details, see the website for the summer school.

January 2013:

Developing new approaches to the study of hate speech, exploring how people talk about climate change and looking at how changes in corporate governance are communicated will be part of the remit of a new £3.5m research hub at Lancaster University, which will study the use and manipulation of language in society. Funded by the Economic and Social Research Council (ESRC), the new Centre for Corpus Approaches to Social Science (CASS) will bring the latest techniques in linguistic analysis to bear on a range of questions in the social sciences.

For more details, see the full story.

December 2012:

Isis Forensics has secured significant investment from The North West Fund for Venture Capital and Lancashire County Council's Rosebud Fund. Isis Forensics was founded by CEO Dr James Walkerdine in 2007 and is based in co-location facilities at InfoLab21. The company is an international digital forensics company which specialises in developing solutions to protect individuals and assist law enforcement with digital investigations. The company licences language analysis software built by UCREL members. For more details, see the full story.

December 2012:

£700k Project to Boost Clinical Assessment Rates for Cognitive Decline

Currently, only 50% of people with dementia ever receive a diagnosis that could lead to them receiving medical care and support. So urgent is this problem that novel ways to persuade people to present themselves for clinical assessment are being sought. Lancaster University is leading a project to see if computer interaction can offer new opportunities for self-referral.

The £700k SAMS (Software Architecture for Mental health Self management) project, funded under the EPSRC Working Together call, will investigate the use of data and text-mining techniques, combined with adaptive user interfaces to detect early signs of cognitive decline from the way people use their computers.

The project is led from the School of Computing and Communications by Prof. Pete Sawyer, Dr. Paul Rayson and Prof. Alistair Sutcliffe, and is joint with Prof. Alistair Burns, Dr Iracema Leroi and Prof. John Keane from Manchester University and Prof. Clive Ballard from Kings College London. The project is supported by the Dementias Neurodegen Network (DeNDRoN), The Alzheimer's Society, Microsoft Research, the University of British Columbia and Johns Hopkins University School of Medicine.

November 2012:

Applications are welcome for the following posts at the Department of Linguistics and English Language: Lecturer in Corpus Linguistics, Lecturer in Linguistics and English Language, and Lecturer in Second/Foreign Language Education.

In addition, two five year research posts are available. These are to be based in the new, ESRC funded, Centre for Corpus Approaches to Social Science. The £3.5 million Centre, to be hosted by UCREL at Lancaster University under the direction of Tony McEnery and Andrew Hardie, will run from Easter 2013 for five years and has the goal of encouraging the uptake of corpus approaches across the social sciences. Shorter research contracts and visiting opportunities attached to the Centre will be advertised over the next five years. To find out more and to apply visit: http://hr-jobs.lancs.ac.uk/vacancies.aspx?cat=248&type=6

October 2012:

The seventh international CORPUS LINGUISTICS conference (CL2013) will be held at Lancaster University from Tuesday 23rd July 2013 to Friday 26th July 2013. The main conference will be preceded by a workshop day on Monday 22nd July. For more details, see the conference website.

October 2012:

ICAME Journal call for submissions and subscriptions
The ICAME Journal is published annually by UCREL at Lancaster University in both electronic format and paper copy. CALL FOR SUBMISSIONS: Deadline for submissions: 1 December 2012 Deadline for reviews: 31 December 2012
The ICAME Journal invites submissions for proposed contributions in the field of English Corpus Linguistics for immediate consideration for the next issue in 2013 or the following issue in 2014. Manuscripts for articles, progress reports and shorter notices can be sent to one of the editors: Merja Kyto (e-mail: merja.kyto@engelska.uu.se) and Anna-Brita Stenstrom (e-mail: ab.stenstrom@telia.com) Books for review and correspondence on reviews and abstracts should be sent to: Ilka Mindt (e-mail: ilka.mindt@upb.de) Date of publication of issue 37: May 2013
CALL FOR SUBSCRIPTIONS: More information and previous issues (in PDF) are available on-line at the ICAME Journal website: http://icame.uib.no/journal.html To subscribe to the ICAME Journal for the next issue to be published in May 2013 please visit our secure on-line order form Please note that a subscription to issue 37 (May 2013) is included in the registration for the ICAME 2013 conference organised by the University of Santiago de Compostela. Those who register for ICAME 2013 will automatically receive a copy at the conference.

October 2012

Six academics from three faculties have started work on the ESRC-funded project called Metaphor in End-of-Life Care. For the next 17 months, they will study the metaphors used by patients, unpaid family carers and healthcare professionals in a 1.5-million-word data set. The way in which end-of-life care is talked about can shed light on people's views, needs, experiences and challenges, and identify areas where increased anxiety and misunderstanding can occur. Elena Semino, Veronika Koller, Andrew Hardie and Zsofia Demjen (Linguistics and English Language), Paul Rayson (Computing and Communications) and Sheila Payne (International Observatory on End of Life Care) will look at a large body of data from three different groups of people and reflect on any differences between the groups and the implications of the findings for providing end-of-life care. For enquiries, please contact: e.semino@lancaster.ac.uk.

October 2011

Digital Humanities project awarded 1.5 million Euro grant: Lancaster University has been awarded a European Research Council Starting Grant of 1.5 million Euros for a five-year project which will act as a flagship programme for Digital Humanities research. Building upon Lancaster's international expertise in Corpus Linguistics and Geographical Information Systems (GIS), the project will develop methodologies for the automatic extraction of place names from large bodies of text, a process which will facilitate spatial interpretations of both historical events and imaginative representations of space and place.
More details ...

February 2011

The UCREL research centre is pleased to announce its first Summer School in Corpus Linguistics. This will take place on Wednesday 13th, Thursday 14th, and Friday 15th July 2011 (half-days Wednesday and Friday, full-day Thursday). The event is free to attend, but registration *in advance* is compulsory, as places are limited. We are now inviting applications from anyone interested in participating. For more information, see http://ucrel.lancs.ac.uk/summerschool/

October 2010

UCREL's NLP tools have been featured twice this week at events in London.

First, at a "Science in the New Parliament" exhibition hosted by the Parliamentary Office for Science and Technology in collaboration with Research Councils UK. The Minister of State for Universities and Science David Willetts MP praised instant feedback technology included in the Voice Your View project. (More details ...)

Second, as part of the Isis toolkit which was demonstrated at the Online Child Protection conference where experts met to discuss ways in which technology can help keep children safe on the internet. (More details ...)

September 2010

Call for subscriptions and submissions: ICAME Journal: Deadline for submissions: 1 December 2010. The ICAME Journal invites submissions for proposed contributions in the field of English Corpus Linguistics for immediate consideration for the next issue or the following issue in 2012. Manuscripts for articles, progress reports and shorter notices can be sent to one of the editors. More information and previous issues (in PDF) are available on-line at the ICAME Journal website. To subscribe to the ICAME Journal for the next issue to be published in May 2011 please visit our secure on-line order form. Price for one issue is GBP30. Previous issues are still on sale via the same order form.

August 2010

Call for abstracts: Workshop on Arabic Corpus Linguistics, Lancaster University, 11-12 April 2011.

Read more ...

February 2010

Call for Papers: Text-mining in the Digital Humanities: The Interface between Conceptual History, Critical Discourse Analysis and Corpus Linguistics. (Lancaster University Thu 13 - Fri 14 May 2010)

The aim of this interdisciplinary workshop is to explore the potential for collaboration between researchers in Critical Discourse Analysis (CDA), Corpus Linguistics (CL) and Conceptual History (CH), the study of key socio-political concepts in their historical context (see http://www.concepta-net.org/conceptual_history).

Read more ...

February 2010

Call for participation: BAAL event: Gender and Corpus Linguistics, Tuesday March 30th 2010, Lancaster University.

Corpus linguistics involves the use of large collections of naturally occurring texts, encoded in electronic form so that they can be analysed with the help of computer software to extract linguistic patterns. In recent years, this approach has begun to be utilised by linguists who are interested in analysing issues relating to gender. This BAAL event aims to bring together researchers working in the field, as well as showcasing recent corpus-based gender research, and enabling debate on best practice.

Invited speakers include Bandar Al-Hejin (Lancaster), Cinzia Bevitori (Bologna), Tony McEnery (Lancaster), Brona Murphy (Edinburgh), Louise Mullany (Nottingham), Michael Pearce (Sunderland), Costas Gabrielatos (Lancaster) and Eivind Torgersen (Lancaster).

For more details, see the BAAL Gender and Language SIG pages.

January 2009

Call for subscriptions to ICAME Journal published under the auspices of the Aksis Centre (The Department of Culture, Language and Information Technology, University of Bergen, Norway) and UCREL. To subscribe to the ICAME Journal for the next issue published in May 2009 please visit our secure on-line order form that offers credit card, PayPal, fax, and phone order options.

September 2008

Call for papers: ICAME-30 27-31 May 2009, Lancaster, UK. See the first circular at the conference website.

June 2008

COMPUTER EXPERTS ARE NOW HARNESSING NEW DEVELOPMENTS IN LANGUAGE ANALYSIS TO IDENTIFY PAEDOPHILES POSING AS CHILDREN IN ONLINE CHAT ROOMS, to pick up on their vocabulary choices and trail them as they move around the internet. Led by Professor Awais Rashid (Computing), Lancaster, Swansea and Middlesex Universities have joined forces with specialist UK law enforcement to develop tools to identify paedophiles masquerading as children in online chat rooms. This month the University launched Project Isis - a three-year Child Protection Initiative which aims to develop new tools for policing websites and supporting law enforcement, funded by the Engineering and Physical Sciences Research Council and The Economic and Social Research Council.
Read more ...

The call for papers for the Corpus Linguistics 2009 conference is now available. Please visit the conference website for details.

January 2008

The next CLARET workshop will take place at Lancaster University on March 31 and April 1 2008.

December 2007

Call for subscriptions to ICAME Journal published under the auspices of the Aksis Centre (The Department of Culture, Language and Information Technology, University of Bergen, Norway) and UCREL. To subscribe to the ICAME Journal for the next issue published in May 2008 please visit our secure on-line order form that offers credit card, fax, post, phone and purchase order options.

November 2007

New issue of the Empirical Text and Culture Research Journal published.

September 2007

Corpus Linguistics Advanced Research Education and Training (CLARET) funded by the AHRC. The first workshop will take place at Liverpool University on 29-30th November 2007.

July 2007:

New project funded by the AHRC: Corpus-based grammar in contrast (CORGRAM).

November 2006:

Call for papers: the Corpus Linguistics 2007 conference will be held at the University of Birmingham, 27-30 July 2007.

September 2006:

Call for submissions and subscriptions to ICAME Journal published under the auspices of the Aksis Centre (The Department of Culture, Language and Information Technology, University of Bergen, Norway) and UCREL. To subscribe to the ICAME Journal for the next issue published in May 2007 please visit our secure on-line order form that offers credit card, fax, post, phone and purchase order options. Price for one issue is USD52, around 43 Euros or GBP30 dependant on the current exchange rate. Issue 30 is still on sale via the issue 30 order form.

August 2006:

Call for submissions to two journals:

June 2006:

Call for participation: workshop on Historical Text Mining. July 20th and 21st, Lancaster University, UK. For more details, see the workshop webpage: http://ucrel.lancs.ac.uk/events/htm06/

January 2006:

We are now taking orders for issue 30 of the ICAME Journal to be published in the spring of 2006. Please visit the secure on-line order page.

December 2005:

Call for papers: Third International Workshop on Language Resources for Translation Work, Research & Training.
A Satellite Event of LREC 2006 (5th Language Resources and Evaluation Conference).
Date: 28th May 2006.
Venue: Magazzini del Cotone Conference Center, Genoa, Italy.

November 2005:

Call for papers: EACL 2006 Workshop on Multi-word-expressions in a multilingual context. April 3rd 2006, Trento, Italy. http://ucrel.lancs.ac.uk/EACL06MWEmc/

October 2005:

Announcing a new journal: Corpora
Corpora is a new journal focusing on the many and varied uses of corpora both in linguistics and beyond. The journal accepts articles presenting research findings based on the exploitation of corpora as well as accounts of corpus building, corpus tool construction and corpus annotation schemes. The journal will be published by Edinburgh University Press. For more details, see the Corpora Journal Home Page.

August 2005:

New project funded by the Leverhulme Trust entitled "Changing English Across the 20th Century: a corpus-based study". The main aim of the research is to carry out an investigation of areas of change in grammatical usage in 20th Century British English, focussing on the verb phrase. The study will be based on a package of four corpora sampled at regular intervals: 1991 1961 1931 1901. Two sub-goals are to: a) Compile a new corpus of British English called Lancaster1901 focussing on the beginning of the twentieth century. b) Enhance the encoding and annotation of Lancaster1901 and the three existing corpora (Lancaster1931, LOB and FLOB), and release the enhancements to the academic community.

For more information see the Lancaster University press release and the project details.

April 2005:

New project funded by the EPSRC: Assist (Automated Semantic Assistance for Translators) aims to address the problem of providing contextual examples of translation equivalents for words from the general lexicon. We will employ comparable corpora, an existing semantic field annotation system for English and develop a new semantic field tagger for Russian.

New project funded by the British Academy: Scragg revisited - a quantitative investigation of spelling variation across the centuries.

October 2004:

Lancaster is involved in the AHRB ICT Methods Network, see the AHRB press release and the network site at CCH for more details.

September 2004:

ELRA have announced that new Written Language Resources are available in their catalogue. You will find below their short descriptions. Please visit their on-line catalogue to get more detailed information: www.elda.fr and www.elra.info

*** ELRA-W0037 The EMILLE/CIIL Corpus ***

The EMILLE/CIIL Corpus consists of monolingual corpora containing approximately 92,799,000 words for 14 South Asian languages (Assamese, Bengali, Gujarati, Hindi, Kannada, Kashmiri, Malayalam, Marathi, Oriya, Punjabi, Sinhala, Tamil, Telegu and Urdu) (including 2,627,000 words of transcribed spoken data for Bengali, Gujarati, Hindi, Punjabi and Urdu), a parallel corpus of 200,000 words in English with translations in Hindi, Bengali, Punjabi, Gujarati and Urdu. Annotations include Urdu monolingual and parallel corpora annotated for parts-of-speech, and 20 written Hindi corpus files annotated to show the nature of demonstrative use. All other components are annotated at the sentence level. The corpus is marked up using CES- compliant SGML and encoded using Unicode.

*** ELRA-W0038 The EMILLE Lancaster Corpus ***

The EMILLE Lancaster Corpus consists of monolingual corpora containing approximately 58,880,000 words for seven South Asian languages (Bengali, Gujarati, Hindi, Punjabi, Sinhala, Tamil and Urdu) (including 2,627,000 words of transcribed spoken data for Bengali, Gujarati, Hindi, Punjabi and Urdu), a parallel corpus of 200,000 words in English with translations in Hindi, Bengali, Punjabi, Gujarati and Urdu. Annotations include Urdu monolingual and parallel corpora annotated for parts-of-speech, and 20 written Hindi corpus files annotated to show the nature of demonstrative use. All other components are annotated at the sentence level. The corpus is marked up using CES-compliant SGML and encoded using Unicode.

*** ELRA-W0039 The Lancaster Corpus of Mandarin Chinese (LCMC) ***

The Lancaster Corpus of Mandarin Chinese (LCMC) sampled 15 written text categories including news, literary texts, academic prose and official documents etc published in P. R. China in the earlier 1990s for a total of approximately 1 million words. The same sampling frame and period as FLOB/FROWN were used in LCMC. The corpus is encoded in Unicode (UTF-8) and marked up in XML.

August 2004:

CALL FOR PRE-CONFERENCE WORKSHOP PROPOSALS. Deadline: December 3rd, 2004. Corpus Linguistics 2005 Birmingham, July 14th-17th. Organisers: University of Birmingham and University of Lancaster http://www.corpus.bham.ac.uk/conference Proposals are invited for pre-conference workshops on July 14th at the University of Birmingham. The conference, Corpus Linguistics 2005, is run jointly by the universities of Birmingham and Lancaster, and is the third biennial conference in the series on Corpus Linguistics. The workshops and the conference will be held at the University of Birmingham between July 14th -17th 2005

May 2004:

New project funded by the Andrew W. Mellon Foundation: WordHoard applies to highly canonical literary texts the insights and techniques of corpus linguistics.

January 2004:

Release of the Lancaster Corpus of Mandarin Chinese, a Mandarin Chinese match for the FLOB and FROWN corpora. The corpus is part-of-speech tagged and available, free of charge, for use in non-profit making research.

Release of the EMILLE/CIIL corpus. The corpus contains monolingual written corpus data for 14 South Asian languages (Assamese, Bengali, Gujarati, Hindi, Kannada, Kashmiri, Malayalam, Marathi, Oriya, Punjabi, Sinhala, Tamil, Telegu and Urdu). It also contains orthographically transcribed spoken data and parallel corpus data for five South Asian languages (Bengali, Gujarati, Hindi, Punjabi and Urdu). In addition, the parallel corpus contains the English originals from which the translations stored in the corpus were derived. All data in the corpus is CES and Unicode compliant. The EMILLE corpus totals some 94 million words. The corpora were built as part of a collaboration between Lancaster University and the Central Institute of Indian Languages, Mysore.

December 2003:

Call for papers: 5th International Conference on Discourse Anaphora and Anaphor Resolution (DAARC2004)

November 2003:

New project: the Leverhulme Corpus Project plans to build a corpus which matches as closely as possible the LOB and FLOB corpora of written British English, except that the year of data collection is 1931, or near to that date (+/- 3 years).

2nd call for papers The sixth Teaching And Language Corpora conference (TaLC 2004)

Local Corpus research group meetings will continue this term on Mondays at 4pm in B81, Bowland.

September 2003:

New books containing a selection of papers from the CL2001 conference:


Wilson, A., Rayson, P. and McEnery, T. (eds.) (2003) Corpus Linguistics by the Lune: a festschrift for Geoffrey Leech. Peter Lang, Frankfurt. (Volume 8 in the Lodz studies in Language Series edited by Lewandowska-Tomaszczyk, B. and Melia, P. J.) ISBN 3-631-50952-2


Wilson, Rayson, McEnery (2003)
Wilson, A., Rayson, P. and McEnery, T. (eds.) (2003) A Rainbow of Corpora: Corpus Linguistics and the Languages of the World. Lincom-Europa, München. ISBN 3 89586 872 8. Linguistics Edition 40. 174 pp.


August 2003:

Conference Announcement: The sixth Teaching And Language Corpora conference (TaLC 2004)

CLAWS part-of-speech tagger free web trial extended to 10,000 words for academic users.

March 2003:

Initial release of the Lancaster Newsbooks Corpus

EMILLE Corpus Beta version released

Recent conference: Corpus Linguistics 2003CL2003
Bookshelf Book Series Announcement: Routledge Advances in Corpus Linguistics

   
UCREL,
Lancaster University,
Lancaster,
LA1 4WA.
Tel: +44 1524 510357 Fax: +44 1524 510492
email: ucrel at lancaster.ac.uk

Lancaster University approved pages maintained by Paul Rayson Lancaster University, UK.
All material in these pages © 1993-2014 UCREL, Lancaster University.