|UCREL Home Page|
|University Centre for Computer Corpus Research on Language|
Who we are
ACL Anthology mirror
CLAWS POS tagger
USAS semantic tagger
Relevant web links
Internal mailing list
CASS (ESRC Centre for Corpus Approaches to Social Science)
CREME (Corpus Research in Early Modern English
'LEADING THE WAY IN CORPUS-BASED NLP RESEARCH'
UCREL is a research centre of Lancaster University.
Lancaster University's world-renowned language pioneers, spanning four generations of researchers, are to receive The Queen's Anniversary Prize for Higher and Further Education.
The award for Lancaster's Centre for Corpus Approaches to Social Science (CASS) was announced at a reception at St James's Palace on Thursday evening (November 19).
The Queen's Anniversary Prizes are awarded every two years to universities and colleges who submit work judged to show excellence, innovation, impact and benefit for the institution itself and for people and society generally in the wider world.
Researchers at the Centre, funded by the Economic and Social Research Council, charitable trusts and other research councils, have provided valuable insights into the understanding of language by using computers to analyse billions of words - in writing, speech and online - for the past 45 years.
The work has resulted in a huge range of important, 'real world' applications such as vastly improved dictionaries and has also influenced policy towards important issues in society such as online aggression, hate speech and the way in which end of life care is discussed.
By providing fresh perspectives to such problems, CASS, part of the longer-standing University Centre for Computer Corpus Research on Language, has helped develop new approaches to challenging practices both in terms of raising awareness and of informing policy makers and other stakeholders of how such language may be used to inform, manipulate, wound and offend.
Computers have enabled the centre, which draws staff from nine departments across campus, to analyse massive datasets of language to account for the changing patterns of use of written and spoken language in everyday contexts.
CORPORA AND DISCOURSE Conference - Siena 30 June - 02 July 2016 - Call for papers
For more details, see the conference website.
The #corpusMOOC starts on 28th September offerring a practical introduction to the methodology of corpus linguistics for researchers in social sciences and humanities.
Corpus Linguistics 2015: In honour of the life and work of Geoffrey Leech
Call for Papers and Pre-Conference Workshops
The eighth international Corpus Linguistics conference (CL2015) will be held at Lancaster University from Tuesday 21st July 2015 to Friday 24th July 2015. The main conference will be preceded by a workshop day on Monday 20th July.
This series of conferences began in 2001 with an event celebrating the career of Professor Geoffrey Leech, on the occasion of his retirement. In August of 2014, we reported with great sadness Geoff's sudden death.
By dedicating this eighth conference in the Corpus Linguistics series once again to a celebration of Geoff's life, his career, and his truly remarkable influence on the field, we once more pay tribute to, and commemorate, a remarkable intellect and a sorely-missed colleague and friend.
For more details, see the conference website
In memory: Professor Geoffrey Leech
It is with great sorrow that we report the death on 19th August of Professor Geoffrey Leech.
Geoff was not only the founder of the UCREL research centre for corpus linguistics at Lancaster University, he was also the first Professor and founding Head of the Department of Linguistics and English Language. His contributions to linguistics - not only in corpus linguistics, but also in English grammar, pragmatics and stylistics - were immense.
All our thoughts are with Geoff's wife Fanny, and with his family. It is still hard for us to find the right words at this time. For many of us he was an inspirational teacher and mentor, but for all of us, he was a kind and generous friend.
There is a webpage where tributes, messages and memories can be posted: http://wp.lancs.ac.uk/geoffreyleech/
UCREL Summer School in Corpus Linguistics,
Lancaster University, UK.
15th to 18th July 2014.
This month sees the start of #corpusMOOC, a Massive Open Online Course, led by Tony McEnery and featuring other tutors from UCREL and CASS in Lancaster. "Corpus linguistics: method, analysis, interpretation" can be followed by signing up to the course hosted by FutureLearn.
Season's greetings from everyone in UCREL at Lancaster University. As part of the University's short films from across campus there's a video with Chris Donaldson from the Spatial Humanities project talking about a Dickens ghost story with a Lancaster twist.
ICAME Journal: call for submissions and subscriptions
The ICAME Journal is published annually by UCREL at Lancaster University in both electronic format and paper copy.
CALL FOR SUBMISSIONS:
Deadline for submissions: 1 December 2013
Deadline for reviews: 31 December 2013
Teaching and Language Corpora conference (TaLC 2014): First Call for Papers and Workshops
The next TaLC conference will take place in 2014 at Lancaster University, UK. The TaLC series of conferences was inaugurated in 1994. For this 20th anniversary event, we are delighted to welcome the eleventh TaLC back to Lancaster, the original host institution. TaLC 11 will run from Monday 21st to Wednesday 23rd July (inclusive), with a pre-conference workshop day on Sunday 20th July. The call can be found on the conference website http://ucrel.lancs.ac.uk/talc2014
UCREL Summer School in Corpus Linguistics
The programme consists of a series of intensive two-hour sessions, some involving practical work, others more discussion-oriented. The UCREL Summer School is intended primarily for postgraduate research students (and secondarily for Masters-level students and postdoctoral researchers) who require in-depth knowledge of corpus-based methodologies for their degree projects. It is not aimed at raw beginners, but rather at PhD students who have at least some introductory experience of analysis using language corpora, and who wish to expand their knowledge of key issues and techniques in cutting-edge corpus research.
For more details, see the website for the summer school.
Developing new approaches to the study of hate speech, exploring how people talk about climate change and looking at how changes in corporate governance are communicated will be part of the remit of a new £3.5m research hub at Lancaster University, which will study the use and manipulation of language in society. Funded by the Economic and Social Research Council (ESRC), the new Centre for Corpus Approaches to Social Science (CASS) will bring the latest techniques in linguistic analysis to bear on a range of questions in the social sciences.
For more details, see the full story.
Isis Forensics has secured significant investment from The North West Fund for Venture Capital and Lancashire County Council's Rosebud Fund. Isis Forensics was founded by CEO Dr James Walkerdine in 2007 and is based in co-location facilities at InfoLab21. The company is an international digital forensics company which specialises in developing solutions to protect individuals and assist law enforcement with digital investigations. The company licences language analysis software built by UCREL members. For more details, see the full story.
£700k Project to Boost Clinical Assessment Rates for Cognitive Decline
Currently, only 50% of people with dementia ever receive a diagnosis that could lead to them receiving medical care and support. So urgent is this problem that novel ways to persuade people to present themselves for clinical assessment are being sought. Lancaster University is leading a project to see if computer interaction can offer new opportunities for self-referral.
The £700k SAMS (Software Architecture for Mental health Self management) project, funded under the EPSRC Working Together call, will investigate the use of data and text-mining techniques, combined with adaptive user interfaces to detect early signs of cognitive decline from the way people use their computers.
The project is led from the School of Computing and Communications by Prof. Pete Sawyer, Dr. Paul Rayson and Prof. Alistair Sutcliffe, and is joint with Prof. Alistair Burns, Dr Iracema Leroi and Prof. John Keane from Manchester University and Prof. Clive Ballard from Kings College London. The project is supported by the Dementias Neurodegen Network (DeNDRoN), The Alzheimer's Society, Microsoft Research, the University of British Columbia and Johns Hopkins University School of Medicine.
Applications are welcome for the following posts at the Department of Linguistics and English Language: Lecturer in Corpus Linguistics, Lecturer in Linguistics and English Language, and Lecturer in Second/Foreign Language Education.
In addition, two five year research posts are available. These are to be based in the new, ESRC funded, Centre for Corpus Approaches to Social Science. The £3.5 million Centre, to be hosted by UCREL at Lancaster University under the direction of Tony McEnery and Andrew Hardie, will run from Easter 2013 for five years and has the goal of encouraging the uptake of corpus approaches across the social sciences. Shorter research contracts and visiting opportunities attached to the Centre will be advertised over the next five years. To find out more and to apply visit: http://hr-jobs.lancs.ac.uk/vacancies.aspx?cat=248&type=6
The seventh international CORPUS LINGUISTICS conference (CL2013) will be held at Lancaster University from Tuesday 23rd July 2013 to Friday 26th July 2013. The main conference will be preceded by a workshop day on Monday 22nd July. For more details, see the conference website.
ICAME Journal call for submissions and subscriptions
Six academics from three faculties have started work on the ESRC-funded project called Metaphor in End-of-Life Care. For the next 17 months, they will study the metaphors used by patients, unpaid family carers and healthcare professionals in a 1.5-million-word data set. The way in which end-of-life care is talked about can shed light on people's views, needs, experiences and challenges, and identify areas where increased anxiety and misunderstanding can occur. Elena Semino, Veronika Koller, Andrew Hardie and Zsofia Demjen (Linguistics and English Language), Paul Rayson (Computing and Communications) and Sheila Payne (International Observatory on End of Life Care) will look at a large body of data from three different groups of people and reflect on any differences between the groups and the implications of the findings for providing end-of-life care. For enquiries, please contact: firstname.lastname@example.org.
Digital Humanities project awarded 1.5 million Euro grant: Lancaster
University has been awarded a European Research Council Starting Grant
of 1.5 million Euros
for a five-year project which will act as a flagship
programme for Digital Humanities research.
Building upon Lancaster's international expertise in Corpus Linguistics
and Geographical Information Systems (GIS), the project will develop
methodologies for the automatic extraction of place names from large
bodies of text, a process which will facilitate spatial interpretations
of both historical events and imaginative representations of space and
The UCREL research centre is pleased to announce its first Summer School in Corpus Linguistics. This will take place on Wednesday 13th, Thursday 14th, and Friday 15th July 2011 (half-days Wednesday and Friday, full-day Thursday). The event is free to attend, but registration *in advance* is compulsory, as places are limited. We are now inviting applications from anyone interested in participating. For more information, see http://ucrel.lancs.ac.uk/summerschool/
UCREL's NLP tools have been featured twice this week at events in London.
First, at a "Science in the New Parliament" exhibition hosted by the Parliamentary Office for Science and Technology in collaboration with Research Councils UK. The Minister of State for Universities and Science David Willetts MP praised instant feedback technology included in the Voice Your View project. (More details ...)
Second, as part of the Isis toolkit which was demonstrated at the Online Child Protection conference where experts met to discuss ways in which technology can help keep children safe on the internet. (More details ...)
Call for subscriptions and submissions: ICAME Journal: Deadline for submissions: 1 December 2010. The ICAME Journal invites submissions for proposed contributions in the field of English Corpus Linguistics for immediate consideration for the next issue or the following issue in 2012. Manuscripts for articles, progress reports and shorter notices can be sent to one of the editors. More information and previous issues (in PDF) are available on-line at the ICAME Journal website. To subscribe to the ICAME Journal for the next issue to be published in May 2011 please visit our secure on-line order form. Price for one issue is GBP30. Previous issues are still on sale via the same order form.
Call for abstracts: Workshop on Arabic Corpus Linguistics, Lancaster University, 11-12 April 2011.
Call for Papers: Text-mining in the Digital Humanities: The Interface between Conceptual History, Critical Discourse Analysis and Corpus Linguistics. (Lancaster University Thu 13 - Fri 14 May 2010)
The aim of this interdisciplinary workshop is to explore the potential for collaboration between researchers in Critical Discourse Analysis (CDA), Corpus Linguistics (CL) and Conceptual History (CH), the study of key socio-political concepts in their historical context (see http://www.concepta-net.org/conceptual_history).
Call for participation: BAAL event: Gender and Corpus Linguistics, Tuesday March 30th 2010, Lancaster University.
Corpus linguistics involves the use of large collections of naturally occurring texts, encoded in electronic form so that they can be analysed with the help of computer software to extract linguistic patterns. In recent years, this approach has begun to be utilised by linguists who are interested in analysing issues relating to gender. This BAAL event aims to bring together researchers working in the field, as well as showcasing recent corpus-based gender research, and enabling debate on best practice.
Invited speakers include Bandar Al-Hejin (Lancaster), Cinzia Bevitori (Bologna), Tony McEnery (Lancaster), Brona Murphy (Edinburgh), Louise Mullany (Nottingham), Michael Pearce (Sunderland), Costas Gabrielatos (Lancaster) and Eivind Torgersen (Lancaster).
For more details, see the BAAL Gender and Language SIG pages.
Call for subscriptions to ICAME Journal published under the auspices of the Aksis Centre (The Department of Culture, Language and Information Technology, University of Bergen, Norway) and UCREL. To subscribe to the ICAME Journal for the next issue published in May 2009 please visit our secure on-line order form that offers credit card, PayPal, fax, and phone order options.
Call for papers: ICAME-30 27-31 May 2009, Lancaster, UK. See the first circular at the conference website.
COMPUTER EXPERTS ARE NOW HARNESSING NEW DEVELOPMENTS IN LANGUAGE
ANALYSIS TO IDENTIFY PAEDOPHILES POSING AS CHILDREN IN ONLINE CHAT
ROOMS, to pick up on their vocabulary choices and trail them as they
move around the internet. Led by Professor Awais Rashid (Computing),
Lancaster, Swansea and Middlesex Universities have joined forces with
specialist UK law enforcement to develop tools to identify paedophiles
masquerading as children in online chat rooms. This month the
University launched Project Isis - a three-year Child Protection
Initiative which aims to develop new tools for policing websites and
supporting law enforcement, funded by the Engineering and Physical
Sciences Research Council and The Economic and Social Research Council.
The call for papers for the Corpus Linguistics 2009 conference is now available. Please visit the conference website for details.
The next CLARET workshop will take place at Lancaster University on March 31 and April 1 2008.
Call for subscriptions to ICAME Journal published under the auspices of the Aksis Centre (The Department of Culture, Language and Information Technology, University of Bergen, Norway) and UCREL. To subscribe to the ICAME Journal for the next issue published in May 2008 please visit our secure on-line order form that offers credit card, fax, post, phone and purchase order options.
New issue of the Empirical Text and Culture Research Journal published.
Corpus Linguistics Advanced Research Education and Training (CLARET) funded by the AHRC. The first workshop will take place at Liverpool University on 29-30th November 2007.
New project funded by the AHRC: Corpus-based grammar in contrast (CORGRAM).
Call for papers: the Corpus Linguistics 2007 conference will be held at the University of Birmingham, 27-30 July 2007.
Call for submissions and subscriptions to ICAME Journal published under the auspices of the Aksis Centre (The Department of Culture, Language and Information Technology, University of Bergen, Norway) and UCREL. To subscribe to the ICAME Journal for the next issue published in May 2007 please visit our secure on-line order form that offers credit card, fax, post, phone and purchase order options. Price for one issue is USD52, around 43 Euros or GBP30 dependant on the current exchange rate. Issue 30 is still on sale via the issue 30 order form.
Call for submissions to two journals:
Call for participation: workshop on Historical Text Mining. July 20th and 21st, Lancaster University, UK. For more details, see the workshop webpage: http://ucrel.lancs.ac.uk/events/htm06/
Call for papers:
Third International Workshop on Language Resources for Translation Work, Research & Training.
Call for papers: EACL 2006 Workshop on Multi-word-expressions in a multilingual context. April 3rd 2006, Trento, Italy. http://ucrel.lancs.ac.uk/EACL06MWEmc/
Announcing a new journal: Corpora
New project funded by the Leverhulme Trust entitled "Changing English Across the 20th Century: a corpus-based study". The main aim of the research is to carry out an investigation of areas of change in grammatical usage in 20th Century British English, focussing on the verb phrase. The study will be based on a package of four corpora sampled at regular intervals: 1991 – 1961 – 1931 – 1901. Two sub-goals are to: a) Compile a new corpus of British English called Lancaster1901 focussing on the beginning of the twentieth century. b) Enhance the encoding and annotation of Lancaster1901 and the three existing corpora (Lancaster1931, LOB and FLOB), and release the enhancements to the academic community.
New project funded by the EPSRC: Assist (Automated Semantic Assistance for Translators) aims to address the problem of providing contextual examples of translation equivalents for words from the general lexicon. We will employ comparable corpora, an existing semantic field annotation system for English and develop a new semantic field tagger for Russian.
New project funded by the British Academy: Scragg revisited - a quantitative investigation of spelling variation across the centuries.
ELRA have announced that new Written Language Resources are available in their catalogue. You will find below their short descriptions. Please visit their on-line catalogue to get more detailed information: www.elda.fr and www.elra.info
*** ELRA-W0037 The EMILLE/CIIL Corpus ***
The EMILLE/CIIL Corpus consists of monolingual corpora containing approximately 92,799,000 words for 14 South Asian languages (Assamese, Bengali, Gujarati, Hindi, Kannada, Kashmiri, Malayalam, Marathi, Oriya, Punjabi, Sinhala, Tamil, Telegu and Urdu) (including 2,627,000 words of transcribed spoken data for Bengali, Gujarati, Hindi, Punjabi and Urdu), a parallel corpus of 200,000 words in English with translations in Hindi, Bengali, Punjabi, Gujarati and Urdu. Annotations include Urdu monolingual and parallel corpora annotated for parts-of-speech, and 20 written Hindi corpus files annotated to show the nature of demonstrative use. All other components are annotated at the sentence level. The corpus is marked up using CES- compliant SGML and encoded using Unicode.
*** ELRA-W0038 The EMILLE Lancaster Corpus ***
The EMILLE Lancaster Corpus consists of monolingual corpora containing approximately 58,880,000 words for seven South Asian languages (Bengali, Gujarati, Hindi, Punjabi, Sinhala, Tamil and Urdu) (including 2,627,000 words of transcribed spoken data for Bengali, Gujarati, Hindi, Punjabi and Urdu), a parallel corpus of 200,000 words in English with translations in Hindi, Bengali, Punjabi, Gujarati and Urdu. Annotations include Urdu monolingual and parallel corpora annotated for parts-of-speech, and 20 written Hindi corpus files annotated to show the nature of demonstrative use. All other components are annotated at the sentence level. The corpus is marked up using CES-compliant SGML and encoded using Unicode.
*** ELRA-W0039 The Lancaster Corpus of Mandarin Chinese (LCMC) ***
The Lancaster Corpus of Mandarin Chinese (LCMC) sampled 15 written text categories including news, literary texts, academic prose and official documents etc published in P. R. China in the earlier 1990s for a total of approximately 1 million words. The same sampling frame and period as FLOB/FROWN were used in LCMC. The corpus is encoded in Unicode (UTF-8) and marked up in XML.
CALL FOR PRE-CONFERENCE WORKSHOP PROPOSALS. Deadline: December 3rd, 2004. Corpus Linguistics 2005 Birmingham, July 14th-17th. Organisers: University of Birmingham and University of Lancaster http://www.corpus.bham.ac.uk/conference Proposals are invited for pre-conference workshops on July 14th at the University of Birmingham. The conference, Corpus Linguistics 2005, is run jointly by the universities of Birmingham and Lancaster, and is the third biennial conference in the series on Corpus Linguistics. The workshops and the conference will be held at the University of Birmingham between July 14th -17th 2005
New project funded by the Andrew W. Mellon Foundation: WordHoard applies to highly canonical literary texts the insights and techniques of corpus linguistics.
Release of the Lancaster Corpus of Mandarin Chinese, a Mandarin Chinese match for the FLOB and FROWN corpora. The corpus is part-of-speech tagged and available, free of charge, for use in non-profit making research.
Release of the EMILLE/CIIL corpus. The corpus contains monolingual written corpus data for 14 South Asian languages (Assamese, Bengali, Gujarati, Hindi, Kannada, Kashmiri, Malayalam, Marathi, Oriya, Punjabi, Sinhala, Tamil, Telegu and Urdu). It also contains orthographically transcribed spoken data and parallel corpus data for five South Asian languages (Bengali, Gujarati, Hindi, Punjabi and Urdu). In addition, the parallel corpus contains the English originals from which the translations stored in the corpus were derived. All data in the corpus is CES and Unicode compliant. The EMILLE corpus totals some 94 million words. The corpora were built as part of a collaboration between Lancaster University and the Central Institute of Indian Languages, Mysore.
New project: the Leverhulme Corpus Project plans to build a corpus which matches as closely as possible the LOB and FLOB corpora of written British English, except that the year of data collection is 1931, or near to that date (+/- 3 years).
2nd call for papers The sixth Teaching And Language Corpora conference (TaLC 2004)
Local Corpus research group meetings will continue this term on Mondays at 4pm in B81, Bowland.
New books containing a selection of papers from the CL2001 conference:
Conference Announcement: The sixth Teaching And Language Corpora conference (TaLC 2004)
CLAWS part-of-speech tagger free web trial extended to 10,000 words for academic users.
Tel: +44 1524 510357 Fax: +44 1524 510492
email: ucrel at lancaster.ac.uk
Lancaster University approved pages maintained by Paul Rayson Lancaster University, UK.
All material in these pages © 1993-2014 UCREL, Lancaster University.