News:
March 2016:
This month, UCREL researchers begin a new collaborative project called
CorCenCC (Corpws Cenedlaethol Cymraeg Cyfoes (The National Corpus of
Contemporary Welsh).
This project will pioneer a community driven approach to linguistic corpus
construction. It is an interdisciplinary, collaborative project led by the
School of English, Communication and Philosophy at Cardiff University.
The £1.8m project commenced on 1st March 2016 and is funded by
the Economic and Social Research Council (ESRC) and the Arts and
Humanities Research Council (AHRC).
January 2016:
UCREL researchers have had four papers accepted for one of the top computational
conferences, the 10th edition of the Language Resources and Evaluation Conference, Portoroz (Slovenia).
The full list of the accepted papers is online at the conference website.
- OSMAN - A Novel Arabic Readability Metric.
Mahmoud El-Haj and Paul Rayson
- Learning Tone and Attribution for Financial Text Mining.
Mahmoud El-Haj, Paul Rayson, Steve Young, Andrew Moore, Martin Walker, Thomas Schleicher and Vasiliki Athanasakou
- UPPC - Urdu Paraphrase Plagiarism Corpus.
Muhammad Sharjeel, Paul Rayson and Rao Muhammad Adeel Nawab
- Lexical Coverage Evaluation of Large-scale Multilingual Semantic Lexicons for Twelve Languages.
Scott Piao, Paul Rayson, Dawn Archer, Francesca Bianchi, Carmen Dayrell, Mahmoud El-Haj, Ricardo-María Jiménez,
Dawn Knight, Michal Kren, Laura Löfberg, Rao Muhammad Adeel Nawab, Jawad Shafi, Phoey Lee Teh and Olga Mudraya
January 2016:
We regret to announce the death of Dr Richard Xiao
(Linguistics and English Language) on Saturday 2 January 2016. Richard
started his career in China and first came to Lancaster to study towards
his PhD, which he received in 2002. He stayed on as a Research Associate
until 2007, and returned as a Lecturer in 2012, after working at UCLAN
and Edge Hill. In 2014, he took early retirement due to ill health,
having been promoted first to Senior Lecturer and then to Reader. In
those two years, Richard introduced Chinese at both UG and PG levels,
acted for a year as Director of the Confucius Institute, published many
papers and several books, supervised several PhD students, and received
funding from the British Academy and the ESRC. His premature death is a
huge loss to Corpus Linguistics, Chinese Linguistics and Translation
Studies. He will be much missed by colleagues in the Department of
Linguistics and English Language, the Lancaster Confucius Institute, and
around the world. Richard leaves his wife (Lyn) and daughter (Yina). His
funeral was held in the Chaplaincy Centre on Friday 15 January 2016
at 12.30pm.
A Collection of Essays and Poems in Memory of Richard Xiao
has been produced by his friends and colleagues, edited by Professor Hu.
January 2016:
Lancaster Summer Schools
in Corpus Linguistics and other Digital methods.
Lancaster University, UK - 12th to 15th July 2016.
Call for participation - apply now!!
November 2015:
Royal approval for Lancaster University linguistics centre
Lancaster University's world-renowned language pioneers, spanning four generations of researchers, are to receive The Queen's Anniversary Prize for Higher and Further Education.
The award for Lancaster's Centre for Corpus Approaches to Social Science (CASS) was announced at a reception at St James's Palace on Thursday evening (November 19).
The Queen's Anniversary Prizes are awarded every two years to universities and colleges who submit work judged to show excellence, innovation, impact and benefit for the institution itself and for people and society generally in the wider world.
Researchers at the Centre, funded by the Economic and Social Research Council, charitable trusts and other research councils, have provided valuable insights into the understanding of language by using computers to analyse billions of words - in writing, speech and online - for the past 45 years.
The work has resulted in a huge range of important, 'real world' applications such as vastly improved dictionaries and has also influenced policy towards important issues in society such as online aggression, hate speech and the way in which end of life care is discussed.
By providing fresh perspectives to such problems, CASS, part of the longer-standing University Centre for Computer Corpus Research on Language, has helped develop new approaches to challenging practices both in terms of raising awareness and of informing policy makers and other stakeholders of how such language may be used to inform, manipulate, wound and offend.
Computers have enabled the centre, which draws staff from nine departments across campus, to analyse massive datasets of language to account for the changing patterns of use of written and spoken language in everyday contexts.
October 2015:
CORPORA AND DISCOURSE Conference - Siena 30 June - 02 July 2016 - Call for papers
Corpus-based and corpus-assisted discourse studies (CADS) investigate the employment of corpus techniques to
shed light on aspects of language used for communicative purposes or, put another way, to analyse how
language is used to (attempt to) influence the beliefs and behaviour of other people.
The conference will take place at Siena University Pontignano Conference Centre on June 30-July 2, 2016 and
will feature a Festival of Methods workshop (June 30th) convened by Charlotte Taylor, Tony McEnery, Vaclav
Brezina, in which we explore the effects of our choice of tools, methods and approaches. Ahead of each
conference in the series, a task will be set which researchers are invite to tackle and then time will be set
aside at the conference itself for presentation of findings and extended discussion on the kinds of analyses
which were developed. We are calling this new kind of panel event the Festival of Methods because we hope it
will be an engaging exploration and celebration of the range of methods we have at our disposal. This kind of
activity follows on from inter-researcher and objectivity/subjectivity studies such as Marchi & Taylor
(2009), Baker (2011) and Baker & Levon (2015), but also draws on the traditions of the shared task in
computational linguistics where conference participants are given the chance to all work on the same data
with the same research question.
For more details, see the conference website.
September 2015:
The #corpusMOOC starts on 28th September
offerring a practical introduction to the methodology of corpus linguistics for researchers in social sciences and humanities.
December 2014:
Corpus Linguistics 2015: In honour of the life and work of Geoffrey Leech
Call for Papers and Pre-Conference Workshops
The eighth international Corpus Linguistics conference (CL2015) will be held at Lancaster University from Tuesday 21st July 2015 to Friday 24th July 2015. The main conference will be preceded by a workshop day on Monday 20th July.
This series of conferences began in 2001 with an event celebrating the career of Professor Geoffrey Leech, on the occasion of his retirement. In August of 2014, we reported with great sadness Geoff's sudden death.
By dedicating this eighth conference in the Corpus Linguistics series once again to a celebration of Geoff's life, his career, and his truly remarkable influence on the field, we once more pay tribute to, and commemorate, a remarkable intellect and a sorely-missed colleague and friend.
For more details, see the conference website
August 2014:
In memory: Professor Geoffrey Leech
It is with great sorrow that we report the death on 19th August of
Professor Geoffrey Leech.
Geoff was not only the founder of the UCREL research centre for corpus
linguistics at Lancaster University, he was also the first Professor and
founding Head of the Department of Linguistics and English Language. His
contributions to linguistics - not only in corpus linguistics, but also
in English grammar, pragmatics and stylistics - were immense.
All our thoughts are with Geoff's wife Fanny, and with his family.
It is still hard for us to find the right words at this time. For many
of us he was an inspirational teacher and mentor, but for all of us, he
was a kind and generous friend.
There is a webpage where tributes, messages and memories can be posted:
http://wp.lancs.ac.uk/geoffreyleech/
April 2014:
UCREL Summer School in Corpus Linguistics,
Lancaster University, UK.
15th to 18th July 2014.
Call for Participation:
The UCREL Summer School 2014 is the fourth in a highly successful series
of free-to-attend training events that began in 2011.
Sponsored by UCREL at Lancaster University - one of the world's leading
and longest-established centres for corpus-based research - its aim is
to support students of language and linguistics in the development of
advanced skills in corpus methods.
The UCREL Summer School is intended primarily for postgraduate research
students (and secondarily for Masters-level students, postdoctoral
researchers, and others) who require in-depth knowledge of corpus-based
methodologies for their degree projects. It is not aimed at raw
beginners, but rather at students who have at least some introductory
experience of analysis using language corpora, and who wish to expand
their knowledge of key issues and techniques in cutting-edge corpus
research.
January 2014:
This month sees the start of #corpusMOOC, a Massive Open Online Course, led
by Tony McEnery and featuring other tutors from UCREL and CASS in Lancaster.
"Corpus linguistics: method, analysis, interpretation" can be followed by
signing up to the course hosted by FutureLearn.
December 2013:
Season's greetings from everyone in UCREL at Lancaster University.
As part of the University's short films from across campus
there's a video with Chris Donaldson from the
Spatial Humanities project
talking about a Dickens ghost story with a Lancaster twist.
October 2013:
ICAME Journal: call for submissions and subscriptions
The ICAME Journal is published annually by UCREL at Lancaster University in both electronic format and paper copy.
CALL FOR SUBMISSIONS:
Deadline for submissions: 1 December 2013
Deadline for reviews: 31 December 2013
The ICAME Journal invites submissions for proposed contributions in the field of English Corpus Linguistics for immediate consideration for the next issue in 2014 or the following issue in 2015. Manuscripts for articles, progress reports and shorter notices can be sent to one of the editors:
Merja Kyto (e-mail: merja.kyto@engelska.uu.se) and
Anna-Brita Stenstrom (e-mail: ab.stenstrom@telia.com)
Books for review and correspondence on reviews and abstracts should be sent to:
Ilka Mindt (e-mail: ilka.mindt@upb.de)
Date of publication of issue 38: May 2014
CALL FOR SUBSCRIPTIONS:
More information and previous issues (in PDF) are available on-line at the ICAME Journal website: http://icame.uib.no/journal.html
To subscribe to the ICAME Journal for the next issue to be published in May 2014 please visit our
secure on-line order form
Please note that a subscription to issue 38 (May 2014) is included in the registration for the ICAME 2013 conference organised by the University of Santiago de Compostela. Those who register for ICAME 2013 will automatically receive a copy at the conference.
October 2013:
Teaching and Language Corpora conference (TaLC 2014): First Call for Papers and Workshops
The next TaLC conference will take place in 2014 at Lancaster
University, UK.
The TaLC series of conferences was inaugurated in 1994. For this 20th
anniversary event, we are delighted to welcome the eleventh TaLC back
to Lancaster, the original host institution.
TaLC 11 will run from Monday 21st to Wednesday 23rd July (inclusive),
with a pre-conference workshop day on Sunday 20th July.
The call can be found on the conference website
http://ucrel.lancs.ac.uk/talc2014
May 2013:
UCREL Summer School in Corpus Linguistics
The programme consists of a series of intensive two-hour sessions, some involving practical work, others more discussion-oriented.
The UCREL Summer School is intended primarily for postgraduate research students (and secondarily for Masters-level students and postdoctoral researchers) who require in-depth knowledge of corpus-based methodologies for their degree projects. It is not aimed at raw beginners, but rather at PhD students who have at least some introductory experience of analysis using language corpora, and who wish to expand their knowledge of key issues and techniques in cutting-edge corpus research.
For more details, see the website for the summer school.
January 2013:
Developing new approaches to the study of hate speech, exploring how people talk about climate change and looking at how changes
in corporate governance are communicated will be part of the remit of a new £3.5m research hub at Lancaster University, which
will study the use and manipulation of language in society.
Funded by the Economic and Social Research Council (ESRC), the new Centre for Corpus Approaches to Social Science (CASS) will
bring the latest techniques in linguistic analysis to bear on a range of questions in the social sciences.
For more details, see the full story.
December 2012:
Isis Forensics has secured significant investment from The North West Fund for Venture Capital and Lancashire County Council's Rosebud Fund. Isis Forensics was founded by CEO Dr James Walkerdine in 2007 and is based in co-location facilities at InfoLab21. The company is an international digital forensics company which specialises in developing solutions to protect individuals and assist law enforcement with digital investigations.
The company licences language analysis software built by UCREL members.
For more details, see the full story.
December 2012:
£700k Project to Boost Clinical Assessment Rates for Cognitive Decline
Currently, only 50% of people with dementia ever receive a diagnosis that could lead to them receiving medical care and support. So urgent is this problem that novel ways to persuade people to present themselves for clinical assessment are being sought. Lancaster University is leading a project to see if computer interaction can offer new opportunities for self-referral.
The £700k SAMS (Software Architecture for Mental health Self management) project, funded under the EPSRC Working Together call, will investigate the use of data and text-mining techniques, combined with adaptive user interfaces to detect early signs of cognitive decline from the way people use their computers.
The project is led from the School of Computing and Communications by Prof. Pete Sawyer, Dr. Paul Rayson and Prof. Alistair Sutcliffe, and is joint with Prof. Alistair Burns, Dr Iracema Leroi and Prof. John Keane from Manchester University and Prof. Clive Ballard from Kings College London. The project is supported by the Dementias Neurodegen Network (DeNDRoN), The Alzheimer's Society, Microsoft Research, the University of British Columbia and Johns Hopkins University School of Medicine.
November 2012:
Applications are welcome for the following posts at the Department of Linguistics and English Language:
Lecturer in Corpus Linguistics,
Lecturer in Linguistics and English Language, and
Lecturer in Second/Foreign Language Education.
In addition, two five year research posts are available. These are to be based in the new, ESRC funded, Centre for Corpus Approaches to Social Science.
The £3.5 million Centre, to be hosted by UCREL at Lancaster University under the direction of Tony McEnery and Andrew Hardie, will run from Easter 2013 for five years and has the goal of encouraging the uptake of corpus approaches across the social sciences. Shorter research contracts and visiting opportunities attached to the Centre will be advertised over the next five years.
To find out more and to apply visit:
http://hr-jobs.lancs.ac.uk/vacancies.aspx?cat=248&type=6
October 2012:
The seventh international CORPUS LINGUISTICS conference (CL2013) will
be held at Lancaster University from Tuesday 23rd July 2013 to Friday
26th July 2013. The main conference will be preceded by a workshop day
on Monday 22nd July.
For more details, see the conference website.
October 2012:
ICAME Journal call for submissions and subscriptions
The ICAME Journal is published annually by UCREL at Lancaster University in both electronic format and paper copy.
CALL FOR SUBMISSIONS:
Deadline for submissions: 1 December 2012
Deadline for reviews: 31 December 2012
The ICAME Journal invites submissions for proposed contributions in the field of English Corpus Linguistics for immediate consideration for the next issue in 2013 or the following issue in 2014. Manuscripts for articles, progress reports and shorter notices can be sent to one of the editors:
Merja Kyto (e-mail: merja.kyto@engelska.uu.se) and
Anna-Brita Stenstrom (e-mail: ab.stenstrom@telia.com)
Books for review and correspondence on reviews and abstracts should be sent to:
Ilka Mindt (e-mail: ilka.mindt@upb.de)
Date of publication of issue 37: May 2013
CALL FOR SUBSCRIPTIONS:
More information and previous issues (in PDF) are available on-line at the ICAME Journal website: http://icame.uib.no/journal.html
To subscribe to the ICAME Journal for the next issue to be published in May 2013 please visit our
secure on-line order form
Please note that a subscription to issue 37 (May 2013) is included in the registration for the ICAME 2013 conference organised by the University of Santiago de Compostela. Those who register for ICAME 2013 will automatically receive a copy at the conference.
October 2012
Six academics from three faculties have started work on the ESRC-funded project
called Metaphor in End-of-Life Care. For the next 17 months, they will study the
metaphors used by patients, unpaid family carers and healthcare professionals in
a 1.5-million-word data set. The way in which end-of-life care is talked about
can shed light on people's views, needs, experiences and challenges, and
identify areas where increased anxiety and misunderstanding can occur. Elena
Semino, Veronika Koller, Andrew Hardie and Zsofia Demjen (Linguistics and
English Language), Paul Rayson (Computing and Communications) and Sheila Payne
(International Observatory on End of Life Care) will look at a large body of
data from three different groups of people and reflect on any differences
between the groups and the implications of the findings for providing
end-of-life care. For enquiries, please contact: e.semino@lancaster.ac.uk.
October 2011
Digital Humanities project awarded 1.5 million Euro grant: Lancaster
University has been awarded a European Research Council Starting Grant
of 1.5 million Euros
for a five-year project which will act as a flagship
programme for Digital Humanities research.
Building upon Lancaster's international expertise in Corpus Linguistics
and Geographical Information Systems (GIS), the project will develop
methodologies for the automatic extraction of place names from large
bodies of text, a process which will facilitate spatial interpretations
of both historical events and imaginative representations of space and
place.
More details ...
February 2011
The UCREL research centre
is pleased to announce its first Summer
School in Corpus Linguistics. This will take place on
Wednesday 13th, Thursday 14th, and Friday 15th July
2011 (half-days Wednesday and Friday, full-day
Thursday).
The event is free to attend, but registration *in
advance* is compulsory, as places are limited. We are
now inviting applications from anyone interested in
participating.
For more information, see
http://ucrel.lancs.ac.uk/summerschool/
October 2010
UCREL's NLP tools have been featured twice this week at events in London.
First, at a
"Science in the New Parliament" exhibition hosted by the
Parliamentary Office for Science and Technology in collaboration with
Research Councils UK. The Minister of State for Universities and Science
David Willetts MP praised instant feedback technology included in
the Voice Your View project.
(More details ...)
Second, as part of the Isis toolkit which was demonstrated at the
Online Child Protection conference where experts met to discuss
ways in which technology can help keep children safe on the internet.
(More details ...)
September 2010
Call for subscriptions and submissions: ICAME Journal:
Deadline for submissions: 1 December 2010.
The ICAME Journal invites submissions for proposed contributions in the field of English Corpus Linguistics for immediate consideration for the next issue or the following issue in 2012. Manuscripts for articles, progress reports and shorter notices can be sent to one of the editors.
More information and previous issues (in PDF) are available on-line at the
ICAME Journal website.
To subscribe to the ICAME Journal for the next issue to be published in May 2011 please visit our
secure on-line order form.
Price for one issue is GBP30.
Previous issues are still on sale via the same order form.
August 2010
Call for abstracts:
Workshop on Arabic Corpus Linguistics, Lancaster University, 11-12 April 2011.
Read more ...
February 2010
Call for Papers:
Text-mining in the Digital Humanities: The Interface between Conceptual
History, Critical Discourse Analysis and Corpus Linguistics.
(Lancaster University
Thu 13 - Fri 14 May 2010)
The aim of this interdisciplinary workshop is to explore the potential
for collaboration between researchers in Critical Discourse Analysis
(CDA), Corpus Linguistics (CL) and Conceptual History (CH), the study
of key socio-political concepts in their historical context (see
http://www.concepta-net.org/conceptual_history).
Read more ...
February 2010
Call for participation: BAAL event:
Gender and Corpus Linguistics,
Tuesday March 30th 2010,
Lancaster University.
Corpus linguistics involves the use of large collections of naturally occurring texts, encoded in electronic form so that they can be analysed with the help of computer software to extract linguistic patterns. In recent years, this approach has begun to be utilised by linguists who are interested in analysing issues relating to gender. This BAAL event aims to bring together researchers working in the field, as well as showcasing recent corpus-based gender research, and enabling debate on best practice.
Invited speakers include Bandar Al-Hejin (Lancaster), Cinzia Bevitori (Bologna), Tony McEnery (Lancaster), Brona Murphy (Edinburgh), Louise Mullany (Nottingham), Michael Pearce (Sunderland), Costas Gabrielatos (Lancaster) and Eivind Torgersen (Lancaster).
For more details, see the BAAL Gender and Language SIG pages.
January 2009
Call for subscriptions to ICAME Journal published under the auspices of the Aksis Centre (The Department of Culture, Language and Information Technology, University of Bergen, Norway) and UCREL.
To subscribe to the ICAME Journal for the next issue published in May 2009
please visit our
secure on-line order form that offers credit card, PayPal, fax, and phone order options.
September 2008
Call for papers: ICAME-30 27-31 May 2009, Lancaster, UK.
See the first circular at the
conference website.
June 2008
COMPUTER EXPERTS ARE NOW HARNESSING NEW DEVELOPMENTS IN LANGUAGE
ANALYSIS TO IDENTIFY PAEDOPHILES POSING AS CHILDREN IN ONLINE CHAT
ROOMS, to pick up on their vocabulary choices and trail them as they
move around the internet. Led by Professor Awais Rashid (Computing),
Lancaster, Swansea and Middlesex Universities have joined forces with
specialist UK law enforcement to develop tools to identify paedophiles
masquerading as children in online chat rooms. This month the
University launched Project Isis - a three-year Child Protection
Initiative which aims to develop new tools for policing websites and
supporting law enforcement, funded by the Engineering and Physical
Sciences Research Council and The Economic and Social Research Council.
Read more ...
The call for papers for the
Corpus Linguistics 2009
conference is now available. Please visit the conference website for details.
January 2008
The next CLARET workshop will take place at Lancaster University on March 31 and April 1 2008.
December 2007
Call for subscriptions to ICAME Journal published under the auspices of the Aksis Centre (The Department of Culture, Language and Information Technology, University of Bergen, Norway) and UCREL.
To subscribe to the ICAME Journal for the next issue published in May 2008
please visit our
secure on-line order form that offers credit card, fax, post, phone and purchase order options.
November 2007
New issue of the Empirical Text and Culture Research
Journal published.
September 2007
Corpus Linguistics Advanced Research Education and Training (CLARET)
funded by the AHRC.
The first workshop will take place at Liverpool University
on 29-30th November 2007.
July 2007:
New project funded by the AHRC:
Corpus-based grammar in contrast (CORGRAM).
November 2006:
Call for papers: the Corpus Linguistics 2007 conference
will be held at the University of Birmingham, 27-30 July 2007.
September 2006:
Call for submissions and subscriptions to ICAME Journal published under the auspices of the Aksis Centre (The Department of Culture, Language and Information Technology, University of Bergen, Norway) and UCREL.
To subscribe to the ICAME Journal for the next issue published in May 2007
please visit our
secure on-line order form that offers credit card, fax, post, phone and purchase order options.
Price for one issue is USD52, around 43 Euros or GBP30 dependant on the current exchange rate.
Issue 30 is still on sale via the issue 30 order form.
August 2006:
Call for submissions to two journals:
June 2006:
Call for participation: workshop on Historical Text Mining.
July 20th and 21st, Lancaster University, UK.
For more details, see the workshop webpage:
http://ucrel.lancs.ac.uk/events/htm06/
January 2006:
We are now taking orders for issue 30 of the
ICAME Journal to be published
in the spring of 2006. Please visit the
secure on-line order page.
December 2005:
Call for papers:
Third International Workshop on Language Resources for Translation Work, Research & Training.
A Satellite Event of LREC 2006 (5th Language Resources and Evaluation
Conference).
Date: 28th May 2006.
Venue: Magazzini del Cotone Conference Center, Genoa, Italy.
November 2005:
Call for papers:
EACL 2006 Workshop on
Multi-word-expressions in a multilingual context.
April 3rd 2006, Trento, Italy.
http://ucrel.lancs.ac.uk/EACL06MWEmc/
October 2005:
Announcing a new journal: Corpora
Corpora is a new journal focusing on the many and varied uses of
corpora both in linguistics and beyond. The journal accepts articles
presenting research findings based on the exploitation of corpora as
well as accounts of corpus building, corpus tool construction and
corpus annotation schemes. The journal will be published by
Edinburgh University Press. For more details, see the
Corpora Journal Home Page.
August 2005:
New project funded by the Leverhulme Trust
entitled "Changing English Across the 20th Century: a corpus-based study".
The main aim of the research is to carry out an investigation of areas
of change in grammatical usage in 20th Century British English,
focussing on the verb phrase. The study will be based on a package of
four corpora sampled at regular intervals: 1991 – 1961 – 1931 – 1901.
Two sub-goals are to: a) Compile a new corpus of British English called
Lancaster1901 focussing on the beginning of the twentieth century. b)
Enhance the encoding and annotation of Lancaster1901 and the three
existing corpora (Lancaster1931, LOB and FLOB), and release the
enhancements to the academic community.
For more information see the
Lancaster University press release
and the project details.
April 2005:
New project funded by the EPSRC:
Assist (Automated Semantic Assistance for Translators)
aims to address the problem of providing contextual examples of
translation equivalents for words from the general lexicon. We will employ
comparable corpora, an existing semantic field annotation system for English and
develop a new semantic field tagger for Russian.
New project funded by the British Academy:
Scragg revisited - a quantitative
investigation of spelling variation across the centuries.
October 2004:
Lancaster is involved in the AHRB ICT Methods Network, see the
AHRB press release
and the network site at CCH for more details.
September 2004:
ELRA have announced that new Written Language
Resources are available in their catalogue.
You will find below their short descriptions. Please
visit their on-line catalogue to get more detailed
information: www.elda.fr
and www.elra.info
*** ELRA-W0037 The EMILLE/CIIL Corpus ***
The EMILLE/CIIL Corpus consists of monolingual corpora
containing approximately 92,799,000 words for 14 South Asian
languages (Assamese, Bengali, Gujarati, Hindi, Kannada,
Kashmiri, Malayalam, Marathi, Oriya, Punjabi, Sinhala, Tamil,
Telegu and Urdu) (including 2,627,000 words of transcribed spoken
data for Bengali, Gujarati, Hindi, Punjabi and Urdu), a parallel corpus
of 200,000 words in English with translations in Hindi, Bengali, Punjabi,
Gujarati and Urdu. Annotations include Urdu monolingual and parallel
corpora annotated for parts-of-speech, and 20 written Hindi corpus files
annotated to show the nature of demonstrative use. All other components
are annotated at the sentence level. The corpus is marked up using CES-
compliant SGML and encoded using Unicode.
*** ELRA-W0038 The EMILLE Lancaster Corpus ***
The EMILLE Lancaster Corpus consists of monolingual corpora
containing approximately 58,880,000 words for seven South Asian
languages (Bengali, Gujarati, Hindi, Punjabi, Sinhala, Tamil and Urdu)
(including 2,627,000 words of transcribed spoken data for Bengali, Gujarati,
Hindi, Punjabi and Urdu), a parallel corpus of 200,000 words in English with
translations in Hindi, Bengali, Punjabi, Gujarati and Urdu. Annotations include
Urdu monolingual and parallel corpora annotated for parts-of-speech, and 20
written Hindi corpus files annotated to show the nature of demonstrative use.
All other components are annotated at the sentence level. The corpus is
marked up using CES-compliant SGML and encoded using Unicode.
*** ELRA-W0039 The Lancaster Corpus of Mandarin Chinese (LCMC) ***
The Lancaster Corpus of Mandarin Chinese (LCMC) sampled 15 written
text categories including news, literary texts, academic prose and official
documents etc published in P. R. China in the earlier 1990s for a total of
approximately 1 million words. The same sampling frame and period as
FLOB/FROWN were used in LCMC. The corpus is encoded in Unicode (UTF-8)
and marked up in XML.
August 2004:
CALL FOR PRE-CONFERENCE WORKSHOP PROPOSALS.
Deadline: December 3rd, 2004.
Corpus Linguistics 2005
Birmingham, July 14th-17th.
Organisers: University of Birmingham and University of Lancaster
http://www.corpus.bham.ac.uk/conference
Proposals are invited for pre-conference workshops on July 14th at the
University of Birmingham. The conference, Corpus Linguistics 2005, is
run jointly by the universities of Birmingham and Lancaster, and is the
third biennial conference in the series on Corpus Linguistics. The
workshops and the conference will be held at the University of
Birmingham between July 14th -17th 2005
May 2004:
New project funded by the Andrew W. Mellon Foundation:
WordHoard
applies to highly canonical literary texts the insights and
techniques of corpus linguistics.
January 2004:
Release of the
Lancaster Corpus of Mandarin Chinese,
a Mandarin Chinese match for the FLOB and
FROWN corpora. The corpus is part-of-speech tagged and available, free
of charge, for use in non-profit making research.
Release of the
EMILLE/CIIL corpus.
The corpus contains monolingual written corpus data for 14 South
Asian languages (Assamese, Bengali, Gujarati, Hindi, Kannada, Kashmiri,
Malayalam, Marathi, Oriya, Punjabi, Sinhala, Tamil, Telegu and Urdu). It
also contains orthographically transcribed spoken data and parallel
corpus data for five South Asian languages (Bengali, Gujarati, Hindi,
Punjabi and Urdu). In addition, the parallel corpus contains the English
originals from which the translations stored in the corpus were derived.
All data in the corpus is CES and Unicode compliant. The EMILLE corpus
totals some 94 million words.
The corpora were built as part of a collaboration between Lancaster
University and the Central Institute of Indian Languages, Mysore.
December 2003:
Call for papers: 5th International Conference on Discourse Anaphora and Anaphor Resolution (DAARC2004)
November 2003:
New project: the Leverhulme Corpus Project plans to
build a corpus which matches as closely as possible the LOB and FLOB corpora of written British English, except that the year of data collection is 1931, or near to that date (+/- 3 years).
2nd call for papers The sixth Teaching And Language Corpora conference (TaLC 2004)
Local Corpus research group
meetings will continue this term on Mondays at 4pm in B81, Bowland.
September 2003:
New books containing a selection of papers from the CL2001 conference:
Wilson, A., Rayson, P. and McEnery, T. (eds.) (2003)
Corpus Linguistics by the Lune: a festschrift for Geoffrey Leech.
Peter Lang, Frankfurt.
(Volume 8 in the Lodz studies in Language Series edited by
Lewandowska-Tomaszczyk, B. and Melia, P. J.)
ISBN 3-631-50952-2
Wilson, A., Rayson, P. and McEnery, T. (eds.) (2003)
A Rainbow of Corpora: Corpus Linguistics and the Languages of the World.
Lincom-Europa, München.
ISBN 3 89586 872 8. Linguistics Edition 40. 174 pp.
August 2003:
Conference Announcement: The sixth Teaching And Language Corpora conference (TaLC 2004)
CLAWS part-of-speech tagger free web trial extended to 10,000 words for academic users.
March 2003:
Initial release of the Lancaster Newsbooks Corpus
EMILLE Corpus Beta version released
Recent conference: Corpus Linguistics 2003
Book Series Announcement: Routledge Advances in Corpus Linguistics