Empirical Text and Culture Research (ETC) published by RAM-Verlag.
ICAME published under the auspices of the Aksis Centre (The Department of Culture, Language and Information Technology, University of Bergen, Norway) and UCREL.
The proceedings of the Corpus Linguistics conference series are online:
McEnery, T. and Hardie, A. (2012) Corpus Linguistics: Method, theory and practice. Cambridge: Cambridge University Press.
Corpus Linguistics: Method, theory and practice is a textbook introducing corpus linguistics, published by Cambridge University Press, and written by Tony McEnery and Andrew Hardie. The support website (http://corpora.lancs.ac.uk/clmtp/) contains two broad types of material: four sample sections drawn from the book, and supplementary material including answers to the exercises in the book, extended footnotes, extended references, and directories of web-links relevant to the book.
Baker, P. (ed.) (2009) Contemporary Corpus Linguistics.
"The inclusion of Contemporary in the title is no idle boast - all of these papers take corpus linguistics forward in exciting and challenging ways." - Michael Hoey, Baines Professor of English language, University of Liverpool, UK.
Corpus Linguistics uses large electronic databases of language to examine hypotheses about language use. These can be tested scientifically with computerised analytical tools, without the researcher's preconceptions influencing their conclusions. For this reason, Corpus Linguistics is a popular and expanding area of study.
Contemporary Corpus Linguistics presents a comprehensive survey of the ways in which Corpus Linguistics is being used by researchers. Written by internationally renowned linguists, this volume of seventeen introductory chapters aims to provide a snapshot of the field of corpus linguistics. The contributors present accessible, yet detailed, analyses of recent methods and theory in Corpus Linguistics, ways of analysing corpora, and recent applications in translation, stylistics, discourse analysis and language teaching.
The book represents the best of current practice in Corpus Linguistics, and as a one volume reference will be invaluable to students and researchers looking for an overview of the field.
Andrew Wilson, Dawn Archer and Paul Rayson (eds.) (2006)
Corpus linguistics around the world.
Rodopi, Amsterdam, pp. 233. ISBN 90-420-1836-4
(Appears in the series Language and Computers
This volume contains a selection of the papers delivered at the Corpus Linguistics 2003 conference, held at Lancaster University in April 2003. The papers selected address a wide range of world languages - Basque, Chinese, Danish, Dutch, English, French, German, Maltese, Russian, Spanish, and Slovene. Both synchronic and diachronic studies are included, as well as studies of learner language. In addition to mainstream linguistic analyses of phonetics, vocabulary, syntax, semantics, and rhetoric, application areas covered in the volume include financial forecasting, cross-cultural research, corpus processing, and language teaching.
Leech, G. (2006) A Glossary of English Grammar. Edinburgh: Edinburgh University Press.
This is an alphabetic guide to common terms used in the description of the English language. "A Glossary of English Grammar" presents a wide range of terms used to describe the way the English language is structured. Grammatical terms can be a problem for students, especially when there are alternative names for the same thing (for example, 'past tense' and 'preterite'). This book therefore provides a basic and accessible guide, focusing on the English language. Definitions of grammatical terms are given in simple language, with clear examples, many from authentic texts and spoken sources, showing how they are used. The terms used in the "Comprehensive Grammar of the English Language" are widely seen as standard, and form the basis of grammatical terminology in this book. At the same time, this glossary takes account of other variants of English grammar, including the most important terms from Huddleston and Pullum's influential "Cambridge Grammar of the English Language". This book is indispensable for anyone wishing to understand present-day terminology of English grammar more fully.
Baker, P., Hardie, A. & McEnery, A.(2006)
A Glossary of Corpus Linguistics. Edinburgh: Edinburgh University Press.
This book presents a comprehensive glossary of terms used in corpus linguistics. This alphabetic guide provides definitions and discussion of key terms used in corpus linguistics. Corpus data is being used in a growing number of English and Linguistics departments which have no record of past research with corpus data. This is the first comprehensive glossary of the many specialist terms in corpus linguistics and will be useful for corpus linguists and non corpus linguists alike. Clearly written by a team of experienced academics in the field, the glossary provides full coverage of both traditional and contemporary terminology. Entries are focused around the following broad groupings: important corpora; key technical terms in the field; key linguistic terms relevant to corpus-based research; key statistical measures used in corpus linguistics; key computer programme/retrieval systems used in the construction and exploitation of corpora; and standards applied within the field of corpus linguistics.
Baker, P. (2006) Using Corpora in Discourse Analysis. London: Continuum.
This book examines approaches to carrying out discourse analysis (DA) using techniques that are grounded in corpus linguistics. Assuming no prior knowledge of corpora, the book examines and evaluates a variety of corpus-based methodologies including: collocations, keyness, concordances, dispersion plots, and building and annotating corpora. Illustrated with a number of real-life examples of corpus-based DA from a range of sources and covering a variety of subjects, this is an informative introduction to using corpus linguistics as a methodology in discourse analysis.
McEnery, T., Xiao, R. and Tono, Y. (2005).
Corpus-based Language Studies: An Advanced Resource Book
The corpus-based approach to linguistic analysis and language teaching has come to prominence over the past two decades. This book seeks to bring readers up to date with the latest developments in corpus-based language studies. The only textbook to adopt a 'how to' approach with exercises and cases, it covers all the major theoretical approaches to the use of corpus data and affords students and researchers alike readings from eminent figures in the discipline. In comparison with the existing introductory books in corpus linguistics, Corpus-based Language Studies is unique in a number of ways.
McEnery, T. (2005). Swearing in English: Bad Language, Purity and Power from 1586 to the Present. London: Routledge.
Swearing is an everyday part of the language of most speakers of modern English. This corpus informed account of swearing describes swearing and also outlines its social function, with a particular focus on the relationship between swearing and abuse. Do men use bad language more than women? How do social class and the use of bad language interact? Do young speakers use bad language more frequently than older speakers? Using the spoken section of the British National Corpus, "Swearing in English" explores questions such as these and considers at length the historical origins of modern attitudes to bad language. Drawing on a variety of methodologies including historical research and corpus linguistics, and a range of data such as corpora, dramatic texts, early modern newsbooks and television, Tony McEnery takes a socio-historical approach to discourses about bad language in English. Arguing that purity of speech and power have come to be connected via a series of moral panics about bad language, the book contends that these moral panics, over time, have generated the differences observable in bad language usage in present day English. A fascinating, comprehensive insight into an increasingly popular area, this book provides an explanation, and not simply a description, of how modern attitudes to bad language have come about.
Baker, P. (2005) Public Discourses of Gay Men. London: Routledge.
"Public Discourses of Gay Men" brings queer linguistics, an aspect of sociolinguistics, together with corpus linguistics to investigate the way gay male identities are constructed in the public domain. The book uses data from a range of publicity available sources, both written and spoken, to analyze the language surrounding homosexuality. For more details, see Paul's website.
Xiao, R. and McEnery, T. (2004).
Aspect in Mandarin Chinese: A corpus-based study . Amsterdam : John Benjamins.
Chinese, as an aspect language, has played an important role in the development of aspect theory. This book is a systematic and structured exploration of the linguistic devices that Mandarin Chinese employs to express aspectual meanings. The work presented here is the first corpus-based account of aspect in Chinese, encompassing both situation aspect and viewpoint aspect. In using corpus data, the book seeks to achieve a marriage between theory-driven and corpus-based approaches to linguistics. The corpus-based model presented explores aspect at both the semantic and grammatical levels. At the semantic level a two-level model of situation aspect is proposed, which covers both the lexical and sentential levels, thus giving a better account of the compositional nature of situation aspect. At the grammatical level four perfective and four imperfective aspects in Chinese are explored in detail. This exploration corrects many intuition-based misconceptions, and associated misleading conclusions, about aspect in Chinese common in the literature. See the website for the book.
Wilson, A., Rayson, P. and McEnery, T. (eds.) (2003)
Corpus Linguistics by the Lune: a festschrift for Geoffrey Leech.
Peter Lang, Frankfurt.
(Volume 8 in the Lodz studies in Language Series edited by
Lewandowska-Tomaszczyk, B. and Melia, P. J.)
Geoffrey Leech is one of the pioneers of modern computer corpus
linguistics. This Festschrift contains contributions from colleagues
around the world who have been influenced, in various ways, by the
approaches to language and text which Geoff pioneered. Many of the
contributions focus (both synchronically and diachronically) on the
English language, which has been Geoff's main interest throughout his
career. However, work on Polish, French, Biblical Greek, and Creole
studies is also reported, which demonstrates the influence that Geoff
has had on scholars outside of English linguistics. The papers in this
book - all given at the Corpus Linguistics 2001 conference held in
Geoff's honour - cover a wide range of topics and applications within
corpus linguistics, including corpus building and annotation,
lexicology, syntax, sociolinguistics, stylistics and pragmatics.
Wilson, A., Rayson, P. and McEnery, T. (eds.) (2003) A Rainbow of Corpora: Corpus Linguistics and the Languages of the World. Lincom-Europa, München. ISBN 3 89586 872 8. Linguistics Edition 40. 174 pp.
The aim of this volume is to showcase the range of corpus-based linguistic
research currently being carried out on languages other than English.
The papers included report on work carried out on Arabic, Bulgarian, Czech,
Dutch, French, German, Biblical Greek, Biblical Hebrew, Medieval Irish, Korean,
Romanian and Swedish, including a number of regional and social variants. They
also address a range of areas as diverse as corpus design, corpus annotation,
register analysis, syntax, and quantitative linguistics.
The papers in this volume will leave the reader in no doubt that corpus-based
research is now being conducted for a whole "rainbow of languages".
The papers included report on work carried out on Arabic, Bulgarian, Czech, Dutch, French, German, Biblical Greek, Biblical Hebrew, Medieval Irish, Korean, Romanian and Swedish, including a number of regional and social variants. They also address a range of areas as diverse as corpus design, corpus annotation, register analysis, syntax, and quantitative linguistics.
The papers in this volume will leave the reader in no doubt that corpus-based research is now being conducted for a whole "rainbow of languages".
Routledge Advances in Corpus Linguistics
Editors: Tony McEnery (Lancaster University) and Michael Hoey (Liverpool University)
This book series is designed to provide an opportunity for researchers to
publish research monographs in corpus linguistics with a major international
publisher, Routledge. The series aims to publish new and challenging research
reflecting on the methodology of corpus linguistics itself and/or on its
application to specific areas of linguistics. No approach to corpus data is
excluded from the series. Indeed the editors hope that, over time, the series
will provide a useful forum in which methodological and theoretical
related to corpus use can be fruitfully debated.
For further information see the
For further information see the publisher's site.
Leech, G., Rayson, P., and Wilson, A. (2001).
Word Frequencies in Written and Spoken English: based on the British National Corpus.
Longman, London. ISBN 0582-32007-0
For more details see the companion website.
Table of Contents:
Botley, S. P., McEnery, T., and Wilson, A. (eds.) (2000)
Multilingual Corpora in Teaching and Research
Rodopi, Amsterdam. ISBN 90-420-0541-6
Table of Contents:
Wichmann, A., Fligelstone, S., McEnery, T., and Knowles, G. (eds.) (1997)
Teaching and Language Corpora
Longman, London. ISBN 0-582-27609-8
Corpora are well-established as a resource for language research; they are now also increasingly being used for reaching purposes. This book is the first of its kind to deal explicitly and in a wide-ranging way with the use of corpora in teaching. It contains an extensive collection of articles by corpus linguists and practising teachers, covering not only the use of data to inform and create teaching materials but also the direct exploitation of corpora by students, both in the study of linguistics in general and in the acquisition of proficiency in individual languages, including English, Welsh, German, French and Italian. In addition, the book offers practical information on the sources of corpora and concordances, including those suitable for work on non-roman scripts such as Greek and Cyrillic.
Teaching and Language Corpora is suitable for language teachers in higher secondary and tertiary education and applied linguistics specialists who have an interest in learner-centred approaches.
Garside, R., Leech, G., and McEnery, A. (eds.) (1997)
Corpus Annotation: Linguistic Information from Computer Text Corpora
ISBN 0582 29837 7 (pbk). pp 281.
This is the first book to survey the growing field of research known as corpus annotation. It is well known that the computer corpus - an electronic collection of texts, often of millions of words - has become a central resource for many aspects of linguistics, information technology and the processing of human language. Increasingly, it is seen as essential to annotate a corpus linguistically in order to successfully extract information from it. Annotation takes place at various levels, such as part-of-speech tagging, parsing, semantic tagging, and discourse annotation. These tasks are typically carried out by a combination of automatic and manual techniques, the automatic techniques often involving innovative probabilistic models of language. Annotation is not only a highly practical task: it also sheds new light on the nature of language and the most effective means of analysing it.
Corpus Annotation gives an up-to-date picture of this fascinating new area of research, and will provide essential reading for newcomers to the field as well as those already involved in corpus annotation. Early chapters introduce the different levels and techniques of corpus annotation. Later chapters deal with software developments, applications, and the development of standards for the evaluation of corpus annotation. While the book takes detailed account of research world-wide, its focus is particularly on the work of the UCREL (University Centre for Computer Corpus Research on Language) team at Lancaster University, which has been at the forefront of developments in the field of corpus annotation since its beginnings in the 1970s.
All three editors teach at Lancaster University, and are well known for their work in linguistics, computer science and corpus-based language research. Roger Garside is Senior Lecturer in the Department of Computing, Geoffrey Leech is Research Professor in English Linguistics, and Tony McEnery is Lecturer in Linguistics and Modern English Language.
McEnery, T., and Wilson, A. (1996)
(Edinburgh Textbooks in Empirical Linguistics Series).
Edinburgh University Press, UK. pp.209. ISBN 0 7486 0482 0
This is the first undergraduate course-book for the teaching of a corpus-based approach to language and linguistics. It gives a step-by-step introduction to what a corpus is, how corpora are constructed, and what can be done with them. Each chapter ends with a section of study questions which contain practical corpus-based exercises. With an increased interest in the use of corpora, this book fills an urgent and increasing need for a suitable undergraduate coursebook on this fast-growing subject.
Thomas, J., and Short, M. (eds) (1996).
Using corpora for language research:
Studies in the Honour of Geoffrey Leech.
Longman, London. 301pp + ix,
ISBN 0582 248787 (Hbk) 0582 248779 (Pbk).
[Published January 17th 1996, for Prof. Geoffrey Leech's 60th birthday]
Corpus linguistics is a relatively new subject in linguistics whereby corpora or collections of spoken or written texts are stored on computer in a tagged form and used for linguistic analysis. Up until now most corpus-based research has focused on grammar and lexicography, but as this new text shows, the corpus approach can be applied to a wide range of areas of language study including translation, stylistics, foreign language teaching and language testing. This book is in honour of Geoffrey Leech, a leading contributor to the field of corpus linguistics. It is an authoritative guide showing how to develop and use corpora for language research.
Using Corpora for Language Research is designed to be used by non-specialists in corpus work who have an intrest in language study. Written in a clear and accessible style, this important text will act as a catalyst for the use of the corpus approach in many areas of language research and teaching.
Leech, G., Myers, G. and Thomas, J. (eds.) (1995).
Spoken English on Computer: Transcription, Mark-up and Application.
London: Longman, pp.xii+260.
[based on the proceedings of the ESRC-funded Lancaster
Workshop on Computerized Spoken Discourse,
The computer analysis of corpora - large bodies of language data stored on computer - has rapidly emerged as a leading paradigm of linguistic research and is now becoming the basis for studying spoken language in many different applications. Both topical and timely, this book addresses the basic issues of how to represent spoken language on the computer. It also brings together for the first time contributions on particular applications of computerised spoken language, such as language pathology, sociolinguistics, lexicography, speech and language technology. The contributions are written by leading world experts in the field, including Wallace Chafe, Jane Edwards, Stig Johansson and John Sinclair.
Divided into three sections, each with an accessible editorial introduction, the book offers a wide coverage of the subject, combining theoretical, practical and descriptive issues and materials. It includes numerous detailed examples of different transcription schemes, together with samples of transcribed spoken data.
This book will be of value to postgraduate students, researchers and lecturers working in corpus linguistics, speech technology and English language studies, as well as undergraduate students of linguistics and linguists needing to know more about the subject generally.
Black, E., Garside, R., Leech, G. (eds) (1993).
Statistically-driven computer grammars of English:
The IBM/Lancaster approach. Amsterdam, Rodopi. pp248.
This book is about building computer programs that parse (analyze, or "diagram") sentence of a "real-world" English. The English we are concerned with might be a corpus of everyday, naturally-occurring prose, such as the entire text of this morning's newspaper. Most programs that now exist for this purpose are not very successful at finding the correct analysis for everyday sentences. In contrast, the programs described here make use of a more successful statistically-driven approach.
Our book is, first, a record of a five-year research collaboration between IBM and Lancaster University. Large numbers of real-world sentences were "fed into the memory" of a program for grammatical analysis (including a detailed grammar of English) and processed by statistical methods. The idea is to single out the correct parse, among all those offered by the grammar, on the basis of probabilities.
Second, this is a "how-to" book, showing how to build and implement a statistically-driven broad-coverage grammar of English. We even supply our own grammar, with the necessary statistical algorithms, and with the knowledge needed to prepare a very large set (or corpus) of sentences so that it can be used to guide the statistical processing of the grammar's rules.
Garside R., Leech G. and Sampson G. (eds) (1987)
The Computational Analysis of English: A Corpus-based Approach.
Over the past five to ten years, a research team based mainly at the Universities and Lancaster and Leeds has been engaged in a distinctive method of analysing the English language by computer. This book, edited by three leading participants in that research, deals with its background, current achievements in grammatical analysis and practical applications.
The approach this book describes is termed "corpus-based" because it postulates that in order to program computers to process unrestricted human language, it is necessary to work with a large collection of computer-readable texts of varied kinds, that is a computer corpus, such as the Lancaster-Oslo/Bergen Corpus underlying much of the research reported here.
A large general corpus of English inevitably contains rare and exceptional usages, as well as normal usages. Such research therefore yields more detailed and adequate descriptions of a language, in terms of lexicon and grammar, than are available by other means, and also motivates new methods of language study, based on probabilistic reasoning. The probabilistic approach also leads to "robust" methods of language analysis. This methodology has varied applications in such areas as textual error detection and speech synthesis.