UCREL Publications Bookshelf


Corpora Journal cover Corpora (Corpus-based Language Learning, Language Processing and Linguistics) published by Edinburgh University Press

Empirical Text and Culture Research (ETC) published by RAM-Verlag.

ICAME published under the auspices of the Aksis Centre (The Department of Culture, Language and Information Technology, University of Bergen, Norway) and UCREL.

Book series

Paul Rayson (Lancaster University) and Mark Davies (Brigham Young University) are editors of the Routledge Frequency Dictionaries Series. Tony McEnery (Lancaster University) and Michael Hoey (Liverpool University) are editors of the Routledge Advances in Corpus Linguistics book series.

Technical papers

UCREL publishes a series of fully-refereed Technical Papers, under the general editorship of Andrew Wilson and Tony McEnery.

Journal and conference papers

There is also a reference list of previous publications by members of UCREL.

The proceedings of the Corpus Linguistics conference series are online:


Here is a selection of books published involving members of UCREL:

McEnery, T. and Hardie, A. (2012) Corpus Linguistics: Method, theory and practice. Cambridge: Cambridge University Press.

Corpus Linguistics: Method, theory and practice is a textbook introducing corpus linguistics, published by Cambridge University Press, and written by Tony McEnery and Andrew Hardie. The support website (http://corpora.lancs.ac.uk/clmtp/) contains two broad types of material: four sample sections drawn from the book, and supplementary material including answers to the exercises in the book, extended footnotes, extended references, and directories of web-links relevant to the book.

Baker, P. (ed.) (2009) Contemporary Corpus Linguistics. Continuum.
Hardback: ISBN: 9780826496102 Pages: 368


"The inclusion of Contemporary in the title is no idle boast - all of these papers take corpus linguistics forward in exciting and challenging ways." - Michael Hoey, Baines Professor of English language, University of Liverpool, UK.

Corpus Linguistics uses large electronic databases of language to examine hypotheses about language use. These can be tested scientifically with computerised analytical tools, without the researcher's preconceptions influencing their conclusions. For this reason, Corpus Linguistics is a popular and expanding area of study.

Contemporary Corpus Linguistics presents a comprehensive survey of the ways in which Corpus Linguistics is being used by researchers. Written by internationally renowned linguists, this volume of seventeen introductory chapters aims to provide a snapshot of the field of corpus linguistics. The contributors present accessible, yet detailed, analyses of recent methods and theory in Corpus Linguistics, ways of analysing corpora, and recent applications in translation, stylistics, discourse analysis and language teaching.

The book represents the best of current practice in Corpus Linguistics, and as a one volume reference will be invaluable to students and researchers looking for an overview of the field.

Chapter 1 Introduction - Paul Baker
Chapter 2 Searching for Metaphorical Patterns in Corpora - Alice Deignan
Chapter 3 Corpora and Critical Discourse Analysis - Gerlinde Mautner
Chapter 4 Corpus Stylistics and the Pickwickian watering-pot - Michaela Mahlberg
Chapter 5 The Metalanguage of impoliteness: Using Sketch Engine to Explore the Oxford English Corpus - Jonathan Culpeper
Chapter 6 Issues in the Design and Development of Software Tools for Corpus Studies: The Case for Collaboration - Laurence Anthony
Chapter 7 Compatibility Between Corpus Annotation Efforts and its Effect on Computational Linguistics - Adam Meyers
Chapter 8 Spoken Corpus Analysis: Multimodal Approaches to Language Description - Irina Dahlmann and Svenja Adolphs
Chapter 9 Fixed Collocational Patterns in Isolexical and Isotextual Versions of a Corpus - David Oakey
Chapter 10 Corpus Linguistics and Language Variation - Michael P. Oakes
Chapter 11 Integrating Learner Corpus Analysis into a Probabilistic Model of Second Language Acquisition - Yukio Tono
Chapter 12 English Language Teaching and Corpus Linguistics: Lessons from the American National Corpus - Randi Reppen
Chapter 13 The Impact of Corpora on Dictionaries - Patrick Hanks
Chapter 14 Using Corpora in Translation Studies: The State of the Art - Richard Xiao and Ming Yue
Chapter 15 Corpus Linguistics and the Languages of South Asia: Some Current Research Directions - Andrew Hardie
Chapter 16 The Web as Corpus Versus Traditional Corpora: Their Relative Utility for Linguists and Language Learners - Robert Lew
Chapter 17 Building and Analysing Corpora of Computer-Mediated Communication - Brian King

Andrew Wilson, Dawn Archer and Paul Rayson (eds.) (2006) Corpus linguistics around the world. Rodopi, Amsterdam, pp. 233. ISBN 90-420-1836-4 (Appears in the series Language and Computers Number 56).

This volume contains a selection of the papers delivered at the Corpus Linguistics 2003 conference, held at Lancaster University in April 2003. The papers selected address a wide range of world languages - Basque, Chinese, Danish, Dutch, English, French, German, Maltese, Russian, Spanish, and Slovene. Both synchronic and diachronic studies are included, as well as studies of learner language. In addition to mainstream linguistic analyses of phonetics, vocabulary, syntax, semantics, and rhetoric, application areas covered in the volume include financial forecasting, cross-cultural research, corpus processing, and language teaching.

Leech, G. (2006) A Glossary of English Grammar. Edinburgh: Edinburgh University Press.

This is an alphabetic guide to common terms used in the description of the English language. "A Glossary of English Grammar" presents a wide range of terms used to describe the way the English language is structured. Grammatical terms can be a problem for students, especially when there are alternative names for the same thing (for example, 'past tense' and 'preterite'). This book therefore provides a basic and accessible guide, focusing on the English language. Definitions of grammatical terms are given in simple language, with clear examples, many from authentic texts and spoken sources, showing how they are used. The terms used in the "Comprehensive Grammar of the English Language" are widely seen as standard, and form the basis of grammatical terminology in this book. At the same time, this glossary takes account of other variants of English grammar, including the most important terms from Huddleston and Pullum's influential "Cambridge Grammar of the English Language". This book is indispensable for anyone wishing to understand present-day terminology of English grammar more fully.

Baker, P., Hardie, A. & McEnery, A.(2006) A Glossary of Corpus Linguistics. Edinburgh: Edinburgh University Press.

This book presents a comprehensive glossary of terms used in corpus linguistics. This alphabetic guide provides definitions and discussion of key terms used in corpus linguistics. Corpus data is being used in a growing number of English and Linguistics departments which have no record of past research with corpus data. This is the first comprehensive glossary of the many specialist terms in corpus linguistics and will be useful for corpus linguists and non corpus linguists alike. Clearly written by a team of experienced academics in the field, the glossary provides full coverage of both traditional and contemporary terminology. Entries are focused around the following broad groupings: important corpora; key technical terms in the field; key linguistic terms relevant to corpus-based research; key statistical measures used in corpus linguistics; key computer programme/retrieval systems used in the construction and exploitation of corpora; and standards applied within the field of corpus linguistics.

Baker, P. (2006) Using Corpora in Discourse Analysis. London: Continuum.

This book examines approaches to carrying out discourse analysis (DA) using techniques that are grounded in corpus linguistics. Assuming no prior knowledge of corpora, the book examines and evaluates a variety of corpus-based methodologies including: collocations, keyness, concordances, dispersion plots, and building and annotating corpora. Illustrated with a number of real-life examples of corpus-based DA from a range of sources and covering a variety of subjects, this is an informative introduction to using corpus linguistics as a methodology in discourse analysis.

McEnery, T., Xiao, R. and Tono, Y. (2005). Corpus-based Language Studies: An Advanced Resource Book Routledge, London.

The corpus-based approach to linguistic analysis and language teaching has come to prominence over the past two decades. This book seeks to bring readers up to date with the latest developments in corpus-based language studies. The only textbook to adopt a 'how to' approach with exercises and cases, it covers all the major theoretical approaches to the use of corpus data and affords students and researchers alike readings from eminent figures in the discipline. In comparison with the existing introductory books in corpus linguistics, Corpus-based Language Studies is unique in a number of ways.

  • A book which covers how-to and why
  • A book which engages with a range of approaches to the use of corpus data
  • A book which is more focused on multilingual corpus linguistics
See the companion website for the book.

McEnery, T. (2005). Swearing in English: Bad Language, Purity and Power from 1586 to the Present. London: Routledge.

Swearing is an everyday part of the language of most speakers of modern English. This corpus informed account of swearing describes swearing and also outlines its social function, with a particular focus on the relationship between swearing and abuse. Do men use bad language more than women? How do social class and the use of bad language interact? Do young speakers use bad language more frequently than older speakers? Using the spoken section of the British National Corpus, "Swearing in English" explores questions such as these and considers at length the historical origins of modern attitudes to bad language. Drawing on a variety of methodologies including historical research and corpus linguistics, and a range of data such as corpora, dramatic texts, early modern newsbooks and television, Tony McEnery takes a socio-historical approach to discourses about bad language in English. Arguing that purity of speech and power have come to be connected via a series of moral panics about bad language, the book contends that these moral panics, over time, have generated the differences observable in bad language usage in present day English. A fascinating, comprehensive insight into an increasingly popular area, this book provides an explanation, and not simply a description, of how modern attitudes to bad language have come about.

Baker, P. (2005) Public Discourses of Gay Men. London: Routledge.

"Public Discourses of Gay Men" brings queer linguistics, an aspect of sociolinguistics, together with corpus linguistics to investigate the way gay male identities are constructed in the public domain. The book uses data from a range of publicity available sources, both written and spoken, to analyze the language surrounding homosexuality. For more details, see Paul's website.

Xiao, R. and McEnery, T. (2004). Aspect in Mandarin Chinese: A corpus-based study . Amsterdam : John Benjamins.

Chinese, as an aspect language, has played an important role in the development of aspect theory. This book is a systematic and structured exploration of the linguistic devices that Mandarin Chinese employs to express aspectual meanings. The work presented here is the first corpus-based account of aspect in Chinese, encompassing both situation aspect and viewpoint aspect. In using corpus data, the book seeks to achieve a marriage between theory-driven and corpus-based approaches to linguistics. The corpus-based model presented explores aspect at both the semantic and grammatical levels. At the semantic level a two-level model of situation aspect is proposed, which covers both the lexical and sentential levels, thus giving a better account of the compositional nature of situation aspect. At the grammatical level four perfective and four imperfective aspects in Chinese are explored in detail. This exploration corrects many intuition-based misconceptions, and associated misleading conclusions, about aspect in Chinese common in the literature. See the website for the book.

Wilson, A., Rayson, P. and McEnery, T. (eds.) (2003) Corpus Linguistics by the Lune: a festschrift for Geoffrey Leech. Peter Lang, Frankfurt. (Volume 8 in the Lodz studies in Language Series edited by Lewandowska-Tomaszczyk, B. and Melia, P. J.) ISBN 3-631-50952-2

Geoffrey Leech is one of the pioneers of modern computer corpus linguistics. This Festschrift contains contributions from colleagues around the world who have been influenced, in various ways, by the approaches to language and text which Geoff pioneered. Many of the contributions focus (both synchronically and diachronically) on the English language, which has been Geoff's main interest throughout his career. However, work on Polish, French, Biblical Greek, and Creole studies is also reported, which demonstrates the influence that Geoff has had on scholars outside of English linguistics. The papers in this book - all given at the Corpus Linguistics 2001 conference held in Geoff's honour - cover a wide range of topics and applications within corpus linguistics, including corpus building and annotation, lexicology, syntax, sociolinguistics, stylistics and pragmatics.

Bas Aarts/Evelien Keizer/Mariangela Spinillo/Sean Wallis: Which or what? A study of interrogative determiners in present-day English - Karin Aijmer: Discourse particles in contrast: the case of in fact and actually - Dawn Archer/Jonathan Culpeper: Sociopragmatic annotation: New directions and possibilities in historical corpus linguistics - Ylva Berglund/Oliver Mason: «But this formula doesn't mean anything...!?» - Douglas Biber/Susan Conrad/Viviana Cortes: Lexical bundles in speech and writing: an initial taxonomy - Raymond Hickey: Tracking lexical change in present-day English - Barbara Lewandowska-Tomaszczyk/Michael P. Oakes/Paul Rayson: Annotated Corpora for Assistance with English-Polish Translation - Stanley E. Porter/Matthew B. O'Donnell: Theoretical issues for corpus linguistics and the study of ancient languages - Helena Raumolin-Brunberg: Temporal aspects of language change: what can we learn from the CEEC? - Geoffrey Sampson: Reflections of a dendrographer - Hans-Jörg Schmid: Do women and men really live in different cultures? Evidence from the BNC - Mark Sebba/Susan Dray: Is it Creole, is it English, is it valid? Developing and using a corpus of unstandardised written language - Mick Short: A Corpus-based Approach to Speech, Thought and Writing Presentation - Jean Véronis: Sense tagging: does it make sense? - Anne Wichmann/Richard Cauldwell: Wh Questions and attitude: the effect of context.

Wilson, Rayson, McEnery (2003)
Wilson, A., Rayson, P. and McEnery, T. (eds.) (2003) A Rainbow of Corpora: Corpus Linguistics and the Languages of the World. Lincom-Europa, München. ISBN 3 89586 872 8. Linguistics Edition 40. 174 pp.

The aim of this volume is to showcase the range of corpus-based linguistic research currently being carried out on languages other than English.

The papers included report on work carried out on Arabic, Bulgarian, Czech, Dutch, French, German, Biblical Greek, Biblical Hebrew, Medieval Irish, Korean, Romanian and Swedish, including a number of regional and social variants. They also address a range of areas as diverse as corpus design, corpus annotation, register analysis, syntax, and quantitative linguistics.

The papers in this volume will leave the reader in no doubt that corpus-based research is now being conducted for a whole "rainbow of languages".

1. The PARIS 7 annotated corpus for French: some experimental results Anne Abeillé, Lionel Clément, Alexandra Kinyon, François Toussenel
2. Lexical frequency of contemporary Canadian French based on a large corpus Martin Beaudoin and Michel Simard
3. The Corpus of Electronic Texts: A digital lexicon of Medieval Irish and an Irish prosopography Beatrix Färber
4. A corpus of written Italian: a defined and a dynamic model R. Rossini Favretti, F. Tamburini and C. De Santis
5. A reusable corpus needs syntactic annotations: Prague Dependency Treebank Eva Hajičová and Petr Sgall
6. Variation across Korean text registers Beom-mo Kang, Hung-gyu Kim and Myung-hoe Huh
7. A tagset for the morphosyntactic tagging of Arabic Shereen Khoja, Roger Garside and Gerry Knowles
8. Tracing referent location in oral picture descriptions Maarten Lemmens
9. Pragmatic and discursive aspects of German modal particles: a corpus- based approach Martina Möllering
10. Investigating characteristic lexical distributions and grammatical patterning in Swedish texts translated from English P-O Nilsson
11. OpenText.org and the problems and prospects of working with ancient discourse Matthew Brook O.Donnell, Stanley E. Porter, and Jeffrey T. Reed
12. Syntactic change in Abidjanee French Katja Ploog
13. HPSG-based syntactic treebank of Bulgarian (BulTreeBank) Kiril Simov, Gergana Popova, Petya Osenova
14. Grammatical aspects of Corpus Linguistics: Design and usage of linear and hierarchical text databases Wolf-Dieter Syring
15. A corpus.based analysis of how accurately printed Romanian obeys to some universal laws Adriana Vlad, Adrian Mitrea and Mihai Mitrea

Routledge Advances in Corpus Linguistics
Editors: Tony McEnery (Lancaster University) and Michael Hoey (Liverpool University)

This book series is designed to provide an opportunity for researchers to publish research monographs in corpus linguistics with a major international publisher, Routledge. The series aims to publish new and challenging research reflecting on the methodology of corpus linguistics itself and/or on its application to specific areas of linguistics. No approach to corpus data is excluded from the series. Indeed the editors hope that, over time, the series will provide a useful forum in which methodological and theoretical differences related to corpus use can be fruitfully debated.

For further information see the publisher's site.

cover picture Leech, G., Rayson, P., and Wilson, A. (2001). Word Frequencies in Written and Spoken English: based on the British National Corpus. Longman, London. ISBN 0582-32007-0

For more details see the companion website.
Table of Contents:
  • CHAPTER 1: Frequencies in the Whole Corpus (Spoken and Written English)
  • CHAPTER 2: Spoken and Written English
  • CHAPTER 3: Two Main Varieties of Spoken English Compared
  • CHAPTER 4: Two Main Varieties of Written English Compared
  • CHAPTER 5: Rank Frequency Lists of Words within Word Classes (Parts of Speech) in the whole corpus
  • CHAPTER 6: Frequency Lists of Grammatical Word Classes (based on the Sampler Corpus)
cover picture Botley, S. P., McEnery, T., and Wilson, A. (eds.) (2000) Multilingual Corpora in Teaching and Research Rodopi, Amsterdam. ISBN 90-420-0541-6

Table of Contents:
  • Michael OAKES & Tony McENERY Chapter One: Bilingual text alignment - an overview 1
  • Michel SIMARD, George FOSTER, Marie-Louise HANNAN, Elliott MACKLOVITCH Pierre PLAMONDON Chapter Two: Bilingual text alignment: where do we draw the line? 38
  • Pernilla DANIELSSON and Daniel RIDINGS Chapter Three: Corpus and terminology: software for the translation program at Göteborgs Universitet or getting students to do the work 65
  • Carol PETERS, Eugenio PICCHI and Lisa BIAGINI Chapter Four: Parallel and comparable bilingual corpora in language teaching and learning 73
  • Renée MEYER, Mary Ellen OKUROWSKI and Thérèse HAND Chapter Five: Using authentic corpora and language tools for adult-centred learning 86
  • Jennifer PEARSON Chapter Six: Teaching terminology using electronic resources 92
  • Michael BARLOW Chapter Seven: Parallel texts in language teaching 106
  • David WOOLLS Chapter Eight: From purity to pragmatism; user-driven develop- ment of a multilingual parallel concordancer 116
  • Stig JOHANSSON and Knut HOFLAND Chapter Nine: The English-Norwegian parallel corpus: current work and new directions 134
  • Raphael SALKIE Chapter Ten: Unlocking the power of the SMEMUC 148
  • Josef SCHMIED and Barbara FINK Chapter Eleven. Corpus-based contrastive lexicology: the case of English with and its German translation equivalents 157
  • Tony McEnery Scott PIAO & Xu XIN Chapter Twelve: Parallel alignment in English and Chinese 177
  • INDEX 202
cover picture Wichmann, A., Fligelstone, S., McEnery, T., and Knowles, G. (eds.) (1997) Teaching and Language Corpora Longman, London. ISBN 0-582-27609-8

Corpora are well-established as a resource for language research; they are now also increasingly being used for reaching purposes. This book is the first of its kind to deal explicitly and in a wide-ranging way with the use of corpora in teaching. It contains an extensive collection of articles by corpus linguists and practising teachers, covering not only the use of data to inform and create teaching materials but also the direct exploitation of corpora by students, both in the study of linguistics in general and in the acquisition of proficiency in individual languages, including English, Welsh, German, French and Italian. In addition, the book offers practical information on the sources of corpora and concordances, including those suitable for work on non-roman scripts such as Greek and Cyrillic.

Teaching and Language Corpora is suitable for language teachers in higher secondary and tertiary education and applied linguistics specialists who have an interest in learner-centred approaches.

cover picture Garside, R., Leech, G., and McEnery, A. (eds.) (1997) Corpus Annotation: Linguistic Information from Computer Text Corpora Longman, London. ISBN 0582 29837 7 (pbk). pp 281.

This is the first book to survey the growing field of research known as corpus annotation. It is well known that the computer corpus - an electronic collection of texts, often of millions of words - has become a central resource for many aspects of linguistics, information technology and the processing of human language. Increasingly, it is seen as essential to annotate a corpus linguistically in order to successfully extract information from it. Annotation takes place at various levels, such as part-of-speech tagging, parsing, semantic tagging, and discourse annotation. These tasks are typically carried out by a combination of automatic and manual techniques, the automatic techniques often involving innovative probabilistic models of language. Annotation is not only a highly practical task: it also sheds new light on the nature of language and the most effective means of analysing it.

Corpus Annotation gives an up-to-date picture of this fascinating new area of research, and will provide essential reading for newcomers to the field as well as those already involved in corpus annotation. Early chapters introduce the different levels and techniques of corpus annotation. Later chapters deal with software developments, applications, and the development of standards for the evaluation of corpus annotation. While the book takes detailed account of research world-wide, its focus is particularly on the work of the UCREL (University Centre for Computer Corpus Research on Language) team at Lancaster University, which has been at the forefront of developments in the field of corpus annotation since its beginnings in the 1970s.

All three editors teach at Lancaster University, and are well known for their work in linguistics, computer science and corpus-based language research. Roger Garside is Senior Lecturer in the Department of Computing, Geoffrey Leech is Research Professor in English Linguistics, and Tony McEnery is Lecturer in Linguistics and Modern English Language.

  • Preface
  • 1. Introducing corpus annotation Geoffrey Leech
  • 2. Grammatical tagging Geoffrey Leech
  • 3. Syntactic annotation: treebanks Geoffrey Leech and Elizabeth Eyes
  • 4. Semantic annotation Andrew Wilson and Jenny Thomas
  • 5. Discourse annotation: anaphoric relations in corpora Roger Garside, Steve Fligelstone and Simon Botley
  • 6. Further levels of annotation Geoffrey Leech, Anthony McEnery and Martin Wynne
  • 7. A hybrid grammatical tagger: CLAWS4 Roger Garside and Nicholas Smith
  • 8. How to generalise the task of annotation Steve Fligelstone, Mike Pacey and Paul Rayson
  • 9. Improving a tagger Nicholas Smith
  • 10. Retargeting a tagger Fernando Sánchez León and Amalio F.Nieto-Serrano
  • 11. The use of syntactic annotation tools: partial and full parsing Jeremy Bateman, Jean Forrest, and Tim Willis
  • 12. Higher-level annotation tools Roger Garside and Paul Rayson
  • 13. A corpus/annotation toolbox Anthony McEnery and Paul Rayson
  • 14. A corpus-based grammar tool Anthony McEnery, John Paul Baker and John Hutchinson
  • 15. The exploitation of multilingual annotated corpora for term extraction Anthony McEnery, Jean-Marc Langé, Michael Oakes and Jean Véronis
  • 16. Cross-linguistic guidelines for the annotation of corpora Peter Kahrel, Ruthanna Barnett and Geoffrey Leech
  • 17. Consistency and accuracy in correcting automatically-tagged corpora John Paul Baker
  • Appendix I: Sources for further information (WWW and e-mail addresses)
  • Appendix II: Abbreviations and acronyms
  • Appendix III: Specimen annotation practices: the C7 and C5 tagsets
  • Bibliography
  • Index

cover picture McEnery, T., and Wilson, A. (1996) Corpus Linguistics (Edinburgh Textbooks in Empirical Linguistics Series). Edinburgh University Press, UK. pp.209. ISBN 0 7486 0482 0

This is the first undergraduate course-book for the teaching of a corpus-based approach to language and linguistics. It gives a step-by-step introduction to what a corpus is, how corpora are constructed, and what can be done with them. Each chapter ends with a section of study questions which contain practical corpus-based exercises. With an increased interest in the use of corpora, this book fills an urgent and increasing need for a suitable undergraduate coursebook on this fast-growing subject.
cover picture Thomas, J., and Short, M. (eds) (1996). Using corpora for language research: Studies in the Honour of Geoffrey Leech. Longman, London. 301pp + ix, ISBN 0582 248787 (Hbk) 0582 248779 (Pbk). [Published January 17th 1996, for Prof. Geoffrey Leech's 60th birthday]

Corpus linguistics is a relatively new subject in linguistics whereby corpora or collections of spoken or written texts are stored on computer in a tagged form and used for linguistic analysis. Up until now most corpus-based research has focused on grammar and lexicography, but as this new text shows, the corpus approach can be applied to a wide range of areas of language study including translation, stylistics, foreign language teaching and language testing. This book is in honour of Geoffrey Leech, a leading contributor to the field of corpus linguistics. It is an authoritative guide showing how to develop and use corpora for language research.

Using Corpora for Language Research is designed to be used by non-specialists in corpus work who have an intrest in language study. Written in a clear and accessible style, this important text will act as a catalyst for the use of the corpus approach in many areas of language research and teaching.

cover picture Leech, G., Myers, G. and Thomas, J. (eds.) (1995). Spoken English on Computer: Transcription, Mark-up and Application. London: Longman, pp.xii+260. [based on the proceedings of the ESRC-funded Lancaster Workshop on Computerized Spoken Discourse, Sept. 1993].

The computer analysis of corpora - large bodies of language data stored on computer - has rapidly emerged as a leading paradigm of linguistic research and is now becoming the basis for studying spoken language in many different applications. Both topical and timely, this book addresses the basic issues of how to represent spoken language on the computer. It also brings together for the first time contributions on particular applications of computerised spoken language, such as language pathology, sociolinguistics, lexicography, speech and language technology. The contributions are written by leading world experts in the field, including Wallace Chafe, Jane Edwards, Stig Johansson and John Sinclair.

Divided into three sections, each with an accessible editorial introduction, the book offers a wide coverage of the subject, combining theoretical, practical and descriptive issues and materials. It includes numerous detailed examples of different transcription schemes, together with samples of transcribed spoken data.

This book will be of value to postgraduate students, researchers and lecturers working in corpus linguistics, speech technology and English language studies, as well as undergraduate students of linguistics and linguists needing to know more about the subject generally.

Black, E., Garside, R., Leech, G. (eds) (1993). Statistically-driven computer grammars of English: The IBM/Lancaster approach. Amsterdam, Rodopi. pp248.

This book is about building computer programs that parse (analyze, or "diagram") sentence of a "real-world" English. The English we are concerned with might be a corpus of everyday, naturally-occurring prose, such as the entire text of this morning's newspaper. Most programs that now exist for this purpose are not very successful at finding the correct analysis for everyday sentences. In contrast, the programs described here make use of a more successful statistically-driven approach.

Our book is, first, a record of a five-year research collaboration between IBM and Lancaster University. Large numbers of real-world sentences were "fed into the memory" of a program for grammatical analysis (including a detailed grammar of English) and processed by statistical methods. The idea is to single out the correct parse, among all those offered by the grammar, on the basis of probabilities.

Second, this is a "how-to" book, showing how to build and implement a statistically-driven broad-coverage grammar of English. We even supply our own grammar, with the necessary statistical algorithms, and with the knowledge needed to prepare a very large set (or corpus) of sentences so that it can be used to guide the statistical processing of the grammar's rules.

cover picture Garside R., Leech G. and Sampson G. (eds) (1987) The Computational Analysis of English: A Corpus-based Approach. London: Longman.

Over the past five to ten years, a research team based mainly at the Universities and Lancaster and Leeds has been engaged in a distinctive method of analysing the English language by computer. This book, edited by three leading participants in that research, deals with its background, current achievements in grammatical analysis and practical applications.

The approach this book describes is termed "corpus-based" because it postulates that in order to program computers to process unrestricted human language, it is necessary to work with a large collection of computer-readable texts of varied kinds, that is a computer corpus, such as the Lancaster-Oslo/Bergen Corpus underlying much of the research reported here.

A large general corpus of English inevitably contains rare and exceptional usages, as well as normal usages. Such research therefore yields more detailed and adequate descriptions of a language, in terms of lexicon and grammar, than are available by other means, and also motivates new methods of language study, based on probabilistic reasoning. The probabilistic approach also leads to "robust" methods of language analysis. This methodology has varied applications in such areas as textual error detection and speech synthesis.