UCREL Technical Papers

UCREL publishes a series of fully-refereed Technical Papers, under the general editorship of Andrew Wilson and Tony McEnery. These papers fall into two categories: (1) articles dealing with corpora and computational linguistics and (2) corpus manuals.

Electronic versions of some of the volumes are available here for free download as PDF format. Simply click on the appropriate PDF icon. Prices refer to hardcopy by post. You can download the free Acrobat Reader from Adobe.

Current List of Titles:

  1. PDF Database Design for Corpus Storage: The ET10-63 Data Model. Tony McEnery and Beatrice Daille. 1993. £2.50
  2. PDF Corpora and Translation: Uses and Future Prospects. Tony McEnery and Andrew Wilson. 1993. £2.50
  3. PDF Towards an Integration of Content Analysis and Discourse Analysis: The Automatic Linkage of Key Relations in Text. Andrew Wilson. 1993. £2.50
  4. Special Issue. Corpora in Language Education and Research: A Selection of Papers from Talc94. Edited by Andrew Wilson and Tony McEnery. £10.00
  5. PDF Combined Approach for Terminology Extraction: Lexical Statistics and Linguistic Filtering. Béatrice Daille. 1995. £3.50
  6. PDF 'Only Connect'. Critical Discourse Analysis and Corpus Linguistics. Gerlinde Hardt-Mautner. 1995. £2.50
  7. The Evaluation of Multiple Post-Editors: Inter-Rater Consistency in Correcting Automatically Tagged Data. John Paul Baker. 1995. £2.50
  8. Special Issue. Approaches to Discourse Anaphora: Proceedings of the Discourse Anaphora and Resolution Colloquium (DAARC96), edited by Simon Botley, Julia Glass, Tony McEnery and Andrew Wilson. £40.00.
  9. Special Issue. Proceedings of Teaching and Language Corpora 1996. (TALC96) eds. Simon Botley, Julia Glass, Tony McEnery and Andrew Wilson. 280 pages. 1996. ISBN 186220 013 1. £40.00
  10. A Study of Text Typology: Multi-Feature and Multi-Dimensional Analyses. Kaoru Takahashi. 1997. 60 pages. ISBN 1 86220 035 1, £4.50
  11. Special Issue. New Approaches to Discourse Anaphora: Proceedings of the Second Colloquium on Discourse Anaphora and Anaphor Resolution (DAARC2), edited by Simon Botley and Tony McEnery. 1998. £40.00.
  12. Special Issue. Proceedings of the Discourse Anaphora and Reference Resolution Conference (DAARC2000), edited by Paul Baker, Andrew Hardie, Tony McEnery and Anna Siewierska. 2000. £40.00.
  13. Special issue. Proceedings of the Corpus Linguistics 2001 conference, edited by Paul Rayson, Andrew Wilson, Tony McEnery, Andrew Hardie and Shereen Khoja. ISBN 1 86220 107 2. 2001. £40.00. This is also available as a CDRom (containing PDF versions of the hardcopy) for £6.00. [The contents pages are available as a PDF file. The full CL2001 proceedings are available on the UCREL website. Also please see the conference website for proceedings errata.]
  14. Special issue. Proceedings of the Workshop on Corpus-Based and Processing Approaches to Figurative Language. Held in conjunction with Corpus Linguistics 2001 edited by John Barnden, Mark Lee and Katja Markert ISBN 1 86220 108 0. 2001. £10.00
  15. PDF The prosody of Please-requests: a corpus based approach. Anne Wichmann 2002. 24 pages. £3.50
  16. Special issue. Proceedings of the Corpus Linguistics 2003 conference, edited by Dawn Archer, Paul Rayson, Andrew Wilson and Tony McEnery. ISBN 1 86220 131 5. 2003. £45.00. This is also available as a CDRom (containing PDF versions of the hardcopy) for £10.00. [The contents pages are available as a PDF file. Please see the conference website for proceedings errata. The full proceedings are available online.]
  17. Special issue. Proceedings of the The Workshop on Shallow Processing of Large Corpora (SProLaC 2003) held in conjunction with the Corpus Linguistics 2003 conference, edited by Kiril Simov and Petya Osenova. ISBN 1-86220-134-X 2003. £5.00. This volume is also available electronically via the organiser's website. The cover and contents page are also included there.
  18. PDF Special issue. Proceedings of the Interdisciplinary Workshop on Corpus-Based Approaches to Figurative Language held in conjunction with the Corpus Linguistics 2003 conference, edited by John Barnden, Sheila Glasbey, Mark Lee, Katja Markert and Alan Wallington. ISBN 1-86220-147-1 2003. £5.00.

Printed and bound copies of these Technical Papers can be obtained, at the prices listed (plus postage), from:

Technical Papers
c/o Paul Rayson
Director of UCREL
Computing Department
Infolab21, South Drive,
Lancaster University
LANCASTER, LA1 4WA
United Kingdom

Email: paul at comp.lancs.ac.uk
Tel: +44 1524 510357
Fax: +44 1524 510492

Cheques in UK currency should be made payable to "Lancaster University". Some volumes are not out-of-print. We will aim to provide electronic versions instead, if this is suitable.

Please contact the same address given above for up-to-date details of titles and availability, or e-mail, in this regard only, to one of the general editors: Andrew Wilson or Tony McEnery. or WATCH THIS SPACE!


Abstracts

Gerlinde Hardt-Mautner. 'Only Connect'. Critical Discourse Analysis and Corpus Linguistics. 1995.

The methodology traditionally used in critical discourse analysis (CDA) is mainly qualitative and hence unwieldy to apply to larger corpora. Standard forms of quantification, on the other hand, involve elaborate classification and coding procedures that destroy the coherence of the original discourse. On the basis of examples from a research project dealing with newspaper language, this paper assesses the potential of concordancing as a research tool for CDA. It is argued that computer-aided analysis enables researchers to blend together qualitative and quantitative views of the data, and to look at a more representative corpus than they can when working manually.

John Paul Baker. The Evaluation of Multiple Post-Editors: Inter-Rater Consistency in Correcting Automatically Tagged Data. 1995.

The experiment investigated the hypothesis that using human post-editors to check automatically tagged corpora would introduce inconsistencies in the data. Nine experienced post-editors were given sentences of written and spoken data, which had previously been tagged by CLAWS, and asked to remove errors from the output. Once ambivalent words had been removed from the data, mean rater accuracy was found to be higher than the accuracy of CLAWS output (98.7% to 95.3%), while overall consistency between post-editors was 98%. As a result of the experiment, ambivalent cases were resolved through the incorporation of new guidelines. It was also found that if subjects made a slip, it would be highly likely to involve substituting or leaving a noun tag in the place of the correct tag.

Kaoru Takahashi. A Study of Text Typology: Multi-Feature and Multi-Dimensional Analyses. 1997.

This paper is concerned with text typology. The LOB Corpus, which is a million-word collection of British English texts, is addressed for the study of characterizing text types and identifying linguistic characteristics in each text type. By means of multivariate analysis, the variation of the occurrence of the assigned linguistic features among genre categories yields the classification and systematization of genre categories, and also makes it explicit to specify the characteristics of linguistic features among classified groups. The criteria of the classification are exclusively based on the dimensions which are statistically revealed by the multivariate analysis, and afterwards the groupings are interpreted linguistically. As a result of the analysis, two main dimensions, i.e., ``narrative versus non-narrative concern" and ``specification of content versus generalization of content" enable the classification of three groups among genre categories in the LOB Corpus.

As the second stage of this paper, focussing on the tag sequences in the LOB Corpus, the research on text types shifts to the syntactic level. This is carried out by a similar statistical methodology, whereby the syntactic distinction between contrastive linguistic groups, i.e., fiction and exposition is made explicit.

Lastly, I touch upon discourse analysis. The linguistic features concerning semantics, e.g., proper nouns, common nouns etc., enable more sophisticated classification of text types macroscopically.

This paper concludes with a future plan of research concerning a multi-feature and multi-dimensional approach.


UCREL LOGO