| Preface |   | viii | 
| Mariko Abe : A Corpus-based Contrastive Analysis of Spoken and Written Learner Corpora: The Case of Japanese-speaking Learners of English |   | 1 | 
| Aduriz I., Aranzabe M.J., Arriola J.M., Atutxa A., Díaz de Ilarraza A., Ezeiza N., Gojenola K., Oronoz M., Soroa A., and Urizar R.: Methodology and steps towards the construction of EPEC, a corpus of written Basque tagged at morphological and syntactic levels for the automatic processing |   | 10 | 
| Khurshid Ahmad, Pensiri Manomaisupat, David Cheng, Tugba Taskaya, Saif Ahmad, Lee Gillam, Andrew Hippisley: The mood of the (financial) markets: In a corpus of words and of pictures |   | 12 | 
| Sandra M. Aluísio, Gisele M. Pinheiro, Marcelo Finger, Maria das Graças V. Nunes, Stella E. O. Tagnin: The Lacio-Web Project: overview and issues in Brazilian Portuguese corpora creation |   | 14 | 
| Dawn Archer, Tony McEnery, Paul Rayson, Andrew Hardie: Developing an automated semantic analysis system for Early Modern English |   | 22 | 
| Dawn Archer, Andrew Hardie, Tony McEnery, Scott Piao: A corpus of seventeenth-century English news reportage: construction, encoding and applications |   | 32 | 
| Bertol Arrieta, Arantza Díaz de Ilarraza, Koldo Gojenola, Montse Maritxalar, Maite Oronoz: A database system for storing second language learner corpora |   | 33 | 
| Jørg Asmussen: Towards a methodology for corpus-based studies of linguistic change: Contrastive observations and their possible diachronic interpretations in the Korpus 2000 and Korpus 90 General Corpora of Danish |   | 42 | 
| Eric Atwell: A New Machine Learning Algorithm for Neoposy: coining new Parts of Speech |   | 43 | 
| Eric Atwell, Paul Gent, Julia Medori, Clive Souter: Detecting student copying in a corpus of science laboratory reports: simple and smart approaches |   | 48 | 
| Francis Henrik Aubert, Stella E. O. Tagnin: A Corpus of Sworn Translations – for linguistic and historical research |   | 54 | 
| Bogdan Babych, Anthony Hartley, Eric Atwell: Statistical modelling of MT output corpora for Information Extraction |   | 62 | 
| Paul Baker, Andrew Hardie, Tony McEnery, and Sri B.D. Jayaram: Constructing Corpora of South Asian Languages |   | 71 | 
| Federica Barbieri: The "new" quotatives in American English: A cross-register comparison |   | 81 | 
| Marco Baroni and Silvia Bernardini: A preliminary analysis of collocational differences in monolingual comparable corpora |   | 82 | 
| Sabine Bartsch: Investigating cross-linguistic constraints on the premodification of adjectival past participles and desubstantival adjectives. A corpus-based study of English and German |   | 92 | 
| Kate Beeching: Synchronic and diachronic variation: the how and why of sociolinguistic corpora. |   | 102 | 
| Luisa Bentivogli, Christian Girardi, Emanuele Pianta: The MEANING Italian Corpus |   | 103 | 
| Julie Carson-Berndsen, Ulrike Gut and Robert Kelly: Discovering regularities in non-native speech |   | 113 | 
| P. Beust, S. Ferrari, V. Perlerin: NLP model and tools for detecting and interpreting metaphors in domain-specific corpora |   | 114 | 
| Philippe Blache, Marie-Laure Guénot and Tristan van Rullen: A corpus-based technique for grammar development |   | 124 | 
| Birte Bös: Towards an integrated model of service encounters |   | 132 | 
| Roderick Bovingdon and Angelo Dalli: Statistical analysis of the source origin of Maltese |   | 140 | 
| Lou Burnard, Tony Dodd: Xara: an XML aware tool for corpus searching |   | 142 | 
| Marianna N. Christou: Expressions and structures of the delexical verb KANΩ [“MAKE” /  “DO”] in Modern Greek language: A corpus-based approach to newspaper articles |   | 145 | 
| Ken Cosh and Pete Sawyer: Using natural language processing tools to assist semiotic analysis of information systems |   | 155 | 
| H. Cunningham, V. Tablan, K. Bontcheva, M. Dimitrov: Language engineering tools for collaborative corpus annotation |   | 165 | 
| Mark Davies: Annotation without lexicons: an alternative to the standard bootstrapping approach |   | 174 | 
| Joost van de Weijer: Consonant variation within words |   | 184 | 
| Debbie Elliott, Anthony Hartley and Eric Atwell: Rationale for a multilingual corpus for machine translation evaluation |   | 191 | 
| John Elliott and Debbie Elliott: The Human Language Chorus Corpus (HULCC) |   | 201 | 
| Jens Fauth, Hans-Jörg Schmid: Detecting gender-preferential patterns of linguistic features in face-to-face communication |   | 211 | 
| Valéria D. Feltrim, Sandra M. Aluísio, Maria das Graças V. Nunes: Analysis of the rhetorical structure of computer science abstracts in Portuguese |   | 212 | 
| Katerina T. Frantzi: Updating LSP dictionaries with collocational information |   | 219 | 
| Robert Gaizauskas, Lou Burnard, Paul Clough and Scott Piao: Using the XARA XML-Aware Corpus Query Tool to Investigate the METER Corpus |   | 227 | 
| Ana Llinares García: Repetition and young learners´ initiations in the L2: a corpus-driven analysis |   | 237 | 
| Sandrine Garnier, Youhanizou Tall, Sisay Fissaha, Johann Haller: Learner Corpora: Design, Development and Applications - Development of NLP tools for CALL based on learner corpora (German as a foreign language) |   | 246 | 
| Sara Gesuato: The company women and men keep: what collocations can reveal about culture |   | 253 | 
| Vojko Gorjanc: Tracking lexical changes in the reference corpus of Slovene texts |   | 263 | 
| Stefan Grondelaers, Dirk Speelman, Dirk Geeraerts: A corpus-based approach to informality: the case of Internet chat |   | 264 | 
| Leif Grönqvist and Magnus Gunnarsson : A method for finding word clusters in spoken language |   | 265 | 
| Xiaotian Guo: Between Verbs and Nouns and Between the Base Form and the Other Forms of Verbs – A Contrastive Study into COLEC and LOCNESS |   | 274 | 
| Le An Ha: A method for word segmentation in Vietnamese |   | 282 | 
| Silvia Hansen-Schirra: Linguistic enrichment and exploitation of the Translational English Corpus |   | 288 | 
| Andrew Hardie: Developing a tagset for automated part-of-speech tagging in Urdu |   | 298 | 
| Nigel Harwood: Personal pronouns and academic writing: a multidisciplinary corpus-based critical pragmatic approach to EAP |   | 308 | 
| Laura Hasler, Constantin Orasan and Ruslan Mitkov: Building better corpora for summarisation |   | 309 | 
| Chris Heffer: Not KWIC but Quick: KeyWords in Court |   | 319 | 
| Kris Heylen and Dirk Speelman: A corpus-based analysis of word order variation: The order of verb arguments in the German middle field |   | 320 | 
| Knut Hofland: A web-based concordance system for spoken language corpora |   | 330 | 
| Shelley Ching-yu Hsieh: The Corpus of Mandarin Chinese and German Animal Expressions |   | 332 | 
| Susan Hunston: Frame, phrase or function: a comparison of frame semantics and local grammars |   | 342 | 
| Emi Izumi,  Toyomi Saiga,  Thepchai Supnithi, Kiyotaka Uchimoto,  Hitoshi Isahara: The development of the spoken corpus of Japanese learner English and the applications in collaboration with NLP techniques |   | 359 | 
| Inés Jacob, Joseba Abaitua, Josu Gómez: Automatic feeding of translation memory tools |   | 367 | 
| Steven Jones, M. Lynne Murphy: Antonymy in Childhood: a corpus-based approach to acquisition |   | 372 | 
| Randall L. Jones: An Analysis of Lexical Text Coverage in Contemporary German |   | 373 | 
| Stig W. Jørgensen, Carsten Hansen, Jette Drost, Dorte Haltrup, Anna Braasch, Sussi Olsen: Domain specific corpus building and lemma selection in a computational lexicon |   | 374 | 
| Tomoko Kaneko: How non-native speakers express anger, surprise, anxiety and grief: a corpus-based comparative study |   | 384 | 
| Sachie Karasawa: Patterns of elaboration and interlanguage development: an exploratory corpus analysis of college student essays |   | 394 | 
| Hannah Kermes, Stefan Evert: Text analysis meets corpus linguistics  |   | 402 | 
| Adam Kilgarriff: Linguistic Search Engine |   | 412 | 
| Paul Kingsbury: A methodology for inducing a chronology of the Pä li Canon |   | 413 | 
| Gerry Knowles, Zuraidah Mohd Don: Tagging a corpus of Malay texts, and coping with 'syntactic drift' |   | 422 | 
| Natalie Kübler and Cécile Frérot: Verbs in specialised corpora: from manual corpus-based description to automatic extraction in an English-French parallel corpus |   | 429 | 
| Toshihiko Kubota: A Study on Abridgement for Spoken Word Titles |   | 439 | 
| David YW Lee: Spoken Academic Lexicogrammar and Discourse Patterns |   | 440 | 
| Geoffrey Leech, Martin Weisser: Generic speech act annotation for task-oriented dialogues |   | 441 | 
| Agnieszka Lenko-Szymanska: The curse and the blessing of mobile phones - a corpus-based study into Polish and American rhetoric strategies |   | 447 | 
| Robert Liebscher and David Groppe: Rethinking context availability for concrete and abstract words: a corpus study |   | 449 | 
| Laura Löfberg, Dawn Archer, Scott Piao, Paul Rayson, Tony McEnery, Krista Varantola, Jukka-Pekka Juntunen: Porting an English semantic tagger to the Finnish language |   | 457 | 
| Nadine Lucas, Bruno Crémilleux, Leny Turmel : Signalling well-written academic articles in an English corpus by text mining techniques |   | 465 | 
| Anke Lüdeling and Stefan Evert: Linguistic experience and productivity: corpus evidence for fine-grained distinctions  |   | 475 | 
| Michaela Mahlberg: High frequency nouns in English: aspects of a grammatical description |   | 484 | 
| Belinda Maia: Constructing comparable and parallel corpora for terminology extraction - work in progress |   | 485 | 
| Manolis Maragoudakis, Katia Kermanidis and Nikos Fakotakis: Towards a Bayesian Stochastic Part-Of-Speech and Case Tagger of Natural Language Corpora |   | 486 | 
| Kevin Mark: Learner corpus building and a ‘living’ university foreign language curriculum |   | 496 | 
| Tony McEnery, Zhonghua Xiao: Fuck revisited |   | 504 | 
| Dan McIntyre, Carol Bellard-Thomson, John Heywood, Tony McEnery, Elena Semino and Mick Short: The Construction of a Corpus to Investigate the Presentation of Speech, Thought and Writing in Written and Spoken British English |   | 513 | 
| John McKenny: Seeing the wood and the trees: Reconciling findings from discourse and lexical analysis |   | 523 | 
| Magnus Merkel, Michael Petterstedt and Lars Ahrenberg: Interactive Word Alignment for Corpus Linguistics |   | 533 | 
| José María Guirao Miras Ana González Ledesma, Guillermo de la Madrid Heitzmann, Manuel Alcántara Plá, Antonio Moreno Sandoval: Relating lexical items to sociolinguistic features in a spontaneous speech corpus of Spanish |   | 543 | 
| Juan M. Montero and M. Mar Duque: ANESTTE: a writer’s assistant for a specific purpose language |   | 544 | 
| Olga Moudraia: The Student Engineering Corpus: Analysing Word Frequency |   | 552 | 
| JoAnne Neff, Francisco  Ballesteros, Emma Dafouz, Francisco Martínez, Juan-Pedro Rica: Formulating Writer Stance: A Contrastive Study of EFL Learner Corpora |   | 562 | 
| Diane Nicholls: The Cambridge Learner Corpus - error coding and analysis for lexicography and ELT |   | 572 | 
| Judy Noguchi, Thomas Orr, Yukio Tono: Using a dedicated corpus to identify features of professional English usage: What do “we” do in science journal articles? |   | 582 | 
| Attila Novák, Viktor Nagy, Csaba Oravecz: Corpus assisted development of a Hungarian morphological analyser and guesser |   | 583 | 
| Toshifumi Oba and Eric Atwell: Using the HTK speech recogniser to analyse prosody in a corpus of German spoken learners’ English |   | 591 | 
| Marija Omazic: THE METACOMMUNICATIVE SETTING OF PHRASEOLOGICAL UNITS AND THEIR MODIFICATIONS – EVIDENCE FROM THE BRITISH NATIONAL CORPUS |   | 599 | 
| Nelleke Oostdijk: Corpus linguistics meets language technology: deep syntactic parsing for question answering |   | 603 | 
| Maeve Paris: Extending computer-assisted text analysis techniques to the detection of source code plagiarism and collusion: assisting manual inspection |   | 611 | 
| Núria Gala Pavia, Salah Aït-Mokhtar: Lexicalising a robust parser grammar using the WWW |   | 620 | 
| Julien Perrez and Liesbeth Degand: On the combination of corpus-based and experimental methodologies in the study of causal, contrastive and metadiscourse connectives in L1 and L2 text comprehension and production |   | 627 | 
| Scott S.L. Piao and Tony McEnery: A Tool for Text Comparison |   | 637 | 
| James Pustejovsky, Patrick Hanks, Roser Saurí, Andrew See, Robert Gaizauskas, Andrea Setzer, Dragomir Radev, Beth Sundheim, David Day, Lisa Ferro and Marcia Lazo: The TIMEBANK Corpus |   | 647 | 
| Andrew Roberts and Eric Atwell: The use of corpora for automatic evaluation of grammar inference systems |   | 657 | 
| Juhani Rudanko: More on horror aequi: evidence from large corpora |   | 662 | 
| Sarah Rule, Emma Marsden, Florence Myles, Rosamond Mitchell: Constructing a database of French interlanguage oral corpora |   | 669 | 
| Geoffrey Sampson: Are we nearly there yet, Mum? |   | 678 | 
| Hans-Jörg Schmid, Jens Fauth: Women's and men's style: fact or fiction? New grammatical evidence |   | 679 | 
| Serge Sharoff: Methods and tools for development of the Russian Reference Corpus |   | 680 | 
| Bayan Abu Shawar and Eric Atwell : Using dialogue corpora to train a chatbot  |   | 681 | 
| Gerardo Sierra, Alfonso Medina, Rodrigo Alarcón, César A. Aguilar: Towards the Extraction of Conceptual Information from Corpora |   | 691 | 
| Kiril Simov, Alexander Simov, Milen Kouylekov: Constraints for corpora development and validation |   | 698 | 
| Milena Slavcheva: Corpus shallow parsing: meeting point between paradigmatic knowledge encoding  |   | 706 | 
| Nicholas Smith: A quirky progressive? A corpus-based exploration of the will + be + -ing construction in
recent and present day British English. |   | 714 | 
| Harold Somers: Some Issues in the Mark-up of Handwriting in a Learner Corpus |   | 724 | 
| Dirk Speelman, Stefan Grondelaers, Dirk Geeraerts: A profile-based calculation of region and register variation: the synchronic and diachronic status of the national variants of Dutch |   | 733 | 
| Somayajulu G. Sripada and Ehud Reiter and Jim Hunter and Jin Yu: Exploiting a parallel TEXT - DATA corpus |   | 734 | 
| Asa M. Stepak: A proposed mathematical theory explaining the sequence of grammatical categories  |   | 744 | 
| Petra Storjohann: The lexicographic use of corpora and computational tools for disambiguation |   | 754 | 
| Jozsef Szakos: Cultures and Corpora: Extracting Anthropological Information from Corpora of Formosan Endangered Languages |   | 763 | 
| Jun Arata Takahashi : Do we talk (or write?) differently over the Net?- A lexical enquiry into ‘a’ Net-EN - |   | 764 | 
| Kaoru Takahashi: A Study of Text Types and Register Variation in the British National Corpus |   | 773 | 
| Yuri Tambovtsev: The Structure of the Consonant Patterns in the Spanish Speech Sound Chain as a Clue of Typological Closeness |   | 774 | 
| Yuri Tambovtsev: Phonological similarity between Basque and other world languages based on the frequency of occurrence of certain typological consonantal features |   | 775 | 
| Tess Yu-Shan Ke, Liang-Feng Chen, Chien-Chung Chen: Investigation on the uses of temporal subordinators by NS and NNS in academic spoken English |   | 780 | 
| Carole Tiberius, Dunstan Brown, Greville Corbett: Ambiguity in Russian Morphology |   | 790 | 
| Juhani Toivanen, Tapio Seppänen, Eero Väyrynen: Creation and utilisation of the MediaTeam Emotional Speech Corpus |   | 791 | 
| Yukio Tono: Learner corpora: design, development and applications |   | 800 | 
| Montserrat Civit Torruella, Mª Antònia Martí Antonín, Lluís Padró Cirera : Using hybrid probabilistic-linguistic knowledge to improve pos-tagging performance |   | 810 | 
| Patrick Tschorn, Anke Lüdeling: Morphological knowledge and alignment of English-German parallel corpora |   | 818 | 
| Francesca Vaghi, Marco Venuti: The Economist and The Financial Times. A study of movement metaphors |   | 828 | 
| Bertus van Rooy and Lande Schäfer: An evaluation of three POS taggers for the tagging of the Tswana Learner English Corpus |   | 835 | 
| Tamás Váradi: Shallow parsing of Hungarian business news |   | 845 | 
| Isabel Verdaguer and Anna Poch: Collocational and colligational patterns in lexical sets: A corpus-based study |   | 852 | 
| Maria Verde: Shedding light on SHED, CAST and THROW as nodes of extended lexical units |   | 859 | 
| Shih-Ping Wang: Mutual information and corpus-based approaches to reduplicative fixed expressions |   | 869 | 
| Julie Weeds and David Weir: Finding and evaluating sets of nearest neighbours |   | 879 | 
| David Wible, Ping-Yu Huang: Using learner corpora to examine L2 acquisition of tense-aspect markings |   | 889 | 
| Sandra Williams and Ehud Reiter: A corpus analysis of discourse relations for Natural Language Generation |   | 899 | 
| Andrew Wilson, Celia Worth: Building and annotating corpora of spoken Welsh and Gaelic |   | 909 | 
| Andrew Wilson, Celia Worth: Conceptual Glossaries of the Latin Vulgate Bible |   | 918 | 
| Andrew Wilson, Olga Moudraia: Quantitative or Qualitative Content Analysis?  Experiences from a cross-cultural comparison of female students' attitudes to shoe fashions in Germany, Poland and Russia |   | 919 | 
| Martin Wynne, Rowan Wilson, Ylva Berglund: Virtual Corpora at the Oxford Text Archive |   | 920 | 
| Yang Xiaojun: Survey and Prospect of China’s Corpus-Based Researches |   | 930 | 
| Debra Ziegeler, Sarah Lee: Analysing a Corpus-based Semantic Investigation of English Dialects |   | 931 | 
| Heike Zinsmeister, Ulrich Heid: Identifying predicatively used adverbs by means of a statistical grammar model |   | 932 |