Preface |  | viii |
Mariko Abe : A Corpus-based Contrastive Analysis of Spoken and Written Learner Corpora: The Case of Japanese-speaking Learners of English |  | 1 |
Aduriz I., Aranzabe M.J., Arriola J.M., Atutxa A., Díaz de Ilarraza A., Ezeiza N., Gojenola K., Oronoz M., Soroa A., and Urizar R.: Methodology and steps towards the construction of EPEC, a corpus of written Basque tagged at morphological and syntactic levels for the automatic processing |  | 10 |
Khurshid Ahmad, Pensiri Manomaisupat, David Cheng, Tugba Taskaya, Saif Ahmad, Lee Gillam, Andrew Hippisley: The mood of the (financial) markets: In a corpus of words and of pictures |  | 12 |
Sandra M. Aluísio, Gisele M. Pinheiro, Marcelo Finger, Maria das Graças V. Nunes, Stella E. O. Tagnin: The Lacio-Web Project: overview and issues in Brazilian Portuguese corpora creation |  | 14 |
Dawn Archer, Tony McEnery, Paul Rayson, Andrew Hardie: Developing an automated semantic analysis system for Early Modern English |  | 22 |
Dawn Archer, Andrew Hardie, Tony McEnery, Scott Piao: A corpus of seventeenth-century English news reportage: construction, encoding and applications |  | 32 |
Bertol Arrieta, Arantza Díaz de Ilarraza, Koldo Gojenola, Montse Maritxalar, Maite Oronoz: A database system for storing second language learner corpora |  | 33 |
Jørg Asmussen: Towards a methodology for corpus-based studies of linguistic change: Contrastive observations and their possible diachronic interpretations in the Korpus 2000 and Korpus 90 General Corpora of Danish |  | 42 |
Eric Atwell: A New Machine Learning Algorithm for Neoposy: coining new Parts of Speech |  | 43 |
Eric Atwell, Paul Gent, Julia Medori, Clive Souter: Detecting student copying in a corpus of science laboratory reports: simple and smart approaches |  | 48 |
Francis Henrik Aubert, Stella E. O. Tagnin: A Corpus of Sworn Translations – for linguistic and historical research |  | 54 |
Bogdan Babych, Anthony Hartley, Eric Atwell: Statistical modelling of MT output corpora for Information Extraction |  | 62 |
Paul Baker, Andrew Hardie, Tony McEnery, and Sri B.D. Jayaram: Constructing Corpora of South Asian Languages |  | 71 |
Federica Barbieri: The "new" quotatives in American English: A cross-register comparison |  | 81 |
Marco Baroni and Silvia Bernardini: A preliminary analysis of collocational differences in monolingual comparable corpora |  | 82 |
Sabine Bartsch: Investigating cross-linguistic constraints on the premodification of adjectival past participles and desubstantival adjectives. A corpus-based study of English and German |  | 92 |
Kate Beeching: Synchronic and diachronic variation: the how and why of sociolinguistic corpora. |  | 102 |
Luisa Bentivogli, Christian Girardi, Emanuele Pianta: The MEANING Italian Corpus |  | 103 |
Julie Carson-Berndsen, Ulrike Gut and Robert Kelly: Discovering regularities in non-native speech |  | 113 |
P. Beust, S. Ferrari, V. Perlerin: NLP model and tools for detecting and interpreting metaphors in domain-specific corpora |  | 114 |
Philippe Blache, Marie-Laure Guénot and Tristan van Rullen: A corpus-based technique for grammar development |  | 124 |
Birte Bös: Towards an integrated model of service encounters |  | 132 |
Roderick Bovingdon and Angelo Dalli: Statistical analysis of the source origin of Maltese |  | 140 |
Lou Burnard, Tony Dodd: Xara: an XML aware tool for corpus searching |  | 142 |
Marianna N. Christou: Expressions and structures of the delexical verb KANΩ [“MAKE” / “DO”] in Modern Greek language: A corpus-based approach to newspaper articles |  | 145 |
Ken Cosh and Pete Sawyer: Using natural language processing tools to assist semiotic analysis of information systems |  | 155 |
H. Cunningham, V. Tablan, K. Bontcheva, M. Dimitrov: Language engineering tools for collaborative corpus annotation |  | 165 |
Mark Davies: Annotation without lexicons: an alternative to the standard bootstrapping approach |  | 174 |
Joost van de Weijer: Consonant variation within words |  | 184 |
Debbie Elliott, Anthony Hartley and Eric Atwell: Rationale for a multilingual corpus for machine translation evaluation |  | 191 |
John Elliott and Debbie Elliott: The Human Language Chorus Corpus (HULCC) |  | 201 |
Jens Fauth, Hans-Jörg Schmid: Detecting gender-preferential patterns of linguistic features in face-to-face communication |  | 211 |
Valéria D. Feltrim, Sandra M. Aluísio, Maria das Graças V. Nunes: Analysis of the rhetorical structure of computer science abstracts in Portuguese |  | 212 |
Katerina T. Frantzi: Updating LSP dictionaries with collocational information |  | 219 |
Robert Gaizauskas, Lou Burnard, Paul Clough and Scott Piao: Using the XARA XML-Aware Corpus Query Tool to Investigate the METER Corpus |  | 227 |
Ana Llinares García: Repetition and young learners´ initiations in the L2: a corpus-driven analysis |  | 237 |
Sandrine Garnier, Youhanizou Tall, Sisay Fissaha, Johann Haller: Learner Corpora: Design, Development and Applications - Development of NLP tools for CALL based on learner corpora (German as a foreign language) |  | 246 |
Sara Gesuato: The company women and men keep: what collocations can reveal about culture |  | 253 |
Vojko Gorjanc: Tracking lexical changes in the reference corpus of Slovene texts |  | 263 |
Stefan Grondelaers, Dirk Speelman, Dirk Geeraerts: A corpus-based approach to informality: the case of Internet chat |  | 264 |
Leif Grönqvist and Magnus Gunnarsson : A method for finding word clusters in spoken language |  | 265 |
Xiaotian Guo: Between Verbs and Nouns and Between the Base Form and the Other Forms of Verbs – A Contrastive Study into COLEC and LOCNESS |  | 274 |
Le An Ha: A method for word segmentation in Vietnamese |  | 282 |
Silvia Hansen-Schirra: Linguistic enrichment and exploitation of the Translational English Corpus |  | 288 |
Andrew Hardie: Developing a tagset for automated part-of-speech tagging in Urdu |  | 298 |
Nigel Harwood: Personal pronouns and academic writing: a multidisciplinary corpus-based critical pragmatic approach to EAP |  | 308 |
Laura Hasler, Constantin Orasan and Ruslan Mitkov: Building better corpora for summarisation |  | 309 |
Chris Heffer: Not KWIC but Quick: KeyWords in Court |  | 319 |
Kris Heylen and Dirk Speelman: A corpus-based analysis of word order variation: The order of verb arguments in the German middle field |  | 320 |
Knut Hofland: A web-based concordance system for spoken language corpora |  | 330 |
Shelley Ching-yu Hsieh: The Corpus of Mandarin Chinese and German Animal Expressions |  | 332 |
Susan Hunston: Frame, phrase or function: a comparison of frame semantics and local grammars |  | 342 |
Emi Izumi, Toyomi Saiga, Thepchai Supnithi, Kiyotaka Uchimoto, Hitoshi Isahara: The development of the spoken corpus of Japanese learner English and the applications in collaboration with NLP techniques |  | 359 |
Inés Jacob, Joseba Abaitua, Josu Gómez: Automatic feeding of translation memory tools |  | 367 |
Steven Jones, M. Lynne Murphy: Antonymy in Childhood: a corpus-based approach to acquisition |  | 372 |
Randall L. Jones: An Analysis of Lexical Text Coverage in Contemporary German |  | 373 |
Stig W. Jørgensen, Carsten Hansen, Jette Drost, Dorte Haltrup, Anna Braasch, Sussi Olsen: Domain specific corpus building and lemma selection in a computational lexicon |  | 374 |
Tomoko Kaneko: How non-native speakers express anger, surprise, anxiety and grief: a corpus-based comparative study |  | 384 |
Sachie Karasawa: Patterns of elaboration and interlanguage development: an exploratory corpus analysis of college student essays |  | 394 |
Hannah Kermes, Stefan Evert: Text analysis meets corpus linguistics |  | 402 |
Adam Kilgarriff: Linguistic Search Engine |  | 412 |
Paul Kingsbury: A methodology for inducing a chronology of the Pä li Canon |  | 413 |
Gerry Knowles, Zuraidah Mohd Don: Tagging a corpus of Malay texts, and coping with 'syntactic drift' |  | 422 |
Natalie Kübler and Cécile Frérot: Verbs in specialised corpora: from manual corpus-based description to automatic extraction in an English-French parallel corpus |  | 429 |
Toshihiko Kubota: A Study on Abridgement for Spoken Word Titles |  | 439 |
David YW Lee: Spoken Academic Lexicogrammar and Discourse Patterns |  | 440 |
Geoffrey Leech, Martin Weisser: Generic speech act annotation for task-oriented dialogues |  | 441 |
Agnieszka Lenko-Szymanska: The curse and the blessing of mobile phones - a corpus-based study into Polish and American rhetoric strategies |  | 447 |
Robert Liebscher and David Groppe: Rethinking context availability for concrete and abstract words: a corpus study |  | 449 |
Laura Löfberg, Dawn Archer, Scott Piao, Paul Rayson, Tony McEnery, Krista Varantola, Jukka-Pekka Juntunen: Porting an English semantic tagger to the Finnish language |  | 457 |
Nadine Lucas, Bruno Crémilleux, Leny Turmel : Signalling well-written academic articles in an English corpus by text mining techniques |  | 465 |
Anke Lüdeling and Stefan Evert: Linguistic experience and productivity: corpus evidence for fine-grained distinctions |  | 475 |
Michaela Mahlberg: High frequency nouns in English: aspects of a grammatical description |  | 484 |
Belinda Maia: Constructing comparable and parallel corpora for terminology extraction - work in progress |  | 485 |
Manolis Maragoudakis, Katia Kermanidis and Nikos Fakotakis: Towards a Bayesian Stochastic Part-Of-Speech and Case Tagger of Natural Language Corpora |  | 486 |
Kevin Mark: Learner corpus building and a ‘living’ university foreign language curriculum |  | 496 |
Tony McEnery, Zhonghua Xiao: Fuck revisited |  | 504 |
Dan McIntyre, Carol Bellard-Thomson, John Heywood, Tony McEnery, Elena Semino and Mick Short: The Construction of a Corpus to Investigate the Presentation of Speech, Thought and Writing in Written and Spoken British English |  | 513 |
John McKenny: Seeing the wood and the trees: Reconciling findings from discourse and lexical analysis |  | 523 |
Magnus Merkel, Michael Petterstedt and Lars Ahrenberg: Interactive Word Alignment for Corpus Linguistics |  | 533 |
José María Guirao Miras Ana González Ledesma, Guillermo de la Madrid Heitzmann, Manuel Alcántara Plá, Antonio Moreno Sandoval: Relating lexical items to sociolinguistic features in a spontaneous speech corpus of Spanish |  | 543 |
Juan M. Montero and M. Mar Duque: ANESTTE: a writer’s assistant for a specific purpose language |  | 544 |
Olga Moudraia: The Student Engineering Corpus: Analysing Word Frequency |  | 552 |
JoAnne Neff, Francisco Ballesteros, Emma Dafouz, Francisco Martínez, Juan-Pedro Rica: Formulating Writer Stance: A Contrastive Study of EFL Learner Corpora |  | 562 |
Diane Nicholls: The Cambridge Learner Corpus - error coding and analysis for lexicography and ELT |  | 572 |
Judy Noguchi, Thomas Orr, Yukio Tono: Using a dedicated corpus to identify features of professional English usage: What do “we” do in science journal articles? |  | 582 |
Attila Novák, Viktor Nagy, Csaba Oravecz: Corpus assisted development of a Hungarian morphological analyser and guesser |  | 583 |
Toshifumi Oba and Eric Atwell: Using the HTK speech recogniser to analyse prosody in a corpus of German spoken learners’ English |  | 591 |
Marija Omazic: THE METACOMMUNICATIVE SETTING OF PHRASEOLOGICAL UNITS AND THEIR MODIFICATIONS – EVIDENCE FROM THE BRITISH NATIONAL CORPUS |  | 599 |
Nelleke Oostdijk: Corpus linguistics meets language technology: deep syntactic parsing for question answering |  | 603 |
Maeve Paris: Extending computer-assisted text analysis techniques to the detection of source code plagiarism and collusion: assisting manual inspection |  | 611 |
Núria Gala Pavia, Salah Aït-Mokhtar: Lexicalising a robust parser grammar using the WWW |  | 620 |
Julien Perrez and Liesbeth Degand: On the combination of corpus-based and experimental methodologies in the study of causal, contrastive and metadiscourse connectives in L1 and L2 text comprehension and production |  | 627 |
Scott S.L. Piao and Tony McEnery: A Tool for Text Comparison |  | 637 |
James Pustejovsky, Patrick Hanks, Roser Saurí, Andrew See, Robert Gaizauskas, Andrea Setzer, Dragomir Radev, Beth Sundheim, David Day, Lisa Ferro and Marcia Lazo: The TIMEBANK Corpus |  | 647 |
Andrew Roberts and Eric Atwell: The use of corpora for automatic evaluation of grammar inference systems |  | 657 |
Juhani Rudanko: More on horror aequi: evidence from large corpora |  | 662 |
Sarah Rule, Emma Marsden, Florence Myles, Rosamond Mitchell: Constructing a database of French interlanguage oral corpora |  | 669 |
Geoffrey Sampson: Are we nearly there yet, Mum? |  | 678 |
Hans-Jörg Schmid, Jens Fauth: Women's and men's style: fact or fiction? New grammatical evidence |  | 679 |
Serge Sharoff: Methods and tools for development of the Russian Reference Corpus |  | 680 |
Bayan Abu Shawar and Eric Atwell : Using dialogue corpora to train a chatbot |  | 681 |
Gerardo Sierra, Alfonso Medina, Rodrigo Alarcón, César A. Aguilar: Towards the Extraction of Conceptual Information from Corpora |  | 691 |
Kiril Simov, Alexander Simov, Milen Kouylekov: Constraints for corpora development and validation |  | 698 |
Milena Slavcheva: Corpus shallow parsing: meeting point between paradigmatic knowledge encoding |  | 706 |
Nicholas Smith: A quirky progressive? A corpus-based exploration of the will + be + -ing construction in
recent and present day British English. |  | 714 |
Harold Somers: Some Issues in the Mark-up of Handwriting in a Learner Corpus |  | 724 |
Dirk Speelman, Stefan Grondelaers, Dirk Geeraerts: A profile-based calculation of region and register variation: the synchronic and diachronic status of the national variants of Dutch |  | 733 |
Somayajulu G. Sripada and Ehud Reiter and Jim Hunter and Jin Yu: Exploiting a parallel TEXT - DATA corpus |  | 734 |
Asa M. Stepak: A proposed mathematical theory explaining the sequence of grammatical categories |  | 744 |
Petra Storjohann: The lexicographic use of corpora and computational tools for disambiguation |  | 754 |
Jozsef Szakos: Cultures and Corpora: Extracting Anthropological Information from Corpora of Formosan Endangered Languages |  | 763 |
Jun Arata Takahashi : Do we talk (or write?) differently over the Net?- A lexical enquiry into ‘a’ Net-EN - |  | 764 |
Kaoru Takahashi: A Study of Text Types and Register Variation in the British National Corpus |  | 773 |
Yuri Tambovtsev: The Structure of the Consonant Patterns in the Spanish Speech Sound Chain as a Clue of Typological Closeness |  | 774 |
Yuri Tambovtsev: Phonological similarity between Basque and other world languages based on the frequency of occurrence of certain typological consonantal features |  | 775 |
Tess Yu-Shan Ke, Liang-Feng Chen, Chien-Chung Chen: Investigation on the uses of temporal subordinators by NS and NNS in academic spoken English |  | 780 |
Carole Tiberius, Dunstan Brown, Greville Corbett: Ambiguity in Russian Morphology |  | 790 |
Juhani Toivanen, Tapio Seppänen, Eero Väyrynen: Creation and utilisation of the MediaTeam Emotional Speech Corpus |  | 791 |
Yukio Tono: Learner corpora: design, development and applications |  | 800 |
Montserrat Civit Torruella, Mª Antònia Martí Antonín, Lluís Padró Cirera : Using hybrid probabilistic-linguistic knowledge to improve pos-tagging performance |  | 810 |
Patrick Tschorn, Anke Lüdeling: Morphological knowledge and alignment of English-German parallel corpora |  | 818 |
Francesca Vaghi, Marco Venuti: The Economist and The Financial Times. A study of movement metaphors |  | 828 |
Bertus van Rooy and Lande Schäfer: An evaluation of three POS taggers for the tagging of the Tswana Learner English Corpus |  | 835 |
Tamás Váradi: Shallow parsing of Hungarian business news |  | 845 |
Isabel Verdaguer and Anna Poch: Collocational and colligational patterns in lexical sets: A corpus-based study |  | 852 |
Maria Verde: Shedding light on SHED, CAST and THROW as nodes of extended lexical units |  | 859 |
Shih-Ping Wang: Mutual information and corpus-based approaches to reduplicative fixed expressions |  | 869 |
Julie Weeds and David Weir: Finding and evaluating sets of nearest neighbours |  | 879 |
David Wible, Ping-Yu Huang: Using learner corpora to examine L2 acquisition of tense-aspect markings |  | 889 |
Sandra Williams and Ehud Reiter: A corpus analysis of discourse relations for Natural Language Generation |  | 899 |
Andrew Wilson, Celia Worth: Building and annotating corpora of spoken Welsh and Gaelic |  | 909 |
Andrew Wilson, Celia Worth: Conceptual Glossaries of the Latin Vulgate Bible |  | 918 |
Andrew Wilson, Olga Moudraia: Quantitative or Qualitative Content Analysis? Experiences from a cross-cultural comparison of female students' attitudes to shoe fashions in Germany, Poland and Russia |  | 919 |
Martin Wynne, Rowan Wilson, Ylva Berglund: Virtual Corpora at the Oxford Text Archive |  | 920 |
Yang Xiaojun: Survey and Prospect of China’s Corpus-Based Researches |  | 930 |
Debra Ziegeler, Sarah Lee: Analysing a Corpus-based Semantic Investigation of English Dialects |  | 931 |
Heike Zinsmeister, Ulrich Heid: Identifying predicatively used adverbs by means of a statistical grammar model |  | 932 |