Proceedings contents

Preface viii

Mariko Abe : A Corpus-based Contrastive Analysis of Spoken and Written Learner Corpora: The Case of Japanese-speaking Learners of English 1

Aduriz I., Aranzabe M.J., Arriola J.M., Atutxa A., Díaz de Ilarraza A., Ezeiza N., Gojenola K., Oronoz M., Soroa A., and Urizar R.: Methodology and steps towards the construction of EPEC, a corpus of written Basque tagged at morphological and syntactic levels for the automatic processing 10

Khurshid Ahmad, Pensiri Manomaisupat, David Cheng, Tugba Taskaya, Saif Ahmad, Lee Gillam, Andrew Hippisley: The mood of the (financial) markets: In a corpus of words and of pictures 12

Sandra M. Aluísio, Gisele M. Pinheiro, Marcelo Finger, Maria das Graças V. Nunes, Stella E. O. Tagnin: The Lacio-Web Project: overview and issues in Brazilian Portuguese corpora creation 14

Dawn Archer, Tony McEnery, Paul Rayson, Andrew Hardie: Developing an automated semantic analysis system for Early Modern English 22

Dawn Archer, Andrew Hardie, Tony McEnery, Scott Piao: A corpus of seventeenth-century English news reportage: construction, encoding and applications 32

Bertol Arrieta, Arantza Díaz de Ilarraza, Koldo Gojenola, Montse Maritxalar, Maite Oronoz: A database system for storing second language learner corpora 33

Jørg Asmussen: Towards a methodology for corpus-based studies of linguistic change: Contrastive observations and their possible diachronic interpretations in the Korpus 2000 and Korpus 90 General Corpora of Danish 42

Eric Atwell: A New Machine Learning Algorithm for Neoposy: coining new Parts of Speech 43

Eric Atwell, Paul Gent, Julia Medori, Clive Souter: Detecting student copying in a corpus of science laboratory reports: simple and smart approaches 48

Francis Henrik Aubert, Stella E. O. Tagnin: A Corpus of Sworn Translations – for linguistic and historical research 54

Bogdan Babych, Anthony Hartley, Eric Atwell: Statistical modelling of MT output corpora for Information Extraction 62

Paul Baker, Andrew Hardie, Tony McEnery, and Sri B.D. Jayaram: Constructing Corpora of South Asian Languages 71

Federica Barbieri: The "new" quotatives in American English: A cross-register comparison 81

Marco Baroni and Silvia Bernardini: A preliminary analysis of collocational differences in monolingual comparable corpora 82

Sabine Bartsch: Investigating cross-linguistic constraints on the premodification of adjectival past participles and desubstantival adjectives. A corpus-based study of English and German 92

Kate Beeching: Synchronic and diachronic variation: the how and why of sociolinguistic corpora. 102

Luisa Bentivogli, Christian Girardi, Emanuele Pianta: The MEANING Italian Corpus 103

Julie Carson-Berndsen, Ulrike Gut and Robert Kelly: Discovering regularities in non-native speech 113

P. Beust, S. Ferrari, V. Perlerin: NLP model and tools for detecting and interpreting metaphors in domain-specific corpora 114

Philippe Blache, Marie-Laure Guénot and Tristan van Rullen: A corpus-based technique for grammar development 124

Birte Bös: Towards an integrated model of service encounters 132

Roderick Bovingdon and Angelo Dalli: Statistical analysis of the source origin of Maltese 140

Lou Burnard, Tony Dodd: Xara: an XML aware tool for corpus searching 142

Marianna N. Christou: Expressions and structures of the delexical verb KANΩ [“MAKE” / “DO”] in Modern Greek language: A corpus-based approach to newspaper articles 145

Ken Cosh and Pete Sawyer: Using natural language processing tools to assist semiotic analysis of information systems 155

H. Cunningham, V. Tablan, K. Bontcheva, M. Dimitrov: Language engineering tools for collaborative corpus annotation 165

Mark Davies: Annotation without lexicons: an alternative to the standard bootstrapping approach 174

Joost van de Weijer: Consonant variation within words 184

Debbie Elliott, Anthony Hartley and Eric Atwell: Rationale for a multilingual corpus for machine translation evaluation 191

John Elliott and Debbie Elliott: The Human Language Chorus Corpus (HULCC) 201

Jens Fauth, Hans-Jörg Schmid: Detecting gender-preferential patterns of linguistic features in face-to-face communication 211

Valéria D. Feltrim, Sandra M. Aluísio, Maria das Graças V. Nunes: Analysis of the rhetorical structure of computer science abstracts in Portuguese 212

Katerina T. Frantzi: Updating LSP dictionaries with collocational information 219

Robert Gaizauskas, Lou Burnard, Paul Clough and Scott Piao: Using the XARA XML-Aware Corpus Query Tool to Investigate the METER Corpus 227

Ana Llinares García: Repetition and young learners´ initiations in the L2: a corpus-driven analysis 237

Sandrine Garnier, Youhanizou Tall, Sisay Fissaha, Johann Haller: Learner Corpora: Design, Development and Applications - Development of NLP tools for CALL based on learner corpora (German as a foreign language) 246

Sara Gesuato: The company women and men keep: what collocations can reveal about culture 253

Vojko Gorjanc: Tracking lexical changes in the reference corpus of Slovene texts 263

Stefan Grondelaers, Dirk Speelman, Dirk Geeraerts: A corpus-based approach to informality: the case of Internet chat 264

Leif Grönqvist and Magnus Gunnarsson : A method for finding word clusters in spoken language 265

Xiaotian Guo: Between Verbs and Nouns and Between the Base Form and the Other Forms of Verbs – A Contrastive Study into COLEC and LOCNESS 274

Le An Ha: A method for word segmentation in Vietnamese 282

Silvia Hansen-Schirra: Linguistic enrichment and exploitation of the Translational English Corpus 288

Andrew Hardie: Developing a tagset for automated part-of-speech tagging in Urdu 298

Nigel Harwood: Personal pronouns and academic writing: a multidisciplinary corpus-based critical pragmatic approach to EAP 308

Laura Hasler, Constantin Orasan and Ruslan Mitkov: Building better corpora for summarisation 309

Chris Heffer: Not KWIC but Quick: KeyWords in Court 319

Kris Heylen and Dirk Speelman: A corpus-based analysis of word order variation: The order of verb arguments in the German middle field 320

Knut Hofland: A web-based concordance system for spoken language corpora 330

Shelley Ching-yu Hsieh: The Corpus of Mandarin Chinese and German Animal Expressions 332

Susan Hunston: Frame, phrase or function: a comparison of frame semantics and local grammars 342

Emi Izumi, Toyomi Saiga, Thepchai Supnithi, Kiyotaka Uchimoto, Hitoshi Isahara: The development of the spoken corpus of Japanese learner English and the applications in collaboration with NLP techniques 359

Inés Jacob, Joseba Abaitua, Josu Gómez: Automatic feeding of translation memory tools 367

Steven Jones, M. Lynne Murphy: Antonymy in Childhood: a corpus-based approach to acquisition 372

Randall L. Jones: An Analysis of Lexical Text Coverage in Contemporary German 373

Stig W. Jørgensen, Carsten Hansen, Jette Drost, Dorte Haltrup, Anna Braasch, Sussi Olsen: Domain specific corpus building and lemma selection in a computational lexicon 374

Tomoko Kaneko: How non-native speakers express anger, surprise, anxiety and grief: a corpus-based comparative study 384

Sachie Karasawa: Patterns of elaboration and interlanguage development: an exploratory corpus analysis of college student essays 394

Hannah Kermes, Stefan Evert: Text analysis meets corpus linguistics 402

Adam Kilgarriff: Linguistic Search Engine 412

Paul Kingsbury: A methodology for inducing a chronology of the Pä li Canon 413

Gerry Knowles, Zuraidah Mohd Don: Tagging a corpus of Malay texts, and coping with 'syntactic drift' 422

Natalie Kübler and Cécile Frérot: Verbs in specialised corpora: from manual corpus-based description to automatic extraction in an English-French parallel corpus 429

Toshihiko Kubota: A Study on Abridgement for Spoken Word Titles 439

David YW Lee: Spoken Academic Lexicogrammar and Discourse Patterns 440

Geoffrey Leech, Martin Weisser: Generic speech act annotation for task-oriented dialogues 441

Agnieszka Lenko-Szymanska: The curse and the blessing of mobile phones - a corpus-based study into Polish and American rhetoric strategies 447

Robert Liebscher and David Groppe: Rethinking context availability for concrete and abstract words: a corpus study 449

Laura Löfberg, Dawn Archer, Scott Piao, Paul Rayson, Tony McEnery, Krista Varantola, Jukka-Pekka Juntunen: Porting an English semantic tagger to the Finnish language 457

Nadine Lucas, Bruno Crémilleux, Leny Turmel : Signalling well-written academic articles in an English corpus by text mining techniques 465

Anke Lüdeling and Stefan Evert: Linguistic experience and productivity: corpus evidence for fine-grained distinctions 475

Michaela Mahlberg: High frequency nouns in English: aspects of a grammatical description 484

Belinda Maia: Constructing comparable and parallel corpora for terminology extraction - work in progress 485

Manolis Maragoudakis, Katia Kermanidis and Nikos Fakotakis: Towards a Bayesian Stochastic Part-Of-Speech and Case Tagger of Natural Language Corpora 486

Kevin Mark: Learner corpus building and a ‘living’ university foreign language curriculum 496

Tony McEnery, Zhonghua Xiao: Fuck revisited 504

Dan McIntyre, Carol Bellard-Thomson, John Heywood, Tony McEnery, Elena Semino and Mick Short: The Construction of a Corpus to Investigate the Presentation of Speech, Thought and Writing in Written and Spoken British English 513

John McKenny: Seeing the wood and the trees: Reconciling findings from discourse and lexical analysis 523

Magnus Merkel, Michael Petterstedt and Lars Ahrenberg: Interactive Word Alignment for Corpus Linguistics 533

José María Guirao Miras Ana González Ledesma, Guillermo de la Madrid Heitzmann, Manuel Alcántara Plá, Antonio Moreno Sandoval: Relating lexical items to sociolinguistic features in a spontaneous speech corpus of Spanish 543

Juan M. Montero and M. Mar Duque: ANESTTE: a writer’s assistant for a specific purpose language 544

Olga Moudraia: The Student Engineering Corpus: Analysing Word Frequency 552

JoAnne Neff, Francisco Ballesteros, Emma Dafouz, Francisco Martínez, Juan-Pedro Rica: Formulating Writer Stance: A Contrastive Study of EFL Learner Corpora 562

Diane Nicholls: The Cambridge Learner Corpus - error coding and analysis for lexicography and ELT 572

Judy Noguchi, Thomas Orr, Yukio Tono: Using a dedicated corpus to identify features of professional English usage: What do “we” do in science journal articles? 582

Attila Novák, Viktor Nagy, Csaba Oravecz: Corpus assisted development of a Hungarian morphological analyser and guesser 583

Toshifumi Oba and Eric Atwell: Using the HTK speech recogniser to analyse prosody in a corpus of German spoken learners’ English 591

Marija Omazic: THE METACOMMUNICATIVE SETTING OF PHRASEOLOGICAL UNITS AND THEIR MODIFICATIONS – EVIDENCE FROM THE BRITISH NATIONAL CORPUS 599

Nelleke Oostdijk: Corpus linguistics meets language technology: deep syntactic parsing for question answering 603

Maeve Paris: Extending computer-assisted text analysis techniques to the detection of source code plagiarism and collusion: assisting manual inspection 611

Núria Gala Pavia, Salah Aït-Mokhtar: Lexicalising a robust parser grammar using the WWW 620

Julien Perrez and Liesbeth Degand: On the combination of corpus-based and experimental methodologies in the study of causal, contrastive and metadiscourse connectives in L1 and L2 text comprehension and production 627

Scott S.L. Piao and Tony McEnery: A Tool for Text Comparison 637

James Pustejovsky, Patrick Hanks, Roser Saurí, Andrew See, Robert Gaizauskas, Andrea Setzer, Dragomir Radev, Beth Sundheim, David Day, Lisa Ferro and Marcia Lazo: The TIMEBANK Corpus 647

Andrew Roberts and Eric Atwell: The use of corpora for automatic evaluation of grammar inference systems 657

Juhani Rudanko: More on horror aequi: evidence from large corpora 662

Sarah Rule, Emma Marsden, Florence Myles, Rosamond Mitchell: Constructing a database of French interlanguage oral corpora 669

Geoffrey Sampson: Are we nearly there yet, Mum? 678

Hans-Jörg Schmid, Jens Fauth: Women's and men's style: fact or fiction? New grammatical evidence 679

Serge Sharoff: Methods and tools for development of the Russian Reference Corpus 680

Bayan Abu Shawar and Eric Atwell : Using dialogue corpora to train a chatbot 681

Gerardo Sierra, Alfonso Medina, Rodrigo Alarcón, César A. Aguilar: Towards the Extraction of Conceptual Information from Corpora 691

Kiril Simov, Alexander Simov, Milen Kouylekov: Constraints for corpora development and validation 698

Milena Slavcheva: Corpus shallow parsing: meeting point between paradigmatic knowledge encoding 706

Nicholas Smith: A quirky progressive? A corpus-based exploration of the will + be + -ing construction in recent and present day British English. 714

Harold Somers: Some Issues in the Mark-up of Handwriting in a Learner Corpus 724

Dirk Speelman, Stefan Grondelaers, Dirk Geeraerts: A profile-based calculation of region and register variation: the synchronic and diachronic status of the national variants of Dutch 733

Somayajulu G. Sripada and Ehud Reiter and Jim Hunter and Jin Yu: Exploiting a parallel TEXT - DATA corpus 734

Asa M. Stepak: A proposed mathematical theory explaining the sequence of grammatical categories 744

Petra Storjohann: The lexicographic use of corpora and computational tools for disambiguation 754

Jozsef Szakos: Cultures and Corpora: Extracting Anthropological Information from Corpora of Formosan Endangered Languages 763

Jun Arata Takahashi : Do we talk (or write?) differently over the Net?- A lexical enquiry into ‘a’ Net-EN - 764

Kaoru Takahashi: A Study of Text Types and Register Variation in the British National Corpus 773

Yuri Tambovtsev: The Structure of the Consonant Patterns in the Spanish Speech Sound Chain as a Clue of Typological Closeness 774

Yuri Tambovtsev: Phonological similarity between Basque and other world languages based on the frequency of occurrence of certain typological consonantal features 775

Tess Yu-Shan Ke, Liang-Feng Chen, Chien-Chung Chen: Investigation on the uses of temporal subordinators by NS and NNS in academic spoken English 780

Carole Tiberius, Dunstan Brown, Greville Corbett: Ambiguity in Russian Morphology 790

Juhani Toivanen, Tapio Seppänen, Eero Väyrynen: Creation and utilisation of the MediaTeam Emotional Speech Corpus 791

Yukio Tono: Learner corpora: design, development and applications 800

Montserrat Civit Torruella, Mª Antònia Martí Antonín, Lluís Padró Cirera : Using hybrid probabilistic-linguistic knowledge to improve pos-tagging performance 810

Patrick Tschorn, Anke Lüdeling: Morphological knowledge and alignment of English-German parallel corpora 818

Francesca Vaghi, Marco Venuti: The Economist and The Financial Times. A study of movement metaphors 828

Bertus van Rooy and Lande Schäfer: An evaluation of three POS taggers for the tagging of the Tswana Learner English Corpus 835

Tamás Váradi: Shallow parsing of Hungarian business news 845

Isabel Verdaguer and Anna Poch: Collocational and colligational patterns in lexical sets: A corpus-based study 852

Maria Verde: Shedding light on SHED, CAST and THROW as nodes of extended lexical units 859

Shih-Ping Wang: Mutual information and corpus-based approaches to reduplicative fixed expressions 869

Julie Weeds and David Weir: Finding and evaluating sets of nearest neighbours 879

David Wible, Ping-Yu Huang: Using learner corpora to examine L2 acquisition of tense-aspect markings 889

Sandra Williams and Ehud Reiter: A corpus analysis of discourse relations for Natural Language Generation 899

Andrew Wilson, Celia Worth: Building and annotating corpora of spoken Welsh and Gaelic 909

Andrew Wilson, Celia Worth: Conceptual Glossaries of the Latin Vulgate Bible 918

Andrew Wilson, Olga Moudraia: Quantitative or Qualitative Content Analysis? Experiences from a cross-cultural comparison of female students' attitudes to shoe fashions in Germany, Poland and Russia 919

Martin Wynne, Rowan Wilson, Ylva Berglund: Virtual Corpora at the Oxford Text Archive 920

Yang Xiaojun: Survey and Prospect of China’s Corpus-Based Researches 930

Debra Ziegeler, Sarah Lee: Analysing a Corpus-based Semantic Investigation of English Dialects 931

Heike Zinsmeister, Ulrich Heid: Identifying predicatively used adverbs by means of a statistical grammar model 932

Preface		viii
Mariko Abe : A Corpus-based Contrastive Analysis of Spoken and Written Learner Corpora: The Case of Japanese-speaking Learners of English		1
Aduriz I., Aranzabe M.J., Arriola J.M., Atutxa A., Díaz de Ilarraza A., Ezeiza N., Gojenola K., Oronoz M., Soroa A., and Urizar R.: Methodology and steps towards the construction of EPEC, a corpus of written Basque tagged at morphological and syntactic levels for the automatic processing		10
Khurshid Ahmad, Pensiri Manomaisupat, David Cheng, Tugba Taskaya, Saif Ahmad, Lee Gillam, Andrew Hippisley: The mood of the (financial) markets: In a corpus of words and of pictures		12
Sandra M. Aluísio, Gisele M. Pinheiro, Marcelo Finger, Maria das Graças V. Nunes, Stella E. O. Tagnin: The Lacio-Web Project: overview and issues in Brazilian Portuguese corpora creation		14
Dawn Archer, Tony McEnery, Paul Rayson, Andrew Hardie: Developing an automated semantic analysis system for Early Modern English		22
Dawn Archer, Andrew Hardie, Tony McEnery, Scott Piao: A corpus of seventeenth-century English news reportage: construction, encoding and applications		32
Bertol Arrieta, Arantza Díaz de Ilarraza, Koldo Gojenola, Montse Maritxalar, Maite Oronoz: A database system for storing second language learner corpora		33
Jørg Asmussen: Towards a methodology for corpus-based studies of linguistic change: Contrastive observations and their possible diachronic interpretations in the Korpus 2000 and Korpus 90 General Corpora of Danish		42
Eric Atwell: A New Machine Learning Algorithm for Neoposy: coining new Parts of Speech		43
Eric Atwell, Paul Gent, Julia Medori, Clive Souter: Detecting student copying in a corpus of science laboratory reports: simple and smart approaches		48
Francis Henrik Aubert, Stella E. O. Tagnin: A Corpus of Sworn Translations – for linguistic and historical research		54
Bogdan Babych, Anthony Hartley, Eric Atwell: Statistical modelling of MT output corpora for Information Extraction		62
Paul Baker, Andrew Hardie, Tony McEnery, and Sri B.D. Jayaram: Constructing Corpora of South Asian Languages		71
Federica Barbieri: The "new" quotatives in American English: A cross-register comparison		81
Marco Baroni and Silvia Bernardini: A preliminary analysis of collocational differences in monolingual comparable corpora		82
Sabine Bartsch: Investigating cross-linguistic constraints on the premodification of adjectival past participles and desubstantival adjectives. A corpus-based study of English and German		92
Kate Beeching: Synchronic and diachronic variation: the how and why of sociolinguistic corpora.		102
Luisa Bentivogli, Christian Girardi, Emanuele Pianta: The MEANING Italian Corpus		103
Julie Carson-Berndsen, Ulrike Gut and Robert Kelly: Discovering regularities in non-native speech		113
P. Beust, S. Ferrari, V. Perlerin: NLP model and tools for detecting and interpreting metaphors in domain-specific corpora		114
Philippe Blache, Marie-Laure Guénot and Tristan van Rullen: A corpus-based technique for grammar development		124
Birte Bös: Towards an integrated model of service encounters		132
Roderick Bovingdon and Angelo Dalli: Statistical analysis of the source origin of Maltese		140
Lou Burnard, Tony Dodd: Xara: an XML aware tool for corpus searching		142
Marianna N. Christou: Expressions and structures of the delexical verb KANΩ [“MAKE” / “DO”] in Modern Greek language: A corpus-based approach to newspaper articles		145
Ken Cosh and Pete Sawyer: Using natural language processing tools to assist semiotic analysis of information systems		155
H. Cunningham, V. Tablan, K. Bontcheva, M. Dimitrov: Language engineering tools for collaborative corpus annotation		165
Mark Davies: Annotation without lexicons: an alternative to the standard bootstrapping approach		174
Joost van de Weijer: Consonant variation within words		184
Debbie Elliott, Anthony Hartley and Eric Atwell: Rationale for a multilingual corpus for machine translation evaluation		191
John Elliott and Debbie Elliott: The Human Language Chorus Corpus (HULCC)		201
Jens Fauth, Hans-Jörg Schmid: Detecting gender-preferential patterns of linguistic features in face-to-face communication		211
Valéria D. Feltrim, Sandra M. Aluísio, Maria das Graças V. Nunes: Analysis of the rhetorical structure of computer science abstracts in Portuguese		212
Katerina T. Frantzi: Updating LSP dictionaries with collocational information		219
Robert Gaizauskas, Lou Burnard, Paul Clough and Scott Piao: Using the XARA XML-Aware Corpus Query Tool to Investigate the METER Corpus		227
Ana Llinares García: Repetition and young learners´ initiations in the L2: a corpus-driven analysis		237
Sandrine Garnier, Youhanizou Tall, Sisay Fissaha, Johann Haller: Learner Corpora: Design, Development and Applications - Development of NLP tools for CALL based on learner corpora (German as a foreign language)		246
Sara Gesuato: The company women and men keep: what collocations can reveal about culture		253
Vojko Gorjanc: Tracking lexical changes in the reference corpus of Slovene texts		263
Stefan Grondelaers, Dirk Speelman, Dirk Geeraerts: A corpus-based approach to informality: the case of Internet chat		264
Leif Grönqvist and Magnus Gunnarsson : A method for finding word clusters in spoken language		265
Xiaotian Guo: Between Verbs and Nouns and Between the Base Form and the Other Forms of Verbs – A Contrastive Study into COLEC and LOCNESS		274
Le An Ha: A method for word segmentation in Vietnamese		282
Silvia Hansen-Schirra: Linguistic enrichment and exploitation of the Translational English Corpus		288
Andrew Hardie: Developing a tagset for automated part-of-speech tagging in Urdu		298
Nigel Harwood: Personal pronouns and academic writing: a multidisciplinary corpus-based critical pragmatic approach to EAP		308
Laura Hasler, Constantin Orasan and Ruslan Mitkov: Building better corpora for summarisation		309
Chris Heffer: Not KWIC but Quick: KeyWords in Court		319
Kris Heylen and Dirk Speelman: A corpus-based analysis of word order variation: The order of verb arguments in the German middle field		320
Knut Hofland: A web-based concordance system for spoken language corpora		330
Shelley Ching-yu Hsieh: The Corpus of Mandarin Chinese and German Animal Expressions		332
Susan Hunston: Frame, phrase or function: a comparison of frame semantics and local grammars		342
Emi Izumi, Toyomi Saiga, Thepchai Supnithi, Kiyotaka Uchimoto, Hitoshi Isahara: The development of the spoken corpus of Japanese learner English and the applications in collaboration with NLP techniques		359
Inés Jacob, Joseba Abaitua, Josu Gómez: Automatic feeding of translation memory tools		367
Steven Jones, M. Lynne Murphy: Antonymy in Childhood: a corpus-based approach to acquisition		372
Randall L. Jones: An Analysis of Lexical Text Coverage in Contemporary German		373
Stig W. Jørgensen, Carsten Hansen, Jette Drost, Dorte Haltrup, Anna Braasch, Sussi Olsen: Domain specific corpus building and lemma selection in a computational lexicon		374
Tomoko Kaneko: How non-native speakers express anger, surprise, anxiety and grief: a corpus-based comparative study		384
Sachie Karasawa: Patterns of elaboration and interlanguage development: an exploratory corpus analysis of college student essays		394
Hannah Kermes, Stefan Evert: Text analysis meets corpus linguistics		402
Adam Kilgarriff: Linguistic Search Engine		412
Paul Kingsbury: A methodology for inducing a chronology of the Pä li Canon		413
Gerry Knowles, Zuraidah Mohd Don: Tagging a corpus of Malay texts, and coping with 'syntactic drift'		422
Natalie Kübler and Cécile Frérot: Verbs in specialised corpora: from manual corpus-based description to automatic extraction in an English-French parallel corpus		429
Toshihiko Kubota: A Study on Abridgement for Spoken Word Titles		439
David YW Lee: Spoken Academic Lexicogrammar and Discourse Patterns		440
Geoffrey Leech, Martin Weisser: Generic speech act annotation for task-oriented dialogues		441
Agnieszka Lenko-Szymanska: The curse and the blessing of mobile phones - a corpus-based study into Polish and American rhetoric strategies		447
Robert Liebscher and David Groppe: Rethinking context availability for concrete and abstract words: a corpus study		449
Laura Löfberg, Dawn Archer, Scott Piao, Paul Rayson, Tony McEnery, Krista Varantola, Jukka-Pekka Juntunen: Porting an English semantic tagger to the Finnish language		457
Nadine Lucas, Bruno Crémilleux, Leny Turmel : Signalling well-written academic articles in an English corpus by text mining techniques		465
Anke Lüdeling and Stefan Evert: Linguistic experience and productivity: corpus evidence for fine-grained distinctions		475
Michaela Mahlberg: High frequency nouns in English: aspects of a grammatical description		484
Belinda Maia: Constructing comparable and parallel corpora for terminology extraction - work in progress		485
Manolis Maragoudakis, Katia Kermanidis and Nikos Fakotakis: Towards a Bayesian Stochastic Part-Of-Speech and Case Tagger of Natural Language Corpora		486
Kevin Mark: Learner corpus building and a ‘living’ university foreign language curriculum		496
Tony McEnery, Zhonghua Xiao: Fuck revisited		504
Dan McIntyre, Carol Bellard-Thomson, John Heywood, Tony McEnery, Elena Semino and Mick Short: The Construction of a Corpus to Investigate the Presentation of Speech, Thought and Writing in Written and Spoken British English		513
John McKenny: Seeing the wood and the trees: Reconciling findings from discourse and lexical analysis		523
Magnus Merkel, Michael Petterstedt and Lars Ahrenberg: Interactive Word Alignment for Corpus Linguistics		533
José María Guirao Miras Ana González Ledesma, Guillermo de la Madrid Heitzmann, Manuel Alcántara Plá, Antonio Moreno Sandoval: Relating lexical items to sociolinguistic features in a spontaneous speech corpus of Spanish		543
Juan M. Montero and M. Mar Duque: ANESTTE: a writer’s assistant for a specific purpose language		544
Olga Moudraia: The Student Engineering Corpus: Analysing Word Frequency		552
JoAnne Neff, Francisco Ballesteros, Emma Dafouz, Francisco Martínez, Juan-Pedro Rica: Formulating Writer Stance: A Contrastive Study of EFL Learner Corpora		562
Diane Nicholls: The Cambridge Learner Corpus - error coding and analysis for lexicography and ELT		572
Judy Noguchi, Thomas Orr, Yukio Tono: Using a dedicated corpus to identify features of professional English usage: What do “we” do in science journal articles?		582
Attila Novák, Viktor Nagy, Csaba Oravecz: Corpus assisted development of a Hungarian morphological analyser and guesser		583
Toshifumi Oba and Eric Atwell: Using the HTK speech recogniser to analyse prosody in a corpus of German spoken learners’ English		591
Marija Omazic: THE METACOMMUNICATIVE SETTING OF PHRASEOLOGICAL UNITS AND THEIR MODIFICATIONS – EVIDENCE FROM THE BRITISH NATIONAL CORPUS		599
Nelleke Oostdijk: Corpus linguistics meets language technology: deep syntactic parsing for question answering		603
Maeve Paris: Extending computer-assisted text analysis techniques to the detection of source code plagiarism and collusion: assisting manual inspection		611
Núria Gala Pavia, Salah Aït-Mokhtar: Lexicalising a robust parser grammar using the WWW		620
Julien Perrez and Liesbeth Degand: On the combination of corpus-based and experimental methodologies in the study of causal, contrastive and metadiscourse connectives in L1 and L2 text comprehension and production		627
Scott S.L. Piao and Tony McEnery: A Tool for Text Comparison		637
James Pustejovsky, Patrick Hanks, Roser Saurí, Andrew See, Robert Gaizauskas, Andrea Setzer, Dragomir Radev, Beth Sundheim, David Day, Lisa Ferro and Marcia Lazo: The TIMEBANK Corpus		647
Andrew Roberts and Eric Atwell: The use of corpora for automatic evaluation of grammar inference systems		657
Juhani Rudanko: More on horror aequi: evidence from large corpora		662
Sarah Rule, Emma Marsden, Florence Myles, Rosamond Mitchell: Constructing a database of French interlanguage oral corpora		669
Geoffrey Sampson: Are we nearly there yet, Mum?		678
Hans-Jörg Schmid, Jens Fauth: Women's and men's style: fact or fiction? New grammatical evidence		679
Serge Sharoff: Methods and tools for development of the Russian Reference Corpus		680
Bayan Abu Shawar and Eric Atwell : Using dialogue corpora to train a chatbot		681
Gerardo Sierra, Alfonso Medina, Rodrigo Alarcón, César A. Aguilar: Towards the Extraction of Conceptual Information from Corpora		691
Kiril Simov, Alexander Simov, Milen Kouylekov: Constraints for corpora development and validation		698
Milena Slavcheva: Corpus shallow parsing: meeting point between paradigmatic knowledge encoding		706
Nicholas Smith: A quirky progressive? A corpus-based exploration of the will + be + -ing construction in recent and present day British English.		714
Harold Somers: Some Issues in the Mark-up of Handwriting in a Learner Corpus		724
Dirk Speelman, Stefan Grondelaers, Dirk Geeraerts: A profile-based calculation of region and register variation: the synchronic and diachronic status of the national variants of Dutch		733
Somayajulu G. Sripada and Ehud Reiter and Jim Hunter and Jin Yu: Exploiting a parallel TEXT - DATA corpus		734
Asa M. Stepak: A proposed mathematical theory explaining the sequence of grammatical categories		744
Petra Storjohann: The lexicographic use of corpora and computational tools for disambiguation		754
Jozsef Szakos: Cultures and Corpora: Extracting Anthropological Information from Corpora of Formosan Endangered Languages		763
Jun Arata Takahashi : Do we talk (or write?) differently over the Net?- A lexical enquiry into ‘a’ Net-EN -		764
Kaoru Takahashi: A Study of Text Types and Register Variation in the British National Corpus		773
Yuri Tambovtsev: The Structure of the Consonant Patterns in the Spanish Speech Sound Chain as a Clue of Typological Closeness		774
Yuri Tambovtsev: Phonological similarity between Basque and other world languages based on the frequency of occurrence of certain typological consonantal features		775
Tess Yu-Shan Ke, Liang-Feng Chen, Chien-Chung Chen: Investigation on the uses of temporal subordinators by NS and NNS in academic spoken English		780
Carole Tiberius, Dunstan Brown, Greville Corbett: Ambiguity in Russian Morphology		790
Juhani Toivanen, Tapio Seppänen, Eero Väyrynen: Creation and utilisation of the MediaTeam Emotional Speech Corpus		791
Yukio Tono: Learner corpora: design, development and applications		800
Montserrat Civit Torruella, Mª Antònia Martí Antonín, Lluís Padró Cirera : Using hybrid probabilistic-linguistic knowledge to improve pos-tagging performance		810
Patrick Tschorn, Anke Lüdeling: Morphological knowledge and alignment of English-German parallel corpora		818
Francesca Vaghi, Marco Venuti: The Economist and The Financial Times. A study of movement metaphors		828
Bertus van Rooy and Lande Schäfer: An evaluation of three POS taggers for the tagging of the Tswana Learner English Corpus		835
Tamás Váradi: Shallow parsing of Hungarian business news		845
Isabel Verdaguer and Anna Poch: Collocational and colligational patterns in lexical sets: A corpus-based study		852
Maria Verde: Shedding light on SHED, CAST and THROW as nodes of extended lexical units		859
Shih-Ping Wang: Mutual information and corpus-based approaches to reduplicative fixed expressions		869
Julie Weeds and David Weir: Finding and evaluating sets of nearest neighbours		879
David Wible, Ping-Yu Huang: Using learner corpora to examine L2 acquisition of tense-aspect markings		889
Sandra Williams and Ehud Reiter: A corpus analysis of discourse relations for Natural Language Generation		899
Andrew Wilson, Celia Worth: Building and annotating corpora of spoken Welsh and Gaelic		909
Andrew Wilson, Celia Worth: Conceptual Glossaries of the Latin Vulgate Bible		918
Andrew Wilson, Olga Moudraia: Quantitative or Qualitative Content Analysis? Experiences from a cross-cultural comparison of female students' attitudes to shoe fashions in Germany, Poland and Russia		919
Martin Wynne, Rowan Wilson, Ylva Berglund: Virtual Corpora at the Oxford Text Archive		920
Yang Xiaojun: Survey and Prospect of China’s Corpus-Based Researches		930
Debra Ziegeler, Sarah Lee: Analysing a Corpus-based Semantic Investigation of English Dialects		931
Heike Zinsmeister, Ulrich Heid: Identifying predicatively used adverbs by means of a statistical grammar model		932

Table of contents