Preface | | viii |
Mariko Abe : A Corpus-based Contrastive Analysis of Spoken and Written Learner Corpora: The Case of Japanese-speaking Learners of English | | 1 |
Aduriz I., Aranzabe M.J., Arriola J.M., Atutxa A., Díaz de Ilarraza A., Ezeiza N., Gojenola K., Oronoz M., Soroa A., and Urizar R.: Methodology and steps towards the construction of EPEC, a corpus of written Basque tagged at morphological and syntactic levels for the automatic processing | | 10 |
Khurshid Ahmad, Pensiri Manomaisupat, David Cheng, Tugba Taskaya, Saif Ahmad, Lee Gillam, Andrew Hippisley: The mood of the (financial) markets: In a corpus of words and of pictures | | 12 |
Sandra M. Aluísio, Gisele M. Pinheiro, Marcelo Finger, Maria das Graças V. Nunes, Stella E. O. Tagnin: The Lacio-Web Project: overview and issues in Brazilian Portuguese corpora creation | | 14 |
Dawn Archer, Tony McEnery, Paul Rayson, Andrew Hardie: Developing an automated semantic analysis system for Early Modern English | | 22 |
Dawn Archer, Andrew Hardie, Tony McEnery, Scott Piao: A corpus of seventeenth-century English news reportage: construction, encoding and applications | | 32 |
Bertol Arrieta, Arantza Díaz de Ilarraza, Koldo Gojenola, Montse Maritxalar, Maite Oronoz: A database system for storing second language learner corpora | | 33 |
Jørg Asmussen: Towards a methodology for corpus-based studies of linguistic change: Contrastive observations and their possible diachronic interpretations in the Korpus 2000 and Korpus 90 General Corpora of Danish | | 42 |
Eric Atwell: A New Machine Learning Algorithm for Neoposy: coining new Parts of Speech | | 43 |
Eric Atwell, Paul Gent, Julia Medori, Clive Souter: Detecting student copying in a corpus of science laboratory reports: simple and smart approaches | | 48 |
Francis Henrik Aubert, Stella E. O. Tagnin: A Corpus of Sworn Translations – for linguistic and historical research | | 54 |
Bogdan Babych, Anthony Hartley, Eric Atwell: Statistical modelling of MT output corpora for Information Extraction | | 62 |
Paul Baker, Andrew Hardie, Tony McEnery, and Sri B.D. Jayaram: Constructing Corpora of South Asian Languages | | 71 |
Federica Barbieri: The "new" quotatives in American English: A cross-register comparison | | 81 |
Marco Baroni and Silvia Bernardini: A preliminary analysis of collocational differences in monolingual comparable corpora | | 82 |
Sabine Bartsch: Investigating cross-linguistic constraints on the premodification of adjectival past participles and desubstantival adjectives. A corpus-based study of English and German | | 92 |
Kate Beeching: Synchronic and diachronic variation: the how and why of sociolinguistic corpora. | | 102 |
Luisa Bentivogli, Christian Girardi, Emanuele Pianta: The MEANING Italian Corpus | | 103 |
Julie Carson-Berndsen, Ulrike Gut and Robert Kelly: Discovering regularities in non-native speech | | 113 |
P. Beust, S. Ferrari, V. Perlerin: NLP model and tools for detecting and interpreting metaphors in domain-specific corpora | | 114 |
Philippe Blache, Marie-Laure Guénot and Tristan van Rullen: A corpus-based technique for grammar development | | 124 |
Birte Bös: Towards an integrated model of service encounters | | 132 |
Roderick Bovingdon and Angelo Dalli: Statistical analysis of the source origin of Maltese | | 140 |
Lou Burnard, Tony Dodd: Xara: an XML aware tool for corpus searching | | 142 |
Marianna N. Christou: Expressions and structures of the delexical verb KANΩ [“MAKE” / “DO”] in Modern Greek language: A corpus-based approach to newspaper articles | | 145 |
Ken Cosh and Pete Sawyer: Using natural language processing tools to assist semiotic analysis of information systems | | 155 |
H. Cunningham, V. Tablan, K. Bontcheva, M. Dimitrov: Language engineering tools for collaborative corpus annotation | | 165 |
Mark Davies: Annotation without lexicons: an alternative to the standard bootstrapping approach | | 174 |
Joost van de Weijer: Consonant variation within words | | 184 |
Debbie Elliott, Anthony Hartley and Eric Atwell: Rationale for a multilingual corpus for machine translation evaluation | | 191 |
John Elliott and Debbie Elliott: The Human Language Chorus Corpus (HULCC) | | 201 |
Jens Fauth, Hans-Jörg Schmid: Detecting gender-preferential patterns of linguistic features in face-to-face communication | | 211 |
Valéria D. Feltrim, Sandra M. Aluísio, Maria das Graças V. Nunes: Analysis of the rhetorical structure of computer science abstracts in Portuguese | | 212 |
Katerina T. Frantzi: Updating LSP dictionaries with collocational information | | 219 |
Robert Gaizauskas, Lou Burnard, Paul Clough and Scott Piao: Using the XARA XML-Aware Corpus Query Tool to Investigate the METER Corpus | | 227 |
Ana Llinares García: Repetition and young learners´ initiations in the L2: a corpus-driven analysis | | 237 |
Sandrine Garnier, Youhanizou Tall, Sisay Fissaha, Johann Haller: Learner Corpora: Design, Development and Applications - Development of NLP tools for CALL based on learner corpora (German as a foreign language) | | 246 |
Sara Gesuato: The company women and men keep: what collocations can reveal about culture | | 253 |
Vojko Gorjanc: Tracking lexical changes in the reference corpus of Slovene texts | | 263 |
Stefan Grondelaers, Dirk Speelman, Dirk Geeraerts: A corpus-based approach to informality: the case of Internet chat | | 264 |
Leif Grönqvist and Magnus Gunnarsson : A method for finding word clusters in spoken language | | 265 |
Xiaotian Guo: Between Verbs and Nouns and Between the Base Form and the Other Forms of Verbs – A Contrastive Study into COLEC and LOCNESS | | 274 |
Le An Ha: A method for word segmentation in Vietnamese | | 282 |
Silvia Hansen-Schirra: Linguistic enrichment and exploitation of the Translational English Corpus | | 288 |
Andrew Hardie: Developing a tagset for automated part-of-speech tagging in Urdu | | 298 |
Nigel Harwood: Personal pronouns and academic writing: a multidisciplinary corpus-based critical pragmatic approach to EAP | | 308 |
Laura Hasler, Constantin Orasan and Ruslan Mitkov: Building better corpora for summarisation | | 309 |
Chris Heffer: Not KWIC but Quick: KeyWords in Court | | 319 |
Kris Heylen and Dirk Speelman: A corpus-based analysis of word order variation: The order of verb arguments in the German middle field | | 320 |
Knut Hofland: A web-based concordance system for spoken language corpora | | 330 |
Shelley Ching-yu Hsieh: The Corpus of Mandarin Chinese and German Animal Expressions | | 332 |
Susan Hunston: Frame, phrase or function: a comparison of frame semantics and local grammars | | 342 |
Emi Izumi, Toyomi Saiga, Thepchai Supnithi, Kiyotaka Uchimoto, Hitoshi Isahara: The development of the spoken corpus of Japanese learner English and the applications in collaboration with NLP techniques | | 359 |
Inés Jacob, Joseba Abaitua, Josu Gómez: Automatic feeding of translation memory tools | | 367 |
Steven Jones, M. Lynne Murphy: Antonymy in Childhood: a corpus-based approach to acquisition | | 372 |
Randall L. Jones: An Analysis of Lexical Text Coverage in Contemporary German | | 373 |
Stig W. Jørgensen, Carsten Hansen, Jette Drost, Dorte Haltrup, Anna Braasch, Sussi Olsen: Domain specific corpus building and lemma selection in a computational lexicon | | 374 |
Tomoko Kaneko: How non-native speakers express anger, surprise, anxiety and grief: a corpus-based comparative study | | 384 |
Sachie Karasawa: Patterns of elaboration and interlanguage development: an exploratory corpus analysis of college student essays | | 394 |
Hannah Kermes, Stefan Evert: Text analysis meets corpus linguistics | | 402 |
Adam Kilgarriff: Linguistic Search Engine | | 412 |
Paul Kingsbury: A methodology for inducing a chronology of the Pä li Canon | | 413 |
Gerry Knowles, Zuraidah Mohd Don: Tagging a corpus of Malay texts, and coping with 'syntactic drift' | | 422 |
Natalie Kübler and Cécile Frérot: Verbs in specialised corpora: from manual corpus-based description to automatic extraction in an English-French parallel corpus | | 429 |
Toshihiko Kubota: A Study on Abridgement for Spoken Word Titles | | 439 |
David YW Lee: Spoken Academic Lexicogrammar and Discourse Patterns | | 440 |
Geoffrey Leech, Martin Weisser: Generic speech act annotation for task-oriented dialogues | | 441 |
Agnieszka Lenko-Szymanska: The curse and the blessing of mobile phones - a corpus-based study into Polish and American rhetoric strategies | | 447 |
Robert Liebscher and David Groppe: Rethinking context availability for concrete and abstract words: a corpus study | | 449 |
Laura Löfberg, Dawn Archer, Scott Piao, Paul Rayson, Tony McEnery, Krista Varantola, Jukka-Pekka Juntunen: Porting an English semantic tagger to the Finnish language | | 457 |
Nadine Lucas, Bruno Crémilleux, Leny Turmel : Signalling well-written academic articles in an English corpus by text mining techniques | | 465 |
Anke Lüdeling and Stefan Evert: Linguistic experience and productivity: corpus evidence for fine-grained distinctions | | 475 |
Michaela Mahlberg: High frequency nouns in English: aspects of a grammatical description | | 484 |
Belinda Maia: Constructing comparable and parallel corpora for terminology extraction - work in progress | | 485 |
Manolis Maragoudakis, Katia Kermanidis and Nikos Fakotakis: Towards a Bayesian Stochastic Part-Of-Speech and Case Tagger of Natural Language Corpora | | 486 |
Kevin Mark: Learner corpus building and a ‘living’ university foreign language curriculum | | 496 |
Tony McEnery, Zhonghua Xiao: Fuck revisited | | 504 |
Dan McIntyre, Carol Bellard-Thomson, John Heywood, Tony McEnery, Elena Semino and Mick Short: The Construction of a Corpus to Investigate the Presentation of Speech, Thought and Writing in Written and Spoken British English | | 513 |
John McKenny: Seeing the wood and the trees: Reconciling findings from discourse and lexical analysis | | 523 |
Magnus Merkel, Michael Petterstedt and Lars Ahrenberg: Interactive Word Alignment for Corpus Linguistics | | 533 |
José María Guirao Miras Ana González Ledesma, Guillermo de la Madrid Heitzmann, Manuel Alcántara Plá, Antonio Moreno Sandoval: Relating lexical items to sociolinguistic features in a spontaneous speech corpus of Spanish | | 543 |
Juan M. Montero and M. Mar Duque: ANESTTE: a writer’s assistant for a specific purpose language | | 544 |
Olga Moudraia: The Student Engineering Corpus: Analysing Word Frequency | | 552 |
JoAnne Neff, Francisco Ballesteros, Emma Dafouz, Francisco Martínez, Juan-Pedro Rica: Formulating Writer Stance: A Contrastive Study of EFL Learner Corpora | | 562 |
Diane Nicholls: The Cambridge Learner Corpus - error coding and analysis for lexicography and ELT | | 572 |
Judy Noguchi, Thomas Orr, Yukio Tono: Using a dedicated corpus to identify features of professional English usage: What do “we” do in science journal articles? | | 582 |
Attila Novák, Viktor Nagy, Csaba Oravecz: Corpus assisted development of a Hungarian morphological analyser and guesser | | 583 |
Toshifumi Oba and Eric Atwell: Using the HTK speech recogniser to analyse prosody in a corpus of German spoken learners’ English | | 591 |
Marija Omazic: THE METACOMMUNICATIVE SETTING OF PHRASEOLOGICAL UNITS AND THEIR MODIFICATIONS – EVIDENCE FROM THE BRITISH NATIONAL CORPUS | | 599 |
Nelleke Oostdijk: Corpus linguistics meets language technology: deep syntactic parsing for question answering | | 603 |
Maeve Paris: Extending computer-assisted text analysis techniques to the detection of source code plagiarism and collusion: assisting manual inspection | | 611 |
Núria Gala Pavia, Salah Aït-Mokhtar: Lexicalising a robust parser grammar using the WWW | | 620 |
Julien Perrez and Liesbeth Degand: On the combination of corpus-based and experimental methodologies in the study of causal, contrastive and metadiscourse connectives in L1 and L2 text comprehension and production | | 627 |
Scott S.L. Piao and Tony McEnery: A Tool for Text Comparison | | 637 |
James Pustejovsky, Patrick Hanks, Roser Saurí, Andrew See, Robert Gaizauskas, Andrea Setzer, Dragomir Radev, Beth Sundheim, David Day, Lisa Ferro and Marcia Lazo: The TIMEBANK Corpus | | 647 |
Andrew Roberts and Eric Atwell: The use of corpora for automatic evaluation of grammar inference systems | | 657 |
Juhani Rudanko: More on horror aequi: evidence from large corpora | | 662 |
Sarah Rule, Emma Marsden, Florence Myles, Rosamond Mitchell: Constructing a database of French interlanguage oral corpora | | 669 |
Geoffrey Sampson: Are we nearly there yet, Mum? | | 678 |
Hans-Jörg Schmid, Jens Fauth: Women's and men's style: fact or fiction? New grammatical evidence | | 679 |
Serge Sharoff: Methods and tools for development of the Russian Reference Corpus | | 680 |
Bayan Abu Shawar and Eric Atwell : Using dialogue corpora to train a chatbot | | 681 |
Gerardo Sierra, Alfonso Medina, Rodrigo Alarcón, César A. Aguilar: Towards the Extraction of Conceptual Information from Corpora | | 691 |
Kiril Simov, Alexander Simov, Milen Kouylekov: Constraints for corpora development and validation | | 698 |
Milena Slavcheva: Corpus shallow parsing: meeting point between paradigmatic knowledge encoding | | 706 |
Nicholas Smith: A quirky progressive? A corpus-based exploration of the will + be + -ing construction in
recent and present day British English. | | 714 |
Harold Somers: Some Issues in the Mark-up of Handwriting in a Learner Corpus | | 724 |
Dirk Speelman, Stefan Grondelaers, Dirk Geeraerts: A profile-based calculation of region and register variation: the synchronic and diachronic status of the national variants of Dutch | | 733 |
Somayajulu G. Sripada and Ehud Reiter and Jim Hunter and Jin Yu: Exploiting a parallel TEXT - DATA corpus | | 734 |
Asa M. Stepak: A proposed mathematical theory explaining the sequence of grammatical categories | | 744 |
Petra Storjohann: The lexicographic use of corpora and computational tools for disambiguation | | 754 |
Jozsef Szakos: Cultures and Corpora: Extracting Anthropological Information from Corpora of Formosan Endangered Languages | | 763 |
Jun Arata Takahashi : Do we talk (or write?) differently over the Net?- A lexical enquiry into ‘a’ Net-EN - | | 764 |
Kaoru Takahashi: A Study of Text Types and Register Variation in the British National Corpus | | 773 |
Yuri Tambovtsev: The Structure of the Consonant Patterns in the Spanish Speech Sound Chain as a Clue of Typological Closeness | | 774 |
Yuri Tambovtsev: Phonological similarity between Basque and other world languages based on the frequency of occurrence of certain typological consonantal features | | 775 |
Tess Yu-Shan Ke, Liang-Feng Chen, Chien-Chung Chen: Investigation on the uses of temporal subordinators by NS and NNS in academic spoken English | | 780 |
Carole Tiberius, Dunstan Brown, Greville Corbett: Ambiguity in Russian Morphology | | 790 |
Juhani Toivanen, Tapio Seppänen, Eero Väyrynen: Creation and utilisation of the MediaTeam Emotional Speech Corpus | | 791 |
Yukio Tono: Learner corpora: design, development and applications | | 800 |
Montserrat Civit Torruella, Mª Antònia Martí Antonín, Lluís Padró Cirera : Using hybrid probabilistic-linguistic knowledge to improve pos-tagging performance | | 810 |
Patrick Tschorn, Anke Lüdeling: Morphological knowledge and alignment of English-German parallel corpora | | 818 |
Francesca Vaghi, Marco Venuti: The Economist and The Financial Times. A study of movement metaphors | | 828 |
Bertus van Rooy and Lande Schäfer: An evaluation of three POS taggers for the tagging of the Tswana Learner English Corpus | | 835 |
Tamás Váradi: Shallow parsing of Hungarian business news | | 845 |
Isabel Verdaguer and Anna Poch: Collocational and colligational patterns in lexical sets: A corpus-based study | | 852 |
Maria Verde: Shedding light on SHED, CAST and THROW as nodes of extended lexical units | | 859 |
Shih-Ping Wang: Mutual information and corpus-based approaches to reduplicative fixed expressions | | 869 |
Julie Weeds and David Weir: Finding and evaluating sets of nearest neighbours | | 879 |
David Wible, Ping-Yu Huang: Using learner corpora to examine L2 acquisition of tense-aspect markings | | 889 |
Sandra Williams and Ehud Reiter: A corpus analysis of discourse relations for Natural Language Generation | | 899 |
Andrew Wilson, Celia Worth: Building and annotating corpora of spoken Welsh and Gaelic | | 909 |
Andrew Wilson, Celia Worth: Conceptual Glossaries of the Latin Vulgate Bible | | 918 |
Andrew Wilson, Olga Moudraia: Quantitative or Qualitative Content Analysis? Experiences from a cross-cultural comparison of female students' attitudes to shoe fashions in Germany, Poland and Russia | | 919 |
Martin Wynne, Rowan Wilson, Ylva Berglund: Virtual Corpora at the Oxford Text Archive | | 920 |
Yang Xiaojun: Survey and Prospect of China’s Corpus-Based Researches | | 930 |
Debra Ziegeler, Sarah Lee: Analysing a Corpus-based Semantic Investigation of English Dialects | | 931 |
Heike Zinsmeister, Ulrich Heid: Identifying predicatively used adverbs by means of a statistical grammar model | | 932 |