EACL 2006 Workshop on

Multi-word-expressions in a multilingual context

April 3rd 2006, Trento, Italy


The EACL 2006 Workshop on "Multi-word-expressions in a multilingual context" will be hosted in conjunction with the 11th Conference of the European Chapter of the Association for Computational Linguistics that will take place April 3-7, 2006, in Trento, italy.

For many years, interest in the NLP community of the problems that multiword-expressions (MWE) posed was focussed mainly on English. Recently, for example at the ACL2004 workshop on multiword expressions, attention has begun to expand to other languages such as Japanese, Russian, Basque and Turkish. This necessitates a re-evaluation of earlier rule-based, statistical and hybrid techniques for MWE identification and classification. In English, MWE types such as phrasal verbs, noun phrases, proper names, and true non-compositional idioms, are considered. However, in other languages some MWE types are represented as compound words, e.g. phrasal verbs in English are generally expressed as verb-prefix in Russian. At the same time, research on MWEs for languages other than English is confronted with new problems, such as the number of word forms per lemma or free word order. Our focus in this workshop will be to incorporate the requirements from different areas such as translation, language engineering and those studying computational techniques for the processing of MWE of language learners and how all these requirements differ across languages. This has a deliberately wide scope to enable cross-disciplinary contact between descriptive, contrastive, educational and computational approaches.

In particular, the importance of MWEs is underestimated in computational approaches to translation. Many applications that involve translation, such as systems carrying out machine translation, word-level alignment, cross-language information retrieval, computer-assisted language learning, etc, quite frequently pay attention only to the translation equivalence between individual words, whereas in the vast majority of cases the equivalence can be established only on the level of units larger than the word. Types of MWEs important in the context of translation involve not only proper idioms (like the much cited kick the bucket), but more prominently:

Another area which is currently under-explored is the role of MWEs in learner language. The appropriate use of MWEs is a key skill in the successful mastery of a second language. Yet, due to the scarcity of learner corpora and methodologies for analysing such data, there is relatively little empirical research on how language learners acquire, process and use MWEs in different discourse contexts. In order to gain a better understanding of these processes a number of questions need to be addressed including the following:

Topics of interest

We welcome papers within the theme of the workshop covering, for example, the following topics:

Draft Programme

The draft programme is as follows. Each paper is allocated 20 minutes presentation time to allow for 10 minutes for questions and discussion.
9.00 Arrivals and welcome
Workshop co-chairs
9.30 Named Entities Translation Based on Comparable Corpora
Iñaki Alegria, Nerea Ezeiza, and Izaskun Fernandez
10.00 Grouping Multi-word Expressions According to Part-Of-Speech in Statistical Machine Translation
Patrik Lambert and Rafael Banchs
11.00 Automatic Extraction of Chinese Multiword Expressions with a Statistical Tool
Scott S.L. Piao, Guangfan Sun, Paul Rayson, and Qi Yuan
11.30 Chunking Japanese Compound Functional Expressions by Machine Learning
Masatoshi Tsuchiya, Takao Shime, Toshihiro Takagi, Takehito Utsuro, Kiyotaka Uchimoto, Suguru Matsuyoshi, Satoshi Sato and Seiichi Nakagawa
12.00 Identifying idiomatic expressions using automatic word-alignment
Begoña Villada Moirón and Jörg Tiedemann
12.30 - 14.30
14.30 Collocation Extraction: Needs, Feeds and Results of an Extraction System for German
Julia Ritz
15.00 Extending corpus-based identification of light verb constructions using a supervised learning framework
Yee Fan Tan, Min-Yen Kan and Hang Cui
15.30 Multi-word verbs in a flective language: the case of Estonian
Heiki-Jaan Kaalep and Kadri Muischnek
16.30 Modeling Monolingual and Bilingual Collocation Dictionaries in Description Logics
Dennis Spohr and Ulrich Heid
17.00 Multiword Units in an MT Lexicon
Tamás Váradi
17.30 Closing discussion


Information on registration and registration fees are provided at the conference web page.

Important dates

January 6, 2006 - Deadline for workshop papers
January 27, 2006 - Notification of acceptance
February 10, 2006 - Camera-ready papers due
April 3, 2006 - Workshop

As the schedule is extremely tight, deadline extensions are NOT possible.

Organising committee

Paul Rayson (Lancaster University, UK)
Serge Sharoff (University of Leeds, UK)
Svenja Adolphs (University of Nottingham, UK)

Programme committee

Dawn Archer (University of Central Lancashire, UK)
Timothy Baldwin (University of Melbourne, Australia)
Francis Bond (NTT Communication Science Laboratories, Japan)
Key-Sun Choi (KAIST, Korea)
Béatrice Daille (University of Nantes, France)
Sylviane Granger (Université catholique de Louvain, Belgium)
Chikara Hashimoto (Kyoto University, Japan)
Ulrich Heid (Universität Stuttgart, Germany)
Laura Löfberg (University of Tampere, Finland)
Anke Lüdeling (Humboldt-Universität zu Berlin, Germany)
Olga Mudraya (Lancaster University, UK)
Kyonghee Paik (ATR Spoken Language Translation Research Laboratories, Japan)
Scott Piao (Lancaster University, UK)
Norbert Schmitt (University of Nottingham, UK)

Further information

