EACL 2006 Workshop on
Multi-word-expressions in a multilingual context

April 3rd 2006, Trento, Italy

http://ucrel.lancs.ac.uk/EACL06MWEmc/

The EACL 2006 Workshop on "Multi-word-expressions in a multilingual context" will be hosted in conjunction with the 11th Conference of the European Chapter of the Association for Computational Linguistics that will take place April 3-7, 2006, in Trento, italy.

Final manuscript submission now closed http://www.softconf.com/start/EACL06_WS08/

Final Programme now online.

Description

For many years, interest in the NLP community of the problems that multiword-expressions (MWE) posed was focussed mainly on English. Recently, for example at the ACL2004 workshop on multiword expressions, attention has begun to expand to other languages such as Japanese, Russian, Basque and Turkish. This necessitates a re-evaluation of earlier rule-based, statistical and hybrid techniques for MWE identification and classification. In English, MWE types such as phrasal verbs, noun phrases, proper names, and true non-compositional idioms, are considered. However, in other languages some MWE types are represented as compound words, e.g. phrasal verbs in English are generally expressed as verb-prefix in Russian. At the same time, research on MWEs for languages other than English is confronted with new problems, such as the number of word forms per lemma or free word order. Our focus in this workshop will be to incorporate the requirements from different areas such as translation, language engineering and those studying computational techniques for the processing of MWE of language learners and how all these requirements differ across languages. This has a deliberately wide scope to enable cross-disciplinary contact between descriptive, contrastive, educational and computational approaches.

In particular, the importance of MWEs is underestimated in computational approaches to translation. Many applications that involve translation, such as systems carrying out machine translation, word-level alignment, cross-language information retrieval, computer-assisted language learning, etc, quite frequently pay attention only to the translation equivalence between individual words, whereas in the vast majority of cases the equivalence can be established only on the level of units larger than the word. Types of MWEs important in the context of translation involve not only proper idioms (like the much cited kick the bucket), but more prominently:

terminology (and detection of links between complex terms in the two languages)
fixed expressions (in the course of, face to face)
ill-formed collocations (by and large, all of a sudden)
habitual collocations (strong tea, face the truth)
support verbs (take a bath, have a look)

Another area which is currently under-explored is the role of MWEs in learner language. The appropriate use of MWEs is a key skill in the successful mastery of a second language. Yet, due to the scarcity of learner corpora and methodologies for analysing such data, there is relatively little empirical research on how language learners acquire, process and use MWEs in different discourse contexts. In order to gain a better understanding of these processes a number of questions need to be addressed including the following:

What are the design requirements for both spoken and written learner corpora that ensure adequate representation of a particular language variety and, at the same time, allow the facilitation of automatic extraction of MWEs?
What are the differences between MWEs used by native speakers as compared to language learners?
What role do MWEs play in the language acquisition process, and how might we develop corpus resources to track the acquisition process?
What are the implications for the development of tools for the extraction of MWEs if they are to be used on different language varieties?
How might we combine different approaches to assess whether the MWEs that we identify in native speaker corpora are also processed as MWEs when used by the language learner?

Topics of interest

We welcome papers within the theme of the workshop covering, for example, the following topics:

Theoretical research on MWEs, including definitions and taxonomies of MWEs in translation and learner language research
Semantic classification of MWEs
Methodologies and algorithms for MWE identification in corpora
Alignment of MWE translation equivalents in multilingual parallel/comparable corpora
Evaluation of automatic MWE extraction tools in a multilingual context
Application of MWE tools for translation, language engineering and langauge learning tasks
Comparing requirements for MWE tools in these different research fields, application tasks and languages

Draft Programme

The draft programme is as follows. Each paper is allocated 20 minutes presentation time to allow for 10 minutes for questions and discussion.

MORNING:
9.00	Arrivals and welcome Workshop co-chairs
9.30	Named Entities Translation Based on Comparable Corpora Iñaki Alegria, Nerea Ezeiza, and Izaskun Fernandez
10.00	Grouping Multi-word Expressions According to Part-Of-Speech in Statistical Machine Translation Patrik Lambert and Rafael Banchs
10.30	COFFEE BREAK
11.00	Automatic Extraction of Chinese Multiword Expressions with a Statistical Tool Scott S.L. Piao, Guangfan Sun, Paul Rayson, and Qi Yuan
11.30	Chunking Japanese Compound Functional Expressions by Machine Learning Masatoshi Tsuchiya, Takao Shime, Toshihiro Takagi, Takehito Utsuro, Kiyotaka Uchimoto, Suguru Matsuyoshi, Satoshi Sato and Seiichi Nakagawa
12.00	Identifying idiomatic expressions using automatic word-alignment Begoña Villada Moirón and Jörg Tiedemann
LUNCH BREAK:
12.30 - 14.30
AFTERNOON:
14.30	Collocation Extraction: Needs, Feeds and Results of an Extraction System for German Julia Ritz
15.00	Extending corpus-based identification of light verb constructions using a supervised learning framework Yee Fan Tan, Min-Yen Kan and Hang Cui
15.30	Multi-word verbs in a flective language: the case of Estonian Heiki-Jaan Kaalep and Kadri Muischnek
16.00	COFFEE BREAK
16.30	Modeling Monolingual and Bilingual Collocation Dictionaries in Description Logics Dennis Spohr and Ulrich Heid
17.00	Multiword Units in an MT Lexicon Tamás Váradi
17.30	Closing discussion

Registration

Information on registration and registration fees are provided at the conference web page.

Important dates

January 6, 2006 - Deadline for workshop papers
January 27, 2006 - Notification of acceptance
February 10, 2006 - Camera-ready papers due
April 3, 2006 - Workshop

As the schedule is extremely tight, deadline extensions are NOT possible.

Organising committee

Paul Rayson (Lancaster University, UK)
Serge Sharoff (University of Leeds, UK)
Svenja Adolphs (University of Nottingham, UK)

Programme committee

Dawn Archer (University of Central Lancashire, UK)
Timothy Baldwin (University of Melbourne, Australia)
Francis Bond (NTT Communication Science Laboratories, Japan)
Key-Sun Choi (KAIST, Korea)
Béatrice Daille (University of Nantes, France)
Sylviane Granger (Université catholique de Louvain, Belgium)
Chikara Hashimoto (Kyoto University, Japan)
Ulrich Heid (Universität Stuttgart, Germany)
Laura Löfberg (University of Tampere, Finland)
Anke Lüdeling (Humboldt-Universität zu Berlin, Germany)
Olga Mudraya (Lancaster University, UK)
Kyonghee Paik (ATR Spoken Language Translation Research Laboratories, Japan)
Scott Piao (Lancaster University, UK)
Norbert Schmitt (University of Nottingham, UK)

Further information

Workshop web page http://ucrel.lancs.ac.uk/EACL06MWEmc/

Conference web page http://eacl06.itc.it/

EACL 2006 Workshops site http://www.science.uva.nl/~mdr/EACL2006Workshops/

Contact information

Dr Paul Rayson
Computing Department, Infolab21, South Drive, Lancaster University, Lancaster, LA1 4WA, UK.
Tel: +44 1524 510357
Fax: +44 1524 510492
Email: paul-at-comp.lancs.ac.uk

EACL 2006 Workshop on Multi-word-expressions in a multilingual context