EACL 2006 Workshop on
Multi-word-expressions in a multilingual context
April 3rd 2006, Trento, Italy
http://ucrel.lancs.ac.uk/EACL06MWEmc/
The EACL 2006 Workshop on "Multi-word-expressions in a multilingual context" will be hosted in
conjunction with the 11th Conference of the European Chapter of
the Association for Computational Linguistics that will take
place April 3-7, 2006, in Trento, italy.
Final manuscript submission now closed
http://www.softconf.com/start/EACL06_WS08/
Final Programme now online.
Description
For many years, interest in the NLP community of the problems that
multiword-expressions (MWE) posed was focussed mainly on English.
Recently, for example at the ACL2004 workshop on multiword expressions,
attention has begun to expand to other languages such as Japanese,
Russian, Basque and Turkish. This necessitates a re-evaluation of
earlier rule-based, statistical and hybrid techniques for MWE
identification and classification. In English, MWE types such as
phrasal verbs, noun phrases, proper names, and true non-compositional
idioms, are considered. However, in other languages some MWE types are
represented as compound words, e.g. phrasal verbs in English are
generally expressed as verb-prefix in Russian. At the same time,
research on MWEs for languages other than English is confronted with
new problems, such as the number of word forms per lemma or free word
order. Our focus in this workshop will be to incorporate the
requirements from different areas such as translation, language
engineering and those studying computational techniques for the
processing of MWE of language learners and how all these requirements
differ across languages.
This has a deliberately wide scope to enable cross-disciplinary contact
between descriptive, contrastive, educational and computational
approaches.
In particular, the importance of MWEs is underestimated in
computational approaches to translation. Many applications that
involve translation, such as systems carrying out machine translation,
word-level alignment, cross-language information retrieval,
computer-assisted language learning, etc, quite frequently pay
attention only to the translation equivalence between individual words,
whereas in the vast majority of cases the equivalence can be
established only on the level of units larger than the word. Types of
MWEs important in the context of translation involve not only proper
idioms (like the much cited kick the bucket), but more prominently:
- terminology (and detection of links between complex terms in the two languages)
- fixed expressions (in the course of, face to face)
- ill-formed collocations (by and large, all of a sudden)
- habitual collocations (strong tea, face the truth)
- support verbs (take a bath, have a look)
Another area which is currently under-explored is the role of MWEs in
learner language. The appropriate use of MWEs is a key skill in the
successful mastery of a second language. Yet, due to the scarcity of
learner corpora and methodologies for analysing such data, there is
relatively little empirical research on how language learners acquire,
process and use MWEs in different discourse contexts. In order to gain
a better understanding of these processes a number of questions need to
be addressed including the following:
- What are the design requirements for both spoken and written learner corpora that ensure adequate representation of a particular language variety and, at the same time, allow the facilitation of automatic extraction of MWEs?
- What are the differences between MWEs used by native speakers as compared to language learners?
- What role do MWEs play in the language acquisition process, and how might we develop corpus resources to track the acquisition process?
- What are the implications for the development of tools for the extraction of MWEs if they are to be used on different language varieties?
- How might we combine different approaches to assess whether the MWEs that we identify in native speaker corpora are also processed as MWEs when used by the language learner?
Topics of interest
We welcome papers within the theme of the workshop covering, for example, the following topics:
- Theoretical research on MWEs, including definitions and taxonomies of MWEs in translation and learner language research
- Semantic classification of MWEs
- Methodologies and algorithms for MWE identification in corpora
- Alignment of MWE translation equivalents in multilingual parallel/comparable corpora
- Evaluation of automatic MWE extraction tools in a multilingual context
- Application of MWE tools for translation, language engineering and langauge learning tasks
- Comparing requirements for MWE tools in these different research fields, application tasks and languages
Draft Programme
The draft programme is as follows. Each paper is allocated 20 minutes presentation time
to allow for 10 minutes for questions and discussion.
MORNING:
| |
9.00 | Arrivals and welcome
Workshop co-chairs |
9.30 | Named Entities Translation Based on Comparable Corpora
Iñaki Alegria, Nerea Ezeiza, and Izaskun Fernandez |
10.00 | Grouping Multi-word Expressions According to Part-Of-Speech in Statistical
Machine Translation
Patrik Lambert and Rafael Banchs |
10.30 | COFFEE BREAK |
11.00 | Automatic Extraction of Chinese Multiword Expressions with a Statistical
Tool
Scott S.L. Piao, Guangfan Sun, Paul Rayson, and Qi Yuan |
11.30 | Chunking Japanese Compound Functional Expressions by Machine Learning
Masatoshi Tsuchiya, Takao Shime, Toshihiro Takagi, Takehito Utsuro,
Kiyotaka Uchimoto, Suguru Matsuyoshi, Satoshi Sato and Seiichi
Nakagawa |
12.00 | Identifying idiomatic expressions using automatic word-alignment
Begoña Villada Moirón and Jörg Tiedemann |
LUNCH BREAK: | |
12.30 - 14.30 | |
AFTERNOON: | |
14.30 | Collocation Extraction: Needs, Feeds and Results of an Extraction System for
German
Julia Ritz |
15.00 | Extending corpus-based identification of light verb constructions using a
supervised learning framework
Yee Fan Tan, Min-Yen Kan and Hang Cui |
15.30 | Multi-word verbs in a flective language: the case of Estonian
Heiki-Jaan Kaalep and Kadri Muischnek |
16.00 | COFFEE BREAK |
16.30 | Modeling Monolingual and Bilingual Collocation Dictionaries in Description
Logics
Dennis Spohr and Ulrich Heid |
17.00 | Multiword Units in an MT Lexicon
Tamás Váradi |
17.30 | Closing discussion |
Registration
Information on registration and registration fees are
provided at the conference web page.
Important dates
January 6, 2006 - Deadline for workshop papers
January 27, 2006 - Notification of acceptance
February 10, 2006 - Camera-ready papers due
April 3, 2006 - Workshop
As the schedule is extremely tight, deadline extensions are NOT
possible.
Organising committee
Paul Rayson (Lancaster University, UK)
Serge Sharoff (University of Leeds, UK)
Svenja Adolphs (University of Nottingham, UK)
Programme committee
Dawn Archer (University of Central Lancashire, UK)
Timothy Baldwin (University of Melbourne, Australia)
Francis Bond (NTT Communication Science Laboratories, Japan)
Key-Sun Choi (KAIST, Korea)
Béatrice Daille (University of Nantes, France)
Sylviane Granger (Université catholique de Louvain, Belgium)
Chikara Hashimoto (Kyoto University, Japan)
Ulrich Heid (Universität Stuttgart, Germany)
Laura Löfberg (University of Tampere, Finland)
Anke Lüdeling (Humboldt-Universität zu Berlin, Germany)
Olga Mudraya (Lancaster University, UK)
Kyonghee Paik (ATR Spoken Language Translation Research Laboratories, Japan)
Scott Piao (Lancaster University, UK)
Norbert Schmitt (University of Nottingham, UK)
Further information
Workshop web page
http://ucrel.lancs.ac.uk/EACL06MWEmc/
Conference web page
http://eacl06.itc.it/
EACL 2006 Workshops site
http://www.science.uva.nl/~mdr/EACL2006Workshops/
Contact information
Dr Paul Rayson
Computing Department, Infolab21, South Drive, Lancaster University, Lancaster, LA1 4WA, UK.
Tel: +44 1524 510357
Fax: +44 1524 510492
Email: paul-at-comp.lancs.ac.uk