Corpus Linguistics 2013

Lancaster University, UK – 22nd to 26th July 2013

Second Workshop on Arabic Corpus Linguistics (WACL-2)

Workshop in conjunction with the Corpus Linguistics 2013 conference

Monday 22nd July 2013 – Lancaster University, UK

Keynote speaker:
Claire Brierley, University of Leeds
“Natural Language Processing working together with Arabic and Islamic Studies”

Eric Atwell, University of Leeds
Andrew Hardie, Lancaster University

Call for papers

The call for papers is now closed. See the timetable below.

We invite proposals for the full-day Second Workshop on Arabic Corpus Linguistics, to be held in conjunction with the Corpus Linguistics 2013 conference. Following on from the successful first WACL in 2011, as well as the related LRE-REL event in 2012, WACL-2 will again take place at Lancaster University.

The aim of this series of workshops is to create a venue for exploring progress in the field of research into the Arabic language using corpora, from across the many areas of corpus linguistics and computational linguistics where the analysis of Arabic structure and usage is an active issue.

The scope of the workshop encompasses both (a) the design, construction and annotation of Arabic corpora, and (b) the use of corpora in research on the Arabic language – in any relevant area, including (but not limited to!) lexis and lexicography, syntax, collocation, NLP systems and analysis tools, contrastive and historical studies, stylistics, and discourse analysis. All varieties of Arabic – including the different Colloquial Arabics as well as Classical/Qur’anic and Modern Standard forms of the language – are within the workshop's purview.

Abstracts for presentations are invited on any of these areas, or on any other topic related to the study of Arabic-language corpora. Presentations either describing finished research or reporting work in progress are welcome. Submissions from postgraduate students are especially welcome.

Abstracts should be up to 600 words; presentations will be in the usual format (20 minutes for the presentation and 10 minutes for questions).

Please submit abstracts by email to Andrew Hardie (a.hardie@lancaster.ac.uk). Please use the same abstract format prescribed by the main conference – a template can be found at http://corpora.lancs.ac.uk/submission/template. Acceptable file formats are Microsoft Word .doc(x), RTF, or OpenDocument text (.odt). Please use Unicode characters for any Arabic text examples. All abstracts should be in English rather than Arabic; English will be the language of the workshop.

Please note that we will not accept for WACL-2 any abstract which has been accepted in the main CL2013 conference in verbatim form. We are happy to consider submissions arising from a research project which is also being presented at the main conference, but the content must not be identical or overlap substantially. For example, it might be appropriate to submit to WACL-2 a presentation focusing on matters of interest to Arabic specialists, while submitting an abstract of broader methodological or theoretical interest to the main conference.

Key dates:

  • Closing date for abstracts: Monday February 25th 2013 extended to Mon March 4th
  • Responses to abstract submission: before Monday March 1st 2013


Participants should register for the workshop day via the CL2013 website (this can be done in addition to, or independently of, registration for the main conference). See this page for details: http://ucrel.lancs.ac.uk/cl2013/register.php.

Registration opens on February 15th 2013.

DRAFT Timetable (subject to revision!)

9:00-9:30am The effects of speakers' gender, age, and region on overall performance of Arabic automatic speech recognition systems using the phonetically rich and balanced Modern Standard Arabic speech corpus.
Mohammad A. M. ABUSHARIAH, Majdi SAWALHA, The University of Jordan
9:30-10:00am Arabic Learner Corpus v1: a new resource for Arabic language research.
Abdullah ALFAIFI, Eric ATWELL, University of Leeds, UK
10:00-10:30am Discourse markers at the local level in NP opinion articles.
Fatima ALKOHLANI, University of Business and Technology
10:30-11:00am The design and construction of the 50 million words KSUCCA King Saud University Corpus of Classical Arabic.
Maha ALRABIAH, AbdulMalik AL-SALMAN, King Saud University, Saudi Arabia
Eric ATWELL, University of Leeds, UK
11:00-11:30am Tea and coffee break
11:30-12:00pm Converging linguistic evidence on two flavors of production: The synonymy of Arabic COME verbs.
Antti ARPPE, Dana ABDULRAHIM, University of Alberta, Canada
12:00-12:30pm arTenTen: a new, vast corpus for Arabic.
Nizar HABASH, Columbia University, USA
Adam KILGARRIFF, Lexical Computing Ltd, UK
Noam ORDAN, University of Haifa, Israel
Ryan ROTH, Columbia University, USA
Vitek SUCHOMEL, Masaryk University, CZ, and Lexical Computing Ltd, UK
12:30-1:00pm KEYNOTE PRESENTATION: Natural Language Processing working together with Arabic and Islamic Studies.
Claire BRIERLEY, School of Computing, University of Leeds, UK
1:00-2:00pm Lunch
2:00-2:30pm KALIMAT a multipurpose Arabic corpus.
Mahmoud EL-HAJ, Lancaster University, UK
Rim KOULALI, Mohammed 1 University, Morocco
2:30-3:00pm AraSAS: A semantic tagger for Arabic.
Ghada MOHAMED, University of Bahrain
Amanda POTTS, Andrew HARDIE, Lancaster University, UK
3:00-3:30pm When collocational and expressive choices are imbued with political stances and ideological opinions: A corpus-based critical discourse analysis of Islamic identity and Egyptian identity in the news media of pre-revolutionary Egypt.
Safwat A. S. MOHAMMED, Lancaster and Cairo Universities, UK and Egypt
3:30-4:00pm Tea and coffee break
4:00-4:15pm Using Subordinate Clauses as Subjects of Verbal Sentences in Modern Standard Arabic.
Ashraf ABDOU, American University in Cairo, Egypt
4:15-4:30pm Unifying linguistic annotations and ontologies for the Arabic Quran.
Noorhan ABBAS, Luluh ALDHUBAYI, Hend AL-KHALIFA, Zainab ALQASSEM, Eric ATWELL, Kais DUKES, Majdi SAWALHA, Abdul-Baquee Muhammad SHARAF, University of Leeds, UK, and King Saud University, Saudi Arabia
4:30-4:45pm Corpus-based lexicography in a language with a long lexicographical tradition: The case of Arabic.
Tressy ARTS, Karen MCNEIL, Oxford University Press, UK
4:45-5:00pm The role of large-scale Arabic corpora in the tasks of sort-out and throw-out of sensory subdivisions of the entry in the general-purpose monolingual Arabic reference works.
Sultan Nasser A. ALMUJAIWEL, King Saud University, Saudi Arabia
5:00-5:15pm A hybrid approach for prepositional phrase attachment in MSA and EA.
Rania AL-SABBAGH, Abbas BENMAMOUN, University of Illinois at Urbana-Champaign, USA
5:15-5:30pm Developing tools for Arabic corpus for researchers.
Bassam HAMMO, Faisal AL-SHARGI, Sane YAGI, Nadim OBEID, University of Jordan
5:30-5:45pm Multi-level analysis and annotation of Arabic corpora for text-to-sign-language Machine Translation.
Abdelaziz LAKHFIF, Ferhat Abbas University, Algeria
Mohammed T. LASKRI, Badji Mokhtar University, Algeria
Eric ATWELL, University of Leeds, UK
5:45-6:00pm Representation of Muslim Brotherhood in Egyptian newspapers.
Sara YOUSSEF, American University in Cairo, Egypt
6:00pm Close of WACL2 Workshop on Arabic Corpus Linguistics.

POSTER PRESENTATIONS (on show during 2 hours of breaks)

A: Annotating the Arabic Quran with a classical semantic ontology
Nora ABBAS, Eric ATWELL, University of Leeds, UK

B: Generating an Arabic Sentiment Corpus from social media.
Samah ALHAZMI, John MCNAUGHT, University of Manchester, UK

C: Towards an automatic development of Named Entities Corpus from Arabic Wikipedia.
Fahd ALOTAIBI, Mark LEE, University of Birmingham, UK

D: Linguistics features to confirm the chronological order of the Quran.
Sameer ALREHAILI, Eric ATWELL, University of Leeds, UK

E: Quran ontologies and keywords for Question Answering.
Aisha JILANI, Lee MCCLUSKEY, Di CAI, University of Huddersfield, UK

F: Unsupervised morphology learning using the Quranic Arabic Corpus.
Bilal KHALIQ, John CARROLL, University of Sussex, UK

G: Corpus based unsupervised learning of Arabic morphology.
Abdellah LAKHDARI, Hadda CHERROUN, Amar Telidji University, Algeria

H: Enriching Algerian Arabic dialects corpora.
K. MEFTOUH, Badji Mokhtar University, Algeria
S. HARRAT, Ecole Normale Superieure de Bouzareah, Algeria

I: Quranic verse similarity based on Word Sense Disambiguation.
Farah MEHBOOB, Institute of Business Administration, Pakistan

J: Development and implementation of a computational algorithm to predict the classical Arabic conjugate pattern focusing on weak verbs.
Haq NAWAZ, Mufti Ahmad ALI, Jamia Ashrafia Lahore, Pakistan

K: Arabic social media analysis for the construction and the enrichment of NLP tools.
Fatiha SADAT, University of Quebec in Montreal, Canada

L: Accelerating the processing of large corpora: using Grid Computing for lemmatizing the 176 million words Arabic Internet Corpus.
Majdi SAWALHA, University of Jordan
Eric ATWELL, University of Leeds
Mohammad A. M. ABUSHARIAH, University of Jordan

M: Early results for named entity recognition in a Haddith corpus.
Muazzam Ahmed SIDDIQUI, Mostafa El-Sayed SALEH, Abobakr BAGAIS, King Abdulaziz University, Saudi Arabia

