UCREL Corpus Research Seminar

Exploring and classifying the Arabic copula and auxiliary kāna via enhanced part-of-speech tagging

Andrew Hardie¹ & Wessam Ibrahim²

¹CASS, Lancaster University ²Tanta University

Arabic syntax is understudied relative to the language's famously complex morphology - both generally and from a corpus-based perspective. The copula kāna, 'be', functions additionally as an auxiliary, creating periphrastic tense-aspect constructions; but the literature on these functions of kāna is far from exhaustive. To analyse kāna within the million-word Leeds Corpus of Contemporary Arabic, part-of-speech tagging (via a newly-enhanced system) is applied to disambiguate copula and auxiliary at a high rate of accuracy. Concordances of both are extracted, and 10% samples of each (499 instances of copula kāna, 387 of auxiliary kāna) are manually analysed to identify surface-level grammatical patterns and meanings. This raw analysis is then systematised according to the more general patterns' main parameters of variation; special descriptions are developed for specific apparently fixed-form expressions (including two phraseologies which afford expression of verbal and adjectival modality). Overall substantial new detail, not mentioned in existing reference grammars, is discovered (e.g. the great predominance of the past imperfect construction over other uses of auxiliary kāna); there exists notable potential for these corpus-based findings to inform and enhance not only grammatical descriptions, but also pedagogy of Arabic as a first or second/foreign language.

UCREL Corpus Research Seminar

University Centre for Computer Corpus Research on Language

Computing & Communications | Linguistics and English Language

Exploring and classifying the Arabic copula and auxiliary kāna via enhanced part-of-speech tagging

Andrew Hardie¹ & Wessam Ibrahim²

¹CASS, Lancaster University ²Tanta University

Week 25 2018/2019

Thursday 16th May 2019
3:00-4:00pm

CHC - Charles Carter A15

UCREL Corpus Research Seminar

University Centre for Computer Corpus Research on Language

Computing & Communications | Linguistics and English Language

Exploring and classifying the Arabic copula and auxiliary kāna via enhanced part-of-speech tagging

Andrew Hardie1 & Wessam Ibrahim2

1CASS, Lancaster University 2Tanta University

Week 25 2018/2019

Thursday 16th May 20193:00-4:00pm

CHC - Charles Carter A15

Andrew Hardie¹ & Wessam Ibrahim²

¹CASS, Lancaster University ²Tanta University

Thursday 16th May 2019
3:00-4:00pm