Seminars from previous years are still being added, the archive is still available on the old website.
On 12th April 2003, 83.6% of Hungarians voted in support of Hungary joining the European Union. This decisive result followed a massive parliamentary discussion about the issue and guaranteed Hungary access to the EU. But what was the attitude of Hungarian MPs towards the European Union? And how was Hungarian identity shaped in discourses about EU membership? In this talk I will present the preliminary results of a corpus-assisted study of Hungarian parliamentary speeches delivered between 1998 and 2003. After a brief historical introduction I will first outline the methodological approach I adopted to sketch attitudes and identities by means of collocation analysis. I will then describe the data I employed, namely the self-collected HUNPOL corpus. Finally, using the GraphColl software, I will show how semantic and discourse prosody can highlight the Hungarian politicians' stance regarding the European Union and the status they posit for themselves in a (possibly) new political dimension.
The SAMS project (Software Architecture for Mental health Self-management) is investigating whether monitoring data from everyday computer-use activity can be used to effectively detect subtle signs of cognitive impairment that may indicate the early stages of Alzheimer's disease.
In this talk I will discuss the SAMS project, the collection of data and text form participants, and our approach to mining the text to infer cognitive health. During the SAMS project, bespoke software is used to collect data and text from participants (installed on the participants' home PCs). The collection software passively and unobtrusively collects many forms of data and text from the participants' PCs (inc. typed email and document text), which is securely logged, and later transferred to our server for analysis. The analysis consists of various data and text mining techniques to attempt to map trends and patterns in the data with clinical indicators of Alzheimer's Disease, e.g. working memory, motor control.
Tools usage within the SAMS project will also be discussed, including the development of the bespoke collection and analysis software, as well as existing tools that are re-used (Part of Speech Tagger, Semantic Tagger).
In this talk, I will present ongoing work on the development of UCREL multilingual semantic annotation system. Over the past years, the original UCREL English semantic tagger has been adjusted and extended to cover more and more languages, including Finnish, Italian, Portuguese, Chinese, Spanish, French etc. Currently, a major project CorCenCC is underway in which a Welsh semantic tagger is under development in collaboration with Welsh project partners. This tool is useful for various corpus-based research such as cross-lingual studies. I'll discuss linguistic resources involved in the development of the tool, and introduce a GUI tool which links the multilingual tagger web services and help researchers to process corpus data conveniently.
This talk will present an overview of newly available resources from the Mellon-funded Visualising English Print project (Mellon-funded, University of Wisconsin-Madison, the University of Strathclyde and the Folger Shakespeare Library) for engaging with the Text Creation Partnership texts. She will discuss some of our the curation and standardisation principles guiding the project, how we envision scholars will use our resources, and present a case study of how to use our resources to conduct an analysis of Early Modern scientific writing.
Our paper will explore the ways in which religious identities are negotiated in a setting characterised by religious diversity and proximity: Yorubaland in South West Nigeria. We will explore how interreligious relationships are discursively constructed in extensive survey data (2,819 respondents in total) collected as part of an anthropological project focussing on the coexistence of Islam, Christianity and traditional practice in Yoruba-speaking parts of southwest Nigeria: 'Knowing each other: Everyday religious encounters, social identities and tolerance in southwest Nigeria'. Corpus tools and techniques will be used to examine the 1,535 questionnaires filled in English, particularly answers to open-ended questions (our corpus). The premise is that by exploring discursive choices made by Christian and Muslim respondents in this corpus, we can gain insights into Yoruba Muslims and Christians' perception of themselves and each other and their experiences of inter-religious encounters.
Owing to the focus of the paper, two sub-corpora of the above-mentioned corpus were compiled: one with all answers by respondents of Muslim faith and another with all answers by respondents of Christian faith. We will use four-grams for each of these corpora to show how corpus-assisted investigations into phraseology have helped us gain insights into the data which traditional anthropological methods alone would not have allowed. Our findings will concern, for example, the specific boundaries our Christian and Muslim respondents draw around their religious behaviour and their shared understanding of religion.
English texts on Chinese governmental websites are often criticised for being 'Chinglish' or 'lifeless'. This project investigates how English versions of Chinese governmental websites can improve their stylistic quality. The project is a computational stylistic comparison between English texts on Chinese governmental websites and English texts on UK and US governmental websites. The approach is corpus-based and employs Biber's (1988) multidimensional analysis. A corpus (including two subcorpora) of websites had previously been downloaded using the wget-m method. Perl scripts were used to extract text content from web pages to form a txt file for each website, and word frequency lists and trigrams have also been extracted. Keyword lists for the two subcorpora have been generated based on a COCA word frequency list. Several issues remain to be dealt with before further analysis can be conducted, including: whether it is possible to separate 'real content' from purely repetitive content when data comes from web pages (such as menus, navigation, copyright); the alternatives to manual annotation when this is not a practical option given the massive size of the corpus; and how to identify which features to consider to make the comparison more significant.
This talk will provide a basic introduction to FireAnt, a freeware tool that I have co-developed with Laurence Anthony (Waseda University). FireAnt offers three main utilities: the live collection of real-time tweets; the ability to filter that (and many other kinds of) data based on user-defined parameters; and the ability to export the data in formats suitable for corpus tools, network graphing, timeseries analysis, and so forth. I'll demonstrate each of these steps in turn, and provide suggestions along the way for possible types of investigation that FireAnt has been and can be used for. There will be time at the end for questions.