Supporting
bilingual free-text survey
and questionnaire data analysis
In a modern consumer-led culture, obtaining and responding to qualitative feedback (i.e. often free-text comments/written feedback) is embedded in the professional practice of many walks of life. Surveys are used, for example, in staff development, professional training, product design and testing, and in various forms of service provision across the public and private sector.
News
Key Features
CorCenCC’s semantic and part of speech grammar-based categorisations of individual words and phrases
Corpus functionalities for the querying of language
Existing tools that we will draw on include those developed as part of the CorCenCC project (Corpws Cenedlaethol Cymraeg Cyfoes – The National Corpus of Contemporary Welsh). This includes CorCenCC’s semantic (i.e. meaning based categorisations of individual words and phrases) and part of speech (POS – i.e. grammar-based categorisations of individual words and phrases – e.g. nouns, verbs) taggers and tagsets for Welsh language, and corpus functionalities for the querying of language, amongst others. These tools will be integrated into a user-friendly, online interface that users can paste/upload their texts into, to search for patterns of meaning that emerge in survey responses and feedback; to see which words are most often used in relation to a given theme, place, topic; to understand what visitors particularly enjoyed about a service or attraction, and what they think could be improved.
The final version of the tool will be made freely-available and will be adaptable in terms of who can use it and when. It will contain generic analysis features that enable it to be used by any public and/or professional company and institution dealing with varying datasets of qualitative survey data and will be of relevance to academic researchers analysing and visualising survey data. The accessibility and usability of this tool will help provide a direct route to potential impact.
Surveys and questionnaires often produce a combination of quantitative and qualitative forms of data. Quantitative forms, such as rating scales (e.g. likert scale responses), multiple choice questions and rank order questions can be numerated (i.e. quantified) with ease, the analysis of which can be conducted in a systematic and often automated way. By contrast, more qualitative questions, which prompt open ended, free-text comment responses, or, in the context of the tourism and heritage sector, written feedback from exhibitions, events and/or historical sites on social media channels or websites pose a more difficult challenge for the analyst. Tackling written, text-based feedback often requires a more labour-intensive and manual approach to analysis. Compounding this challenge is where feedback is presented in both English and Welsh, as is often the case in Wales, with Wales representing the largest bilingual community in the UK. The successful analysis of bilingual data relies on the workforce having the appropriate linguistic expertise to process it.
This project aims to bridge this gap by building the novel ‘FreeTxt/TestunRhydd’ toolkit which is designed to support the analysis and visualisation of multiple forms of open-ended, free-text data in both English and Welsh. FreeTxt/TestunRhydd will draw on existing open-source bilingual corpus-based utilities and methodologies, repackaging these and taking them in a new direction so that they are relevant to new audiences/user-groups. We will work closely with project partners to co-design, co-construct and test FreeTxt/TestunRhydd to ensure that the resource is fit-for-purpose and fairly and consistently meets the needs of Welsh and English-language responses.
Project Team
Dawn Knight
Project PI, Principal Investigator- Cardiff UniversityDr. Dawn Knight is a Reader in Applied Linguistics at Cardiff University, UK. She was the Principal Investigator (PI) of the CorCenCC (National Corpus of Contemporary Welsh) project and is the Co-Principal Investigator of the Interactional Variation Online project. Dawn has expertise in corpus linguistics, discourse analysis, digital interaction and non-verbal communication and was former Chair of the British Association for Applied Linguistics (BAAL). Dawn is the PI of the FreeTxt/TestunRhydd project.
Paul Rayson
Project CI, Co-Investigator- Lancaster UniversityProfessor Paul Rayson works in the School of Computing and Communications at Lancaster University, and is Director of the UCREL interdisciplinary research centre which carries out research in corpus linguistics and natural language processing (NLP). A long-term focus of his work is semantic multilingual NLP in extreme circumstances where language is noisy e.g. in historical, learner, speech, email, txt and other CMC varieties.
Mahmoud El-Haj
Project CI, Co-Investigator- Lancaster UniversityDr. Mahmoud El-Haj, also known as Mo, is an NLP Lecturer in Computer Science at the School of Computing and Communications at Lancaster University. Mo received his PhD in Computer Science from The University of Essex working on Multi-document Summarization. His work is mainly towards Summarization, Information Extraction, Financial NLP and multilingual NLP with his work being applied to many languages including English, Arabic, Spanish, Portuguese and Welsh. He has an interest in under-resourced languages and building NLP datasets.
Ignatius Ezeani
Senior Research Associate- Lancaster UniversityDr Ignatius Ezeani is a Senior Teaching/Research Associate at Lancaster University. He is interested in the application of NLP techniques in building resources for low-resource languages including Igbo and Welsh. He works on the efficient adaption of existing NLP tools and techniques for creating task-oriented systems for low-resource languages.
Steve Morris
Senior Research Associate- Cardiff UniversitySteve Morris is an Honorary Research Fellow in Applied Linguistics at Swansea University where previously he worked as an Associate Professor in Applied Linguistics and Welsh. Together with Dr Dawn Knight and Professor Tess Fitzpatrick, he was a co-creator of the CorCenCC (National Corpus of Contemporary Welsh) project on which he was also a Co-Investigator. The interdisciplinary interface between Applied Linguistics and the Welsh Language continues to be the prime focus of his work.
Project Advisory Group
- National Trust Wales
- Cadw
Funding Acknowledgement
This project, which runs from 2022-2023, is funded by the AHRC.