Staff replies to online patient reviews: A method for analysing a mixed corpus of (deliberately) duplicated and non-duplicated text

Craig Evans

CASS, Lancaster University

Replies to patient reviews on the website NHS Choices are usually produced at the discretion of staff at individual healthcare practices. The extent to which these replies consist of original (i.e. individually written) or duplicated (i.e. copied and pasted) elements can vary. This prevents methods such as keyword and collocation analyses being applied to a corpus of replies, where results will likely reflect text reproduction more than language use patterns. To address this problem, three subcorpora were created from an 11-million-word corpus of staff replies, based on text types linked to duplication tendencies. These are: full-text or near full-text duplicates that represent stock responses; part-text duplicates that represent adapted stock responses; and non-duplicated replies that represent unique responses. The 'duplicate content' tool in WordSmith 7 was used to identify the subcorpora; this allows texts to be filtered based on a percentage of shared word types / tokens. After filtering duplicates using different percentage thresholds, keywords for each subcorpus were identified to direct an analysis of the language use of the three text types. This presentation represents a work-in-progress, where preliminary results are used for illustrative purposes in a report that focuses on method and data.

Week 22 2017/2018

Thursday 3rd May 2018
3:00-4:00pm

Fylde D28