Corpus framework analysis: integrating computational linguistics, corpus linguistics, and clinical psychology to analyse Reddit posts on personal recovery in bipolar disorder

Glorianna Jagfeld

Spectrum Centre for Mental Health Research, Lancaster University

The concept of personal recovery, 'a way of living a satisfying, hopeful life even with the limitations caused by the illness' (Anthony, 1993) is of particular value in bipolar disorder where symptoms often persist despite adequate treatment but has been under-researched. A recent systematic review defined the first conceptual framework for personal recovery in bipolar disorder, POETIC (Purpose & meaning, Optimism & hope, Empowerment, Tensions, Identity, Connectedness) (Jagfeld, Lobban, Marshall, et al., 2021). So far, personal recovery has only been studied in researcher-constructed environments (interviews, focus groups). Peer online support forum posts can serve as a complementary source of non-reactive data to study health beliefs and experiences.

By integrating corpus and computational linguistics and health research methods, this study analyses a corpus of public bipolar support forum posts from the discussion platform Reddit in relation to the lived experience of personal recovery. As people talk about a wide variety of topics on Reddit, selecting what is relevant presents a challenge in working with non-reactive data and led to our innovative corpus construction process. Starting from a 1B word dataset of Reddit posts by people with a self-reported bipolar disorder diagnosis (Jagfeld, Lobban, Rayson, et al., 2021), a series of automatic filtering steps involving computational linguistic methods and manual coding resulted in the 1.3M word PR-BD corpus of personal recovery-relevant posts.

To analyse the PR-BD corpus, I coded lemmas in the PR-BD corpus into the POETIC framework via concordance analysis using #LancsBox 6.0. This constitutes a novel integration of corpus and computational linguistics and deductive framework analysis, which we have named corpus framework analysis (CFA). Preliminary CFA results show that three POETIC domains featured most in discussions on Reddit: Connectedness (particularly romantic relationships and social support), Purpose & meaning (parenting, work), and Empowerment (self-management and personal responsibility).


Anthony, W. A. (1993). Recovery from mental illness: the guiding vision of the mental health system in the 1990s. Psychosocial Rehabilitation Journal, 16(4), 11-23.

Jagfeld, G., Lobban, F., Marshall, P., & Jones, S. H. (2021). Personal recovery in bipolar disorder: Systematic review and "best fit" framework synthesis of qualitative evidence - a POETIC adaptation of CHIME. Journal of Affective Disorders, 292, 375-385.

Jagfeld, G., Lobban, F., Rayson, P., & Jones, S. H. (2021). Understanding who uses Reddit: Profiling individuals with a self-reported bipolar disorder diagnosis. Proceedings of the Seventh Workshop on Computational Linguistics and Clinical Psychology: Improving Access at NAACL 2021.

Week 20 2021/2022

Thursday 24th March 2022

Microsoft Teams - request a link via email