In addition, when all the manifestos are published you will be able to download crosstabs of word, POS and semantic tag frequencies with log-likelihood and dispersion statistics. These are tab delimited so can be opened in your favourite spreadsheet application.
Conservatives | Labour | Liberal Democrats | Green Party | Plaid Cymru | Scottish National Party | Reform UK | |
Party website | www.conservatives.com | labour.org.uk | www.libdems.org.uk | greenparty.org.uk | www.plaid.cymru | www.snp.org | www.reformparty.uk |
Launch date | 11 June 2024 | 13 June 2024 | 10 June 2024 | 12 June 2024 | 13 June 2024 | 19 June 2024 | 17 June 2024 |
Direct link | |||||||
Local copy | |||||||
Edited text version | TXT | TXT | TXT | TXT | TXT | TXT | TXT |
Number of words (tokens) | 25,305 | 23,809 | 21,366 | 19,009 | 16,639 | 7,776 | 6,701 |
Number of unique words (types) | 4,469 | 4,078 | 3,924 | 3,646 | 3,255 | 1,832 | 1,869 |
Word frequency list | TXT | TXT | TXT | TXT | TXT | TXT | TXT |
Key word cloud | Figure 1 | Figure 3 | Figure 5 | Figure 7 | Figure 9 | Figure 11 | Figure 13 |
Key semantic tag cloud | Figure 2 | Figure 4 | Figure 6 | Figure 8 | Figure 10 | Figure 12 | Figure 14 |
As in previous elections, I will make local copies of manifestos available since they often disappear offline after the election. To prepare the text versions, I will use Adobe Reader to save as text and then manually edit (using VS Code) a small number of items such as headers, footers and page numbers to store them in pseudo-XML-style tags. In addition, I will follow the input format guidelines for CLAWS and convert n-dashes, pound signs, begin and end quotes to XML entities.
Notes:
1. The choice of which parties to cover is again based on those appearing in the BBC election debates and Newscast leaders and manifesto coverage.
2. For Labour, I used the large print PDF version (local copy PDF) because the standard PDF couldn't be converted to txt by Adobe Acrobat Reader
A note on word counts: Wmatrix counts semantically meaningful multiword expressions (MWEs) as one item, so other corpus software may well provide different counts here. These figures also depend on tokenisation in our NLP pipeline (by CLAWS).
Word frequency lists are tab delimited so you can load them in to your favourite spreadsheet program. MWEs are shown in the word frequency lists as words connected by underscore characters.
Key word and semantic tag clouds are produced by comparing the data with the BNC Written Sampler corpus. The larger the font, the higher the log-likelihood score, so larger items are more significantly overused compared to the reference corpus. See the Wmatrix main page and online tutorials if you want more details about how this works.
Figure 1: Conservatives Key Word Cloud
Figure 2: Conservatives Key Semantic Tag Cloud
Figure 3: Labour Key Word Cloud
Figure 4: Labour Key Semantic Tag Cloud
Figure 5: Liberal Democrats Key Word Cloud
Figure 6: Liberal Democrats Key Semantic Tag Cloud
Figure 7: Green Key Word Cloud
Figure 8: Green Key Semantic Tag Cloud
Figure 9: Plaid Cymru Key Word Cloud
Figure 10: Plaid Cymru Key Semantic Tag Cloud
Figure 11: SNP Key Word Cloud
Figure 12: SNP Key Semantic Tag Cloud
Figure 13: Reform UK Party Key Word Cloud
Figure 14: Reform UK Party Key Semantic Tag Cloud