CFIE is a research programme exploring accounting and financial market text using natural language processing (NLP) and corpus linguistics methods.
Our work aims to understand the properties and impact of financial narratives, with particular emphasis on annual reports, preliminary earnings announcements, conference calls, and the financial media.
Our multidisciplinary team unites researchers with expertise in financial reporting and financial markets, computer science, and computational linguistics.
Key outputs include:
Research funding to date includes UK Research and Innovation via the Economic and Social Research Council (ESRC), the Financial Conduct Authority (FCA), the Financial Reporting Council (FRC), the accounting profession via the Institute of Chartered Accountants in England and Wales (ICAEW), Lancaster University, and the International Centre for Research in Accounting.
Text extraction: We develop tools supporting structured text retrieval from PDF annual reports, earnings announcements provided in HTML format, and conference calls provided as rich text files.
Strategy and business models: What are the properties of commentary on strategy and business model, and how useful is it for investors?
Annual report quality: What are the distinguishing linguistic features of high quality annual report narratives?
Performance reporting: What focus do management give to earnings- versus non-earnings-based measures of performance, and how useful are alternative performance measures (APMs)?
Preliminary earnings announcements: Are UK preliminary earnings announcements narratives useful beyond quantitative results? Is there any suggestion that management use narratives to present a biased view of financial performance?
Predicting accounting errors and manipulation: Can narrative disclosures provide clues that help to predict accounting manipulation and fraud?
Publications
Working papers and work-in-progress
Research funding
Workshops
Practitioner-focused presentations
The FRC regulates auditors, accountants and actuaries, and sets the UK’s Corporate Governance and Stewardship Codes with the aim of promoting transparency and integrity in business.
The FRC is a project partner and cofunder in our ESRC project Analysing Narrative Aspects of UK Preliminary Earnings Announcements and Annual Reports: Tools and Insights for Researchers and Regulators (contract ES/R003904/1).
We are working on several analyses including the properties of earnings announcement narratives, alternative performance measures (APMs), and strategy and business model reporting.
The FCA is the conduct regulator for UK financial markets and financial services firms, and the prudential regulator for a subset of UK financial services firms. Ongoing research is exploring how automated analysis of text can assist the FCA in its market scrutiny activities.
The PLSA works together the industry and other parties to raise standards, share best practice and support pension schemes, pension advisors and pension savers.
We worked with colleagues at the PLSA to evaluate annual reporting practices by large UK-listed companies on workforce-related matters. Annual reporting practices were assessed against the PLSA's stewardship toolkit.
Evidence indicated that while exemplars of good reporting practice exist, disclosure practices vary considerably across companies and the overall level of transparency is lower than one might expect given executives’ claims about the key role that their workforce plays in delivering long-term corporate success.
Read the final report here.
CFA UK represents around 12,000 investment professionals and comprises part of the worldwide network of member societies of the CFA Institute.
CFK UK commissioned an analysis of the link between CEO pay and long-term value creation for a sample of the largest companies listed on the London Stock Exchange.
The final report highlighted a weak link between traditional performance metrics used in executive remuneration contracts such as EPS and TSR, and proxies for long-term value creation. The evidence also suggested a weak association between CEO pay and long-term value creation.
Read the final report here.
The IR Society promotes best practice in investor relations and serves as the focal point for UK for investor relations practice and IR professionals.
Since 2015 we have provided input to the Best Annual Report category in the IR Society’s annual Best Practice Awards. We use a version of our CFIE-FRSE app to score aspects of annual reports automatically. These automatic scores serve as a cross-check on detailed manual evaluations performed by members of the IR Society’s expert judging panel.
Summary narrative features for annual reports published between 2003 and 2017 by firms listed on the London Stock Exchange’s Main Market and Alternative Investment Market.
The dataset adjusts for firm name changes to ensure time-series comparability. Fiscal year-ends are matched to Thomson Reuters Datastream.
Note: We do not publish company identifiers due to licensing restrictions. Instead, we provide details on how to match the dataset to Thomson Reuters Datastream using firm names and a SAS script.
A zip file containing the dataset and supplementary material is available to download here.
A range of wordlist resources drawn from prior work and our own research relating to features such as sentiment, forward-lookingness, risk, uncertainty, and strategy.
Wordlists available to download:
A set of annual report corpora constructed using reports published between 2003 and 2017. These corpora can be used to study the linguistic properties of UK annual reports and to identify unusual linguistic features associated with a specific report or report section.
Available UK annual report corpora include:
A zip file containing the dataset and supplementary material is available to download here.
The CFIE-FRSE annual report App for digital UK annual reports published as PDF files (submitted as individual files or in bulk for batch processing).
The App supports structured retrieval of annual report text based on the report table of contents (or PDF bookmarks where a valid table of contents cannot be detected). It also classifies annual report contents into a range of generic sections (e.g., chairman’s statements, governance statements, etc.) to facilitate cross-sectional comparisons.
Output is provided as individual .txt files and as a pooled Excel spreadsheet.
Note that the App is not recommended for structured extraction from scanned (image-based) PDF files.
The Java-based App is available on GitHub for download to your PC.
See here for an academic journal article providing a detailed discussion and validation of the App.
The tool can be adapted to process reports published in other languages and reporting regimes.
Python scripts for scraping files from EDGAR, retrieving text from specific items (e.g., Item 7, MD&A), and text preprocessing prior to analysis
Java scripts to process transcript files saved in RTF and HTML formats.
Scripts support:
Use the following resources to support corpus analysis of financial text
CONTACT