File lists

In section 1.3 on page 4 of the introduction chapter we wrote:

"The BNC has now been produced in two versions. The original release (Version 1.0) was distributed from 1995. The new BNC (Version 2.0), at the time of going of press, is expected to be released in the winter of 2000/1. The new release, unlike the old one, is to be available (under licence) worldwide. It has certain advantages over the old version, notably (a) improved accuracy of grammatical tagging (undertaken at Lancaster in 1995-96), and (b) correction of details of text classification, which has resulted in the transfer of about a million words from the 'imaginative' category of written texts to the 'informative' category. However, it has also been necessary to reduce the size of the BNC in the worldwide release by 69 texts, owing to the difficulty or impossibility of obtaining world rights for the distribution of those texts. For the purposes of this book, we are using the complete BNC of Version 1.0, and at the same time are making use of the improvements of Version 2.0 - including improved grammatical tagging and text category corrections. This ensures that we base our frequency lists on the best and fullest information available."

In order to make clear the contents of each section of the BNC, we list here the files in each major subcorpus as used in the book. This reflects the classification of texts at time of going to press. For any more recent changes in classification, you may wish to browse the BNC Indexer produced by David Lee. In the table below, click on the number of files in each case to see the corresponding list of files.

Written - imaginative52618,439,114
Written - informative2683 71,230,923
Spoken - demographic (conversational)153 4,214,926
Spoken - context governed (task-oriented)762 6,161,272
Spoken915 10,376,198
Written3209 89,670,037
Whole corpus4124 100,046,235