BNC2 POS-tagging Manual

Credits and acknowledgements

The authors of the manual are Geoffrey Leech and Nicholas Smith (Lancaster University).

The original grammatical tagging of the BNC (version 1.0) was undertaken by a team at UCREL, Lancaster led by Roger Garside and Geoffrey Leech. The main members of the team were: Michael Bryant, Elizabeth Eyes, Mary Hodges and Nicholas Smith. Additional members were Tom Barney, Jean Forrest, Mary Kinane and Xungfeng Xu.

The Corpus was automatically tagged by the Claws4 tagging software, originally authored by Roger Garside and Ian Marshall. The tagger was thoroughly improved and adapted to the large-scale task of tagging the BNC by Roger Garside. Michael Bryant wrote and implemented support software.

After the completion of the BNC, a phase of tagging improvement was undertaken with the funding of the Engineering and Physical Sciences Research Council (Research Grant No. GR/F 99847). The enhancement project was led by Geoffrey Leech, Roger Garside and Tony McEnery. The main objective of this improvement phase was to correct as many tagging errors as possible. For this purpose, an enhanced version of Claws4 was run over the whole corpus. In addition, a new tool was developed (the Template Tagger) for ‘patching’ the corpus in such a way as to eliminate further sets of errors by rule. This tool was developed by Michael Pacey, building on a prototype written by Steven Fligelstone. The research team working on tagging improvement was Nicholas Smith (lead researcher), Martin Wynne and Paul Baker.

In addition, work on manual tagging error correction has been undertaken by Mary Hodges and Jeremy Bateman.