UCREL Semantic Analysis System (USAS)


USAS Home Page | English tagger demo | pymusas | Projects | People | Publications | Wmatrix


Top level codes

Semantic Tagger Framework

The UCREL semantic analysis system is a framework for undertaking the automatic semantic analysis of text. The framework has been designed and used across a number of research projects and this page collects together various pointers to those projects and publications produced since 1990. Originally developed in C for English only by Paul Rayson, subsequent versions of the multilingual semantic tagger have been created in Java by Scott Piao, and then by Andrew Moore in Python (pymusas). Pymusas is an open source version of the semantic tagger under development from 2021 onwards and full details of the progress, methods and usage can be seen in the GitHub repository. Currently, the English tagger (C version) is also available in Wmatrix version 5, and the Chinese, Dutch, Finnish, French, Italian, Portuguese, Spanish, and Welsh semantic taggers from pymusas are available in Wmatrix version 6.

The semantic tagset used by USAS was originally loosely based on Tom McArthur's Longman Lexicon of Contemporary English (McArthur, 1981). It has a multi-tier structure with 21 major discourse fields (shown here on the right), subdivided, and with the possibility of further fine-grained subdivision in certain cases. We have written an introduction to the USAS category system (PDF file) with examples of prototypical words and multi-word units in each semantic field.

The full tagset is available on-line in plain text form and formatted on one page in PDF. We also have a list of the full descriptive labels of the semantic subcategories.

A visual representation showing the USAS tagset heirarchy is now on-line, along with those for the Louw-Nida model and the Hallig/Von Wartburg/Schmidt/Wilson Model.

Multilingual extension of Semantic Tagger Framework for other languages

Following the research in the Benedict project to extend the system to Finnish, and that in the ASSIST project for Russian, beginning in 2013, the USAS framework was extended to cover many more languages including: Chinese, Dutch, Italian, Portuguese, Spanish and Malay. The Java software framework developed in the Benedict and ASSIST projects was modified to accommodate these languages, and semantic lexicons were compiled for them by automatically "translating" the English semantic lexicon entries, with some manual improvement where possible. Due to the inevitable ambiguity of translations and part-of-speech correspondence across and between languages, the automatically translated lexicons contain errors, which need to be cleaned manually. The Java software framework is no longer being supported, but many of the taggers for languages listed below are now available in the open source pymusas tagger.

Please get in touch with Paul Rayson if you would like to be involved in further improvements of the tools or the addition of new languages. In order to reference this further development of the multilingual USAS tagger, please cite our paper at NAACL-HLT 2015 which describes the initial bootstrapping method for six languages (Chinese, Dutch, Italian, Portuguese, Spanish, Malay):

Piao, S., Bianchi, F., Dayrell, C., D'Egidio, A. and Rayson, P. (2015). Development of the multilingual semantic annotation system. In proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics - Human Language Technologies (NAACL HLT 2015), Denver, Colorado, United States, pp. 1268-1274. PDF version (Poster: PDF version)

In 2015/16, we extended this work to cover twelve languages, as reported in the LREC 2016 paper:

Scott Piao, Paul Rayson, Dawn Archer, Francesca Bianchi, Carmen Dayrell, Mahmoud El-Haj, Ricardo-María Jiménez, Dawn Knight, Michal Kren, Laura Löfberg, Rao Muhammad Adeel Nawab, Jawad Shafi, Phoey Lee Teh and Olga Mudraya. (2016) Lexical Coverage Evaluation of Large-scale Multilingual Semantic Lexicons for Twelve Languages. In proceedings of the 10th edition of the Language Resources and Evaluation Conference (LREC2016), Portoroz, Slovenia, pp. 2614-2619. PDF version

Note: The lexical resources and other files can be downloaded from our GitHub repository (https://github.com/UCREL/Multilingual-USAS). Unless stated otherwise, the non-English semantic lexicons are licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

Chinese Semantic Tagger

The Chinese semantic tagger has been developed by incorporating the Stanford Chinese word segmenter and the Chinese POS tagger into the USAS Java framework. The Chinese semantic lexicons have been automatically generated by translating the English semantic lexicons entries using a Chinese-English Dictionary (Xiao et al., 2010) and a LDC (Linguistic Data Consortium) English-Chinese Wordlist. Due to the different Chinese POS tags used in the Stanford Chinese POS tagger and Xiao et al.'s dictionary, their POS tags are mapped into a simplified common tagset to be used internally by the software system. The Chinese lexicon also employs a set of extended kinship semantic tags designed by Qian and Piao (2009). The full tagset is available in Chinese (txt, docx, pdf). We are grateful for the assistance of Dr Richard Xiao (Lancaster University, UK) and Qian Yufang (Zhejiang University of Media and Communications, China) with this research. Currently the Chinese single word and multi-word unit semantic lexicons contain over 64,000 and over 19,000 entries respectively. As automatically generated lexicons, they contain errors.

Dutch Semantic Tagger

The Dutch semantic tagger has been developed using a similar process to that of the Italian semantic tagger, using the Dutch version of TreeTagger. The Dutch lexicon has been compiled by translating the English semantic lexicon entries using a Dutch-English dictionary developed by (Tiberius and Schoonheim, 2014). As the Dictionary and TreeTagger use different POS tagsets, they are both mapped into a simplified common tagset to be used by the software system. Currently Dutch semantic single word lexicon contains 4,203 entries. We are grateful for the assistance of Dr. Carole Tiberius (INL, Netherlands) with this research.

Finnish Semantic Tagger

In the Benedict project, we developed the Finnish Semantic Tagger as the first non-English tagger. Laura Löfberg has continued to update the semantic lexical resources during and after her Ph.D. at Lancaster University. Laura's final PhD thesis (2017) is also available and describes the process of designing, creating, and evaluating the Finnish lexicons: DOI: 10.17635/lancaster/thesis/3 The Finnish semantic lexicon can be accessed here (download as UTF-8): Single word lexicon. The tagset has also been translated by Laura.

French Semantic Tagger

The French semantic tagger itself is under development, and we have translated the semantic tagset into French. The French tagger is being developed in collaboration with Michael Gauthier (Université Lumière Lyon 2, France), Emilie L'Hôte (Paris Diderot University, France), Julien Perrez, Pauline Heyvaert (Université de Liège, Belgium), Min Reuchamps (Université catholique de Louvain, Belgium), and Verena Weiland (Heidelberg, Germany).

Indonesian Semantic Tagger

A first release of the Indonesian semantic tagger has now been developed, and we began with a translation of the semantic tagset into Indonesian. The translation was carried out by Muchamad Sholakhuddin Al Fajri, Lancaster University. Subsequently, we have collaborated with Prihantoro (previously a PhD student at Lancaster University) to link the TreeTagger for Indonesian to the pymusas tagger.

Italian Semantic Tagger

The Italian semantic tagger has been developed in collaboration with Dr Francesca Bianchi (Dip. di Studi Umanistici, Universita del Salento, Italy) and Prof. Elena Semino (Dept. of Linguistics and English Language, Lancaster University, UK). The original Java software framework was modified by incorporating the TreeTagger Italian POS tagger and in pymusas we use the spaCy pipeline for POS tagging and lemmatisation. The English semantic lexicon entries have been automatically translated into Italian counterparts using FreeLang and other English-Italian Dictionaries with the help of Italian native speakers. Although some lexicon entries were manually checked, most of the entries were automatically generated and therefore they inevitably contain errors, which need to be cleared manually in future. The full tagset is available in Italian (doc, pdf). Currently, there are two Italian semantic lexicons: single word lexicon (over 20,400 entries) and multi-word lexicon (over 4,100 entries which were manually checked).

Malay Semantic Tagger

Via a small grant between Lancaster University and Sunway University, we began developing a Malay Semantic Tagger. A lexicon has been created and reported in the LREC2016 paper above. Thanks to Phoey Lee Teh, a Malay version of the semantic tagset is available.

Portuguese Semantic Tagger

The Portuguese semantic tagger was developed using a similar process to that of the Italian semantic tagger, using the Portuguese TreeTagger (in the Java version) and the spaCy pipeline in pymusas. The Portuguese lexicons have been compiled by translating the English semantic lexicon entries using a Portuguese-English dictionary developed by Davies and Preto-Bay (2007) and FreeLang Portuguese-English bilingual lexicon. A small section of the lexicons were manually checked, but most of the lexicon entries were automatically generated and therefore contain errors, which need to be cleaned manually in the future. As the POS tagger and lexical resources use different POS tagsets, these tagsets are mapped into a simplified common tagset to be used by the software. Currently, there are two Portuguese semantic lexicons: single word lexicon (over 13,900 entries) and multi-word lexicon (over 1,780 entries). The Portuguese lexicons were created with the help of Carmen Dayrell (CASS, Lancaster University, UK).

Russian Semantic Tagger

The Russian lexicon and MWE list developed in the ASSIST project are available for download here. We provide a zip archive containing three files: a single word lexicon, a proper name lexicon and a MWE lexicon. The current versions (14th December 2006) contain 13,153 single words, 4,444 proper names, and 713 MWE templates. For each word in the files, we include a part-of-speech tag based on the mystem tagset for Russian and a list of possible semantic tags, all of which have been manually checked. You can also see the USAS semantic tagset in Russian as a two page PDF and text file. For more details about the Russian resources, see Mudraya et al (2006).

Spanish Semantic Tagger

The Spanish semantic tagger is in an early stage of development, following a similar process to that of the Italian semantic tagger, using the Spanish TreeTagger (for the Java version). Currently Spanish tagger has a single-word semantic lexicon compiled by translating the English semantic lexicon entries using a Spanish-English dictionary compiled by Mark Davies (2006). Initially generated by an automatic process, the lexicon contains errors and it requires further cleaning manually in the future. Because the POS tagger and the dictionary employ different POS tagset, these tagsets are mapped into a simplified common POS tagset to be used by the software. The Spanish lexicon is being created in collaboration with Ricardo-María Jiménez, faculty member of UICbarcelona (Universitat Internacional de Cataluña, Barcelona, Spain). PhD student Hugo Sanjurjo González (University of León, Spain) also contributed to editing of new lexicon entries while he was a visitor to Lancaster University in 2016. During his visit to Lancaster in 2017, PhD student Carlos Herrero Zorita (Autonomous University of Madrid) contributed the first version of the MWE list and added a large number of entries to the single word list including modals and named entities. The Spanish semantic lexicons can be accessed here (download as UTF-8): (a) Single word lexicon (b) MWE lexicon

Swedish Semantic Tagger

The Swedish semantic tagger is under development, Lisa Sjösten has translated the semantic tagset into Swedish, and Maria Nääs has manually checked over 4,000 lexicon entries. The Swedish tagger was developed in collaboration with Anna Gustafsson and Johan Frid (Lund University, Sweden). The Swedish semantic lexicon can be accessed here: single word lexicon.

Turkish Semantic Tagger

The Turkish semantic tagger itself has not been developed yet, but we have a translation of the semantic tagset into Turkish. The translation was carried out by Duygu Candarli from Manchester Institute of Education, The University of Manchester.

Urdu Semantic Tagger

The Urdu semantic tagger is under development by Jawad Shafi, a previous PhD student at Lancaster University from COMSATS Institute of Information Technology, Lahore, Pakistan. The Urdu semantic lexicon can be accessed here (download as UTF-8): Single word lexicon. The tagset has also been translated and there is a mapping between the Urdu version of the tags themselves and the English version.

Welsh Semantic Tagger

The Welsh semantic tagger was developed in the CorCenCC project, and we have a translation of the semantic tagset into Welsh, carried out by colleagues on the CorCenCC project. The Java version of the Welsh semantic tagger by Scott Piao was released on the CorCenCC GitHub repository. We also have a Welsh semantic tagger available as part of pymusas.

Funded projects

The software and linguistic resources underpinning the semantic analysis have been designed and produced during five projects:
The ACASD and ACAMRIT projects led to the initial design and implementation of the tools for English and applied them in the area of interview transcripts. The REVERE project applied the tools in the domain of software engineering documentation using a web front end called Wmatrix. In Benedict, we re-implemented the English semantic tagging (EST) tool in Java, and improved the linguistic resources in the tool. In addition we developed a Finnish Semantic Tagging (FST) tool. In the ASSIST project collaborating with the Universityy of Leeds, we extended the existing USAS framework to construct a Russian Semantic Tagger (RST). In 2013, the UCREL research centre funded initial development of the Italian, Dutch and Chinese lexicons. In 2015, a Lancaster University - Sunway University small grant funded initial exploration work on a Malay semantic tagger. Starting in 2016, with funding from the CorCenCC project with Cardiff, Swansea and Bangor Universities, we developed the Welsh semantic tagger. From 2021, via Wmatrix licences, the UCREL research centre funded the development of the Python open source version, pymusas.

People

Andrew Wilson was the RA in Linguistics on the first two projects and
Paul Rayson was the RA in Computing on all five projects. Scott Piao and Dawn Archer were the RAs on the Benedict project. Olga Mudraya was the RA in Linguistics and Scott Piao was the Computing RA on the Assist project. Scott Piao is the Computing RA on the initial development of the Italian, Dutch, Chinese and other taggers and on the CorCenCC project. Andrew Moore is the lead developer of pymusas. The grant holders and supervisors were Roger Garside (Computing), Geoff Leech (Linguistics) and Jenny Thomas (Linguistics, now at Bangor). Tony McEnery was the principal investigator for Benedict. For ASSIST, Roger Garside, Tony McEnery, Andrew Wilson and Paul Rayson were the grant holders.

Publications describing the system (or extensions of the system)

  1. Wilson, A. and Rayson, P. (1993). Automatic Content Analysis of Spoken Discourse. In: C. Souter and E. Atwell (eds), Corpus Based Computational Linguistics. Amsterdam: Rodopi. pp215-226 (text)
  2. Wilson, A. (1993). Towards an Integration of Content Analysis and Discourse Analysis: The Automatic Linkage of Key Relations in Text. UCREL Technical Paper 3, Linguistics Department, Lancaster University. PDF version
  3. Rayson, P., and Wilson, A. (1996). The ACAMRIT semantic tagging system: progress report. In L. J. Evett, and T. G. Rose (eds) Language Engineering for Document Analysis and Recognition, LEDAR, AISB96 Workshop proceedings, pp 13-20. Brighton, England. Faculty of Engineering and Computing, Nottingham Trent University, UK. ISBN 0 905 488628 PDF version
  4. Wilson, A. and Thomas, J.A. (1997) Semantic annotation, in Garside, R., Leech, G., and McEnery, A. (eds.) Corpus Annotation: Linguistic Information from Computer Text Corpora. Longman, London, pp. 53-65.
  5. Garside, R., and Rayson, P. (1997). Higher-level annotation tools. In. R. Garside, G. Leech, and A. McEnery (eds.) Corpus Annotation: Linguistic Information from Computer Text Corpora. Longman, London. pp 179 - 193.
  6. Paul Rayson (2002). USAS: UCREL semantic analysis system. Invited talk at Daito Bunka University, Tokyo, Japan. February 2002. (HTML slides)
  7. Dawn Archer, Andrew Wilson, Paul Rayson (2002). Introduction to the USAS category system. Benedict project report, October 2002. (PDF version)
  8. Dawn Archer, Tony McEnery, Paul Rayson, Andrew Hardie (2003). Developing an automated semantic analysis system for Early Modern English. In Dawn Archer, Paul Rayson, Andrew Wilson and Tony McEnery (eds.) Proceedings of the Corpus Linguistics 2003 conference. UCREL technical paper number 16. UCREL, Lancaster University, pp. 22 - 31. PDF version
  9. Laura Löfberg, Dawn Archer, Scott Piao, Paul Rayson, Tony McEnery, Krista Varantola, Jukka-Pekka Juntunen (2003). Porting an English semantic tagger to the Finnish language. In Dawn Archer, Paul Rayson, Andrew Wilson and Tony McEnery (eds.) Proceedings of the Corpus Linguistics 2003 conference. UCREL technical paper number 16. UCREL, Lancaster University, pp. 457 - 464. PDF version
  10. Scott S. L. Piao, Paul Rayson, Dawn Archer, Andrew Wilson and Tony McEnery (2003). Extracting Multiword Expressions with a Semantic Tagger. In proceedings of the Workshop on Multiword Expressions: Analysis, Acquisition and Treatment, at ACL 2003, 41st Annual Meeting of the Association for Computational Linguistics, Sapporo, Japan, July 12, 2003, pp. 49-56. PDF version
  11. Piao, Scott S. L., Paul Rayson, Dawn Archer, Tony McEnery (2004). Evaluating Lexical Resources for A Semantic Tagger. In proceedings of 4th International Conference on Language Resources and Evaluation (LREC 2004), May 2004, Lisbon, Portugal, Volume II, pp. 499-502. ISBN 2-9517408-1-6. PDF version
  12. Rayson, P., Archer, D., Piao, S. L., McEnery, T. (2004). The UCREL semantic analysis system. In proceedings of the workshop on Beyond Named Entity Recognition Semantic labelling for NLP tasks in association with 4th International Conference on Language Resources and Evaluation (LREC 2004), 25th May 2004, Lisbon, Portugal, pp. 7-12. PDF version
  13. Archer, D., Rayson, P., Piao, S., McEnery, T. (2004). Comparing the UCREL Semantic Annotation Scheme with Lexicographical Taxonomies. In Williams G. and Vessier S. (eds.) Proceedings of the 11th EURALEX (European Association for Lexicography) International Congress (Euralex 2004), Lorient, France, 6-10 July 2004. Université de Bretagne Sud. Volume III, pp. 817-827. ISBN 2-9522-4570-3. PDF version
  14. Paul Rayson, Scott Piao, Dawn Archer (2004). Modern and Historical Aspects of the UCREL Semantic Analysis System. Invited talk at the University of Sheffield, UK, 16th November 2004. (PDF versionslides)
  15. Rayson, P. (2005) Right from the word go: identifying multi-word-expressions for semantic tagging. Invited talk at BAAL Corpus Linguistics SIG / OTA Workshop: Identifying and Researching Multi-Word Units. Thursday 21st April 2005, Oxford University Computing Services. (PDF versionslides)
  16. Scott S.L. Piao, Dawn Archer, Olga Mudraya, Paul Rayson, Roger Garside, Tony McEnery, Andrew Wilson (2005) A Large Semantic Lexicon for Corpus Annotation. In proceedings of the Corpus Linguistics 2005 conference, July 14-17, Birmingham, UK. Proceedings from the Corpus Linguistics Conference Series on-line e-journal, Vol. 1, no. 1, ISSN 1747-9398. PDF version
  17. Piao, S., Rayson, P., Archer, D., McEnery, T. (2005) Comparing and combining a semantic tagger and a statistical tool for MWE extraction. Computer Speech and Language, (Special issue on Multiword expressions), Volume 19, issue 4, pp. 378 - 397, Elsevier. doi:10.1016/j.csl.2004.11.002
  18. Mudraya, O., Babych, B., Piao, S., Rayson, P., Wilson, A. (2006). Developing a Russian semantic tagger for automatic semantic annotation. In proceedings of Corpus Linguistics 2006, St. Petersburg, from 10-14 October 2006. English PDF version Russian PDF version (slides)
  19. Qian, Yufang and Scott Piao (2009). The Development of A Semantic Annotation Scheme for Chinese Kinship. Corpora, Vol. 4 (2), Edinburgh University Press. pp. 189-208.
  20. Piao, S., Bianchi, F., Dayrell, C., D'Egidio, A. and Rayson, P. (2015). Development of the multilingual semantic annotation system. In proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics - Human Language Technologies (NAACL HLT 2015), Denver, Colorado, United States, pp. 1268-1274. PDF version
  21. Scott Piao, Paul Rayson, Dawn Archer, Francesca Bianchi, Carmen Dayrell, Mahmoud El-Haj, Ricardo-María Jiménez, Dawn Knight, Michal Kren, Laura Löfberg, Rao Muhammad Adeel Nawab, Jawad Shafi, Phoey Lee Teh and Olga Mudraya. (2016) Lexical Coverage Evaluation of Large-scale Multilingual Semantic Lexicons for Twelve Languages. In proceedings of the 10th edition of the Language Resources and Evaluation Conference (LREC2016), Portoroz, Slovenia, pp. 2614-2619. PDF version

Publications describing applications of the system

  1. Wilson, A. and Leech, G.N. (1993). Automatic Content Analysis and the Stylistic Analysis of Prose Literature. Revue: Informatique et Statistique dans les Sciences Humaines 29: 219-234.
  2. Thomas, J., and Wilson, A. (1996). Methodologies for studying a corpus of doctor-patient interaction. In J. Thomas and M. Short (eds) Using corpora for language research. Longman, London, pp 92-109.
  3. Rayson, P., Garside, R., and Sawyer, P. (1999). Recovering Legacy Requirements. In Proceedings of REFSQ'99 Fifth International Workshop on Requirements Engineering: Foundations of Software Quality, June 14-15 1999, Heidelberg, Germany. Published by University of Namur, pp. 49-54. ISBN 2 87037 307 4. PDF version
  4. Rayson, P., Garside, R., and Sawyer, P. (2000). Assisting requirements engineering with semantic document analysis. In Proceedings of Content-based multimedia information access RIAO 2000 (Recherche d'Informations Assistie par Ordinateur, Computer-Assisted Information Retrieval) International Conference, College de France, Paris, France, April 12-14, 2000. C.I.D., Paris, pp. 1363 - 1371. ISBN 2-905450-07-X PDF version
  5. Rayson, P., Emmet, L., Garside, R., and Sawyer, P. (2000). The REVERE Project: Experiments with the application of probabilistic NLP to Systems Engineering. In proceedings of 5th International Conference on Applications of Natural Language to Information Systems (NLDB'2000). Versailles, France, June 28-30th, 2000. PDF version
  6. Rayson, P., Garside, R., and Sawyer, P. (2000). Assisting Requirements Recovery from Legacy Documents. In Henderson, P. (ed.) Systems Engineering for Business Process Change: collected papers from the EPSRC research programme. Springer-Verlag, London, pp. 251 - 263. ISBN 1-85233-2220 PDF version
  7. Barbara Lewandowska-Tomaszczyk, Michael Oakes & Paul Rayson (2001). Annotated Corpora for Assistance with English-Polish Translation. Paper presented at Corpus Linguistics 2001, Lancaster University, UK, March 30-April 2, 2001. PDF version
  8. S. Sharoff, P. Rayson, O. Mudraya, A. Wilson and T. McEnery (2004). A tool for assisting translators using automatic semantic annotation. Presented at Corpus Use and Learning to Translate (CULT-BCN) Barcelona, January 22nd-24th 2004.
  9. Marilyn Deegan, Harold Short, Dawn Archer, Paul Baker, Tony McEnery, Paul Rayson (2004) Computational Linguistics Meets Metadata, or the Automatic Extraction of Key Words from Full Text Content. RLG Diginews, Vol. 8, No. 2. ISSN 1093-5371.
  10. Jones, M., Rayson, P. and Leech, G. (2004) Key category analysis of a spoken corpus for EAP. Presented at The 2nd Inter-Varietal Applied Corpus Studies (IVACS) International Conference on "Analyzing Discourse in Context" The Graduate School of Education, Queen�s University, Belfast, Northern Ireland, 25 - 26 June, 2004. PDF version
  11. L�fberg L, Juntunen J-P, Nykanen A, Varantola K, Rayson P, Archer D. (2004). Using a semantic tagger as dictionary search tool. In Williams G. and Vessier S. (eds.) Proceedings of the 11th EURALEX (European Association for Lexicography) International Congress (Euralex 2004), Lorient, France, 6-10 July 2004. Université de Bretagne Sud. Volume I, pp. 127-134. ISBN 2-9522-4570-3.
  12. Archer, D. and Rayson, P. (2004) Using an historical semantic tagger as a diagnostic tool for variation in spelling. Presented at Thirteenth International Conference on English Historical Linguistics (ICEHL 13) University of Vienna, Austria 23-29 August, 2004.
  13. Sharoff, S., Babych, B., Rayson, P., Mudraya, P. and Piao, S. (2006) ASSIST: Automated Semantic Assistance for Translators. In companion proceedings to the 11th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2006), Trento, Italy, April 3-7, 2006, pp. 139 - 142. ISBN 1-932432-60-4. PDF version
  14. Piao, S. L., Rayson, P., Mudraya, O., Wilson, A. and Garside, R. (2006) Measuring MWE compositionality using semantic annotation. In proceedings of COLING/ACL workshop on Multiword Expressions: Identifying and Exploiting Underlying Properties, July 23, 2006, Sydney, Australia. PDF version (Download data for human ratings)
  15. Andrew Wilson, Olga Moudraia (2006) Quantitative or Qualitative Content Analysis? Experiences from a cross-cultural comparison of female students' attitudes to shoe fashions in Germany, Poland and Russia. In Andrew Wilson, Paul Rayson and Dawn Archer (eds.) Corpus Linguistics around the world. Rodopi, Amsterdam.
  16. For more recent applications of the English Semantic Tagger, see the list on the Wmatrix website