UCREL research centre

11th Teaching and Language Corpora Conference

Lancaster University, UK – 20th to 23rd July 2014


Pre-conference workshops

The main conference will be preceded by a workshop day on Sunday 20th July.

The day is organised as two half-day workshop sessions, with a choice of sessions in the morning and in the afternoon, as per previous TaLC conferences. All workshops will be practical, and in some cases laboratory-based, sessions.

Registering for the workshop day

You should sign up for the workshop day by selecting the appropriate option when you register. It is possible, if you wish, to register for the workshop day without registering for the main conference. After you have registered, you will be given the opportunity to select your morning option and afternoon option for the workshop day.

Places may be limited on some workshops; we will post updates here if space on any particular workshop is running low.

Morning workshops:

Afternoon workshops:

Workshop descriptions

An introduction to working with written and spoken corpus data

Led by: Chris Tribble and Guy Aston

Corpus applications in language education are often associated with large scale corpus projects such as the British National Corpus (2001). However, while these large corpora have been invaluable for the elaboration of lexicographic and grammatical accounts of language, they have been found problematic for many language learning and language teaching applications. A response to this concern can be found in the development of small or specialist corpora, and their exploitation for pedagogic purposes.

In this workshop, you will have the opportunity to develop your own pedagogic corpus and to develop learning / teaching materials for classroom purposes. No previous experience of classroom applications of corpora is required. Participants should, ideally, bring with them a collection of electronic texts which can be used as a micro corpus. By the end of the 3 hour workshop, participants will be able to use WordSmith or AntConc to generate wordlists, ngram lists and edited concordances which can be used as the basis for classroom materials.

Click here for a full workshop descirption.

Analysing English vocabulary with the New General Service List: Pedagogical applications

Led by: Vaclav Brezina and Dana Gablasova

This practical workshop explores the development and use of pedagogical wordlists – lists of vocabulary intended for teaching and learning purposes. In particular, the focus will be on the New General Service List (new-GSL). The new-GSL is a list of 2,500 lemmas based on analysis of four language corpora with a total size of almost 13 billion running words. The workshop will cover the following topics:

  • Different notions of a “word” (types, lemmas, word families, lexemes)
  • Identifying the core vocabulary
  • Reconceptualising lexical coverage of texts
  • Pedagogical materials development
  • Online vocabulary analysis tool

Click here for a full workshop descirption.

Unix for corpus-users: a beginner's guide

Led by: John Williams

The Unix command line offers a range of powerful, customizable commands and utilities that can greatly enhance the corpus user's ability to compile and manipulate corpora. This workshop is for corpus users with little or no knowledge of the Unix command line who would like to extend their repertoire of searching, sorting, and synthesizing techniques.

The workshop will introduce some simple Unix programming techniques, including a script to batch-convert Word docs and pdfs to corpus-friendly plain text files (.txt). Following this, the main task of the workshop will be to build up a reusable and customizable command to compile an ordered frequency list of the words in the corpus. By the end of the workshop, even the most nervous participants should be able to make sense of commands such as:

cat $(ls) | sed 's/[[:punct:]]/ /g' | sed 's/[[:space:]]/\n/g' | grep '^[A-Za-z]' | tr "[A-Z]" "[a-z]" | sort | uniq -c | sort -nr

Click here for a full workshop descirption.

Helping L2 learners and translators find L1-L2 equivalence with the help of corpora

Led by: Ana Frankenberg-Garcia

Language learners have always relied on their first or native language (L1) in order to help them communicate in the second language (L2). Seen from this perspective, one of the most important aids for L2 learners is the bilingual L1-L2 dictionary. However, bilingual dictionaries traditionally focus on single words, and do not usually provide much information beyond that. This half-day hands-on workshop aims to show how information regarding L1-L2 equivalence beyond the level of the isolated word can be obtained by navigating through monolingual corpora in two different languages.

The starting point for the workshop is a brief explanation of the concept of collocation and of how different languages do not always combine words in the same way, comparing separate L1 and L2 corpora. Next, for the practical, hands-on part of the workshop, participants will be guided through using the Sketch Engine to obtain word sketches, i.e., automatic, corpus-based summaries of a word's grammatical and collocational behaviour, and will be shown how to deal with bilingual word sketches.

Click here for a full workshop descirption.

Readability and vocabulary management

Led by: George Weir and Laurence Anthony

Readability and vocabulary management are important issues in language teaching, materials writing, and documentation.

This practical workshop will explore the issue of vocabulary management through the concept of readability. Although readability estimates developed in the 1900s can still be found in current word processors and learner support applications, we will argue that such 'traditional' metrics have limited use and may be usefully replaced by more sophisticated techniques that are based on current language technologies. A range of software tools will be introduced in hands-on computer exercises, to illustrate different methods for estimating readability and managing vocabulary level. Discussion sessions will allow attendees to consider the scope and benefits of such techniques in their own areas of interest.

Click here for a full workshop descirption.

Keyword and collocation statistics: under the hood of CQPweb

Led by: Stefan Evert and Andrew Hardie

Many corpus analysis programs perform certain statistical calculations “behind the scenes”. The most common such calculations are for keywords and collocations, both of which are based on the calculation of an effect size or a statistical significance test statistic.

The aim of this workshop is to give participants a behind-the-scenes understanding of how these statistics work – taking as our example the CQPweb system's approach to the calculation of keyword and collocation measures.

No knowledge of statistics will be assumed. Instead, we will work from the ground up, introducing the notions of a contingency table, significance testing, and effect size measures. Participants will learn to work through these analyses by hand before seeing how they are implemented under the hood in CQPweb. At the end of the workshop, participants will have a firmer and deeper understanding of the procedures upon which not only CQPweb but many other corpus tools rely.

Click here for a full workshop descirption.


This page last modified on Tuesday 14 June 2022 at 8:38 pm.