A method for the annotation of metaphors in corpora

Elena Semino
Department of Linguistics and Modern English Language
Lancaster University
Lancaster, LA1 4YT
e.semino@lancaster.ac.uk

Gerard Steen
Department of English
Free University,
Amsterdam
GJ.STEEN@hetnet.nl

This paper proposes a method for the annotation (and quantification) of metaphors in electronic data. This method derives from the five-step procedure for metaphor analysis outlined in the paper by Steen and Semino (see Steen 1999 and Semino et al. (in press)), and is based on the typology of metaphorical patterns in language presented in Crisp et al. (in press). The paper by Steen and Semino reports on the development of a reliable procedure for the identification of metaphorically used open class words in authentic texts. The application of such a procedure to electronic data can lead to the insertion of tags for metaphoricity at the level of individual lexical items. As well as indicating that a particular word can be analysed as a linguistic metaphor, each tag could also include additional information. For example, it may include a percentage indicating the level of inter-judge agreement reached at the end of the reliability exercise outlined in the paper by Steen and Semino, and/or a number from a 3- or 5-point scale indicating the degree of (linguistic) conventionality of a particular metaphorical expression (given the rather major assumption that one could sensibly operationalise the notion of degree of metaphorical conventionality). Being able to count the number of metaphorically used words in a set of data can be useful, but, on its own, it is a rather crude approach to the study of metaphors in language. This is partly due to the fact that metaphorically used words can pattern with one another in a number of ways. Previous work conducted within our team (Crisp et al. in press) has resulted in the identification of a specific set of structural metaphorical patterns in language. The following extracts from Sara Maitland's novel Three Times Table exemplify the basic patterns (the words that we have analysed as metaphorical have been underlined):

(1) Rachel seemed able to absorb everything that Phoebe tried
(2) He had set his mark on them
(3) Phoebe [. . .] asked herself with a sudden rush of nostalgia.
(4) That was the stuff of melodrama.
(5) their women turned on them, snarling

Each of the first four examples corresponds to a T-unit (Fox 1987), which Crisp et al. (in press) adopt as the basic unit of discourse for the purposes of their analysis. Example (5) consists of two T-units ('their women turned on them' and 'snarling'). Example (1) contains one metaphorically used word, while each of the remaining examples contains two. Moreover, in (2) and (3) the two metaphorically used words can be said to relate to the same metaphorical source domain, but the same cannot be said for (4). Examples (2) and (3) are also different from each other, however. In (3) the two metaphorically used words stand in a modifier-modified grammatical relationship and therefore jointly refer to a single element in the text world, while this is not the case in (2). Crisp et al. (in press) use propositional analysis to describe this distinction. Finally, in (5) the cross-domain mapping realised by 'turn on' in the first T-unit, is instantiated again in the following T-unit by means of 'snarling'. Crisp et al. (in press) capture these different structural patterns by means of a typology based on four sets of binary oppositions. According to this taxonomy, examples (1) to (4) involve restricted metaphor, while (5) involves extended metaphor. In addition examples (2)-(4) contrast with (1) as follows: the metaphorical pattern in (2) is multiple metaphor, while in (1) it is singular; the metaphorical pattern in (3) is complex, while in (1) it is simple; the metaphorical pattern in (4) is mixed, while in (1) it is pure. We will spell out such oppositions and show how they can be incorporated within our annotation system at the word-tag level. This would allow for the quantification of different patterns of metaphorical structure within a corpus. Crisp et al. have shown how this can be a useful measure of stylistic complexity by comparing an extract from Maitland's novel with an extract from Salman Rushdie's The Moor's Last Sigh. Finally, we will show how such a method of analysis and annotation also allows for the possibility of counting the number of linguistically realised cross-domain conceptual mappings. Crisp et al. spell out the relationship between different types of metaphorical T-units and metaphorical mappings as follows:

Every metaphorical T-unit which is neither mixed nor extended will signal one such mapping, and so will count as one. Every metaphorical T-unit that is mixed will have assigned to it the number of the source domains which it can potentially activate. Any sequence of T-units which is extended will count as one, no matter how many members it may have. (Crisp et al. in press)

Our proposed annotation system includes tags at the level of the T-unit which incorporate information about the number of mappings involved in each case. The annotation method we propose, therefore, provides analysts with the possibility of quantifying metaphors at different levels (the word, the metaphorical pattern, the T-unit, the conceptual mapping). It is also, however, a time-consuming method, which will in practice be implemented gradually and incrementally, as resources become available and as different parts of the analysis are operationalised. Our hope is that the explicitness of the analytical procedure will inspire the production of software which will allow a partial automatisation of the tagging process.

REFERENCES

Crisp, P., Heywood, J. and G. Steen (in press) 'Identification and Analysis, Classification and Quantification', Language and Literature

Fox, B. (1987) Discourse Structure and Anaphora. Cambridge: Cambridge University Press.

Semino, E., Heywood, J. and M. Short (in press) 'Methodological Problems in the Analysis of Metaphors in a Corpus of Conversations about Cancer', Journal of Pragmatics.

Steen, G. (1999) 'From Linguistic to Conceptual Metaphor in Five Steps', in R. W. Gibbs Jr. and G. Steen (eds) Metaphor in Cognitive Linguistics, pp. 57-77. Amsterdam: John Benjamins.