"Big data" in language studies: from cargo-cult science to phantom revolution

Andrew Hardie

CASS, Lancaster University

"Big data" has recently become a buzzword in a range of academic disciplines as well as in certain quarters of business and government, although the criteria by which data counts as "big" are not always clear. The standard account is that the age of big data, which arrived approximately three to four years ago, promises revolutionary change across all the fields where it can be applied. What does this mean for the study of language and text? In this lecture, I will argue that for linguistics, work that can be labelled as using "big data" falls under two banners: that which is novel is not good, and that which is good is not novel. First, I will look at some of the fruits to date of the big data approach to language usage data. The last several years have seen numerous high-profile academic publications - albeit not in linguistics journals - purporting to implement such an approach. I will explore some examples of such studies, illustrating basic flaws which prevent them from rising above the level of pseudo-research. Second, I will develop a critical appraisal of the concept of "big data" in the textual sense: what does the term even mean in the context of language and linguistics? Thus, I will argue that the prospect of a big data revolution in linguistics is fundamentally illusory.

Week 28 2014/2015

Thursday 11th June 2015
2:00-3:00pm

Furness LT 1