Big Educational Data: any good for SLA research?

Dora Alexopoulou

University of Cambridge, Theoretical and Applied Linguistigs

The emergence of online EFL teaching platforms offering teaching and learning to students around the globe results in unprecedented amounts of learner production data: data can come from rich task sets across the proficiency spectrum and learners from a variety of linguistic, educational and cultural backgrounds. Exploiting such datasets opens important opportunities for SLA research and, in particular, linking SLA findings to second language teaching. But at the same time, such datasets have all the pitfalls of big data: a range of variables standardly controlled for in carefully designed data collections (e.g. task sets) are not considered. Access to unprecedented numbers of learners is set against lack of rich learner metadata targeted in typical data collections. In addition, the very context of production poses arbitrary constraints (e.g. word limits on writings). Last, but not least, the size of such datasets brings new challenges for extracting information and addressing the noisy aspects of the data.

Can we then use such data for SLA research, crucially, to link SLA findings to teaching second languages? I will argue that Natural Language Processing (NLP) tools can help us address many of the methodological issues and will show that we can obtain valuable information for SLA research. I will use the EF-Cambridge Open Language Database (EFCAMDAT) as an example of a big data resource. I will focus on the the developmental trajectory of Relative Clauses (RCs) as a study case and consider specific issues that can affect the developmental picture, such as task effects, formulaic language and national language effects. I will conclude by showing that not only we can arrive at reliable generalisations about RC development based on a resource like EFCAMDAT, but we can also obtain new generalisations, a fact strongly indicating the potential of big educational data for SLA research.

Week 8 2014/2015

Wednesday 26th November 2014
4:00-5:00pm

County South C89

Extra session co-organised with SSLAT