A Solution to the Problem of High Variance When Tuning NLP Models With K-fold Cross Validation

Henry Moss

STOR-i, Lancaster University

K-fold cross validation (CV) is a popular method for estimating the true performance of machine learning models, allowing model selection and parameter tuning. However, the very process of CV requires random partitioning of the data and so our performance estimates are in fact stochastic, with variability that can be substantial for natural language processing tasks. We demonstrate that these unstable estimates cannot be relied upon for effective parameter tuning. The resulting tuned parameters are highly sensitive to how our data is partitioned, meaning that we often select sub-optimal parameter choices and have serious reproducibility issues.

We propose to instead use performance estimates based on the less variable J-K-fold CV. Our main contributions are extending the use of J-K-fold CV from performance estimation to parameter tuning and investigating how best to choose J and K. To balance effectiveness and computational efficiency we advocate lower choices of K than are typically seen in the NLP literature and instead use the saved computation to increase J. We provide empirical evidence for this claim across a range of NLP tasks.

Week 17 2017/2018

Thursday 1st March 2018
3:00-4:00pm

Management School LT 9

Joint Data Science Group and UCREL talk