Scaling Entity Linking with Crowdsourcing

Dyaa Albakour

Signal Media

Signal Media is a research-led technology company that uses Artificial Intelligence (AI) and Machine Learning (ML) to turn streams of unstructured text, e.g. news articles, into useful information.

One of the core components of Signal's text analytics pipeline is entity linking. In this presentation, I first review the current state-of-the-art for the task of entity linking (EL) and make the case for using supervised learning approaches to tackle EL. These approaches require large amounts of labelled data, which represent a bottleneck for scaling them out to cover large numbers of entities. To mitigate this, we have developed a production-ready solution to efficiently collect high-quality labelled data at a scale using Active Learning and Crowdsourcing. In particular, I will discuss in this presentation the different steps and the challenges in tuning the design parameters of the crowdsourcing task to limit the noise, reduce the cost and maximise the effectiveness of the resulting machine learning models for EL.

Week 14 2017/2018

Thursday 8th February 2018
3:00-4:00pm

Infolab C60b/c

Joint Data Science Group and UCREL talk