Signal Media is a research-led technology company that uses Artificial Intelligence (AI) and Machine Learning (ML) to turn streams of unstructured text, e.g. news articles, into useful information.
One of the core components of Signal's text analytics pipeline is entity linking. In this presentation, I first review the current state-of-the-art for the task of entity linking (EL) and make the case for using supervised learning approaches to tackle EL. These approaches require large amounts of labelled data, which represent a bottleneck for scaling them out to cover large numbers of entities. To mitigate this, we have developed a production-ready solution to efficiently collect high-quality labelled data at a scale using Active Learning and Crowdsourcing. In particular, I will discuss in this presentation the different steps and the challenges in tuning the design parameters of the crowdsourcing task to limit the noise, reduce the cost and maximise the effectiveness of the resulting machine learning models for EL.
Joint Data Science Group and UCREL talk