Language Understanding in the Wild: Combining Crowdsourcing and Machine Learning

Research output: Chapter in Book/Report/Conference proceedingConference Contribution (Conference Proceeding)

26 Citations (Scopus)
62 Downloads (Pure)


Social media has led to the democratisation of opinion sharing. A wealth of information about public opinions, current events, and authors' insights into specific topics can be gained by understanding the text written by users. However, there is a wide variation in the language used by different authors in different contexts on the web. This diversity in language makes interpretation an extremely challenging task. Crowdsourcing presents an opportunity to interpret the sentiment, or topic, of free-text. However, the subjectivity and bias of human interpreters raise challenges in inferring the semantics expressed by the text. To overcome this problem, we present a novel Bayesian approach to language understanding that relies on aggregated crowdsourced judgements. Our model encodes the relationships between labels and text features in documents, such as tweets, web articles, and blog posts, accounting for the varying reliability of human labellers. It allows inference of annotations that scales to arbitrarily large pools of documents. Our evaluation using two challenging crowdsourcing datasets shows that by efficiently exploiting language models learnt from aggregated crowdsourced labels, we can provide up to 25% improved classifications when only a small portion, less than 4% of documents has been labelled. Compared to the six state-of-the-art methods, we reduce by up to 67% the number of crowd responses required to achieve comparable accuracy. Our method was a joint winner of the CrowdFlower - CrowdScale 2013 Shared Task challenge at the conference on Human Computation and Crowdsourcing (HCOMP 2013).
Original languageEnglish
Title of host publicationWWW '15
Subtitle of host publicationProceedings of the 24th International Conference on World Wide Web
PublisherAssociation for Computing Machinery (ACM)
Number of pages11
ISBN (Print)978-1-4503-3469-3
Publication statusPublished - May 2015
EventWWW 2015 - Florence, Italy
Duration: 18 May 201522 May 2015


ConferenceWWW 2015


Dive into the research topics of 'Language Understanding in the Wild: Combining Crowdsourcing and Machine Learning'. Together they form a unique fingerprint.

Cite this