Low Resource Sequence Tagging with Weak Labels

Edwin Simpson, Jonas Pfeiffer, Iryna Gurevych

Research output: Contribution to conferenceConference Paperpeer-review

151 Downloads (Pure)

Abstract

Current methods for sequence tagging depend on large quantities of domain-specific training data, limiting their use in new, user-defined tasks with few or no annotations. While crowdsourcing can be a cheap source of labels, it often introduces errors that degrade the performance of models trained on such crowdsourced data. Another solution is to use transfer learning to tackle low resource sequence labelling, but current approaches rely heavily on similar high resource datasets in different languages. In this paper, we propose a domain
adaptation method using Bayesian sequence combination to exploit pre-trained models and unreliable crowdsourced data that does not require high resource data in a different language. Our method boosts performance by learning the relationship between each labeller and the target task and trains a sequence labeller on the target domain with little or no goldstandard data. We apply our approach to labelling diagnostic classes in medical and educational case studies, showing that the model achieves strong performance though zero-shot transfer learning and is more effective than alternative ensemble methods. Using NER and information extraction tasks, we show how our approach can train a model directly from crowdsourced labels, outperforming pipeline approaches that
first aggregate the crowdsourced data, then train on the aggregated labels.
Original languageEnglish
Number of pages8
Publication statusPublished - 12 Feb 2020
EventAAAI Conference on Artificial Intelligence - New York, United States
Duration: 7 Feb 202012 Feb 2020
Conference number: 34
https://aaai.org/Conferences/AAAI-20/

Conference

ConferenceAAAI Conference on Artificial Intelligence
Abbreviated titleAAAI-20
CountryUnited States
CityNew York
Period7/02/2012/02/20
Internet address

Fingerprint Dive into the research topics of 'Low Resource Sequence Tagging with Weak Labels'. Together they form a unique fingerprint.

Cite this