Abstract
Current methods for sequence tagging depend on large quantities of domain-specific training data, limiting their use in new, user-defined tasks with few or no annotations. While crowdsourcing can be a cheap source of labels, it often introduces errors that degrade the performance of models trained on such crowdsourced data. Another solution is to use transfer learning to tackle low resource sequence labelling, but current approaches rely heavily on similar high resource datasets in different languages. In this paper, we propose a domain
adaptation method using Bayesian sequence combination to exploit pre-trained models and unreliable crowdsourced data that does not require high resource data in a different language. Our method boosts performance by learning the relationship between each labeller and the target task and trains a sequence labeller on the target domain with little or no goldstandard data. We apply our approach to labelling diagnostic classes in medical and educational case studies, showing that the model achieves strong performance though zero-shot transfer learning and is more effective than alternative ensemble methods. Using NER and information extraction tasks, we show how our approach can train a model directly from crowdsourced labels, outperforming pipeline approaches that
first aggregate the crowdsourced data, then train on the aggregated labels.
adaptation method using Bayesian sequence combination to exploit pre-trained models and unreliable crowdsourced data that does not require high resource data in a different language. Our method boosts performance by learning the relationship between each labeller and the target task and trains a sequence labeller on the target domain with little or no goldstandard data. We apply our approach to labelling diagnostic classes in medical and educational case studies, showing that the model achieves strong performance though zero-shot transfer learning and is more effective than alternative ensemble methods. Using NER and information extraction tasks, we show how our approach can train a model directly from crowdsourced labels, outperforming pipeline approaches that
first aggregate the crowdsourced data, then train on the aggregated labels.
Original language | English |
---|---|
Number of pages | 8 |
Publication status | Published - 12 Feb 2020 |
Event | AAAI Conference on Artificial Intelligence - New York, United States Duration: 7 Feb 2020 → 12 Feb 2020 Conference number: 34 https://aaai.org/Conferences/AAAI-20/ |
Conference
Conference | AAAI Conference on Artificial Intelligence |
---|---|
Abbreviated title | AAAI-20 |
Country/Territory | United States |
City | New York |
Period | 7/02/20 → 12/02/20 |
Internet address |