Enhanced word decomposition by calibrating the decision threshold of probabilistic models and using a model ensemble

Sebastian Spiegler, Peter Flach

Research output: Chapter in Book/Report/Conference proceedingConference Contribution (Conference Proceeding)

2 Citations (Scopus)

Abstract

This paper demonstrates that the use of ensemble methods and carefully calibrating the decision threshold can significantly improve the performance of machine learning methods for morphological word decomposition. We employ two algorithms which come from a family of generative probabilistic models. The models consider segment boundaries as hidden variables and include probabilities for letter transitions within segments. The advantage of this model family is that it can learn from small datasets and easily generalises to larger datasets. The first algorithm PROMODES, which participated in the Morpho Challenge 2009 (an international competition for unsupervised morphological analysis) employs a lower order model whereas the second algorithm PROMODES-H is a novel development of the first using a higher order model. We present the mathematical description for both algorithms, conduct experiments on the morphologically rich language Zulu and compare characteristics of both algorithms based on the experimental results.
Translated title of the contributionEnhanced word decomposition by calibrating the decision threshold of probabilistic models and using a model ensemble
Original languageEnglish
Title of host publication48th Annual Meeting of the Association for Computational Linguistics (ACL 2010)
Publication statusPublished - 2010

Bibliographical note

Other page information: 375-383
Conference Proceedings/Title of Journal: 48th Annual Meeting of the Association for Computational Linguistics (ACL 2010)
Other identifier: 2001194

Fingerprint Dive into the research topics of 'Enhanced word decomposition by calibrating the decision threshold of probabilistic models and using a model ensemble'. Together they form a unique fingerprint.

Cite this