Promodes: A probabilistic generative model for word decompositions

Sebastian Spiegler, Bruno S G Golenia, Peter A Flach

Research output: Chapter in Book/Report/Conference proceedingConference Contribution (Conference Proceeding)

4 Citations (Scopus)

Abstract

For the Morpho Challenge 2009 we present an algorithm for unsupervised morpho- logical analysis called Promodes1 which is based on a probabilistic generative model. The model considers segment boundaries as hidden variables and includes probabilities for letter transitions within segments. Promodes purely concentrates on segmenting words whereas its labeling method is simplistic. Morpheme labels are the segments themselves. The algorithm can be employed in different degrees of supervision. For the challenge, however, we demonstrate three unsupervised versions. The first one uses a simple segmenting algorithm on a small subset of the data which is based on letter succession probabilities in substrings and then estimates the model parameters using a maximum likelihood approach. The second version estimates its parameters through expectation maximization. Independently of the parameter estimation, we utilized each model to decompose words from the original language data. A third method is a committee of unsupervised learners where each learner corresponds to the second version, however, with different initializations of the expectation maximization. The solution is then found by ma jority vote which decides whether to segment in a word position or not. In this paper, we describe the details of the probabilistic model, how parameters are estimated and how the most likely decomposition of an input word is found. We have tested Promodes on Arabic (vowelized and non-vowelized), English, Finnish, German and Turkish. All three methods achieved competitive results in the Morpho Challenge 2009.
Translated title of the contributionPromodes: A probabilistic generative model for word decompositions
Original languageEnglish
Title of host publicationWorking Notes for the CLEF 2009 Workshop, Corfu, Greece
Publication statusPublished - 2009

Bibliographical note

Other page information: -
Conference Proceedings/Title of Journal: Working Notes for the CLEF 2009 Workshop, Corfu, Greece
Other identifier: 2001115

Fingerprint

Dive into the research topics of 'Promodes: A probabilistic generative model for word decompositions'. Together they form a unique fingerprint.

Cite this