Abstract
For the Morpho Challenge 2009 we present an algorithm for unsupervised morpho-
logical analysis called Promodes1 which is based on a probabilistic generative model.
The model considers segment boundaries as hidden variables and includes probabilities
for letter transitions within segments. Promodes purely concentrates on segmenting
words whereas its labeling method is simplistic. Morpheme labels are the segments
themselves. The algorithm can be employed in different degrees of supervision. For
the challenge, however, we demonstrate three unsupervised versions. The first one uses
a simple segmenting algorithm on a small subset of the data which is based on letter
succession probabilities in substrings and then estimates the model parameters using
a maximum likelihood approach. The second version estimates its parameters through
expectation maximization. Independently of the parameter estimation, we utilized
each model to decompose words from the original language data. A third method is
a committee of unsupervised learners where each learner corresponds to the second
version, however, with different initializations of the expectation maximization. The
solution is then found by ma jority vote which decides whether to segment in a word
position or not. In this paper, we describe the details of the probabilistic model, how
parameters are estimated and how the most likely decomposition of an input word is
found. We have tested Promodes on Arabic (vowelized and non-vowelized), English,
Finnish, German and Turkish. All three methods achieved competitive results in the
Morpho Challenge 2009.
Translated title of the contribution | Promodes: A probabilistic generative model for word decompositions |
---|---|
Original language | English |
Title of host publication | Working Notes for the CLEF 2009 Workshop, Corfu, Greece |
Publication status | Published - 2009 |
Bibliographical note
Other page information: -Conference Proceedings/Title of Journal: Working Notes for the CLEF 2009 Workshop, Corfu, Greece
Other identifier: 2001115