Unsupervised Word Decomposition with the Promodes Algorithm

Sebastian Spiegler, Bruno Golenia, Peter Flach

Research output: Chapter in Book/Report/Conference proceedingChapter in a book

5 Citations (Scopus)


We present PROMODES, an algorithm for unsupervised word decomposition, which is based on a probabilistic generative model. The model considers segment boundaries as hidden variables and includes probabilities for letter transitions within segments. For the Morpho Challenge 2009, we demonstrate three versions of PROMODES. The first one uses a simple segmentation algorithm on a subset of the data and applies maximum likelihood estimates for model parameters when decomposing words of the original language data. The second version estimates its parameters through expectation maximization (EM). A third method is a committee of unsupervised learners where learners correspond to different EM initializations. The solution is found by majority vote which decides whether to segment at a word position or not. In this paper, we describe the probabilistic model, parameter estimation and how the most likely decomposition of an input word is found. We have tested PROMODES on non-vowelized and vowelized Arabic as well as on English, Finnish, German and Turkish. All three methods achieved competitive results.
Translated title of the contributionUnsupervised Word Decomposition with the Promodes Algorithm
Original languageEnglish
Title of host publicationMultilingual Information Access Evaluation, Lecture Notes in Computer Science
PublisherSpringer Verlag
Publication statusPublished - 2010

Bibliographical note

Other page information: -
Other identifier: 2001195


Dive into the research topics of 'Unsupervised Word Decomposition with the Promodes Algorithm'. Together they form a unique fingerprint.

Cite this