TY - GEN
T1 - Morphosyntactic Tagging of Slovene Using Progol
AU - Cussens, James
AU - Dzeroski, Saso
AU - Erjavec, Tomaz
PY - 1999
Y1 - 1999
N2 - We consider the task of tagging Slovene words with morphosyntactic descriptions (MSDs). MSDs contain not only part-of-speech information but also attributes such as gender and case. In the case of Slovene there are 2,083 possible MSDs. P-Progol was used to learn morphosyntactic disambiguation rules from annotated data (consisting of 161,314 examples) produced by the MULTEXT-East project. P-Progol produced 1,148 rules taking 36 hours. Using simple grammatical background knowledge, e.g. looking for case disagreement, P-Progol induced 4,094 clauses in eight parallel runs. These rules have proved effective at detecting and explaining incorrect MSD annotations in an independent test set, but have not so far produced a tagger comparable to other existing taggers in terms of accuracy.
AB - We consider the task of tagging Slovene words with morphosyntactic descriptions (MSDs). MSDs contain not only part-of-speech information but also attributes such as gender and case. In the case of Slovene there are 2,083 possible MSDs. P-Progol was used to learn morphosyntactic disambiguation rules from annotated data (consisting of 161,314 examples) produced by the MULTEXT-East project. P-Progol produced 1,148 rules taking 36 hours. Using simple grammatical background knowledge, e.g. looking for case disagreement, P-Progol induced 4,094 clauses in eight parallel runs. These rules have proved effective at detecting and explaining incorrect MSD annotations in an independent test set, but have not so far produced a tagger comparable to other existing taggers in terms of accuracy.
U2 - 10.1007/3-540-48751-4_8
DO - 10.1007/3-540-48751-4_8
M3 - Conference Contribution (Conference Proceeding)
VL - 1634
T3 - LNAI
SP - 68
EP - 79
BT - Proceedings of the 9th International Workshop on Inductive Logic Programming (ILP-99)
PB - Springer
ER -