Abstract
This paper investigates how to adapt standard classification rule learning approaches to subgroup
discovery. The goal of subgroup discovery is to find rules describing subsets of the population
that are sufficiently large and statistically unusual. The paper presents a subgroup discovery algorithm,
CN2-SD, developed by modifying parts of the CN2 classification rule learner: its covering
algorithm, search heuristic, probabilistic classification of instances, and evaluation measures. Experimental
evaluation of CN2-SD on 23 UCI data sets shows substantial reduction of the number
of induced rules, increased rule coverage and rule significance, as well as slight improvements
in terms of the area under ROC curve, when compared with the CN2 algorithm. Application of
CN2-SD to a large traffic accident data set confirms these findings.
Translated title of the contribution | Subgroup discovery with CN2-SD |
---|---|
Original language | English |
Pages (from-to) | 153 - 188 |
Number of pages | 36 |
Journal | Journal of Machine Learning Research |
Volume | 5 |
Publication status | Published - Feb 2004 |
Bibliographical note
Publisher: Microtome PublishingOther: http://www.cs.bris.ac.uk/Publications/pub_info.jsp?id=2000064