Exploiting the High Predictive Power of Multi-class Subgroups

Tarek Abudawood, Peter Flach, Masashi Sugiyama, Qiang Yang

Research output: Chapter in Book/Report/Conference proceedingConference Contribution (Conference Proceeding)

Abstract

Subgroup discovery aims at finding subsets of a population whose class distribution is significantly different from the overall distribution. A number of multi-class subgroup discovery methods has been previously investigated, proposed and implemented in the CN2-MSD system. When a decision tree learner was applied using the induced subgroups as features, it led to the construction of accurate and compact predictive models, demonstrating the usefulness of the subgroups. In this paper we show that, given a significant, sufficient and diverse set of subgroups, no further learning phase is required to build a good predictive model. Our systematic study bridges the gap between rule learning and decision tree modelling by proposing a method which uses the training information associated with the subgroups to form a simple tree-based probability estimator and ranker, RankFree-MSD, without the need for an additional learning phase. Furthermore, we propose an efficient subgroup pruning algorithm, RankFree-Pruning, that prunes unimportant subgroups from the subgroup tree in order to reduce the number of subgroups and the size of the tree without decreasing predictive performance. Despite the simplicity of our approach we experimentally show that its predictive performance in general is comparable to other decision tree and rule learners over 10 multi-class UCI data sets.
Translated title of the contributionExploiting the High Predictive Power of Multi-class Subgroups
Original languageEnglish
Title of host publicationJMLR Workshop and Conference Proceedings Volume 13: 2nd Asian Conference on Machine Learning (ACML'10)
Publication statusPublished - 2010

Bibliographical note

Other page information: 177-192
Conference Proceedings/Title of Journal: JMLR Workshop and Conference Proceedings Volume 13: 2nd Asian Conference on Machine Learning (ACML'10)
Other identifier: 2001250

Fingerprint

Dive into the research topics of 'Exploiting the High Predictive Power of Multi-class Subgroups'. Together they form a unique fingerprint.

Cite this