ROC 'n' Rule Learning -- Towards a Better Understanding of Covering Algorithms

J Fürnkranz, PA Flach

Research output: Contribution to journalArticle (Academic Journal)peer-review

179 Citations (Scopus)

Abstract

This paper provides an analysis of the behavior of separate-and-conquer or covering rule learning algorithms by visualizing their evaluation metrics and their dynamics in coverage space, a variant of ROC space. Our results show that most commonly used metrics, including accuracy, weighted relative accuracy, entropy, and Gini index, are equivalent to one of two fundamental prototypes: precision, which tries to optimize the area under the ROC curve for unknown costs, and a cost-weighted difference between covered positive and negative examples, which tries to find the optimal point under known or assumed costs. We also show that a straightforward generalization of the m-estimate trades off these two prototypes. Furthermore, our results show that stopping and filtering criteria like CN2's significance test focus on identifying significant deviations from random classification, which does not necessarily avoid overfitting. We also identify a problem with Foil's MDL-based encoding length restriction, which proves to be largely equivalent to a variable threshold on the recall of the rule. In general, we interpret these results as evidence that, contrary to common conception, pre-pruning heuristics are not very well understood and deserve more investigation.
Translated title of the contributionROC 'n' Rule Learning -- Towards a Better Understanding of Covering Algorithms
Original languageEnglish
Pages (from-to)39 - 77
Number of pages39
JournalMachine Learning
Volume58 (1)
DOIs
Publication statusPublished - Jan 2005

Bibliographical note

Publisher: Springer
Other: http://www.cs.bris.ac.uk/Publications/pub_info.jsp?id=2000264

Fingerprint

Dive into the research topics of 'ROC 'n' Rule Learning -- Towards a Better Understanding of Covering Algorithms'. Together they form a unique fingerprint.

Cite this