Learning Decision Trees Using the Area Under the ROC Curve

C Ferri, PA Flach, J Hernández-Orallo

Research output: Chapter in Book/Report/Conference proceedingConference Contribution (Conference Proceeding)

629 Citations (Scopus)

Abstract

ROC analysis is increasingly being recognised as an important tool for evaluation and comparison of classifiers when the operating characteristics (i.e. class distribution and cost parameters) are not known at training time. Usually, each classifier is characterised by its estimated true and false positive rates and is represented by a single point in the ROC diagram. In this paper, we show how a single decision tree can represent a set of classifiers by choosing different labellings of its leaves, or equivalently, an ordering on the leaves. In this setting, rather than estimating the accuracy of a single tree, it makes more sense to use the area under the ROC curve (AUC) as a quality metric. We also propose a novel splitting criterion which chooses the split with the highest local AUC. To the best of our knowledge, this is the first probabilistic splitting criterion that is not based on weighted average impurity. We present experiments suggesting that the AUC splitting criterion leads to trees with equal or better AUC value, without sacrificing accuracy if a single labelling is chosen.
Translated title of the contributionLearning Decision Trees Using the Area Under the ROC Curve
Original languageEnglish
Title of host publicationProceedings of the 19th International Conference on Machine Learning
EditorsClaude Sammut, Achim Hoffmann
PublisherMorgan Kaufmann
Pages139 - 146
Number of pages8
ISBN (Print)1558608737
Publication statusPublished - 2002

Bibliographical note

Other: http://portal.acm.org/citation.cfm?id=645531.655987&coll=GUIDE&dl=GUIDE&CFID=36744475&CFTOKEN=16437528#

Fingerprint

Dive into the research topics of 'Learning Decision Trees Using the Area Under the ROC Curve'. Together they form a unique fingerprint.

Cite this