Cost-based Sampling of Individual Instances

William Klement, Peter Flach, Nathalie Japkowicz, Stan Matwin

Research output: Chapter in Book/Report/Conference proceedingConference Contribution (Conference Proceeding)

2 Citations (Scopus)

Abstract

In many practical domains, misclassification costs can differ greatly and may be represented by class ratios, however, most learning algorithms struggle with skewed class distributions. The difficulty is attributed to designing classifiers to maximize the accuracy. Researchers call for using several techniques to address this problem including; under-sampling the majority class, employing a probabilistic algorithm, and adjusting the classification threshold. In this paper, we propose a general sampling approach that assigns weights to individual instances according to the cost function. This approach helps reveal the relationship between classification performance and class ratios and allows the identification of an appropriate class distribution for which, the learning method achieves a reasonable performance on the data. Our results show that combining an ensemble of Naive Bayes classifiers with threshold selection and under-sampling techniques works well for imbalanced data.
Translated title of the contributionCost-based Sampling of Individual Instances
Original languageEnglish
Title of host publicationCanadian Conference on Artificial Intelligence
Pages86-97
Publication statusPublished - 2009

Bibliographical note

ISBN: 9783642018176
Publisher: Springer
Name and Venue of Conference: Canadian Conference on Artificial Intelligence
Other identifier: 2001063

Fingerprint

Dive into the research topics of 'Cost-based Sampling of Individual Instances'. Together they form a unique fingerprint.

Cite this