An information-theoretic approach to finding informative noisy tiles in binary databases

KN Kontonasios, Bie Tijl De

Research output: Chapter in Book/Report/Conference proceedingConference Contribution (Conference Proceeding)

39 Citations (Scopus)

Abstract

The task of finding informative recurring patterns in data has been central to data mining research since the introduction of the task of frequent itemset mining in [1,2,14]. In these seminal papers, the informativeness of a recurring itemset in a binary database was formalized by its support in the database. However, it is now widely recognized that an itemset's support is not the best measure of its informativeness. Furthermore, recent work has highlighted that the support of an itemset is highly susceptible to noise, such that it may be more appropriate to search for itemsets that recur only approximately. In this paper, we present a new measure of informativeness for noisy itemsets in binary databases within the formalism of tiles [6]. We demonstrate the benefits of our new measure by means of experiments on artificial and real-life data, allowing for objective and subjective evaluation.
Translated title of the contributionAn information-theoretic approach to finding informative noisy tiles in binary databases
Original languageEnglish
Title of host publicationThe 2010 SIAM International Conference on Data Mining (SDM)
Number of pages12
Publication statusPublished - Apr 2010

Bibliographical note

Conference Organiser: SIAM

Fingerprint Dive into the research topics of 'An information-theoretic approach to finding informative noisy tiles in binary databases'. Together they form a unique fingerprint.

Cite this