The task of finding informative recurring patterns in data has been central to data mining research since the introduction of the task of frequent itemset mining in [1,2,14]. In these seminal papers, the informativeness of a recurring itemset in a binary database was formalized by its support in the database. However, it is now widely recognized that an itemset's support is not the best measure of its informativeness. Furthermore, recent work has highlighted that the support of an itemset is highly susceptible to noise, such that it may be more appropriate to search for itemsets that recur only approximately. In this paper, we present a new measure of informativeness for noisy itemsets in binary databases within the formalism of tiles . We demonstrate the benefits of our new measure by means of experiments on artificial and real-life data, allowing for objective and subjective evaluation.
|Translated title of the contribution||An information-theoretic approach to finding informative noisy tiles in binary databases|
|Title of host publication||The 2010 SIAM International Conference on Data Mining (SDM)|
|Number of pages||12|
|Publication status||Published - Apr 2010|