Finding interesting itemsets using a probabilistic model for binary databases

Bie Tijl De

Research output: Working paper

Abstract

A good formalization of interestingness of a pattern should satisfy two criteria: it should conform well to intuition, and it should be computationally tractable to use. The focus has long been on the latter, with the development of frequent pattern mining methods. However, it is now recognized that more appropriate measures than frequency are required. In this paper we report results in this direction for itemset mining in binary databases. In particular, we introduce a probabilistic model that can be fitted e±ciently to any binary database, and that has a compact and explicit representation. We then show how this model enables the formalization of an intuitive and tractable interestingness measure for itemsets, relying on concepts from information theory. Our probabilistic model is closely related to the uniform distribution over all databases that can be obtained by means of swap randomization [8]. However, in contrast to the swap randomization model, our model is explicit, which is key to its use for defining practical interestingness measures.
Translated title of the contributionFinding interesting itemsets using a probabilistic model for binary databases
Original languageEnglish
PublisherUniversity of Bristol
Number of pages9
Publication statusPublished - 2009

Fingerprint

Dive into the research topics of 'Finding interesting itemsets using a probabilistic model for binary databases'. Together they form a unique fingerprint.

Cite this