Projects per year
Abstract
In this paper, we are concerned with the problem of model-
ing prior information of a data miner about the data, with the purpose
of quantifying subjective interestingness of patterns. Recent results have
achieved this for the speci¯c case of prior expectations on the row and
column marginals, based on the Maximum Entropy principle [2, 12]. In
the current paper, we extend these ideas to make them applicable to more
general prior information, such as knowledge of frequencies of itemsets,
a cluster structure in the data, or the presence of dense areas in the
database. As in [2, 12], we show how information theory can be used
quantify subjective interestingness against this model as a background.
Our method presents an e±cient, °exible, and rigorous alternative to the
randomization approach presented in [6]. This randomization method
was developed for very similar purposes, but su®ers from convergence
issues and computational limitations. Furthermore, randomization tech-
niques can only be used for empirical hypothesis testing as a way to
quantify interestingness, severely limiting their applicability. We demon-
strate our method by searching for interesting patterns in real-life data
with respect to various realistic types of prior information, and we note
that like the approach from [6], our work can be used for iterative data
mining.
Translated title of the contribution | Formalizing complex prior information to quantify subjective interestingness of frequent pattern sets |
---|---|
Original language | English |
Publisher | University of Bristol |
Number of pages | 16 |
Publication status | Published - 2011 |
Fingerprint
Dive into the research topics of 'Formalizing complex prior information to quantify subjective interestingness of frequent pattern sets'. Together they form a unique fingerprint.Projects
- 1 Finished
-
FROM FREQUENT ITEMSETS TO INFORMATIVE PATTERNS
De Bie, T. E. P. (Principal Investigator)
1/10/09 → 1/04/13
Project: Research