Projects per year
In this paper, we are concerned with the problem of model- ing prior information of a data miner about the data, with the purpose of quantifying subjective interestingness of patterns. Recent results have achieved this for the speci¯c case of prior expectations on the row and column marginals, based on the Maximum Entropy principle [2, 12]. In the current paper, we extend these ideas to make them applicable to more general prior information, such as knowledge of frequencies of itemsets, a cluster structure in the data, or the presence of dense areas in the database. As in [2, 12], we show how information theory can be used quantify subjective interestingness against this model as a background. Our method presents an e±cient, °exible, and rigorous alternative to the randomization approach presented in . This randomization method was developed for very similar purposes, but su®ers from convergence issues and computational limitations. Furthermore, randomization tech- niques can only be used for empirical hypothesis testing as a way to quantify interestingness, severely limiting their applicability. We demon- strate our method by searching for interesting patterns in real-life data with respect to various realistic types of prior information, and we note that like the approach from , our work can be used for iterative data mining.
|Translated title of the contribution||Formalizing complex prior information to quantify subjective interestingness of frequent pattern sets|
|Publisher||University of Bristol|
|Number of pages||16|
|Publication status||Published - 2011|