Projects per year
Abstract
Statistical assessment of the results of data mining
is increasingly recognised as a core task in the knowledge
discovery process. It is of key importance in practice, as
results that might seem interesting at first glance can often
be explained by wellknown basic properties of the data. In
pattern mining, for instance, such trivial results can be so
overwhelming in number that filtering them out is a necessity
in order to identify the truly interesting patterns.
In this paper, we propose an approach for assessing results
on realvalued rectangular databases. More specifically, using
our analytical model we are able to statistically assess whether
or not a discovered structure may be the trivial result of the
row and column marginal distributions in the database.
Our main approach is to use the Maximum Entropy principle
to fit a background model to the data while respecting
its marginal distributions. To find these distributions, we
employ an MDL based histogram estimator, and we fit these
in our model using efficient convex optimisation techniques.
Subsequently, our model can be used to calculate probabilities
directly, as well as to efficiently sample data with the purpose of
assessing results by means of empirical hypothesis testing. Notably,
our approach is efficient, parameterfree, and naturally
deals with missing values. As such, it represents a wellfounded
alternative to swap randomisation.
Translated title of the contribution  Maximum Entropy Modeling for Real Valued Databases 

Original language  English 
Publisher  University of Bristol 
Number of pages  10 
Publication status  Published  2011 
Fingerprint
Dive into the research topics of 'Maximum Entropy Modeling for Real Valued Databases'. Together they form a unique fingerprint.Projects
 1 Finished

FROM FREQUENT ITEMSETS TO INFORMATIVE PATTERNS
De Bie, T. E. P. (Principal Investigator)
1/10/09 → 1/04/13
Project: Research