Projects per year
Abstract
Statistical assessment of the results of data mining
is increasingly recognised as a core task in the knowledge
discovery process. It is of key importance in practice, as
results that might seem interesting at first glance can often
be explained by well-known basic properties of the data. In
pattern mining, for instance, such trivial results can be so
overwhelming in number that filtering them out is a necessity
in order to identify the truly interesting patterns.
In this paper, we propose an approach for assessing results
on real-valued rectangular databases. More specifically, using
our analytical model we are able to statistically assess whether
or not a discovered structure may be the trivial result of the
row and column marginal distributions in the database.
Our main approach is to use the Maximum Entropy principle
to fit a background model to the data while respecting
its marginal distributions. To find these distributions, we
employ an MDL based histogram estimator, and we fit these
in our model using efficient convex optimisation techniques.
Subsequently, our model can be used to calculate probabilities
directly, as well as to efficiently sample data with the purpose of
assessing results by means of empirical hypothesis testing. Notably,
our approach is efficient, parameter-free, and naturally
deals with missing values. As such, it represents a well-founded
alternative to swap randomisation.
Translated title of the contribution | Maximum Entropy Modeling for Real Valued Databases |
---|---|
Original language | English |
Publisher | University of Bristol |
Number of pages | 10 |
Publication status | Published - 2011 |
Fingerprint
Dive into the research topics of 'Maximum Entropy Modeling for Real Valued Databases'. Together they form a unique fingerprint.Projects
- 1 Finished
-
FROM FREQUENT ITEMSETS TO INFORMATIVE PATTERNS
De Bie, T. E. P. (Principal Investigator)
1/10/09 → 1/04/13
Project: Research