Abstract
Cluster analysis seeks to identify homogeneous subgroups of cases in a population. This article provides an introduction to model-based clustering using finite mixture models and extensions. Finite mixtures have been successfully used for more than a hundred years for clustering and classification, but have become increasingly popular in the last decade due to recent advances in computer technology and software availability. Unlike traditional methods of cluster analysis, which are based on heuristic or distance-based procedures, finite mixture modeling provides a formal statistical framework on which to base the clustering procedure. Finite mixture models assume that the population is made up of several distinct subsets (or clusters), each following a different multivariate probability density distribution. Model-based cluster analysis can deal with a mix of nominal, ordinal, count, or continuous variables, any of which may contain missing values. We will demonstrate how the problems of determining the number of clusters and choosing an appropriate clustering method reduce to a model selection problem, for which objective procedures exist. We briefly discuss how model-based cluster analysis can be used to analyze complex and structured (e.g., longitudinal) datasets. WIREs Comput Stat 2012 doi: 10.1002/wics.1204
Original language | English |
---|---|
Pages (from-to) | 341-358 |
Number of pages | 18 |
Journal | Wiley Interdisciplinary Reviews: Computational Statistics |
Volume | 4 |
Issue number | 4 |
DOIs | |
Publication status | Published - 2012 |
Keywords
- finite mixture modeling
- finite mixture densities
- model-based cluster analysis
- model selection
- variable selection
- mixed-mode data
- latent-class analysis
- EM optimization
- finite regression mixtures