Model-based cluster analysis

Daniel Stahl, Hannah Sallis

Research output: Contribution to journalArticle (Academic Journal)peer-review

46 Citations (Scopus)


Cluster analysis seeks to identify homogeneous subgroups of cases in a population. This article provides an introduction to model-based clustering using finite mixture models and extensions. Finite mixtures have been successfully used for more than a hundred years for clustering and classification, but have become increasingly popular in the last decade due to recent advances in computer technology and software availability. Unlike traditional methods of cluster analysis, which are based on heuristic or distance-based procedures, finite mixture modeling provides a formal statistical framework on which to base the clustering procedure. Finite mixture models assume that the population is made up of several distinct subsets (or clusters), each following a different multivariate probability density distribution. Model-based cluster analysis can deal with a mix of nominal, ordinal, count, or continuous variables, any of which may contain missing values. We will demonstrate how the problems of determining the number of clusters and choosing an appropriate clustering method reduce to a model selection problem, for which objective procedures exist. We briefly discuss how model-based cluster analysis can be used to analyze complex and structured (e.g., longitudinal) datasets. WIREs Comput Stat 2012 doi: 10.1002/wics.1204
Original languageEnglish
Pages (from-to)341-358
Number of pages18
JournalWiley Interdisciplinary Reviews: Computational Statistics
Issue number4
Publication statusPublished - 2012


  • finite mixture modeling
  • finite mixture densities
  • model-based cluster analysis
  • model selection
  • variable selection
  • mixed-mode data
  • latent-class analysis
  • EM optimization
  • finite regression mixtures


Dive into the research topics of 'Model-based cluster analysis'. Together they form a unique fingerprint.

Cite this