Colouring and breaking sticks: random distributions and heterogeneous clustering

Research output: Chapter in Book/Report/Conference proceedingChapter in a book


We begin by reviewing some probabilistic results about the Dirichlet Process and its close relatives, focussing on their implications for statistical modelling and analysis. We then introduce a class of simple mixture models in which clusters are of different `colours', with statistical characteristics that are constant within colours, but different between colours. Thus cluster identities are exchangeable only within colours. The basic form of our model is a variant on the familiar Dirichlet process, and we find that much of the standard modelling and computational machinery associated with the Dirichlet process may be readily adapted to our generalisation. The methodology is illustrated with an application to the partially-parametric clustering of gene expression profiles.
Original languageEnglish
Title of host publicationProbability and Mathematical Genetics: Papers in Honour of Sir John Kingman
EditorsN H Bingham, C M Goldie
Number of pages26
Publication statusPublished - Jul 2010


  • Bayesian nonparametrics, gene expression profiles, hierar-chical models, loss functions, MCMC samplers, optimal clustering, partition models, Polya urn, stick breaking.


Dive into the research topics of 'Colouring and breaking sticks: random distributions and heterogeneous clustering'. Together they form a unique fingerprint.

Cite this