Abstract
We propose Bayesian generative models for unsupervised learning with two types of data and
an assumed dependency of one type of data on the other. We consider two algorithmic approaches,
based on a correspondence model, where latent variables are shared across datasets. These models
indicate the appropriate number of clusters in addition to indicating relevant features in both
types of data. We evaluate the model on artificially created data. We then apply the method to
a breast cancer dataset consisting of gene expression and microRNA array data derived from the
same patients. We assume partial dependence of gene expression on microRNA expression in
this study. The method ranks genes within subtypes which have statistically significant abnormal
expression and ranks associated abnormally expressing microRNA. We report a genetic signature
for the basal-like subtype of breast cancer found across a number of previous gene expression
array studies. Using the two algorithmic approaches we find that this signature also arises from
clustering on the microRNA expression data and appears derivative from this data.
Translated title of the contribution | Bayesian Unsupervised Learning with Multiple Data Types |
---|---|
Original language | English |
Article number | Article 27 |
Number of pages | 29 |
Journal | Statistical Applications in Genetics and Molecular Biology |
Volume | 8 |
DOIs | |
Publication status | Published - Jun 2009 |
Bibliographical note
Author of Publication Reviewed: Phaedra Agius, Yiming Ying and Colin CampbellPublisher: Berkeley University Press
Other identifier: Issue 1