Comparative Data Mining for Microarrays: A Case Study Based on Multiple Myeloma

David Page, Fenghuan Zhan, James Cussens, Michael Waddell, Johanna Hardin, Bart Barlogie, John Shaughnessy

Research output: Working paper


Supervised machine learning and data mining tools have become popular for the analysis of gene expression microarray data. They have the potential to uncover new therapeutic targets for diseases, to predict how patients will respond to specific treatments, and to uncover regulatory relationships among genes in normal and disease situations. Comparative experiments are needed to identify the advantages of the leading supervised learning algorithms for microarray data, as well as to give direction in methodological decisions. This paper compares support vector machines, Bayesian networks, decision trees, boosted decision trees, and voting (ensembles of decision stumps) on a new microarray data set for cancer with over 100 samples. The paper provides evidence for several important lessons for mining microarray data, including: (1) Bayes nets and ensembles perform at least as well as other approaches but arguably provide more direct insight; (2) the common practice of throwing out low or negative average differences, or those accompanied by an absent call, is a mistake; (3) looking for consistent differences in expression may be more important than large differences.
Original languageEnglish
PublisherUniversity of Wisconsin–Madison
Publication statusPublished - 1 Nov 2002

Bibliographical note

Computer Science Department, University of Wisconsin, Technical Report #1453


Dive into the research topics of 'Comparative Data Mining for Microarrays: A Case Study Based on Multiple Myeloma'. Together they form a unique fingerprint.

Cite this