Comparative Data Mining for Microarrays: A Case Study Based on Multiple Myeloma

Research output: Working paperWorking paper and Preprints

Abstract

Supervised machine learning and data mining tools have become popular for the analysis of gene expression microarray data. They have the potential to uncover new therapeutic targets for diseases, to predict how patients will respond to specific treatments, and to uncover regulatory relationships among genes in normal and disease situations. Comparative experiments are needed to identify the advantages of the leading supervised learning algorithms for microarray data, as well as to give direction in methodological decisions. This paper compares support vector machines, Bayesian networks, decision trees, boosted decision trees, and voting (ensembles of decision stumps) on a new microarray data set for cancer with over 100 samples. The paper provides evidence for several important lessons for mining microarray data, including: (1) Bayes nets and ensembles perform at least as well as other approaches but arguably provide more direct insight; (2) the common practice of throwing out low or negative average differences, or those accompanied by an absent call, is a mistake; (3) looking for consistent differences in expression may be more important than large differences.
Original languageEnglish
Publication statusPublished - 1 Nov 2002

Fingerprint Dive into the research topics of 'Comparative Data Mining for Microarrays: A Case Study Based on Multiple Myeloma'. Together they form a unique fingerprint.

Cite this