Kernel-based data fusion for gene prioritization

TEP De Bie, L-C Tranchevent, LMM van Oeffelen, Y Moreau

Research output: Contribution to journalArticle (Academic Journal)peer-review

111 Citations (Scopus)


Motivation: Hunting disease genes is a problem of primary importance in biomedical research. Biologists usually approach this problem in two steps: first a set of candidate genes is identified using traditional positional cloning or high-throughput genomics techniques; second, these genes are further investigated and validated in the wet lab, one by one. To speed up discovery and limit the number of costly wet lab experiments, biologist must test the candidate genes starting with the most probable candidates. So far, biologists have relied on literature studies, extensive queries to multiple databases, and hunches about expected properties of the disease gene to determine such an ordering. Recently, we have introduced the data mining tool ENDEAVOUR (Aerts et al., 2006), which performs this task automatically by relying on different genomewide data sources, such as Gene Ontology, literature, microarray, sequence, and more. Results: In this paper, we present a novel kernel method that operates in the same setting: based on a number of different views on a set of training genes, a prioritization of test genes is obtained. We furthermore provide a thorough learning theoretical analysis of the method's guaranteed performance. Finally, we apply the method to the disease data sets on which ENDEAVOUR (Aerts et al., 2006) has been benchmarked, and report a considerable improvement in empirical performance.
Translated title of the contributionKernel-based data fusion for gene prioritization
Original languageEnglish
Pages (from-to)i25 - i32
Number of pages8
Volume23 (13)
Publication statusPublished - Jul 2007

Bibliographical note

Publisher: Oxford University Press


Dive into the research topics of 'Kernel-based data fusion for gene prioritization'. Together they form a unique fingerprint.

Cite this