Learning to separate vocals from polyphonic mixtures via ensemble methods and structured output prediction

Raul Santos-Rodriguez, Tijl De Bie, Matt Mcvicar

Research output: Chapter in Book/Report/Conference proceedingConference Contribution (Conference Proceeding)

3 Citations (Scopus)
380 Downloads (Pure)


Separating the singing from a polyphonic mixed audio signal is a challenging but important task, with a wide range of applications across the music industry and music informatics research. Various methods have been devised over the years, ranging from Deep Learning approaches to dedicated ad hoc solutions. In this paper, we present a novel machine learning method for the task, using a Conditional Random Field (CRF) approach for structured output prediction. We exploit the diversity of previously proposed approaches by using their predictions as input features to our method - thus effectively developing an ensemble method. Our empirical results demonstrate the potential of integrating predictions from different previously-proposed methods into one ensemble method, and additionally show that CRF models with larger complexities generally lead to superior performance.
Original languageEnglish
Title of host publication2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
PublisherInstitute of Electrical and Electronics Engineers (IEEE)
Number of pages5
ISBN (Electronic)9781479999880
ISBN (Print)9781479999873
Publication statusPublished - 19 May 2016

Publication series

ISSN (Electronic)2379-190X


  • Ensemble method
  • Singing voice separation
  • Conditional random fields
  • Spectogram
  • Hidden markov model
  • Time-frequency analysis
  • Computational modeling
  • Machine learning
  • Harmonic analysis
  • Radio frequency


Dive into the research topics of 'Learning to separate vocals from polyphonic mixtures via ensemble methods and structured output prediction'. Together they form a unique fingerprint.

Cite this