Visual Voice Activity Detection in the Wild

Foteini Patrona, Alexandros Iosifidis, Anastasios Tefas, Nikolaos Nikolaidis, Ioannis Pitas

Research output: Contribution to journalArticle (Academic Journal)peer-review

18 Citations (Scopus)
388 Downloads (Pure)


The Visual Voice Activity Detection (V-VAD) problem in unconstrained environments is investigated in this paper. A novel method for V-VAD in the wild, exploiting local shape and motion information appearing at spatiotemporal locations of interest for facial video segment description and the Bag of Words (BoW) model for facial video segment representation, is proposed. Facial video segment classification is subsequently performed using state-of-the-art classification algorithms. Experimental results on one publicly available V-VAD data set, denote the effectiveness of the proposed method, since it achieves better generalization performance in unseen users, when compared to recently proposed state-of-the-art methods. Additional results on a new, unconstrained data set, provide evidence that the proposed method can be effective even in such cases in which any other existing method fails.
Original languageEnglish
Pages (from-to)967-977
Number of pages11
JournalIEEE Transactions on Multimedia
Issue number6
Publication statusPublished - 26 Feb 2016


  • Voice Activity Detection in the wild
  • Space-Time Interest Points
  • Bag of Words model
  • kernel Extreme Learning Machine
  • Action Recognition


Dive into the research topics of 'Visual Voice Activity Detection in the Wild'. Together they form a unique fingerprint.

Cite this