Learning to classify gender from four million images

Sen Jia, Nello Cristianini

Research output: Contribution to journalArticle (Academic Journal)peer-review

59 Citations (Scopus)


The application of learning algorithms to big datasets has been identified for a long time as an effective way to attack important tasks in pattern recognition, but the generation of large annotated datasets has a significant cost. We present a simple and effective method to generate a classifier of face images, by training a linear classification algorithm on a massive dataset entirely assembled and labelled by automated means. In doing so, we perform the largest experiment on face gender recognition so far published, reporting the highest performance yet. Four million images and more than 60,000 features are used to train online classifiers. By using an ensemble of linear classifiers, we achieve an accuracy of 96.86% on the most challenging public database, labelled faces in the wild (LFW), 2.05% higher than the previous best result on the same dataset (Shan, 2012). This result is relevant both for the machine learning community, addressing the role of large datasets, and the computer vision community, providing a way to make high quality face gender classifiers. Furthermore, we propose a general way to generate and exploit massive data without human annotation. Finally, we demonstrate a simple and effective adaptation of the Pegasos that makes it more robust.
Original languageEnglish
Pages (from-to)35-41
Number of pages7
JournalPattern Recognition Letters
Early online date26 Feb 2015
Publication statusPublished - 1 Jun 2015


  • Big data
  • Gender classification
  • On-line learning


Dive into the research topics of 'Learning to classify gender from four million images'. Together they form a unique fingerprint.

Cite this