Biased Embeddings from Wild Data: Measuring, Understanding and Removing

Adam Sutton, Thomas Lansdall-Welfare, Nello Cristianini

Research output: Chapter in Book/Report/Conference proceedingConference Contribution (Conference Proceeding)

2 Citations (Scopus)
120 Downloads (Pure)


Many modern Artificial Intelligence (AI) systems make use of data embeddings, particularly in the domain of Natural Language Processing (NLP). These embeddings are learnt from data that has been gathered "from the wild" and have been found to contain unwanted biases. In this paper we make three contributions towards measuring, understanding and removing this problem. We present a rigorous way to measure some of these biases, based on the use of word lists created for social psychology applications; we observe how gender bias in occupations reflects actual gender bias in the same occupations in the real world; and finally we demonstrate how a simple projection can significantly reduce the effects of embedding bias. All this is part of an ongoing effort to understand how trust can be built into AI systems.
Original languageEnglish
Title of host publicationAdvances in Intelligent Data Analysis XVII
Subtitle of host publication17th International Symposium, IDA 2018, ’s-Hertogenbosch, The Netherlands, October 24–26, 2018, Proceedings
PublisherSpringer, Cham
Number of pages12
ISBN (Electronic)9783030017682
ISBN (Print)9783030017675
Publication statusPublished - 5 Oct 2018

Publication series

NameLecture Notes in Computer Science
ISSN (Print)0302-9743


  • Fairness in AI
  • Bias in data
  • Artificial intelligence
  • Natural language processing
  • Word embeddings


Dive into the research topics of 'Biased Embeddings from Wild Data: Measuring, Understanding and Removing'. Together they form a unique fingerprint.

Cite this