Counterfactual explanations of machine learning predictions: Opportunities and challenges for AI safety

Research output: Chapter in Book/Report/Conference proceedingConference Contribution (Conference Proceeding)

59 Downloads (Pure)

Abstract

One necessary condition for creating a safe AI system is making it transparent to uncover any unintended or harmful behaviour. Transparency can be achieved by explaining predictions of an AI system with counterfactual statements, which are becoming a de facto standard in explaining algorithmic decisions. The popularity of counterfactuals is mainly attributed to their compliance with the “right to explanation” introduced by the European Union’s General Data Protection Regulation and them being understandable by a lay audience as well as domain experts. In this paper we describe our experience and the lessons learnt from explaining decision tree models trained on UCI German Credit and FICO Explainable Machine Learning Challenge data sets with class-contrastive counterfactual statements. We review how counterfactual explanations can affect an artificial intelligence system and its safety by investigating their risks and benefits. We show example explanations, discuss their strengths and weaknesses, show how they can be used to debug the underlying model, inspect its fairness and unveil security and privacy challenges that they pose.

Original languageEnglish
Title of host publicationProceedings of the AAAI Workshop on Artificial Intelligence Safety 2019
Subtitle of host publicationco-located with the Thirty-Third AAAI Conference on Artificial Intelligence 2019 (AAAI 2019) Honolulu, Hawaii, January 27, 2019
PublisherCEUR Workshop Proceedings
Number of pages4
Volume2301
Publication statusPublished - 27 Jan 2019
Event2019 AAAI Workshop on Artificial Intelligence Safety, SafeAI 2019 - Honolulu, United States
Duration: 27 Jan 2019 → …

Publication series

NameCEUR Workshop Proceedings
ISSN (Print)1613-0073

Conference

Conference2019 AAAI Workshop on Artificial Intelligence Safety, SafeAI 2019
CountryUnited States
CityHonolulu
Period27/01/19 → …

Fingerprint Dive into the research topics of 'Counterfactual explanations of machine learning predictions: Opportunities and challenges for AI safety'. Together they form a unique fingerprint.

  • Cite this

    Sokol, K., & Flach, P. (2019). Counterfactual explanations of machine learning predictions: Opportunities and challenges for AI safety. In Proceedings of the AAAI Workshop on Artificial Intelligence Safety 2019: co-located with the Thirty-Third AAAI Conference on Artificial Intelligence 2019 (AAAI 2019) Honolulu, Hawaii, January 27, 2019 (Vol. 2301). (CEUR Workshop Proceedings). CEUR Workshop Proceedings. http://ceur-ws.org/Vol-2301/