Reconciling Training and Evaluation Objectives in Location Agnostic Surrogate Explainers

Research output: Chapter in Book/Report/Conference proceedingConference Contribution (Conference Proceeding)

1 Citation (Scopus)

Abstract

Transparency in AI models is crucial to designing, auditing, and deploying AI systems. However, `black box' models are still used in practice for their predictive power despite their lack of transparency. This has led to a demand for post-hoc, model-agnostic surrogate explainers which provide explanations for decisions of any model by approximating its behaviour close to a query point with a surrogate model. However, it is often overlooked how the location of the query point in the decision surface of the black box model affects the faithfulness of the surrogate explainer. Here, we show that when using standard techniques, there is a decrease in agreement between the black box and the surrogate model for query points towards the edge of the test dataset and when moving away from the decision boundary. This originates from a mismatch between the data distributions used to train and evaluate surrogate explainers. We address this by leveraging knowledge about the test data distribution captured in the class labels of the black box model. By addressing this and encouraging users to take care in understanding the alignment of training and evaluation objectives, we empower them to construct more faithful surrogate explainers.
Original languageEnglish
Title of host publicationCIKM 2023 - Proceedings of the 32nd ACM International Conference on Information and Knowledge Management
PublisherAssociation for Computing Machinery (ACM)
Pages3833-3837
Number of pages5
ISBN (Electronic)9798400701245
DOIs
Publication statusPublished - 21 Oct 2023

Publication series

NameInternational Conference on Information and Knowledge Management, Proceedings
ISSN (Print)2155-0751

Bibliographical note

Funding Information:
This work is supported by the UKRI Centre for Doctoral Training in Interactive AI EP/S022937/1, UKRI Turing AI Fellowship EP/V024817/1, and the TAILOR ICT-48 Network funded by EU Horizon 2020 under grant agreement 952215.

Publisher Copyright:
© 2023 Copyright held by the owner/author(s).

Fingerprint

Dive into the research topics of 'Reconciling Training and Evaluation Objectives in Location Agnostic Surrogate Explainers'. Together they form a unique fingerprint.

Cite this