Abstract
Transparency in AI models is crucial to designing, auditing, and deploying AI systems. However, `black box' models are still used in practice for their predictive power despite their lack of transparency. This has led to a demand for post-hoc, model-agnostic surrogate explainers which provide explanations for decisions of any model by approximating its behaviour close to a query point with a surrogate model. However, it is often overlooked how the location of the query point in the decision surface of the black box model affects the faithfulness of the surrogate explainer. Here, we show that when using standard techniques, there is a decrease in agreement between the black box and the surrogate model for query points towards the edge of the test dataset and when moving away from the decision boundary. This originates from a mismatch between the data distributions used to train and evaluate surrogate explainers. We address this by leveraging knowledge about the test data distribution captured in the class labels of the black box model. By addressing this and encouraging users to take care in understanding the alignment of training and evaluation objectives, we empower them to construct more faithful surrogate explainers.
Original language | English |
---|---|
Title of host publication | CIKM 2023 - Proceedings of the 32nd ACM International Conference on Information and Knowledge Management |
Publisher | Association for Computing Machinery (ACM) |
Pages | 3833-3837 |
Number of pages | 5 |
ISBN (Electronic) | 9798400701245 |
DOIs | |
Publication status | Published - 21 Oct 2023 |
Publication series
Name | International Conference on Information and Knowledge Management, Proceedings |
---|---|
ISSN (Print) | 2155-0751 |
Bibliographical note
Funding Information:This work is supported by the UKRI Centre for Doctoral Training in Interactive AI EP/S022937/1, UKRI Turing AI Fellowship EP/V024817/1, and the TAILOR ICT-48 Network funded by EU Horizon 2020 under grant agreement 952215.
Publisher Copyright:
© 2023 Copyright held by the owner/author(s).