Performance Evaluation in Machine Learning: The Good, the Bad, the Ugly, and the Way Forward

Research output: Chapter in Book/Report/Conference proceedingConference Contribution (Conference Proceeding)

Abstract

This paper gives an overview of some ways in which our understanding of performance evaluation measures for machine-learned classifiers has improved over the last twenty years. I also highlight a range of areas where this understanding is still lacking, leading to ill-advised practices in classifier evaluation. This suggests that in order to make further progress we need to develop a proper measurement theory of machine learning. I then demonstrate by example what such a measurement theory might look like and what kinds of new results it would entail. Finally, I argue that key properties such as classification ability and data set difficulty are unlikely to be directly observable, suggesting the need for latent-variable models and causal inference.
Original languageEnglish
Title of host publicationProceedings of the AAAI Conference on Artificial Intelligence
Pages9808-9814
Number of pages7
DOIs
Publication statusPublished - 17 Jul 2019
EventAAAI Conference on Artificial Intelligence - Hilton Hawaiian Village, Honolulu, United States
Duration: 27 Jan 20191 Feb 2019
https://aaai.org/Conferences/AAAI-19/

Conference

ConferenceAAAI Conference on Artificial Intelligence
Abbreviated titleAAAI
Country/TerritoryUnited States
CityHonolulu
Period27/01/191/02/19
Internet address

Structured keywords

  • Jean Golding

Fingerprint

Dive into the research topics of 'Performance Evaluation in Machine Learning: The Good, the Bad, the Ugly, and the Way Forward'. Together they form a unique fingerprint.

Cite this