Abstract
We consider algorithms for learning functions f: X → where X, and Y are finite, and there is assumed to be no noise in the data. Learning algorithms, Alg, are connected with γ(Alg), the set of prior probability distributions for which they are optimal. A method for constructing γ(Alg) from Alg is given and the relationship between the various γ(Alg) is discussed. Improper algorithms are identified as those for which γ(Alg) has zero volume. Improper algorithms are investigated using linear algebra and two examples of improper algorithms are given. This framework is then applied to the question of choosing between competing algorithms. “Leave-one-out” cross-validation is hence characterised as a crude method of ML-II prior selection. We conclude by examining how the mathematical results bear on practical problems and by discussing related work, as well as suggesting future work.
Original language | English |
---|---|
Title of host publication | Proceedings of the Twelfth International Conference on Machine Learning (ICML-95) |
Publisher | Morgan Kauffman |
Pages | 142-149 |
DOIs | |
Publication status | Published - 9 Jul 1995 |