Solving a semi-Markov decision process (SMDP) using value or policy iteration requires precise knowledge of the probabilistic model and suffers from the curse of dimensionality. To overcome these limitations, we present a reinforcement learning approach where one optimizes the SMDP performance criterion with respect to a family of parameterised policies. We propose an online algorithm that simultaneously estimates the gradient of the performance criterion and optimises it using stochastic approximation. We apply our algorithm to call admission control. Our proposed policy gradient SMDP algorithm and its application to admission control is novel. (c) 2006 Elsevier B.V. All rights reserved.
|Translated title of the contribution||A policy gradient method for semi-Markov decision processes with application to call admission|
|Pages (from-to)||808 - 818|
|Number of pages||11|
|Journal||European Journal of Operational Research|
|Publication status||Published - 1 May 2007|
Bibliographical notePublisher: Elsevier Science BV
Other identifier: IDS number 115JT