A policy gradient method for semi-Markov decision processes with application to call admission

SS Singh, VB Tadic, A Doucet

Research output: Contribution to journalArticle (Academic Journal)peer-review

32 Citations (Scopus)

Abstract

Solving a semi-Markov decision process (SMDP) using value or policy iteration requires precise knowledge of the probabilistic model and suffers from the curse of dimensionality. To overcome these limitations, we present a reinforcement learning approach where one optimizes the SMDP performance criterion with respect to a family of parameterised policies. We propose an online algorithm that simultaneously estimates the gradient of the performance criterion and optimises it using stochastic approximation. We apply our algorithm to call admission control. Our proposed policy gradient SMDP algorithm and its application to admission control is novel. (c) 2006 Elsevier B.V. All rights reserved.
Translated title of the contributionA policy gradient method for semi-Markov decision processes with application to call admission
Original languageEnglish
Pages (from-to)808 - 818
Number of pages11
JournalEuropean Journal of Operational Research
Volume178 (3)
DOIs
Publication statusPublished - 1 May 2007

Bibliographical note

Publisher: Elsevier Science BV
Other identifier: IDS number 115JT

Fingerprint

Dive into the research topics of 'A policy gradient method for semi-Markov decision processes with application to call admission'. Together they form a unique fingerprint.

Cite this