Elaboration on the policy improvement theorem for soft policies in reinforcement learning

Tim Kovacs

Research output: Working paperWorking paper and Preprints

Abstract

In section 5.4 of their book on reinforcement learning Sutton and Barto show that the policy improvement theorem applies to soft policies, that is, when making a soft policy greedier (but still soft) with respect to its Q-function we obtain an improved policy. I found this material difficult to follow and wrote this short document to elaborate on their proof. Familiarity with the material up until that section is assumed.
Translated title of the contributionElaboration on the policy improvement theorem for soft policies in reinforcement learning
Original languageEnglish
PublisherDepartment of Computer Science, University of Bristol
Publication statusPublished - 2010

Bibliographical note

Other page information: -
Other identifier: 2001266

Fingerprint Dive into the research topics of 'Elaboration on the policy improvement theorem for soft policies in reinforcement learning'. Together they form a unique fingerprint.

Cite this