Skip to content

Resource-based Dynamic Rewards for Factored MDPs

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Original languageEnglish
Title of host publication2017 IEEE 29th International Conference on Tools with Artificial Intelligence (ICTAI 2017)
Subtitle of host publicationProceedings of a meeting held 6-8 November 2017, Boston, Massachusetts, USA
Publisher or commissioning bodyInstitute of Electrical and Electronics Engineers (IEEE)
Pages1320-1327
Number of pages8
ISBN (Electronic)9781538638767
ISBN (Print)9781538638774
DOIs
DateAccepted/In press - 15 Aug 2017
DateE-pub ahead of print - 7 Jun 2018
DatePublished (current) - Jun 2018

Publication series

Name
ISSN (Print)2375-0197

Abstract

Factored MDPs provide an efficient way to reduce the complexity of large, real-world domains by exploiting structure within the state space. This avoids the need for the state space to be fully enumerated, which is impractical in large domains. However, defining a reward function for state transitions is difficult in a factored MDP since transitions are not known prior to execution. In this paper, we provide a novel method for deriving rewards from information within the states in order to determine intermediate rewards for state transitions. We do this by treating some specific state variables as resources, allowing costs and rewards to be inferred from changes to the resources and ensuring the agent is resource-aware while also being goal oriented. To facilitate this, we propose a novel variant of Dynamic Bayesian Networks specifically for modelling action transitions
and capable of dealing with relative changes to real-valued state variables (such as resources) in a compact fashion. We also propose a number of reward functions which model resource types commonly found in real-world situations. We go on to show that our proposed framework offers an improvement over existing techniques involving reward functions for factored MDPs as it improves both the efficiency and decision quality of online planners when operating on these models.

Download statistics

No data available

Documents

Documents

  • Full-text PDF (accepted author manuscript)

    Rights statement: This is the author accepted manuscript (AAM). The final published version (version of record) is available online via IEEE at https://ieeexplore.ieee.org/document/8372101 . Please refer to any applicable terms of use of the publisher.

    Accepted author manuscript, 715 KB, PDF document

DOI

View research connections

Related faculties, schools or groups