Skip to content

Resource-based Dynamic Rewards for Factored MDPs

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Standard

Resource-based Dynamic Rewards for Factored MDPs. / Killough, Ronan; Bauters, Kim; McAreavey, Kevin; Liu, Weiru; Hong, Jun.

2017 IEEE 29th International Conference on Tools with Artificial Intelligence (ICTAI 2017): Proceedings of a meeting held 6-8 November 2017, Boston, Massachusetts, USA. Institute of Electrical and Electronics Engineers (IEEE), 2018. p. 1320-1327.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Harvard

Killough, R, Bauters, K, McAreavey, K, Liu, W & Hong, J 2018, Resource-based Dynamic Rewards for Factored MDPs. in 2017 IEEE 29th International Conference on Tools with Artificial Intelligence (ICTAI 2017): Proceedings of a meeting held 6-8 November 2017, Boston, Massachusetts, USA. Institute of Electrical and Electronics Engineers (IEEE), pp. 1320-1327. https://doi.org/10.1109/ICTAI.2017.00198

APA

Killough, R., Bauters, K., McAreavey, K., Liu, W., & Hong, J. (2018). Resource-based Dynamic Rewards for Factored MDPs. In 2017 IEEE 29th International Conference on Tools with Artificial Intelligence (ICTAI 2017): Proceedings of a meeting held 6-8 November 2017, Boston, Massachusetts, USA (pp. 1320-1327). Institute of Electrical and Electronics Engineers (IEEE). https://doi.org/10.1109/ICTAI.2017.00198

Vancouver

Killough R, Bauters K, McAreavey K, Liu W, Hong J. Resource-based Dynamic Rewards for Factored MDPs. In 2017 IEEE 29th International Conference on Tools with Artificial Intelligence (ICTAI 2017): Proceedings of a meeting held 6-8 November 2017, Boston, Massachusetts, USA. Institute of Electrical and Electronics Engineers (IEEE). 2018. p. 1320-1327 https://doi.org/10.1109/ICTAI.2017.00198

Author

Killough, Ronan ; Bauters, Kim ; McAreavey, Kevin ; Liu, Weiru ; Hong, Jun. / Resource-based Dynamic Rewards for Factored MDPs. 2017 IEEE 29th International Conference on Tools with Artificial Intelligence (ICTAI 2017): Proceedings of a meeting held 6-8 November 2017, Boston, Massachusetts, USA. Institute of Electrical and Electronics Engineers (IEEE), 2018. pp. 1320-1327

Bibtex

@inproceedings{3c3b99d46cc74e48b72cba9599392b92,
title = "Resource-based Dynamic Rewards for Factored MDPs",
abstract = "Factored MDPs provide an efficient way to reduce the complexity of large, real-world domains by exploiting structure within the state space. This avoids the need for the state space to be fully enumerated, which is impractical in large domains. However, defining a reward function for state transitions is difficult in a factored MDP since transitions are not known prior to execution. In this paper, we provide a novel method for deriving rewards from information within the states in order to determine intermediate rewards for state transitions. We do this by treating some specific state variables as resources, allowing costs and rewards to be inferred from changes to the resources and ensuring the agent is resource-aware while also being goal oriented. To facilitate this, we propose a novel variant of Dynamic Bayesian Networks specifically for modelling action transitionsand capable of dealing with relative changes to real-valued state variables (such as resources) in a compact fashion. We also propose a number of reward functions which model resource types commonly found in real-world situations. We go on to show that our proposed framework offers an improvement over existing techniques involving reward functions for factored MDPs as it improves both the efficiency and decision quality of online planners when operating on these models.",
author = "Ronan Killough and Kim Bauters and Kevin McAreavey and Weiru Liu and Jun Hong",
year = "2018",
month = "6",
doi = "10.1109/ICTAI.2017.00198",
language = "English",
isbn = "9781538638774",
publisher = "Institute of Electrical and Electronics Engineers (IEEE)",
pages = "1320--1327",
booktitle = "2017 IEEE 29th International Conference on Tools with Artificial Intelligence (ICTAI 2017)",
address = "United States",

}

RIS - suitable for import to EndNote

TY - GEN

T1 - Resource-based Dynamic Rewards for Factored MDPs

AU - Killough, Ronan

AU - Bauters, Kim

AU - McAreavey, Kevin

AU - Liu, Weiru

AU - Hong, Jun

PY - 2018/6

Y1 - 2018/6

N2 - Factored MDPs provide an efficient way to reduce the complexity of large, real-world domains by exploiting structure within the state space. This avoids the need for the state space to be fully enumerated, which is impractical in large domains. However, defining a reward function for state transitions is difficult in a factored MDP since transitions are not known prior to execution. In this paper, we provide a novel method for deriving rewards from information within the states in order to determine intermediate rewards for state transitions. We do this by treating some specific state variables as resources, allowing costs and rewards to be inferred from changes to the resources and ensuring the agent is resource-aware while also being goal oriented. To facilitate this, we propose a novel variant of Dynamic Bayesian Networks specifically for modelling action transitionsand capable of dealing with relative changes to real-valued state variables (such as resources) in a compact fashion. We also propose a number of reward functions which model resource types commonly found in real-world situations. We go on to show that our proposed framework offers an improvement over existing techniques involving reward functions for factored MDPs as it improves both the efficiency and decision quality of online planners when operating on these models.

AB - Factored MDPs provide an efficient way to reduce the complexity of large, real-world domains by exploiting structure within the state space. This avoids the need for the state space to be fully enumerated, which is impractical in large domains. However, defining a reward function for state transitions is difficult in a factored MDP since transitions are not known prior to execution. In this paper, we provide a novel method for deriving rewards from information within the states in order to determine intermediate rewards for state transitions. We do this by treating some specific state variables as resources, allowing costs and rewards to be inferred from changes to the resources and ensuring the agent is resource-aware while also being goal oriented. To facilitate this, we propose a novel variant of Dynamic Bayesian Networks specifically for modelling action transitionsand capable of dealing with relative changes to real-valued state variables (such as resources) in a compact fashion. We also propose a number of reward functions which model resource types commonly found in real-world situations. We go on to show that our proposed framework offers an improvement over existing techniques involving reward functions for factored MDPs as it improves both the efficiency and decision quality of online planners when operating on these models.

U2 - 10.1109/ICTAI.2017.00198

DO - 10.1109/ICTAI.2017.00198

M3 - Conference contribution

SN - 9781538638774

SP - 1320

EP - 1327

BT - 2017 IEEE 29th International Conference on Tools with Artificial Intelligence (ICTAI 2017)

PB - Institute of Electrical and Electronics Engineers (IEEE)

ER -