Tree-based fitted Q-iteration for multi-objective Markov decision processes in water resource management

F. Pianosi*, A. Castelletti, M. Restelli

*Corresponding author for this work

Research output: Contribution to journalArticle (Academic Journal)peer-review

27 Citations (Scopus)


Multi-objective Markov decision processes (MOMDPs) provide an effective modeling framework for decision-making problems involving water systems. The traditional approach is to define many single-objective problems (resulting from different combinations of the objectives), each solvable by standard optimization. This paper presents an approach based on reinforcement learning (RL) that can learn the operating policies for all combinations of objectives in a single training process. The key idea is to enlarge the approximation of the action-value function, which is performed by single-objective RL over the state-action space, to the space of the objectives' weights. The batch-mode nature of the algorithm allows for enriching the training dataset without further interaction with the controlled system. The approach is demonstrated on a numerical test case study and evaluated on a real-world application, the Hoa Binh reservoir, Vietnam. Experimental results on the test case show that the proposed approach (multi-objective fitted Q-iteration; MOFQI) becomes computationally preferable over the repeated application of its single-objective version (fitted Q-iteration; FQI) when evaluating more than five weight combinations. In the Hoa Binh case study, the operating policies computed with MOFQI and FQI have comparable efficiency, while MOFQI provides a continuous approximation of the Pareto frontier with no additional computing costs.

Original languageEnglish
Pages (from-to)258-270
JournalJournal of Hydroinformatics
Issue number2
Publication statusPublished - 2013


  • multi-objective optimization
  • optimal control
  • reinforcement learning
  • tree-based models
  • reservoir operation


Dive into the research topics of 'Tree-based fitted Q-iteration for multi-objective Markov decision processes in water resource management'. Together they form a unique fingerprint.

Cite this