Abstract
We propose a method to capture the handling abilities of fast jet pilots in a software model via reinforcement learning (RL) from human preference feedback. We use pairwise preferences over simulated flight trajectories to learn an interpretable rule-based model called a reward tree, which enables the automated scoring of trajectories alongside an explanatory rationale. We train an RL agent to execute high-quality handling behaviour by using the reward tree as the objective, and thereby generate data for iterative preference collection and further refinement of both tree and agent. Experiments with synthetic preferences show reward trees to be competitive with uninterpretable neural network reward models on quantitative and qualitative evaluations.
Original language | English |
---|---|
Title of host publication | AIAA SCITECH 2024 Forum |
Place of Publication | Orlando, FL |
Publisher | American Institute of Aeronautics and Astronautics Inc. (AIAA) |
Number of pages | 17 |
ISBN (Electronic) | 9781624107115 |
DOIs | |
Publication status | Published - 4 Jan 2024 |
Event | 2024 AIAA SciTech Forum - Orlando, United States Duration: 8 Jan 2024 → 12 Jan 2024 https://www.aiaa.org/scitech |
Conference
Conference | 2024 AIAA SciTech Forum |
---|---|
Abbreviated title | SciTech |
Country/Territory | United States |
City | Orlando |
Period | 8/01/24 → 12/01/24 |
Internet address |
Bibliographical note
Publisher Copyright:© 2024 by the American Institute of Aeronautics and Astronautics, Inc. All rights reserved.
Keywords
- cs.AI