Learning Interpretable Models of Aircraft Handling Behaviour by Reinforcement Learning from Human Feedback

Tom Bewley*, Jonathan Lawry, Arthur G Richards

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference Contribution (Conference Proceeding)

1 Citation (Scopus)
76 Downloads (Pure)

Abstract

We propose a method to capture the handling abilities of fast jet pilots in a software model via reinforcement learning (RL) from human preference feedback. We use pairwise preferences over simulated flight trajectories to learn an interpretable rule-based model called a reward tree, which enables the automated scoring of trajectories alongside an explanatory rationale. We train an RL agent to execute high-quality handling behaviour by using the reward tree as the objective, and thereby generate data for iterative preference collection and further refinement of both tree and agent. Experiments with synthetic preferences show reward trees to be competitive with uninterpretable neural network reward models on quantitative and qualitative evaluations.
Original languageEnglish
Title of host publicationAIAA SCITECH 2024 Forum
Place of PublicationOrlando, FL
PublisherAmerican Institute of Aeronautics and Astronautics Inc. (AIAA)
Number of pages17
ISBN (Electronic)9781624107115
DOIs
Publication statusPublished - 4 Jan 2024
Event2024 AIAA SciTech Forum - Orlando, United States
Duration: 8 Jan 202412 Jan 2024
https://www.aiaa.org/scitech

Conference

Conference2024 AIAA SciTech Forum
Abbreviated titleSciTech
Country/TerritoryUnited States
CityOrlando
Period8/01/2412/01/24
Internet address

Bibliographical note

Publisher Copyright:
© 2024 by the American Institute of Aeronautics and Astronautics, Inc. All rights reserved.

Keywords

  • cs.AI

Fingerprint

Dive into the research topics of 'Learning Interpretable Models of Aircraft Handling Behaviour by Reinforcement Learning from Human Feedback'. Together they form a unique fingerprint.

Cite this