Abstract
This thesis presents a neurosymbolic framework called Symplex (Symbolically-constrained Policy Learning from Exploration/Examples) which combines policy optimisation with example-based user interaction designed to elucidate additional preferences that end users find easier to illustrate than directly formalise. To achieve this, Symplex interleaves a symbolic system based on interactive Inductive Logic Programming (ILP) which learns user preferences as first-order logic constraints derived from example demonstrations, with a neural system based on Deep Q learning (DQL) that learns near-optimal policies subject to those constraints.The core contribution of Symplex lies in its ability to satisfy three key properties: learned constraints take the form of first-order symbolic clauses making them inherently human-interpretable and generalisable across different environment configurations; users are only required to provide examples of desirable behaviour, rather than formal specifications, and can harness interactive feedback mechanisms to refine constraints, resolve conflicts and provide counter-examples; constraints are encoded as low-level state-action penalties in the DQL reward function so can be overridden to account for newly-provided user demonstrations. Additionally, to support practical use and broaden accessibility to a wider user population, Symplex includes a visual interface designed to allow users without expertise in ILP to guide and refine constraint induction.
Experimental results show that Symplex outperforms existing approaches in both efficiency and accuracy when learning hard constraints in a simulated traffic domain, and that its support for defeasibility enables continued adaptation even when new demonstrations contradict previously learned rules. In a benchmark Pacman environment, Symplex successfully learned logical constraints and associated policies to achieve compliance comparable to prior work which required manual specification. Also in this setting, Symplex's interactive ILP mechanisms are shown to accelerate convergence to more accurate and efficient rule sets. Finally, a formal user study verified that these improvements generalise to real-world users and suggest that the visual interface enhances the usability of interactive ILP, supporting more effective, user-driven constraint refinement.
| Date of Award | 9 Dec 2025 |
|---|---|
| Original language | English |
| Awarding Institution |
|
| Supervisor | Oliver Ray (Supervisor) |
Cite this
- Standard