Skip to main navigation Skip to search Skip to main content

Integrating Large Language Models and Reinforcement Learning for Non-Linear Reasoning

Yoav Alon, Cristina David

Research output: Chapter in Book/Report/Conference proceedingConference Contribution (Conference Proceeding)

16 Downloads (Pure)

Abstract

Large Language Models (LLMs) were shown to struggle with long-term planning, which may be caused by the limited way in which they explore the space of possible solutions. We propose an architecture where a Reinforcement Learning (RL) Agent guides an LLM's space exploration: (1) the Agent has access to domain-specific information, and can therefore make decisions about the quality of candidate solutions based on specific and relevant metrics, which were not explicitly considered by the LLM's training objective; (2) the LLM can focus on generating immediate next steps, without the need for long-term planning. We allow non-linear reasoning by exploring alternative paths and backtracking. We evaluate this architecture on the program equivalence task, and compare it against Chain of Thought (CoT) and Tree of Thoughts (ToT). We assess both the downstream task, denoting the binary classification, and the intermediate reasoning steps. Our approach compares positively against CoT and ToT.
Original languageEnglish
Title of host publicationFSE '25
Subtitle of host publicationACM International Conference on the Foundations of Software Engineering (FSE)
PublisherAssociation for Computing Machinery
Pages957-977
Number of pages21
DOIs
Publication statusPublished - 1 Jul 2025
EventThe ACM International Conference on the Foundations of Software Engineering (FSE) 2025 - Trondheim, Norway
Duration: 23 Jul 202527 Jul 2025
https://conf.researchr.org/home/fse-2025

Publication series

Name Proceedings of the ACM on Software Engineering
PublisherACM
NumberFSE
Volume2
ISSN (Electronic)2994-970X

Conference

ConferenceThe ACM International Conference on the Foundations of Software Engineering (FSE) 2025
Abbreviated titleFSE 2025
Country/TerritoryNorway
CityTrondheim
Period23/07/2527/07/25
Internet address

Fingerprint

Dive into the research topics of 'Integrating Large Language Models and Reinforcement Learning for Non-Linear Reasoning'. Together they form a unique fingerprint.

Cite this