Scalable, Validated Code Translation of Entire Projects using Large Language Models

Hanliang Zhang, Cristina David, Meng Wang, Brandon Paulsen, Daniel Kroening

Research output: Chapter in Book/Report/Conference proceedingConference Contribution (Conference Proceeding)

Abstract

Large language models (LLMs) show promise in code translation due to their ability to generate idiomatic code.However, a significant limitation when using LLMs for code translation is scalability: existing works haveshown a drop in translation success rates for code exceeding around 100 lines. We overcome this limitation bydeveloping a modular approach to translation, where we partition the code into small code fragments whichcan be translated independently and semantically validated (that is, by checking I/O equivalence). When thisapproach is applied naively, we discover that LLMs are unreliable when translating features of the sourcelanguage that do not have a direct mapping to the target language, and that the LLM often gets stuck in repairloops when attempting to fix errors. To address these issues, we introduce two key concepts: (1) feature mapping,which integrates predefined translation rules with LLM-based translation to guide the LLM in navigating subtlelanguage differences and producing semantically accurate code; and (2) type-compatibility, which facilitateslocalized checks at the function signature level to detect errors early, thereby narrowing the scope of potentialrepairs. We apply our approach to translating real-world Go codebases to Rust, demonstrating that we canconsistently generate reliable Rust translations for projects up to 9,700 lines of code and 780 functions, with anaverage of 73% of functions successfully validated for I/O equivalence, considerably higher than any existingwork. An artifact for our work can be found at: https://zenodo.org/records/15049238.
Original languageEnglish
Title of host publicationProceedings of the Conference on Programming Language Design and Implementation
VolumePLDI
Publication statusAccepted/In press - 19 Apr 2025
Event46th ACM SIGPLAN Conference on Programming Language Design and Implementation - Westin Josun Seoul, Seoul, Korea, Republic of
Duration: 16 Jun 202520 Jun 2025
https://pldi25.sigplan.org/

Publication series

NameProceedings of the ACM on Programming Languages
PublisherAssociation for Computing Machinery (ACM)
ISSN (Print)2475-1421

Conference

Conference46th ACM SIGPLAN Conference on Programming Language Design and Implementation
Abbreviated titlePLDI 2025
Country/TerritoryKorea, Republic of
CitySeoul
Period16/06/2520/06/25
Internet address

Research Groups and Themes

  • Programming Languages

Fingerprint

Dive into the research topics of 'Scalable, Validated Code Translation of Entire Projects using Large Language Models'. Together they form a unique fingerprint.

Cite this