Abstract
Large language models (LLMs) show promise in code translation due to their ability to generate idiomatic code.However, a significant limitation when using LLMs for code translation is scalability: existing works haveshown a drop in translation success rates for code exceeding around 100 lines. We overcome this limitation bydeveloping a modular approach to translation, where we partition the code into small code fragments whichcan be translated independently and semantically validated (that is, by checking I/O equivalence). When thisapproach is applied naively, we discover that LLMs are unreliable when translating features of the sourcelanguage that do not have a direct mapping to the target language, and that the LLM often gets stuck in repairloops when attempting to fix errors. To address these issues, we introduce two key concepts: (1) feature mapping,which integrates predefined translation rules with LLM-based translation to guide the LLM in navigating subtlelanguage differences and producing semantically accurate code; and (2) type-compatibility, which facilitateslocalized checks at the function signature level to detect errors early, thereby narrowing the scope of potentialrepairs. We apply our approach to translating real-world Go codebases to Rust, demonstrating that we canconsistently generate reliable Rust translations for projects up to 9,700 lines of code and 780 functions, with anaverage of 73% of functions successfully validated for I/O equivalence, considerably higher than any existingwork. An artifact for our work can be found at: https://zenodo.org/records/15049238.
Original language | English |
---|---|
Title of host publication | Proceedings of the Conference on Programming Language Design and Implementation |
Volume | PLDI |
Publication status | Accepted/In press - 19 Apr 2025 |
Event | 46th ACM SIGPLAN Conference on Programming Language Design and Implementation - Westin Josun Seoul, Seoul, Korea, Republic of Duration: 16 Jun 2025 → 20 Jun 2025 https://pldi25.sigplan.org/ |
Publication series
Name | Proceedings of the ACM on Programming Languages |
---|---|
Publisher | Association for Computing Machinery (ACM) |
ISSN (Print) | 2475-1421 |
Conference
Conference | 46th ACM SIGPLAN Conference on Programming Language Design and Implementation |
---|---|
Abbreviated title | PLDI 2025 |
Country/Territory | Korea, Republic of |
City | Seoul |
Period | 16/06/25 → 20/06/25 |
Internet address |
Research Groups and Themes
- Programming Languages