Abstract
Deep learning (DL) compilers are the essential infrastructure to optimize DL models for efficient execution across heterogeneous hardware. Like traditional compilers, they are also bug-prone. However, not all bug reports submitted to DL compiler repositories reflect genuine bugs. Many are false-positive bug reports caused by incorrect configurations or user misunderstandings. These reports can mislead developers, waste debugging resources, and delay critical bug fixes. This paper presents the first comprehensive study of false-positive bug reports in DL compilers, analyzing 1,075 closed issues and discussions from two representative systems: TVMand OpenVINO. We find that false-positive bug reports demand substantial developer effort, occur throughout the compiler workflow, especially during the build and import and IR transformation stages, and frequently result from incorrect environment configuration, incorrect usage, or misunderstanding of compiler features or limitations. To address this challenge, we further investigate the potential of large language models (LLMs)to automatically mitigate false-positive bug reports. Through extensive experiments, we find that few-shot prompting achieves promising performance, with strong accuracy and explanation quality. Our study sheds light on an overlooked yet important category of compiler issues and demonstrates the potential of LLMs in supporting more efficient bug report triage in DL compilers.
| Original language | English |
|---|---|
| Journal | ACM Transactions on Software Engineering and Methodology |
| Early online date | 6 Nov 2025 |
| DOIs | |
| Publication status | E-pub ahead of print - 6 Nov 2025 |
Research Groups and Themes
- Programming Languages