Abstract
Chain-of-Thought (CoT) guides large language models to reason step-by-step, yielding remarkable performance gains across diverse tasks. However, this structured reasoning process also introduces novel and underexplored security risks. In this paper, we present an in-depth analysis of fine-tuning attacks targeting CoT-enabled LLMs, with particular focus on "aha moments" during reasoning, which are critical intermediate steps the model takes to make a significant decision or change its behavior. Through experiments on six CoT models and three non-CoT baselines, we find that even aligned CoT models can be more harmful than their base models. Moreover, the reasoning process frequently contains more harmful and actionable content than the final answer, even when the final answer refuses a harmful request. By examining the causal relationship between the reasoning process and the final outputs, we identify two distinct failure modes, Unintentional Leakage and Harmful Escalation, that systematically drive the generation of harmful reasoning. To rigorously assess these risks, we propose an evaluation framework grounded in the EU AI Act and construct a policy-aligned benchmark dataset for CoT reasoning. Our findings expose inherent vulnerabilities in CoT and offer insights for supervising and aligning the reasoning process in LLMs.
| Original language | English |
|---|---|
| Title of host publication | ASIA CCS '26: 21st ACM Asia Conference on Computer and Communications Security |
| Publisher | Association for Computing Machinery |
| Number of pages | 18 |
| Publication status | Accepted/In press - 20 Nov 2025 |
| Event | 21st ACM ASIA Conference on Computer and Communications Security - Bangalore, India Duration: 1 Jun 2026 → 5 Jun 2026 https://asiaccs2026.cse.iitkgp.ac.in/ |
Conference
| Conference | 21st ACM ASIA Conference on Computer and Communications Security |
|---|---|
| Abbreviated title | ACM ASIACCS 2026 |
| Country/Territory | India |
| City | Bangalore |
| Period | 1/06/26 → 5/06/26 |
| Internet address |
Research Groups and Themes
- Cyber Security
Keywords
- LLMs
- Safe AI