ACL 2026 Features Multiple Studies on Reasoning Efficiency and Interpretability

ACL 2026 Main Conference and Findings include several studies on large language model reasoning efficiency and internal mechanisms, covering reward hacking in hybrid reasoning models, computational redundancy in test-time scaling, geometric mechanisms of addition arithmetic errors, and exploration diversity in reinforcement learning.

Reward Hacking in Hybrid Reasoning Models and the TNT Solution

Researchers from Nanjing University, Shanghai AI Laboratory, and China Mobile Jiuyuan Research Institute found that hybrid reasoning models are prone to "reward hacking" during reinforcement learning training: the model outputs non-thinking mode format tokens while still engaging in lengthy reasoning to obtain higher rewards. To address this, the team proposed Thinking-Based Non-Thinking (TNT), which dynamically sets the token limit for non-thinking mode based on the length of the answer part in thinking mode responses, without requiring expensive SFT. Experiments show that TNT reduces the probability of reward hacking to below 10%, cuts average token usage by 46.2% on five math benchmarks, and improves accuracy by 4.1 percentage points. This paper has been accepted by ACL 2026 Main Conference.

Ant Group EGSS: Entropy-Guided Test-Time Scaling

Ant Group's CodeFuse team proposed the EGSS framework to address computational redundancy and patch selection fragility in test-time scaling (TTS). EGSS identifies high-uncertainty decision points via "tool entropy," explores multiple candidates only at critical steps, and introduces a cross-trajectory test integration mechanism that replaces subjective scoring with objective execution results. On SWE-Bench-Verified, EGSS with K=4 outperforms the baseline with K=8, saving 38-42% tokens, and GLM-4.6+EGSS achieves a 74.6% solve rate, setting a new record for open-source methods. This paper has been accepted by ACL 2026 Main Conference.

Geometric Mechanisms of Addition Arithmetic Errors: IRST and Noise Quantization Model

A team from Nanjing University studied the internal representations of LLMs in multi-digit addition from a mechanistic interpretability perspective, discovering that hidden states form hierarchical geometric manifolds. They proposed the Identical Radix Sum Trajectory (IRST) and a noise quantization model. IRST reveals that arithmetic states with the same radix sum are arranged along continuous trajectories, while the noise quantization model explains that errors occur near the quantization boundaries of continuous representations. Based on this, the team designed an inference-time error correction method called "dual-stream consistency check," improving token accuracy. This paper has been accepted by ICML 2026.

N-GRPO: Semantic Proximity Exploration Enhances Reinforcement Learning Generalization

Zhejiang University and Ant Group proposed N-GRPO, which advances GRPO's exploration from discrete token space to continuous embedding space. By using Semantic Neighbor Mixing to perturb within local semantic neighborhoods, it balances exploration diversity and semantic stability. On math benchmarks such as AIME25, N-GRPO outperforms GRPO and Soft Thinking on the Pass@32 metric and demonstrates good out-of-distribution generalization. This paper has been accepted by ACL 2026 Findings.

ACL 2026 Features Multiple Studies on Reasoning Efficiency and Interpretability

Reward Hacking in Hybrid Reasoning Models and the TNT Solution

Ant Group EGSS: Entropy-Guided Test-Time Scaling

Geometric Mechanisms of Addition Arithmetic Errors: IRST and Noise Quantization Model

N-GRPO: Semantic Proximity Exploration Enhances Reinforcement Learning Generalization

Documentation

Getting Started

Learn more