Claude Fable 5 Unban Sparks Controversy: Security Upgrade Causes Performance Drop, Second Jailbreak Raises Debate
On June 30, 2026, the U.S. Department of Commerce lifted export controls on Anthropic's Claude Fable 5 and Mythos 5, restoring global access on July 1. The models were previously taken down globally on June 12 after Amazon researchers discovered a jailbreak method. Post-unban, Anthropic deployed a new safety classifier that led to frequent false positives, causing significant performance drops on benchmarks like BridgeBench, sparking developer backlash. Meanwhile, security researcher Vitto Rivabella announced a successful second jailbreak on July 2, but noted high attack costs and limited practical gains. Anthropic also proposed an industry jailbreak severity framework with Google, Microsoft, and Amazon, and pledged deeper government collaboration.
Event Background and Unban
- Ban Details: On June 12, the U.S. Department of Commerce ordered Anthropic to globally remove Claude Fable 5 and Mythos 5 under export controls, citing a jailbreak method discovered by Amazon researchers that could bypass safety measures to generate exploit code. Anthropic suspended all user access due to lack of real-time nationality verification.
- Unban Conditions: On June 30, Commerce Secretary Lutnick signed an order lifting the ban, with Anthropic committing to proactively detect security risks, collaborate with the government on release protocols, and report malicious activities. Models resumed access on July 1, but Mythos 5 was only reopened to select U.S. institutions.
Security Upgrade and Performance Controversy
- New Safety Classifier: Anthropic trained a classifier specifically to intercept jailbreak attempts, achieving over 99% success rate, but at the cost of more frequent false positives on normal requests like coding and debugging. When risk is detected, the system automatically switches to Opus 4.8, notifying the user.
- Performance Drop: BridgeMind's BridgeBench tests showed Fable 5's Debugging capability fell from 86.2 to 25.9, Refactoring from 73.6 to 38.4, and Hallucination from 75.9 to 61.7. Out of 12 debugging tasks, only 3 avoided downgrade; the rest were forced to Opus 4.8 and scored zero.
- User Feedback: Developers complained that the model frequently rejected harmless requests, such as explaining the word "human" or counting the letter 'r' in "raspberry." Some users found backend logs labeled "TOO_DUMB_TO_NEED_FABLE," raising questions about Anthropic's attitude.
Second Jailbreak and Industry Impact
- Jailbreak Details: On July 2, security researcher Vitto Rivabella announced a successful jailbreak after 20 hours, using low-resource languages (e.g., Santali) and combined attacks to bypass three classifier layers. However, 90% of requests were blocked, and the obtained content was mostly false information or low-severity vulnerabilities, with limited practical value.
- Industry Framework: Anthropic, together with Amazon, Microsoft, and Google, proposed a four-dimensional jailbreak severity assessment framework (capability gain, gain breadth, weaponization difficulty, discoverability) to standardize risk evaluation. They also launched a HackerOne vulnerability disclosure program to encourage reporting jailbreak methods.
- Government Collaboration: Anthropic committed to allowing government agencies to test models before release, sharing intelligence quickly, investing compute resources in joint security research, and setting up bug bounties.
Controversy and Reflection
- False Positives and Dumbing Down: The safety classifier's over-aggressive blocking rendered the model "in name only," with users paying for Fable 5 but frequently receiving Opus 4.8 service. Anthropic acknowledged this as a deliberate trade-off, but developers argued it undermined product value.
- Second Jailbreak Insights: Although the jailbreak succeeded, the high attack cost showed that current safety measures effectively raised the bar. However, blind spots like low-resource languages exposed biases in AI safety training data.
- Industry Impact: The incident highlights the challenge of balancing AI safety and usability, and the urgency of establishing unified safety standards. Anthropic's actions may push the industry toward more standardized jailbreak response mechanisms.
Also available in 中文.