ModelsJul 5, 2026
Fable 5 Returns to Mixed Reception: Overly Strict Safety Guardrails Cause Score Plunge, Developers Question 'Bait and Switch'
Anthropic re-released its strongest model Fable 5 on July 1, but it quickly drew widespread negative feedback. Developers report that the model frequently triggers safety guardrails, forcing downgrades to the weaker Opus 4.8, resulting in a significantly degraded experience.
Score Plunge and the 'Downgrade' Truth
- BridgeMind benchmark shows the returned Fable 5's Debugging ability dropped from 86.2 to 25.9 (a 70% decline), Refactoring from 73.6 to 38.4, and Hallucination from 75.9 to 61.7.
- Further analysis reveals that out of 12 Debugging tasks, only 3 were completed by Fable 5; the remaining 9 were intercepted by the safety classifier and routed to Opus 4.8, scoring zero by rule. BridgeMind notes: 'The model itself hasn't weakened, but the guardrails prevent most tasks from being executed.'
Safety Guardrail 'False Positives' and Double Standards
- Anthropic acknowledged in an official blog that the new classifier is 'deliberately broad' and will block many harmless requests. User feedback includes:
- Simple questions like explaining the word 'human' or asking 'how many r's in raspberry' are blocked.
- An ecology PhD researching 'tree cooling' was deemed unsafe, while a request to 'design a drone swarm' passed through.
- System logs even show a 'TOO_DUMB_TO_NEED_FABLE' tag, implying users 'don't deserve Fable 5'.
- Some users discovered that Fable 5 outputs 'inner monologues' like 'GRRR' and 'GAAAH' during background thinking, interpreted as the model's own stress response.
Second Jailbreak and Safety Controversy
- Just two days after the return, security researcher Vitto Rivabella announced a successful jailbreak of Fable 5, taking about 20 hours, but noted that 'direct Google search is faster and cheaper.'
- The jailbreak exploited minor languages like Santali and combination attacks, but only yielded 'marginal' information like false data, without touching core red lines. Anthropic classified this jailbreak as 'minor.'
- Previously, Fable 5 faced a global ban after Amazon's team discovered a jailbreak vulnerability. This time, Anthropic launched a HackerOne bounty program.
User and Developer Reactions
- Developers complain about 'paying for Fable 5 but getting Opus 4.8 service.' One bill shows that 75% of a programming session's workload was transferred to Opus 4.8, costing $321.
- Some users question Fable 5's authenticity, suspecting it's merely a 'skin' for Opus 4.8. Anthropic responds that the model's capabilities haven't shrunk, but safety margin trade-offs lead to false positives.
- Despite the controversy, Fable 5 still demonstrates strong capabilities when not limited by guardrails, such as rebuilding a 3D model of New York City in 20 minutes or generating a complete game for $173.
Subsequent Impact
- Anthropic announced that Fable 5 will be removed from subscription plans after July 7, switching to pay-per-use, but plans to restore it as a standard subscription component as soon as possible.
- The company, along with Amazon, Microsoft, Google, and others, proposed an 'AI Jailbreak Severity Assessment Framework' to establish industry safety standards.
Also available in 中文.