Fable 5 Returns to Mixed Reception: Overly Strict Safety Guardrails Cause Score Plunge, Developers Question 'Bait and Switch'

Anthropic re-released its strongest model Fable 5 on July 1, but it quickly drew widespread negative feedback. Developers report that the model frequently triggers safety guardrails, forcing downgrades to the weaker Opus 4.8, resulting in a significantly degraded experience.

Score Plunge and the 'Downgrade' Truth

BridgeMind benchmark shows the returned Fable 5's Debugging ability dropped from 86.2 to 25.9 (a 70% decline), Refactoring from 73.6 to 38.4, and Hallucination from 75.9 to 61.7.
Further analysis reveals that out of 12 Debugging tasks, only 3 were completed by Fable 5; the remaining 9 were intercepted by the safety classifier and routed to Opus 4.8, scoring zero by rule. BridgeMind notes: 'The model itself hasn't weakened, but the guardrails prevent most tasks from being executed.'

Safety Guardrail 'False Positives' and Double Standards

Anthropic acknowledged in an official blog that the new classifier is 'deliberately broad' and will block many harmless requests. User feedback includes:
- Simple questions like explaining the word 'human' or asking 'how many r's in raspberry' are blocked.
- An ecology PhD researching 'tree cooling' was deemed unsafe, while a request to 'design a drone swarm' passed through.
System logs even show a 'TOO_DUMB_TO_NEED_FABLE' tag, implying users 'don't deserve Fable 5'.
Some users discovered that Fable 5 outputs 'inner monologues' like 'GRRR' and 'GAAAH' during background thinking, interpreted as the model's own stress response.

Second Jailbreak and Safety Controversy

Just two days after the return, security researcher Vitto Rivabella announced a successful jailbreak of Fable 5, taking about 20 hours, but noted that 'direct Google search is faster and cheaper.'
The jailbreak exploited minor languages like Santali and combination attacks, but only yielded 'marginal' information like false data, without touching core red lines. Anthropic classified this jailbreak as 'minor.'
Previously, Fable 5 faced a global ban after Amazon's team discovered a jailbreak vulnerability. This time, Anthropic launched a HackerOne bounty program.

User and Developer Reactions

Developers complain about 'paying for Fable 5 but getting Opus 4.8 service.' One bill shows that 75% of a programming session's workload was transferred to Opus 4.8, costing $321.
Some users question Fable 5's authenticity, suspecting it's merely a 'skin' for Opus 4.8. Anthropic responds that the model's capabilities haven't shrunk, but safety margin trade-offs lead to false positives.
Despite the controversy, Fable 5 still demonstrates strong capabilities when not limited by guardrails, such as rebuilding a 3D model of New York City in 20 minutes or generating a complete game for $173.

Subsequent Impact

Anthropic announced that Fable 5 will be removed from subscription plans after July 7, switching to pay-per-use, but plans to restore it as a standard subscription component as soon as possible.
The company, along with Amazon, Microsoft, Google, and others, proposed an 'AI Jailbreak Severity Assessment Framework' to establish industry safety standards.

Fable 5 Returns to Mixed Reception: Overly Strict Safety Guardrails Cause Score Plunge, Developers Question 'Bait and Switch'

Score Plunge and the 'Downgrade' Truth

Safety Guardrail 'False Positives' and Double Standards

Second Jailbreak and Safety Controversy

User and Developer Reactions

Subsequent Impact

Documentation

Getting Started

Learn more