中文
← Back to news
ModelsJul 5, 2026

Fable 5 Returns to Mixed Reception: Overly Strict Safety Guardrails Cause Score Plunge, Developers Question 'Bait and Switch'

Anthropic re-released its strongest model Fable 5 on July 1, but it quickly drew widespread negative feedback. Developers report that the model frequently triggers safety guardrails, forcing downgrades to the weaker Opus 4.8, resulting in a significantly degraded experience.

Score Plunge and the 'Downgrade' Truth

  • BridgeMind benchmark shows the returned Fable 5's Debugging ability dropped from 86.2 to 25.9 (a 70% decline), Refactoring from 73.6 to 38.4, and Hallucination from 75.9 to 61.7.
  • Further analysis reveals that out of 12 Debugging tasks, only 3 were completed by Fable 5; the remaining 9 were intercepted by the safety classifier and routed to Opus 4.8, scoring zero by rule. BridgeMind notes: 'The model itself hasn't weakened, but the guardrails prevent most tasks from being executed.'

Safety Guardrail 'False Positives' and Double Standards

  • Anthropic acknowledged in an official blog that the new classifier is 'deliberately broad' and will block many harmless requests. User feedback includes:
    • Simple questions like explaining the word 'human' or asking 'how many r's in raspberry' are blocked.
    • An ecology PhD researching 'tree cooling' was deemed unsafe, while a request to 'design a drone swarm' passed through.
  • System logs even show a 'TOO_DUMB_TO_NEED_FABLE' tag, implying users 'don't deserve Fable 5'.
  • Some users discovered that Fable 5 outputs 'inner monologues' like 'GRRR' and 'GAAAH' during background thinking, interpreted as the model's own stress response.

Second Jailbreak and Safety Controversy

  • Just two days after the return, security researcher Vitto Rivabella announced a successful jailbreak of Fable 5, taking about 20 hours, but noted that 'direct Google search is faster and cheaper.'
  • The jailbreak exploited minor languages like Santali and combination attacks, but only yielded 'marginal' information like false data, without touching core red lines. Anthropic classified this jailbreak as 'minor.'
  • Previously, Fable 5 faced a global ban after Amazon's team discovered a jailbreak vulnerability. This time, Anthropic launched a HackerOne bounty program.

User and Developer Reactions

  • Developers complain about 'paying for Fable 5 but getting Opus 4.8 service.' One bill shows that 75% of a programming session's workload was transferred to Opus 4.8, costing $321.
  • Some users question Fable 5's authenticity, suspecting it's merely a 'skin' for Opus 4.8. Anthropic responds that the model's capabilities haven't shrunk, but safety margin trade-offs lead to false positives.
  • Despite the controversy, Fable 5 still demonstrates strong capabilities when not limited by guardrails, such as rebuilding a 3D model of New York City in 20 minutes or generating a complete game for $173.

Subsequent Impact

  • Anthropic announced that Fable 5 will be removed from subscription plans after July 7, switching to pay-per-use, but plans to restore it as a standard subscription component as soon as possible.
  • The company, along with Amazon, Microsoft, Google, and others, proposed an 'AI Jailbreak Severity Assessment Framework' to establish industry safety standards.

Also available in 中文.