AI Safety
Anthropic Publishes Constitutional AI Safety Update: Claude 3.7 Security and Jailbreak Resistance
Anthropic has released its most comprehensive AI safety update to date, detailing Constitutional AI improvements in Claude 3.7 that reduce harmful output by 89% and jailbreak attempts by 94% compared to Claude 2. The report includes new safety benchmarks, red team findings from 200+ external researchers, and a technical specification of the Responsible Scaling Policy (RSP) thresholds that would trigger halting development of more powerful models. Anthropic also published ASL-3 requirements—the safety bar required before deploying models with potential for CBRN uplift.
2025年3月1日来源:Anthropic
AnthropicClaudeAI SafetyConstitutional AIRed TeamingRSP