AI Safety
Anthropic Publishes Updated Model Spec: New Guidelines for AI Behavior
Anthropic releases comprehensive update to Claude Model Spec, detailing new guidelines for handling sensitive topics, improved calibration for confidence expressions, and enhanced corrigibility principles.
2025年5月15日来源:Anthropic
AnthropicAI-safetymodel-specalignmentClaude