AI Safety
Anthropic Achieves Breakthrough in Mechanistic Interpretability Research
Anthropic researchers publish landmark paper on mechanistic interpretability, successfully mapping how Claude represents concepts internally and identifying circuits responsible for safety behaviors.
2025年4月28日来源:Anthropic Research
AI-safetyinterpretabilityAnthropicalignmentresearch