返回资讯列表
AI Safety

Anthropic Achieves Breakthrough in Mechanistic Interpretability Research

Anthropic researchers publish landmark paper on mechanistic interpretability, successfully mapping how Claude represents concepts internally and identifying circuits responsible for safety behaviors.

2025年4月28日来源:Anthropic Research
AI-safetyinterpretabilityAnthropicalignmentresearch

阅读原文

本条资讯来源于 Anthropic Research,点击查看完整报道。

前往 Anthropic Research