ModelsJun 18, 2026
DeepSeek Image Recognition Goes Fully Live: Can Grade Homework but Misidentifies Its Own Founder
Event Overview
Just before the 2025 Dragon Boat Festival, DeepSeek fully rolled out its image recognition mode on its official platform, with the mobile app updated simultaneously. Previously in gray-scale testing, the feature is now available to all users. Tests by multiple media outlets and users show that DeepSeek's performance on visual understanding tasks is mixed: it can handle some complex tasks (e.g., grading elementary school math papers), but frequently makes errors in scenarios such as person identification, handwritten text recognition, and logical reasoning—even failing to recognize its own founder, Liang Wenfeng.
Key Test Results
Person Identification: Can't Recognize the Boss, but "Knows" Liu Qiangdong
- Liang Wenfeng (DeepSeek Founder): In multiple tests, DeepSeek misidentified him as Zhang Xiaolong, Su Hua, Wang Xiaochuan, Robin Li, etc., with different answers each time.
- Jensen Huang: Correctly identified, but misjudged the douzhi (fermented bean drink) in his hand as milk; after switching to deep thinking mode, correctly inferred it was douzhi through reasoning.
- Liu Qiangdong: Identified with certainty, sticking to the correct answer even when users tried to mislead.
- Lei Jun: After uploading the image, the model prompted that it "may violate usage guidelines" and refused to identify.
Homework Grading: Performs Well, but Logic Has Flaws
- Tested a fourth-grade math exam (including image-based questions); DeepSeek successfully recognized and graded it, correctly identifying wrong answers.
- However, in an average height calculation problem, the model recognized the original average height (140.2 cm) and new average height (139 cm), but incorrectly judged Honghong's height as 139 cm (option B), while the correct answer should be below 139 cm (option A).
Handwritten Text Recognition: Low Accuracy
- Tested with messy Chinese characters (including horizontal line interference, stroke粘连, and typos); 4 out of 7 characters were misidentified, indicating significant room for improvement in real-world handwritten text recognition.
Other Visual Tasks
- Clock Recognition: Misjudged a time around 6:04:50 as 6:00:50, and insisted on it.
- Artifact Recognition: Correctly identified it as Mughal Empire style and provided detailed analysis of craftsmanship, but could not find the specific origin.
- Find Matching Socks: Failed to correctly identify the identical pair (correct answer: first row, third sock and third row, second sock).
- Piano Chord Recognition: Uploaded a real photo of piano playing and asked "what chord is being played"; DeepSeek gave a wrong answer (correct answer: ACE).
Reactions and Comparisons
- User Feedback: Many users on social media found that DeepSeek also misidentified well-known figures like He Tongxue. Some users joked that it "talks nonsense with a straight face."
- Competitor Comparison:
- Doubao: Performed more accurately and faster on tasks like person identification, homework grading, and clock recognition.
- Gemini 3.5 flash, GPT 5.5: Both gave wrong answers on the piano chord recognition task.
- Claude Sonnet 4.6: Directly refused to answer that task.
- Opus 4.8: Reflected and gave correct reasoning on a Tank Battle logic problem, standing out.
Impact and Next Steps
- Technical Questions: Developers are curious whether this mode is related to DeepSeek 4.1, whether it uses native multimodal technology, and when the multimodal API will be released. DeepSeek multimodal team researcher Xiaokang Chen did not respond.
- Industry Observation: The launch of DeepSeek's image recognition feature fills a gap in multimodal capabilities, but current accuracy is low, especially in person identification and handwritten text recognition. It is expected that technical documentation or optimizations may follow.
Also available in 中文.