JD.com Open-Source Model Tops HuggingFace Video Understanding Chart; Individual Developer Model Also Breaks into Trending

Recently, JD.com's open-source model JoyAI-VL-Interaction topped the HuggingFace video understanding category. The model features "streaming interaction," actively deciding when to speak or remain silent rather than passive Q&A. Technical reports show an overall win rate of 87.9% in human evaluations against Gemini video call assistant, and 100% in monitoring and alert scenarios. JD.com open-sourced the 8B model, 4 million alignment interaction data, training recipes, and a fully deployable system supporting ASR/TTS, long-term memory, visual interface, and Agent bridging, applicable to game commentary, monitoring alerts, real-time translation, etc.

Meanwhile, individual developer Lu Yuxin (HuggingFace ID: yuxinlu1) also broke into the top trending list, with total downloads exceeding 700,000. His released Gemma4-12B GGUF models (V1 Coder and V2 Agentic) excel in coding and Agent tasks, with V1 topping the charts for consecutive days. The V1 model is as small as 4.5GB and can run locally on consumer-grade GPUs; V2 scores 55% on the tau2-bench telecom subset, 3.5 times that of the base model. Lu Yuxin stated the project was entirely self-funded, taking over 40 hours, trained on RTX 5090 with only about 10,000 data entries, emphasizing data quality over quantity. He plans to release V3 and a larger version based on Qwen3.6-27B.

JD.com Open-Source Model Tops HuggingFace Video Understanding Chart; Individual Developer Model Also Breaks into Trending

Documentation

Getting Started

Learn more