Multimodal AI
Google Gemini Ultra's Native Multimodality: 1-Hour Video Analysis in a Single API Call
Google's Gemini 1.5 Pro enables true video understanding — processing up to 1 hour of video at 1fps in a single context window. Early benchmarks show 87% accuracy on technical diagram relationship extraction and 94% accuracy on financial report chart data extraction, significantly outperforming models using bolt-on vision.
2025年5月18日来源:Google DeepMind
GoogleGeminimultimodalvideo-understandingvision-AI