Baidu Open-Sources Unlimited OCR: Parses Dozens of Pages in a Single Forward Pass, Sets New SOTA on OmniDocBench

Baidu recently open-sourced a new OCR model, Unlimited OCR, with 3B total parameters and 500M activated parameters. It achieves composite scores of 93.23% and 93.92% on OmniDocBench v1.5 and v1.6 respectively, setting a new end-to-end SOTA. The model is based on the DeepSeek OCR architecture, with the core innovation being the proposed Reference Sliding Window Attention (R-SWA), which compresses the decoder KV Cache from linear growth to constant, enabling the model to parse dozens of pages in a single forward pass without page-by-page processing. In long-document tests, the edit distance for 20-page documents is only 0.057, and remains below 0.107 even for 40+ pages. In terms of inference speed, the TPS for generating 6000 tokens is about 35% higher than DeepSeek OCR. Model weights and code have been open-sourced on GitHub and Hugging Face. Among the authors of the technical report, the technical director is credited as "YY", sparking speculation that it might be Wei Haoran, a former core researcher on DeepSeek OCR.

Baidu Open-Sources Unlimited OCR: Parses Dozens of Pages in a Single Forward Pass, Sets New SOTA on OmniDocBench

Documentation

Getting Started

Learn more