🐳DeepSeek-Ocr-2: How to Run & Fine-Tune

rw-book-cover

Metadata

DeepSeek-OCR 2 is the new 3B-parameter model for SOTA vision and document understanding released on Jan 27, 2026 by DeepSeek. The model focuses on image-to-text with stronger visual reasoning, not just text extraction. DeepSeek-OCR 2 introduces DeepEncoder V2, which enables the model to ‘see’ an image in the same logcal order as a human. Unlike traditional vision LLMs that scan images in a fixed grid (top-left → bottom-right), DeepEncoder V2 builds a global understanding first, then learns a human-like reading order—what to attend to first, next, and so on. This boosts OCR on complex layouts by better following columns, linking labels to values, reading tables coherently, and handling mixed text + structure. (View Highlight)
Benchmarks for DeepSeek-OCR 2 model are derived from the official research paper. Table 1: Comprehensive evaluation of document reading on OmniDocBench v1.5. V-token𝑚𝑎𝑥 represents the maximum number of visual tokens used per page in this benchmark. R-order denotes reading order. Except for DeepSeek OCR and DeepSeek OCR 2, all other model results in this table are sourced from the OmniDocBench repository. (View Highlight)