DeepSeek-OCR is a 3B-parameter vision model for OCR and document understanding. It uses context optical compression to convert 2D layouts into vision tokens, enabling efficient long-context processing. (View Highlight)
Fine-tuning DeepSeek-OCR on a 200K sample Persian dataset resulted in substantial gains in Persian text detection and understanding. We evaluated the base model against our fine-tuned version on 200 Persian transcript samples, observing an 88.26% absolute improvement in Character Error Rate (CER). After only 60 training steps (batch size = 8), the mean CER decreased from 149.07% to a mean of 60.81%. This means the fine-tuned model is 57% more accurate at understanding Persian. (View Highlight)