Typhoon OCR: Open Vision-Language Model For Thai Document Extraction
SCB 10X introduces Typhoon OCR, an open vision-language model fine-tuned on a Thai-focused dataset using a multi-stage data construction pipeline. The model provides a unified framework capable of performing text translation, key-value extraction, and structured document parsing for both Thai and English.

Typhoon OCR is an open-source vision-language model specifically engineered to handle the unique complexities of Thai document extraction, such as lack of word boundaries and intricate scripts. Developed by SCB 10X, the project addresses the historical performance gap in low-resource languages where general-purpose models often fail. The researchers utilized a multi-stage data construction pipeline that integrates traditional OCR, synthetic data, and human verification to train the model for both layout reconstruction and text transcription. The latest iteration, Typhoon OCR V1.5, offers a streamlined 2-billion parameter architecture that operates more efficiently than its predecessors while maintaining high accuracy across financial reports, government forms, and handwritten notes. Evaluations demonstrate that this compact model frequently matches or exceeds the capabilities of much larger proprietary systems like GPT-4o and Gemini. By releasing these resources under permissive licenses, the authors aim to support reproducible research and practical digital workflows in the Thai language.
The Core Problem: AI Struggles to Read Thai
Right now, highly advanced Artificial Intelligence (like ChatGPT or Google Gemini) is great at reading and analyzing English documents. However, these models struggle heavily with the Thai language. This is because Thai has a very complex writing system with stacked vowels, no spaces between words, and documents that often have extremely dense, messy layouts (like government forms or receipts). Because big tech companies mostly train their AI on English data, Thai users are often left with AI that misreads their documents, messes up tables, or gets confused by simple financial reports.
The Solution: Typhoon OCR
To fix this, researchers developed Typhoon OCR, an open-source AI model specifically trained to be an expert at reading, extracting, and understanding both Thai and English documents.
Key Insights from the Research
1. “David vs. Goliath” Performance The researchers recently released Typhoon OCR V1.5, which is a highly compact “small” AI model. Despite being a fraction of the size of massive AI models like GPT-4o or Gemini 2.5, Typhoon OCR actually beats them at accurately reading and organizing complex Thai documents, such as financial reports and government forms.
2. It Doesn’t Just Read Words; It Understands Layouts Traditional scanners just spit out a messy wall of text. Typhoon OCR acts like a human eye—it understands the structure of a document. If it looks at a financial report or an infographic, it knows how to keep tables, columns, and reading order perfectly intact.
3. Beating the “Data Shortage” with Fake Documents To make AI smart, you have to feed it millions of examples. Because high-quality Thai documents are scarce on the internet, the researchers got creative and built a “synthetic data” pipeline. They basically generated fake Thai documents featuring math, charts, different fonts, and rare vocabulary to teach the AI how to read anything thrown at it.
Practical Benefits for Everyday Consumers and Businesses
While this is a highly technical research paper, the creation of Typhoon OCR brings several exciting, real-world benefits:
- Ultimate Data Privacy: Because Typhoon OCR V1.5 is designed to be “compact and lightweight,” a company or a user can run it directly on their own computer or local servers. This means highly sensitive documents—like medical records, bank statements, or private legal contracts—do not have to be sent over the internet to big tech companies to be processed.
- Faster, Cheaper App Development: Because the model is completely open-source (free for developers to use), creators can easily build new apps that process Thai paperwork quickly. Imagine a mobile app that instantly categorizes your Thai receipts for taxes, or software that instantly digitizes a stack of handwritten Thai administrative forms.
- Less Waiting in Line: Many administrative processes in Thailand require manual data entry from physical forms. By accurately automating how data is pulled from messy, real-world documents and handwritten notes, businesses and government offices can process paperwork significantly faster.


