Intelligent Document Processing administration

Contents:


About IDP

Intelligent Document Processing (IDP) - it is enterprise level solution for end-to-end intelligent document processing. IDP Solution is designed to intelligently process scanned or digitally generated documents (images) of different format. 

High-level IDP diagram

Supported OCR Engines

The elDoc IDP system supports multiple OCR engines, allowing flexibility in accuracy, performance, and deployment scenarios.

  • Tesseract
    elDoc IDP includes a built-in OCR engine based on the latest version of Tesseract, enhanced to deliver optimal recognition accuracy. (See Supported Languages below)
  • Google Vision API
    elDoc IDP can be configured to use Google Vision API for OCR processing, providing high accuracy and robust language support. For more details and languages support, refer to Google Vision API Supported Languages.
  • PaddleOCR API
    elDoc IDP can integrate with PaddleOCR, a high-performance open-source OCR framework optimized for multilingual text detection and recognition. PaddleOCR offers strong accuracy for complex layouts, supports a wide range of languages, and is particularly effective for structured documents and dense text scenarios. PaddleOCR can be also deployed on-prem.
  • VL Model API
    Enables AI/LLM-based OCR using OpenAI API, leveraging vision-language models for advanced document understanding, including complex layouts, context-aware extraction, and semantic interpretation. VL model can be deployed on-prem using ollama, vLLM, llama.cpp, etc.

Tesseract Supported languages


##Language (English name)Code in the system
1Afrikaansafr
2Albaniansqi
3Amharicamh
4Arabicara
5Armenianhye
6Assameseasm
7Azerbaijaniaze
8Azerbaijani - Cyrillicaze_cyrl
9Basqueeus
10Belarusianbel
11Bengaliben
12Bosnianbos
13Bretonbre
14Bulgarianbul
15Burmesemya
16Catalan; Valenciancat
17Cebuanoceb
18Central Khmerkhm
19Cherokeechr
20Chinese - Simplifiedchi_sim
21Chinese - Simplified (Vertical)chi_sim_vert
22Chinese - Traditionalchi_tra
23Chinese - Traditional (Vertical)chi_tra_vert
24Corsicancos
25Croatianhrv
26Czechces
27Danishdan
28Dutch; Flemishnld
29Dzongkhadzo
30Englisheng
31English, Middle (1100-1500)enm
32Esperantoepo
33Estonianest
34Faroesefao
35Filipinofil
36Finnishfin
37Frenchfra
38French, Middle (ca. 1400-1600)frm
39Western Frisianfry
40Galicianglg
41Georgiankat
42Georgian - Oldkat_old
43Germandeu
44German Frakturdeu_frak
45Greek, Ancient (-1453)grc
46Greek, Modern (1453-)ell
47Gujaratiguj
48Haitian; Haitian Creolehat
49Hebrewheb
50Hindihin
51Hungarianhun
52Icelandicisl
53Indonesianind
54Inuktitutiku
55Irishgle
56Italianita
57Italian - Oldita_old
58Japanesejpn
59Japanese (Vertical)jpn_vert
60Javanesejav
61Kannadakan
62Kazakhkaz
63Kirghiz; Kyrgyzkir
64Koreankor
65Korean (Vertical)kor_vert
66Kurdish (Arabic Script)kur
67Laolao
68Latinlat
69Latvianlav
70Lithuanianlit
71Macedonianmkd
72Malaymsa
73Malayalammal
74Maltesemlt
75Maorimri
76Marathimar
77Mongolianmon
78Nepalinep
79Norwegiannor
80Occitan (post 1500)oci
81Oriyaori
82Panjabi; Punjabipan
83Persianfas
84Polishpol
85Portuguesepor
86Pushto; Pashtopus
87Quechuaque
88Romanian; Moldavian; Moldovanron
89Russianrus
90Sanskritsan
91Scottish Gaelicgla
92Serbiansrp
93Serbian - Latinsrp_latn
94Sindhisnd
95Sinhala; Sinhalesesin
96Slovakslk
97Slovenianslv
98Spanish; Castilianspa
99Sundasun
100Swahiliswa
101Swedishswe
102Syriacsyr
103Tajiktgk
104Tamiltam
105Tatartat
106Telugutel
107Thaitha
108Tibetanbod
109Tigrinyatir
110Turkishtur
111Uighur; Uyghuruig
112Ukrainianukr
113Urduurd
114Uzbekuzb
115Uzbek - Cyrillicuzb_cyrl
116Vietnamesevie
117Welshcym
118Yiddishyid
119Yorubayor

Last modified: April 16, 2026