Intelligent Document Processing administration

About IDP

Intelligent Document Processing (IDP) - it is enterprise level solution for end-to-end intelligent document processing. IDP Solution is designed to intelligently process scanned or digitally generated documents (images) of different format. 

High-level IDP diagram

Supported OCR Engines

elDoc IDP system provides support for several OCR Engines.

  1. Tesseract - elDoc IDP comes with an embedded OCR Engine which uses Tesseract OCR (latest version) with enhancements to achieve the best possible results.
  2. Google Vision API - elDoc IDP can be switched for using Google Vision API for performing OCR.

Image quality recommendations

There is a set of standard recommendations for any project which is based on IDP & OCR technologies. To achieve the highest possible results retrieved from scanned document images, please follow recommendations below: 

  • Image should be properly scanned with minimum recommended resolution 300dpi (color is preferable, or at least grayscale);
  • Image should be properly aligned (not skewed, not twisted);
  • Image should be without artefacts and snow effect (without "salt and paper" noise);
  • Image should be without handwriting or other hand-types marks, stamps, chops;
  • Image should be of good quality for Chinese characters;
  • Where possible - use digitally generated documents rather than scanned images;
  • Disable any enhancement features provided by the scanning software or scanner itself, use original (raw) scanned images without modification by any 3rd party software;

Supported languages

##Language (English name)Code in the system
1Afrikaansafr
2Albaniansqi
3Amharicamh
4Arabicara
5Armenianhye
6Assameseasm
7Azerbaijaniaze
8Azerbaijani - Cyrillicaze_cyrl
9Basqueeus
10Belarusianbel
11Bengaliben
12Bosnianbos
13Bretonbre
14Bulgarianbul
15Burmesemya
16Catalan; Valenciancat
17Cebuanoceb
18Central Khmerkhm
19Cherokeechr
20Chinese - Simplifiedchi_sim
21Chinese - Simplified (Vertical)chi_sim_vert
22Chinese - Traditionalchi_tra
23Chinese - Traditional (Vertical)chi_tra_vert
24Corsicancos
25Croatianhrv
26Czechces
27Danishdan
28Dutch; Flemishnld
29Dzongkhadzo
30Englisheng
31English, Middle (1100-1500)enm
32Esperantoepo
33Estonianest
34Faroesefao
35Filipinofil
36Finnishfin
37Frenchfra
38French, Middle (ca. 1400-1600)frm
39Western Frisianfry
40Galicianglg
41Georgiankat
42Georgian - Oldkat_old
43Germandeu
44German Frakturdeu_frak
45Greek, Ancient (-1453)grc
46Greek, Modern (1453-)ell
47Gujaratiguj
48Haitian; Haitian Creolehat
49Hebrewheb
50Hindihin
51Hungarianhun
52Icelandicisl
53Indonesianind
54Inuktitutiku
55Irishgle
56Italianita
57Italian - Oldita_old
58Japanesejpn
59Japanese (Vertical)jpn_vert
60Javanesejav
61Kannadakan
62Kazakhkaz
63Kirghiz; Kyrgyzkir
64Koreankor
65Korean (Vertical)kor_vert
66Kurdish (Arabic Script)kur
67Laolao
68Latinlat
69Latvianlav
70Lithuanianlit
71Macedonianmkd
72Malaymsa
73Malayalammal
74Maltesemlt
75Maorimri
76Marathimar
77Mongolianmon
78Nepalinep
79Norwegiannor
80Occitan (post 1500)oci
81Oriyaori
82Panjabi; Punjabipan
83Persianfas
84Polishpol
85Portuguesepor
86Pushto; Pashtopus
87Quechuaque
88Romanian; Moldavian; Moldovanron
89Russianrus
90Sanskritsan
91Scottish Gaelicgla
92Serbiansrp
93Serbian - Latinsrp_latn
94Sindhisnd
95Sinhala; Sinhalesesin
96Slovakslk
97Slovenianslv
98Spanish; Castilianspa
99Sundasun
100Swahiliswa
101Swedishswe
102Syriacsyr
103Tajiktgk
104Tamiltam
105Tatartat
106Telugutel
107Thaitha
108Tibetanbod
109Tigrinyatir
110Turkishtur
111Uighur; Uyghuruig
112Ukrainianukr
113Urduurd
114Uzbekuzb
115Uzbek - Cyrillicuzb_cyrl
116Vietnamesevie
117Welshcym
118Yiddishyid
119Yorubayor