Language support for Form Recognizer

This article covers the supported languages for text and field extraction (by feature) and detection (Read only). Both groups are mutually exclusive.

Read, layout, and custom form (template) model

The following lists include the currently GA languages in for the v2.1 version and the most recent v3.0 preview. These languages are supported by Read, Layout, and Custom form (template) model features.

Note

Language code optional

Form Recognizer's deep learning based universal models extract all multi-lingual text in your documents, including text lines with mixed languages, and do not require specifying a language code. Do not provide the language code as the parameter unless you are sure about the language and want to force the service to apply only the relevant model. Otherwise, the service may return incomplete and incorrect text.

To use the preview languages, refer to the v3.0 REST API migration guide to understand the differences from the v2.1 GA API and explore the v3.0 preview SDK quickstarts.

Handwritten text (preview and GA)

The following table lists the supported languages for extracting handwritten texts.

Language Language code (optional) Language Language code (optional)
English en Japanese (preview) ja
Chinese Simplified (preview) zh-Hans Korean (preview) ko
French (preview) fr Portuguese (preview) pt
German (preview) de Spanish (preview) es
Italian (preview) it

This section lists the supported languages for extracting printed texts in the latest preview.

Language Code (optional) Language Code (optional)
Angika (Devanagari) anp Lakota lkt
Arabic ar Latin la
Awadhi-Hindi (Devanagari) awa Lithuanian lt
Azerbaijani (Latin) az Lower Sorbian dsb
Bagheli bfy Lule Sami smj
Belarusian (Cyrillic) be, be-cyrl Mahasu Pahari (Devanagari) bfz
Belarusian (Latin) be, be-latn Maltese mt
Bhojpuri-Hindi (Devanagari) bho Malto (Devanagari) kmj
Bodo (Devanagari) brx Maori mi
Bosnian (Latin) bs Marathi mr
Brajbha bra Mongolian (Cyrillic) mn
Bulgarian bg Montenegrin (Cyrillic) cnr-cyrl
Bundeli bns Montenegrin (Latin) cnr-latn
Buryat (Cyrillic) bua Nepali ne
Chamling rab Niuean niu
Chhattisgarhi (Devanagari) hne Nogay nog
Croatian hr Northern Sami (Latin) sme
Dari prs Ossetic os
Dhimal (Devanagari) dhi Pashto ps
Dogri (Devanagari) doi Persian fa
Erzya (Cyrillic) myv Punjabi (Arabic) pa
Faroese fo Ripuarian ksh
Gagauz (Latin) gag Romanian ro
Gondi (Devanagari) gon Russian ru
Gurung (Devanagari) gvr Sadri (Devanagari) sck
Halbi (Devanagari) hlb Samoan (Latin) sm
Haryanvi bgc Sanskrit (Devanagari) sa
Hawaiian haw Santali(Devanagiri) sat
Hindi hi Serbian (Latin) sr, sr-latn
Ho(Devanagiri) hoc Sherpa (Devanagari) xsr
Icelandic is Sirmauri (Devanagari) srx
Inari Sami smn Skolt Sami sms
Jaunsari (Devanagari) Jns Slovak sk
Kangri (Devanagari) xnr Somali (Arabic) so
Karachay-Balkar krc Southern Sami sma
Kara-Kalpak (Cyrillic) kaa-cyrl Tajik (Cyrillic) tg
Kazakh (Cyrillic) kk-cyrl Thangmi thf
Kazakh (Latin) kk-latn Tongan to
Khaling klr Turkmen (Latin) tk
Korku kfq Tuvan tyv
Koryak kpy Urdu ur
Kosraean kos Uyghur (Arabic) ug
Kumyk (Cyrillic) kum Uzbek (Arabic) uz-arab
Kurdish (Arabic) ku-arab Uzbek (Cyrillic) uz-cyrl
Kurukh (Devanagari) kru Welsh cy
Kyrgyz (Cyrillic) ky

This section lists the supported languages for extracting printed texts in the latest GA version.

Language Code (optional) Language Code (optional)
Afrikaans af Japanese ja
Albanian sq Javanese jv
Asturian ast K'iche' quc
Basque eu Kabuverdianu kea
Bislama bi Kachin (Latin) kac
Breton br Kara-Kalpak (Latin) kaa
Catalan ca Kashubian csb
Cebuano ceb Khasi kha
Chamorro ch Korean ko
Chinese Simplified zh-Hans Kurdish (Latin) ku-latn
Chinese Traditional zh-Hant Luxembourgish lb
Cornish kw Malay (Latin) ms
Corsican co Manx gv
Crimean Tatar (Latin) crh Neapolitan nap
Czech cs Norwegian no
Danish da Occitan oc
Dutch nl Polish pl
English en Portuguese pt
Estonian et Romansh rm
Fijian fj Scots sco
Filipino fil Scottish Gaelic gd
Finnish fi Slovenian sl
French fr Spanish es
Friulian fur Swahili (Latin) sw
Galician gl Swedish sv
German de Tatar (Latin) tt
Gilbertese gil Tetum tet
Greenlandic kl Turkish tr
Haitian Creole ht Upper Sorbian hsb
Hani hni Uzbek (Latin) uz
Hmong Daw (Latin) mww Volap√ľk vo
Hungarian hu Walser wae
Indonesian id Western Frisian fy
Interlingua ia Yucatec Maya yua
Inuktitut (Latin) iu Zhuang za
Irish ga Zulu zu
Italian it

Custom neural model

Language Locale code
English (United States) en-us

Receipt and business card models

Note

It's not necessary to specify a locale. This is an optional parameter. The Form Recognizer deep-learning technology will auto-detect the language of the text in your image.

Pre-Built Receipt and Business Cards support all English receipts and business cards with the following locales:

Language Locale code
English (Australia) en-au
English (Canada) en-ca
English (United Kingdom) en-gb
English (India en-in
English (United States) en-us

Business card model

The 2022-06-30-preview release includes Japanese language support:

Language Locale code
Japanese ja

Invoice model

Language Locale code
English (United States) en-US
Spanish es
German (2022-06-30-preview) de
French (2022-06-30-preview) fr
Italian (2022-06-30-preview) it
Portuguese (2022-06-30-preview) pt
Dutch (2022-06-30-preview) nl

ID documents

This technology is currently available for US driver licenses and the biographical page from international passports (excluding visa and other travel documents).

General Document

Language Locale code
English (United States) en-us

Detected languages: Read API

The Read API supports detecting the following languages in your documents. This list may include languages not currently supported for text extraction.

Note

Language detection

Form Recognizer read model can detect possible presence of languages and returns language codes for detected languages. To determine if text can also be extracted for a given language, see previous sections.

Note

Detected languages vs extracted languages

This section lists the languages we can detect from the documents using the Read model, if present. Please note that this list differs from list of languages we support extracting text from, which is specified in the above sections for each model.

Language Code
Afrikaans af
Albanian sq
Amharic am
Arabic ar
Armenian hy
Assamese as
Azerbaijani az
Basque eu
Belarusian be
Bengali bn
Bosnian bs
Bulgarian bg
Burmese my
Catalan ca
Central Khmer km
Chinese zh
Chinese Simplified zh_chs
Chinese Traditional zh_cht
Corsican co
Croatian hr
Czech cs
Danish da
Dari prs
Divehi dv
Dutch nl
English en
Esperanto eo
Estonian et
Fijian fj
Finnish fi
French fr
Galician gl
Georgian ka
German de
Greek el
Gujarati gu
Haitian ht
Hausa ha
Hebrew he
Hindi hi
Hmong Daw mww
Hungarian hu
Icelandic is
Igbo ig
Indonesian id
Inuktitut iu
Irish ga
Italian it
Japanese ja
Javanese jv
Kannada kn
Kazakh kk
Kinyarwanda rw
Kirghiz ky
Korean ko
Kurdish ku
Lao lo
Latin la
Latvian lv
Lithuanian lt
Luxembourgish lb
Macedonian mk
Malagasy mg
Malay ms
Malayalam ml
Maltese mt
Maori mi
Marathi mr
Mongolian mn
Nepali ne
Norwegian no
Norwegian Nynorsk nn
Oriya or
Pasht ps
Persian fa
Polish pl
Portuguese pt
Punjabi pa
Queretaro Otomi otq
Romanian ro
Russian ru
Samoan sm
Serbian sr
Shona sn
Sindhi sd
Sinhala si
Slovak sk
Slovenian sl
Somali so
Spanish es
Sundanese su
Swahili sw
Swedish sv
Tagalog tl
Tahitian ty
Tajik tg
Tamil ta
Tatar tt
Telugu te
Thai th
Tibetan bo
Tigrinya ti
Tongan to
Turkish tr
Turkmen tk
Ukrainian uk
Urdu ur
Uzbek uz
Vietnamese vi
Welsh cy
Xhosa xh
Yiddish yi
Yoruba yo
Yucatec Maya yua
Zulu zu