Complex scripts and shaping engines

Complex script refers to any writing system that has contextual and nonlinear requirements to render their typography correctly. These requirements include:

  • Ligatures, where two consecutive characters are combined into one shape (Latin, Devanagari)
  • Reordering, where some characters are written before the letter they follow in pronunciation (Bengali, Sinhala, and other Indic scripts)
  • Context-shaping, where some letters change shape depending on whether they occur in the beginning, middle, or the end of the word (Arabic, Mongolian)

It should be noted that such processing isn't optional; it's essential to correctly render text in these scripts. Additional glyph processing to render appropriately sophisticated typography might be desirable beyond the minimum required to make the text readable.

Uniscribe is a set of Windows APIs and shaping engines that allow for a high degree of control for typography and for processing complex scripts. Each of the shaping engines in Uniscribe contains the shaping knowledge for a particular script or closely related group of scripts. This shaping knowledge focuses on the basic element of each script, which varies depending on the nature of the writing system. In the Indic scripts, for example, the basic element that needs to be processed is the syllable. In the Arabic script, the basic element is always a pair of letters, with the second letter of a pair becoming the first letter of the next. Uniscribe analyses and prepares strings of Unicode text by breaking runs (that is, strings of text in a single script with uniform formatting) into clusters corresponding to the basic element for that script. The kind of character preprocessing that some complex scripts require (for example, reordering of certain characters in the string) are detailed in the Unicode Standard.

Uniscribe uses several script-specific shaping engines for handling typography in supported complex scripts, including Arabic, Buginese, Korean (Hangul), Hebrew, Indic, Javanese, Khmer, Lao, Myanmar, Sinhala, Syriac, Thaana, Thai, and Tibetan. Uniscribe has two more engines:

  • Standard shaping engine for use with any noncomplex script (Latin, Cyrillic, Greek, etc.)
  • Universal Shaping Engine (USE) for use with complex scripts that aren't supported by one of the dedicated shaping engines. The following complex scripts included in the Unicode Standard 15.0 are supported in the Universal Shaping Engine: ADLaM, Ahom, Bhaiksuki, Balinese, Batak, Brahmi, Buginese, Buhid, Chakma, Cham, Chorasmian, Cypro Minoan, Dives Akuru, Dogra, Duployan, Egyptian Hieroglyphs, Elymaic, Grantha, Gunjala Gondi, Hanifi Rohingya, Hanunoo, Javanese, Kaithi, Kawi, Kayah Li, Kharoshthi, Khitan Small Script, Khojki, Khudawadi, Lepcha, Limbu, Mahajani, Makasar, Mandaic, Manichaean, Marchen, Masaram Gondi, Medefaidrin, Meitei Mayek, Miao, Modi, Mongolian, Multani, Nag Mundari, Nandinagari, Newa, N’Ko, Nyiakeng Puachue Hmong, Old Sogdian, Old Uyghur, Pahawh Hmong, Phags-pa, Psalter Pahlavi, Rejang, Saurashtra, Sharada, Siddham, Sinhala, Sogdian, Soyombo, Sundanese, Syloti Nagri, Tagalog, Tagbanwa, Tai Le, *Tai Tham, Tai Viet, Takri, Tangsa, Tibetan, Tifinagh, Tirhuta, Toto, Vithkuqi, Wancho, Yezidi, Zanabazar Square