Developing OpenType Fonts for Syriac Script

Introduction

This document presents information that will help font developers create or support OpenType fonts for all Syriac script languages covered by the Unicode Standard. The three styles of Syriac in use today — Estrangelo, Serto and East Syriac — all use the same Syriac encoding.

Font developers will learn how to encode script features in their fonts, choose character sets, organize font information, and use existing tools to produce Syriac script fonts. Registered features of Syriac scripts are defined and illustrated, encodings are listed, and templates are included for compiling layout tables for OpenType fonts.

This document also presents information about the Syriac OpenType shaping engine of Uniscribe, an operating system component responsible for text layout.

In addition to being a primer and specification for the creation and support of Syriac script fonts, this document is intended to more broadly illustrate the OpenType Layout architecture, feature schemes, and operating system support for shaping and positioning text.

Glossary

The following terms are useful for understanding the layout features and script rules discussed in this document.

Base Glyph - Any glyph that can have a diacritic mark above or below it. Layout operations are defined in terms of a base glyph, not a base character, as a ligature may act as the base.

Character - Each character represents a Unicode character code point. For example, the 'alaph' character is U+0710. A character may have multiple forms of glyphs.

Diacritic Mark - A character that is positioned above or below a character to provide pronunciation guidance.

Glyph - A glyph represents the displayed form of one or more characters. For example, the final, initial and medial 'beth' glyphs are all forms of the 'beth' character (U+0712).

Kashida - Also known as the 'tatweel' character (U+0640). This character is used for elongation between connecting characters, and for justification.

Ligature - A combination of glyphs that join to form a single glyph. For example, the 'rish seyame' (U072a + U0308) combinations of glyphs are mandatory ligatures for Syriac. Other ligatures are optional.

Nominal form - The glyph that is represented by the Unicode character value.

Shaping Engine

The Uniscribe Syriac shaping engine processes text in stages. The stages are:

  1. Analyze characters for contextual shape
  2. Shape (substitute) glyphs with OTLS (OpenType Library Services)
  3. Position glyphs with OTLS

The descriptions which follow will help font developers understand the rationale for the Syriacfeature encoding model, and help application developers better understand how layout clients can divide responsibilities with operating system functions.

Analyze Characters

The unit that the shaping engine receives for the purpose of shaping is a string of Unicode characters, in a sequence. The contextual analysis engine determines the correct contextual form the character should take based on the character before and after it. The contextual shape maps to an OTL feature for that form (isol, init, medi, med2, fina, fin2, fin3).

Additionally, during the analysis process, the engine also verifies valid diacritic combinations. For additional information, see Invalid Combining Marks.

Shape Glyphs with OTLS

The first step Uniscribe takes in shaping the character string is to map all characters to their nominal form glyphs (e.g. the glyph for U+0627). Then, Uniscribe applies contextual shape features to the glyph string.

Next, Uniscribe calls OTLS to apply the features. All OTL processing is divided into a set of predefined features (described and illustrated in the Features section). Each feature is applied, one by one, to the appropriate glyphs in the syllable and OTLS processes them. Uniscribe makes as many calls to the OTL Services as there are features. This ensures that the features are executed in the desired order.

The steps of the shaping process are outlined below.

Shaping features:

  1. Language forms
    1. Apply feature 'ccmp' to preprocess any glyphs that require composition or decomposition.
    2. Apply feature 'isol' to get the isolated form of characters.
    3. Apply feature 'fina' to get final form glyphs.
    4. Apply feature 'fin2' to replace the 'Alaph' glyph at the end of Syriac words with its appropriate form, when the preceding base character cannot be joined to, and that preceding base character is not a 'Dalath', 'Rish', or dotless 'Dalath-Rish'.
    5. Apply feature 'fin3' to replace the 'Alaph' glyph at the end of Syriac words when the preceding base character is a 'Dalath', 'Rish', or dotless 'Dalath-Rish'.
    6. Apply feature 'medi' to get medial form glyphs.
    7. Apply feature 'med2' to replace the 'Alaph' glyph in the middle of Syriac words when the preceding base character can be joined to it.
    8. Apply feature 'init' to get initial form glyphs.
    9. Apply feature 'rlig' to compose any mandatory ligatures, like 'resh seyame'.
    10. Apply feature 'calt' to apply any desired alternative forms of connections. This can provide type designers with the capability to contextually exchange a glyph to give a better calligraphic presentation.
  2. Typographical forms
    1. Apply feature 'liga' to compose any optional ligatures, like 'lam alaph'.
    2. Apply feature 'dlig' to compose any discretionary ligatures.

Position Glyphs with OTLS

Uniscribe next applies features concerned with positioning, calling functions of OTLS to position glyphs.

Positioning features:

  1. Kerning
    1. Apply feature 'kern' to provide pair kerning between base glyphs requiring adjustment for better typographical quality.
  2. Mark to base
    1. Apply feature 'mark' to position diacritic glyphs to the base glyph.
  3. Mark to mark
    1. Apply feature 'mkmk' to position diacritic glyphs to other diacritic glyphs.

Invalid Combining Marks

Combining marks and signs that appear in text not in conjunction with a valid consonant base are considered invalid. Uniscribe displays these marks using the fallback rendering mechanism defined in the Unicode Standard (section 5.12, 'Rendering Non-Spacing Marks' of the Unicode Standard 3.1), i.e. positioned on a dotted circle.

Please note that to render a sign standalone (in apparent isolation from any base) one should apply it on a space (see section 2.5 'Combining Marks' of Unicode Standard 3.1). Uniscribe requires a ZWJ to be placed between the space and a mark for them to combine into a standalone sign.

For the fallback mechanism to work properly, a Syriac OTL font should contain a glyph for the dotted circle (U+25CC). In case this glyph is missing form the font, the invalid signs will be displayed on the missing glyph shape (white box).

In addition to the 'dotted circle' other Unicode code points that are recommended for inclusion in any Syriac font are; ZWJ (zero width joiner U+200C), ZWNJ (zero width non-joiner; U+200D), LTR (left to right mark; U+200E), and RTL (right to left mark; U+200F). The ZWNJ can be used between two letters to prevent them from forming a cursive connection.

Illustration that shows the dotted circle character, plus Unicode characters zero width non-joiner, zero width joiner, left to right mark, and right to left mark with their suggested glyphs.

If an invalid combination is found, like two 'pthahas' on the same base character, the diacritic that causes the invalid state is placed on a dotted circle to indicate to the user the invalid combination. The shaping engine for non-OpenType fonts will cause invalid mark combinations to overstrike. This is the problem that inserting the dotted circle for the invalid base solves. It should also be noted that the dotted circle is not inserted into the application's backing store. This is a run-time insertion into the glyph array that is returned from the ScriptShape function.

The invalid diacritic logic for Syriac is based on the classes listed below. There is a check to make sure more than one mark of a class is not placed on the same base.

Class Description Code points
DIAC1 Syriac above Greek U+0730, U+0733, U+0736, U+073A, U+073D
DIAC2 Syriac below Greek U+0731, U+0734 U+0737, U+073B, U+073E
DIAC3 Syriac other U+0740, U+0749, U+074A
DIAC4 Syriac dotted class above U+0732, U+0735, U+073F
DIAC5 Syriac dotted class below U+0738, U+0739, U+073C
DIAC6 Syriac qushshaya U+0741, U+030A
DIAC7 Syriac rukkakha U+0742, U+0325
DIAC8 Syriac line type above U+0747, U+0303, U+0304
DIAC9 Syriac line type below U+0748, U+032D, U+032E, U+0330, U+0331
DIAC10 Syriac seyame above U+0308
DIAC11 Syriac seyame below U+0304
DIAC12 Syriac dot above U+0307
DIAC13 Syriac dot below U+0323
DIAC14 Syriac two dots above U+0743
DIAC15 Syriac two dots below U+0744
DIAC16 Syriac three dots above U+0745
DIAC17 Syriac three dots below U+0746

Features

The features listed below have been defined to create the basic forms for the languages that are supported on Syriac systems. Regardless of the model an application chooses for supporting layout of complex scripts, Uniscribe requires a fixed order for executing features within a run of text to consistently obtain the proper basic form. This is achieved by calling features one-by-one in the standard order listed below.

The order of the lookups within each feature is also very important. For more information on lookups and defining features in OpenType fonts, see the Encoding section of the OpenType Font Development document.

The standard order for applying Syriac features encoded in OpenType fonts:

Feature Feature function Layout operation Required
Language based forms:
stch Stretching glyph decomposition GSUB X
ccmp Character composition/decomposition substitution GSUB
isol Isolated character form substitution GSUB
fina Final character form substitution GSUB X
fin2 Final character form # 2 substitution GSUB X
fin3 Final character form # 3 substitution GSUB X
medi Medial character form substitution GSUB X
med2 Medial character form #2 substitution GSUB X
init Initial character form substitution GSUB X
rlig Required ligature substitution GSUB X
calt Connection form substitution GSUB
Typographical forms:
liga Standard ligature substitution GSUB
dlig Discretionary ligature substitution GSUB
Positioning features:
kern Pair kerning GPOS
mark Mark to base positioning GPOS
mkmk Mark to mark positioning GPOS
[GSUB = glyph substitution, GPOS = glyph positioning]

Feature examples

Stretching Glyph Decomposition

Feature Tag: "stch"

The ‘stch’ feature is used to render the Syriac Abbreviation Mark (U+070F) which has the special property of enclosing other Syriac letters and so needs to be able to stretch in order to dynamically adapt to the width of the enclosed text. There are three phases to rendering this feature:

  • Decomposition
    The feature defines a decomposition from a single glyph into an odd number of glyphs which describe the stretching glyph.

  • Reordering
    The shaping engine reorders the glyphs output by the stch feature so that all glyphs in the decomposition sequence come at the start of the enclosed sequence except for the final glyph of the decomposition. The final glyph is reordered so that it comes after the final glyph of the enclosed sequence.

  • Positioning
    The odd numbered glyphs in the decomposition are fixed reference points which are distributed evenly from the start to the end of the enclosed text. The even numbered glyphs are repeated as necessary to fill the space between the fixed glyphs. All decomposition glyphs used for the Syriac Abbreviation Mark must be defined as mark glyphs.

For example, the sequence U+070F U+072B U+0718 U+0712 (܏ܫܘܒ) may be rendered using the stch feature as follows:

Chart that shows glyphs in backing store and the phases of decomposition, reordering, and positioning. A tan background shows the width used to render each glyph.

Note that in the above example the tan background has been used to illustrate the widths used to render each glyph. All glyphs are rendered with full height.

This feature may be used with contextual rules in order to control details of the stretched form. For example, when the target width is narrow, a narrower substitution may be used:

Screenshot that shows the G S A M narrow S G L mod lookup in Microsoft VOLT.
Using the stch feature in the context of a single character in Microsoft VOLT

In other cases, the standard substitution may be used:

Screenshot that shows the G S A M medium D B L mod lookup in Microsoft VOLT.
Using the stch feature in the context of a two characters in Microsoft VOLT

Character composition (and decomposition)

Feature Tag: "ccmp"

The 'ccmp' feature is used to compose a number of glyphs into one glyph, or decompose one glyph into a number of glyphs. This feature is implemented before any other features because there may be times when a font vender wants to control certain shaping of glyphs. An example of using this table is seen below. The 'ccmp' table maps default alphabetic forms to both a composed form (essentially a ligature, GSUB lookup type 4), and decomposed forms (GSUB lookup type 2).

Chart that shows U 0 7 3 2 in backing store in the first box. 2 dots are vertically stacked and slightly misaligned. The second box has U 0 7 3 C and U 0 7 3 F in character composition form. There are 2 dots, they are spaced out and aligned diagonally.
Splitting the character into two glyphs, a dot above and a dot below, allows the dots to correctly float above and below any glyph without having to make many forms of the U+0732 character.

Isolated form

Feature Tag: "isol"

The 'isol' feature is used to map the Unicode character value to its isolated form. This is usually the same glyph form. (GSUB lookup type 1).

Table that shows a letter in backing store and the corresponding isolated form glyph. The images look identical.
Estrangelo style font

Final form

Feature Tag: "fina"

The 'fina' feature is used to map the Unicode character value to its final form. (GSUB lookup type 1).

Table that shows 3 letters in backing store and the corresponding final form glyphs in Estrangelo, Serto, and East Syriac font styles.

Final form #2

Feature Tag: "fin2"

The 'fin2' feature is used to replaces the 'Alaph' glyph at the end of Syriac words with its appropriate form, when the preceding base character cannot be joined to, and that preceding base character is not a 'Dalath', 'Rish', or dotless 'Dalath-Rish'. The 'fin2' table maps default alphabetic forms to corresponding final forms (GSUB lookup type 5).

This feature is used only for the Syriac script 'Alaph' character.

Table that shows 3 combinations of 2 letters in backing store and the corresponding final 2 form glyphs in Estrangelo, Serto, and East Syriac font styles.
Example: When an 'Alaph' is preceded by a He (one of Syriac's right-joining only characters), the 'Alaph' would be replaced by the 'fin2' non-joining form (instead of the joining 'fina' form).

Final form #3

Feature Tag: "fin3"

The 'fin3' feature is used to replaces 'Alaph' glyphs at the end of Syriac words when the preceding base character is a 'Dalath', 'Rish', or dotless 'Dalath-Rish'. The 'fin3' table maps default alphabetic forms to corresponding final forms (GSUB lookup type 5).

This feature is used only for the Syriac script 'Alaph' character.

Table that shows 3 combinations of 2 letters in backing store and the corresponding final 3 form glyphs in Estrangelo, Serto, and East Syriac font styles.
Example: When an 'Alaph' is preceded by a 'Dalath', the 'Alaph' would be replaced by the 'fin3' non-joining form (instead of the joining 'fina' form).

Medial form

Feature Tag: "medi"

The 'medi' feature is used to map the Unicode character value to its medial form. (GSUB lookup type 1). The 'Alaph' glyph should not be included in this lookup. Joining forms of 'Alaph' are created with the med2 lookup.

Table that shows a letter in backing store and the corresponding medial form glyph.
Estrangelo style font

Medial form #2

Feature Tag: "med2"

The 'med2' feature is used to replace the 'Alaph' glyph in the middle of Syriac words when the preceding base character can be joined to. The 'med2' table maps default alphabetic forms to corresponding medial forms (GSUB lookup type 5).

This feature is used only for the Syriac script 'Alaph' character.

Table that shows 3 combinations of 2 letters in backing store and the corresponding medial 2 form glyphs in Estrangelo, Serto, and East Syriac font styles.
Example: When an 'Alaph' is preceded by a 'Waw', the 'Alaph' would be replaced by the 'med2' non-joining form (instead of the joining 'medi' form).

Initial form

Feature Tag: "init"

The 'init' feature is used to map the Unicode character value to its initial form. (GSUB lookup type 1).

Table that shows a letter in backing store and the corresponding initial form glyph.
Estrangelo style font

Required ligatures

Feature Tag: "rlig"

The 'rlig' feature is used to map glyph values to their correct ligated form. Font developers should use this table for all ligatures that they want to map as such all of the time. Ligatures that should be optional, based on user preferences should not be included in this table. Optional ligatures are defined in the 'liga' table.

The 'rlig' feature maps sequences of glyphs to corresponding ligatures (GSUB lookup type 4). Ligatures with more components must be stored ahead of those with fewer components in order to be found. See Ordering ligatures in the Encoding section of the OpenType Font Development document. The set of required ligatures will vary by design and script.

Table that shows 2 letters in backing store and the corresponding required ligatures form.
Estrangelo style font

Connection forms

Feature Tag: "calt"

In specified situations, replaces default glyphs with alternate forms that provide better joining behavior. Used in script typefaces which are designed to have some or all of their glyphs join. The 'calt' table specifies the context in which each substitution occurs, and maps one or more default glyphs to replacement glyphs (GSUB lookup type 6).

"Example: If a Gamal is preceded by a Teth, a short expansion glyph is inserted between the glyphs for a better connection (to prevent the characters from crashing). In the picture, uni0713.fina is substituted with uni0640.short + uni0713.fina when the context of the letter before is uni071B.init.
Table that shows 2 letters in backing store and the corresponding calt form glyph.
Serto style font

Standard ligatures

Feature Tag: "liga"

The 'liga' feature is used to map glyphs to their optional ligated form. Font developers should use this table for all ligatures that they want the user to be able to control by user preference. Uniscribe has a flag that will allow this type of feature to be deactivated. The 'liga' feature maps sequences of glyphs to corresponding ligatures (GSUB lookup type 4). Ligatures with more components must be stored ahead of those with fewer components in order to be found. See Ordering ligatures in the Encoding section of the OpenType Font Development document. The set of optional ligatures will vary by typeface design and script.

Note: Ligatures that should be formed all of the time should not be included in this feature type. Required ligatures are defined in the 'rlig' table.

Table that shows 2 letters in backing store and the corresponding liga form glyph.
Estrangelo style font. Note that if Alaph has been replaced with a joining form under med2, then the joining form will need to be specified as the source of the liga/rlig replacement for Lamadh + Alaph.

Discretionary ligatures

Feature Tag: "dlig"

The 'dlig' feature is also used to map glyphs to their optional ligated form. Font developers should use this table for all ligatures that they want the user to be able to control by user preference. Uniscribe has a flag that will allow this type of feature to be deactivated. The 'dlig' feature maps sequences of glyphs to corresponding ligatures (GSUB lookup type 4). Ligatures with more components must be stored ahead of those with fewer components in order to be found. See Ordering ligatures in the Encoding section of the OpenType Font Development document. The set of optional ligatures will vary by typeface design and script.

Table that shows 2 letters in backing store and the corresponding D lig form glyph.
Serto style font

Kerning

Feature Tag: "kern"

The 'kern' feature is used to adjust amount of space between glyphs, generally to provide optically consistent spacing between glyphs. Although a well-designed typeface has consistent inter-glyph spacing overall, some glyph combinations require adjustment for improved legibility. Besides standard adjustment in either horizontal or vertical direction, this feature can supply size-dependent kerning data via device tables, "cross-stream" kerning in the Y text direction, and adjustment of glyph placement independent of the advance adjustment. Note that this feature would not be used in monospaced fonts.

The font stores a set of adjustments for pairs of glyphs (GPOS lookup type 2 or 8). These may be stored as one or more tables matching left and right classes, and/or as individual pairs. If both forms are used, the classes should be listed last, so as to provide a means to replace any non-ideal values that may result from the class tables. Additional adjustments may be provided for larger sets of glyphs (e.g., triplets, quadruplets, etc.) to overwrite the results of pair kerns in particular combinations. These should precede the pairs.

Screenshot that shows glyphs being kerned in Microsoft VOLT.
Creating kern table using Microsoft VOLT

Mark to base positioning

Feature Tag: "mark"

The 'mark' feature positions mark glyphs in relation to a base glyph, or a ligature glyph. This feature may be implemented as a MarkToBase Attachment lookup (GPOS LookupType = 4) or a MarkToLigature Attachment lookup (GPOS LookupType = 5).

Screenshot that shows the Mark On Base mod lookup in Microsoft Volt.
Positioning mark to base using Microsoft VOLT

Mark to mark positioning

Feature Tag: "mkmk"

The 'mkmk' feature positions mark glyphs in relation to another mark glyph. This feature may be implemented as a MarkToMark Attachment lookup (GPOS LookupType = 6).

Screenshot that shows the Above Mark To Mark mod lookup in Microsoft Volt.
Positioning mark to mark using Microsoft VOLT

Appendix

Appendix A: Writing System Tags

Features are encoded according to both a designated script and language system. The language system tag specifies a typographic convention associated with a language or linguistic subgroup.

Currently, the Uniscribe engine only supports the "default" language for each script. However, font developers may want to build language specific features which are supported in other applications and will be supported in future Microsoft OpenType implementations.

*** NOTE:** It is strongly recommended to include the "dflt" language tag in all OpenType fonts because it defines the basic script handling for a font. The "dflt" language system is used as the default if no other language specific features are defined or if the application does not support that particular language. If the "dflt" tag is not present for the script being used, the font may not work in some applications.

The following tables list the registered tag names for scripts and language systems.

Registered tags for the Syriac script Registered tags for Syriac language systems
Script tag Script Language system tag Language
"syrc" Syriac "dflt" *default script handling
"SYR " Syriac

Note: both the script and language tags are case sensitive (script tags should be lowercase, language tags are all caps) and must contain four characters (ie. you must add a space to the three character language tags).