Advanced Typographic Extensions - OpenType Layout (OpenType 1.6)

The Advanced Typographic tables (OpenType Layout tables) extend the functionality of fonts with either TrueType or CFF outlines. OpenType Layout fonts contain additional information that extends the capabilities of the fonts to support high-quality international typography:

  • OpenType Layout fonts allow a rich mapping between characters and glyphs, which supports ligatures, positional forms, alternates, and other substitutions.
  • OpenType Layout fonts include information to support features for two-dimensional positioning and glyph attachment.
  • OpenType Layout fonts contain explicit script and language information, so a text-processing application can adjust its behavior accordingly.
  • OpenType Layout fonts have an open format that allows font developers to define their own typographical features.

This overview introduces the power and flexibility of the OpenType Layout font model. The OpenType Layout tables are described in more detail in the “Font File Tables” section of the OpenType specification.

OpenType Layout Common Table Formats are documented in the chapter “OpenType Layout Common Table Formats”.

Registered OpenType Layout Tags for scripts, languages, and baselines, are documented in the chapter “OpenType Layout Registered Features”.

OpenType Layout at a Glance

OpenType Layout addresses complex typographical issues that especially affect people using text-processing applications in multi-lingual and non-Latin environments.

OpenType Layout fonts may contain alternative forms of characters and mechanisms for accessing them. For example, in Arabic, the shape of a character often varies with the character’s position in a word. As shown here, the ha character will take any of four shapes, depending on whether it stands alone or whether it falls at the beginning, middle, or end of a word. OpenType Layout helps a text-processing application determine which variant to substitute when composing text.

Glyphs for different positional forms of hah
Figure 1a Isolated, initial, medial, and final forms of the Arabic character ha.

Similarly, OpenType Layout helps an application use the correct forms of characters when text is positioned vertically instead of horizontally, such as with Kanji. For example, Kanji uses alternative forms of parentheses when positioned vertically.

Kanji ideograph with parentheses in horizontal and vertical layout
Figure 1b Alternative forms of parentheses used when positioning Kanji vertically.

The OpenType Layout font format also supports the composition and decomposition of ligatures. For example, English, French, and other languages based on Latin can substitute a single ligature, such as “fi”, for its component glyphs - in this case, “f” and “i”. Conversely, the individual “f” and “i” glyphs could replace the ligature, possibly to give a text-processing application more flexibility when spacing glyphs to fill a line of justified text.

Glyphs for f and i and an f-i ligature glyph
Figure 1c Two Latin glyphs and their associated ligature.
Sequence of three Arabic glyphs and the associated ligature glyph
Figure 1d Three Arabic glyphs and their associated ligature.

Glyph substitution is just one way OpenType Layout extends font capabilities. Using precise X and Y coordinates for positioning glyphs, OpenType Layout fonts also can identify points for attaching one glyph to another to create cursive text and glyphs that need diacritical or other special marks.

OpenType Layout fonts also may contain baseline information that specifies how to position glyphs horizontally or vertically. Because baselines may vary from one script (set of characters) to another, this information is especially useful for aligning text that mixes glyphs from scripts for different languages.

a line of text with both Latin and Arabic scripts
Figure 1e A line of text, baselines adjusted, mixing Latin and Arabic scripts.

TrueType versus OpenType Layout

A TrueType font is a collection of several tables that contain different types of data: glyph outlines, metrics, bitmaps, mapping information, and much more. OpenType Layout fonts contain all this basic information, plus additional tables containing information for advanced typography.

Text-processing applications - referred to as “clients” of OpenType Layout - can retrieve and parse the information in OpenType Layout tables. So, for example, a text-processing client can choose the correct character shapes and space them properly.

As much as possible, the tables of OpenType Layout define only the information that is specific to the font layout. The tables do not try to encode information that remains constant within the conventions of a particular language or the typography of a particular script. Such information that would be replicated across all fonts in a given language belongs in the text-processing application for that language, not in the fonts.

OpenType Layout Terminology

The OpenType Layout model is organized around glyphs, scripts, language systems, and features.

Characters versus glyphs

Users don’t view or print characters: a user views or prints glyphs. A glyph is a representation of a character. The character “capital letter A” is represented by the glyph “A” in Times New Roman Bold and “A” in Arial Bold. A font is a collection of glyphs. To retrieve glyphs, the client uses information in the “cmap” table of the font, which maps the client’s character codes to glyph indices in the table.

Glyphs can also represent combinations of characters and alternative forms of characters: glyphs and characters do not strictly correspond one-to-one. For example, a user might type two characters, which might be better represented with a single ligature glyph. Conversely, the same character might take different forms at the beginning, middle, or end of a word, so a font would need several different glyphs to represent a single character. OpenType Layout fonts contain a table that provides a client with information about possible glyph substitutions.

Alternative ampersand glyphs
Figure 1f Multiple glyphs for the ampersand character.

Scripts

A script is composed of a group of related characters, which may be used by one or more languages. Latin, Arabic, and Thai are examples of scripts. A font may use a single script, or it may use many scripts. Within an OpenType Layout font, scripts are identified by unique 4-byte tags.

a Latin glyph, Kanji glyph, and Arabic glyph
Figure 1g Glyphs in the Latin, Kanji, and Arabic scripts.

Language systems

Scripts, in turn, may be divided into language systems. For example, the Latin script is used to write English, French, or German, but each language has its own special requirements for text processing. A font developer can choose to provide information that is tailored to the script, to the language system, or to both.

Language systems, unlike scripts, are not necessarily evident when a text-processing client examines the characters being used. To avoid ambiguity, the user or the operating system needs to identify the language system. Otherwise, the client will use the default language-system information provided with each script.

three lines of text, in English, French and German
Figure 1h Differences in the English, French, and German language system.

Features

Features define the basic functionality of the font. A font that contains tables to handle diacritical marks will have a “mark” feature. A font that supports substitution of vertical glyphs will have a “vert”  feature.

The OpenType Layout feature model provides great flexibility to font developers because features do not have to be predefined by Microsoft Corporation. Instead, font developers can work with application developers to determine useful features for fonts, add such features to OpenType Layout fonts, and enable client applications to support such features.

block diagram showing script, language system and feature table organization
Figure 1i The relationship of scripts, language systems, features, and lookups for substitution and positioning tables.

OpenType Layout tables

OpenType Layout comprises five new tables: GSUB, GPOS, BASE, JSTF, and GDEF. These tables and their formats are discussed in detail in the chapters that follow this overview.

GSUB: Contains information about glyph substitutions to handle single glyph substitution, one-to-many substitution (ligature decomposition), aesthetic alternatives, multiple glyph substitution (ligatures), and contextual glyph substitution.

GPOS: Contains information about X and Y positioning of glyphs to handle single glyph adjustment, adjustment of paired glyphs, cursive attachment, mark attachment, and contextual glyph positioning.

BASE: Contains information about baseline offsets on a script-by-script basis.

JSTF: Contains justification information, including whitespace and Kashida adjustments.

GDEF: Contains information about all individual glyphs in the font: type (simple glyph, ligature, or combining mark), attachment points (if any), and ligature caret (if a ligature glyph).

Common Table Formats: Several common table formats are used by the OpenType Layout tables.

Text processing with OpenType Layout fonts

A text-processing client follows a standard process to convert the string of characters entered by a user into positioned glyphs. To produce text with OpenType Layout fonts:

  1. Using the cmap table in the font, the client converts the character codes into a string of glyph indices.
  2. Using information in the GSUB table, the client modifies the resulting string, substituting positional or vertical glyphs, ligatures, or other alternatives as appropriate.
  3. Using positioning information in the GPOS table and baseline offset information in the BASE table, the client then positions the glyphs.
  4. Using design coordinates the client determines device-independent line breaks. Design coordinates are high-resolution and device-independent.
  5. Using information in the JSTF table, the client justifies the lines, if the user has specified such alignment.
  6. The operating system rasterizes the line of glyphs and renders the glyphs in device coordinates that correspond to the resolution of the output device.

Throughout this process the text-processing client keeps track of the association between the character codes for the original text and the glyph indices of the final, rendered text. In addition, the client may save language and script information within the text stream to clearly associate character codes with typographical behavior.

Left-to-right and right-to-left text

When an OpenType text layout engine applies the Unicode bidi algorithm and gets to the point where mirroring needs to be performed on runs with an even, i.e. left-to-right (LTR), resolved level, it does the following:

  1. Glyph-level mirroring:

    Apply feature 'ltrm' to the entire LTR run to substitute mirrored forms.

  2. LTR glyph alternates:

    Apply feature 'ltra' to the entire LTR run to finesse glyph selection.

For runs with an odd, i.e. right-to-left (RTL), resolved level, the engine does the following:

  1. Character-level mirroring:

    For each character i in the RTL run:
        If it is mapped to character j by the OMPL and cmap(j) is non-zero:
            Use glyph cmap(j) at character i

    Here OMPL refers to the OpenType Mirroring Pairs List, and cmap(j) refers to the glyph at code point j in the Unicode cmap subtable.

    For example, suppose U+0028, LEFT PARENTHESIS, occurred in the run at resolved level 1. The glyph at that code point in the run will be replaced by cmap(U+0029), since {U+0028, U+0029} is a pair in the OMPL.

  2. Glyph-level mirroring:

    The engine applies the 'rtlm' feature to the entire RTL run. The feature, if present, substitutes mirrored forms for characters other than those covered by the first elements of OMPL pairs (otherwise, it could cancel the effects of character-level mirroring).

    The data contents of the OMPL are identical to the Bidi Mirroring Glyph Property file of Unicode 5.1, and will never be revised. Thus, it will be up to the 'rtlm' feature to provide, if needed, mirrored forms for both (a) Unicode 5.1 code points with the “mirrored” property but no appropriate Unicode 5.1 character mirrors, as well as (b) all future “mirrored” property additions to Unicode, whether or not character mirrors exist for them.

    With such a division of labor between the layout engine and the font, most fonts will not need to include an 'rtlm' feature, since the mirrored forms in their Unicode cmap subtable would be adequate.

  3. RTL glyph alternates:

    The engine applies the 'rtla' feature to the entire RTL run. The feature, if present, substitutes variants appropriate for right-to-left text (other than mirrored forms).

In practice, the engine may apply features simultaneously; thus, it is up to the font vendor to ensure that the features’ lookups are ordered to achieve the desired effect of the algorithms described above. The engine may optimize its implementation in various ways, e.g. by taking advantage of the fact that character- and glyph-level mirroring won’t both apply on the same element in the run.