OpenType development

Legacy information: We're no longer updating this content regularly.

Introduction

The OpenType Layout model provides a powerful architecture for supporting complex scripts and advanced typography. The infrastructure has three components:

  • A publicly specified file format that supports advanced typographic layout information (OpenType)
  • Windows System Services that produce "shaped" and positioned glyph strings from character strings (RichEdit and Uniscribe, the Unicode Script Processor)
  • A freely available, cross-platform library for low-level access to layout information and layout operations (OpenType Layout Services Library) OpenType font format

OpenType fonts may contain either TrueType or PostScript outlines. The fonts are Unicode-based and allow a rich mapping between characters and glyphs. This enables support for ligatures, positional forms, alternates, and other substitutions. OpenType fonts also may include information that supports two-dimensional glyph positioning and glyph attachment.

Layout features within OpenType fonts are organized by scripts and languages. Thus allowing a single font to support multiple writing systems, even within the same script.

OpenType fonts are not dependent on a single character-encoding scheme, and in fact the format supports all the encoding schemes in common use today. Internally, however, all OpenType fonts are "plumbed" with Unicode.

Windows system services

Windows provides service libraries that assist applications in text-layout operations. Many Microsoft applications now use these libraries, which provide consistency, save development time, and insulate product developers from many complex script issues. These libraries are publicly exposed as a part of the operating system, are documented in the Windows SDK, and are governed by the same licensing restrictions as the rest of the operating system.

Although any text layout client may use these services to perform the bulk of text layout, the interfaces are also designed to allow clients to use the services to augment the operation of their own proprietary text engines.

Unicode script processor

The Unicode Script Processor (USP10.DLL) is a collection of API's that enable a text-layout client to format complex scripts. The Unicode Script Processor, aka "Uniscribe," supports the complex rules found in scripts such as Arabic, Indian, and Thai. Uniscribe also handles scripts written from right-to-left, such as Arabic or Hebrew, and supports the mixing of scripts.

Uniscribe has multiple shaping engines that contain the layout knowledge for particular scripts (for example, Arabic, Hebrew, Thai, Hindi, Tamil). In addition, there is an OpenType Layout shaping engine for handling script features unknown to Uniscribe. Uniscribe provides character-to-glyph mapping; dx,dy positioning; line breaking at word boundaries; hit testing and cursor positioning.

Uniscribe subdivides strings of characters into "items" (a character string having all the same script and direction attributes), "runs" (portions of an item that have continuous formatting attributes), and "clusters" (script-defined, indivisible character groupings). A client builds runs based on its own stored formatting attributes, and on the item boundaries obtained from Uniscribe.

Using Uniscribe, clients need only manage a backing store of Unicode character codes, typed by the user in "logical order" (as defined by Unicode). Text-layout clients do not need to maintain any other buffer or mapping table to track character order, and the backing store never changes as a result of layout operations.

Clients of Uniscribe include: Win32 API's, plain text applets, edit controls, RichEdit 3.0, Wordpad, Office9, Internet Explorer 4.0+, FrontPage Express, Outlook Express. It ships with Windows 2000, Internet Explorer 4.0 and greater. Uniscribe may also be used on NT4, Windows 95 and Windows 98 systems.

RichEdit

RichEdit is a higher-level collection of interfaces that may be used to call Uniscribe or other shaping engines or routines. RichEdit serves to further insulate text-layout clients from the complexities of certain scripts.

RichEdit 3.0 provides fast, versatile editing of rich Unicode multilingual text and simple plain text. It includes extensive message and COM interfaces. Features include text editing, formatting, line breaking, simple table layout, vertical text layout, bidirectional text layout, Indic and Thai support, a Word edit UI, and Text Object Model interfaces.

RichEdit is the simplest way for a client to support features of complex scripts. Clients use the TextOut function of RichEdit to automatically parse, shape, position, and break lines.

RichEdit is designed for clients whose primary purpose is not necessarily text layout, but who nonetheless need to display complex scripts.

OpenType Layout Services Library (OTLS)

The OTLS is a set of text-processing helper functions. The services simplify the job of text processing by insulating a client application from the details of the font file format. The services library is freely available under license from Microsoft, and will ultimately be distributed with the operating system.

Although any text-layout client may use these services to perform the bulk of text layout, the interfaces are also designed to allow clients to use the services to augment the operation of their own proprietary text engines.

The OTLS allow clients to work at the more familiar level of features and characters. The OTLS will handle the details of lookup tables and glyph ID's, which may not be as familiar.

The simplest way to use the OTLS is to identify sections of text with OTLS features and use the text layout functions for all text-processing operations. In this approach, the client is still responsible for deciding where to break lines, whether to do justification, and whether to layout text in a device-dependent or device-independent fashion.

A more sophisticated client may directly manipulate the data structures of the text, enabling the management of glyph substitution or positioning. Sophisticated clients may also choose to intercept or replace resource management calls in order to handle memory allocation or access font tables.

The OTLS can be used as a set of shareable functions (as a DLL), or used indirectly by applications calling Uniscribe or RichEdit.

Encoding

Much of the knowledge about laying out text and the semantics of languages is embodied in the system components. This model ensures consistency in the layout operations that are required to arrive at the basic form, and relieves a font developer from having to define generalized script rules within a font (as is the case with Apple TrueType GX fonts).

Some clients may introduce their own knowledge or preferences regarding script layout; and OpenType Layout fonts may contain layout features that duplicate or override those applied by OS services. The layered structure of OS services supporting text-layout allows a client to choose what layout information to use, and how to apply it. However, such architecture also presents the possibility that duplicate feature information and layout intelligence may exist in more than one place.

At a minimum, font developers should be able to expect that applications have knowledge of (or services for executing) script rules as defined in the Unicode Standard. Application developers should be able to expect that fonts have glyphs and positioning information representing layout features as defined by the Unicode Standard.

Features

All of the features described in each of the script or language documents are "registered" and supported by Uniscribe and clients of USP, such as RichEdit. Thus, they provide a means for applications to work with operating system services to layout complex scripts.

Regardless of the model an application chooses for supporting layout of complex scripts, Uniscribe requires a fixed order for executing features within a run of text to consistently obtain the proper basic form. The feature order is different for each script or language system and is described within those documents.

Ordering lookups

Following the OpenType specification, the font developer defines the lookup sequence in the lookup array to control the order a text processing client uses to apply lookup data to glyph substitution and positioning operations. The order of the lookup within the feature tag is critical for desired processing. The lookup you define first will take priority.

Lookup

  • Example: If you had 2 ligatures AB + BC defined in your lookup table, with the BC listed first, and you typed ‘ABC’, you would only get the BC ligature, and not the AB, because the B was already converted into the BC ligature.

Ordering ligatures and conjuncts

To ensure that ligatures and conjuncts are formed properly, substitutions must be ordered so those with higher priority take precedence. It is also important to form longer lookups before shorter ones.

When forming ligatures, lookups must be encoded as follows:

  • The first substitution in a lookup maps the longest string of component characters to the appropriate glyph. The next substitution provides the glyph corresponding to the next longest string of characters, and so on. This is very important because the search process through the lookups terminates with the first match.
  • For consonant conjuncts, full-form conjuncts must precede half forms.

Ordering1

  • For fi & ffi ligatures, feature tag ‘liga’, if you ordered 'uni0066, uni0069 -> uniFB00' before 'uni0066, uni0066, uni0069 -> uniFB03' the ffi ligature would not be formed, because the search process stopped with the fi.

Ordering2

  • When the longer lookup is listed first, the ffi ligature is formed correctly.

Client support

Supporting dx, dy arrays

Developing fonts > Specifications

OpenType development (3 of 5): Client support

Supporting dx, dy arrays

Applications can support dx, dy positioning of glyphs if they are running on Windows NT5 and using USP and/or RichEdit 3.0 to create and position the glyph string.

ExtTextOut does not support delta y positioning on Windows NT4, Windows 98, or Windows 95. Applications running on these platforms must call ExtTextOut multiple times in order to support dx, dy positioning arrays, as follows:

  1. Use the OTLS to produce the glyph string from character string.
  2. Use the OTLS to obtain the advance width information for each glyph.
  3. Create a buffer containing only glyphs with the same delta y value (say, dy1). Replace glyphs having other delta y values with placeholders of the associated advance widths.
  4. Call ExtTextOut to position the glyphs with the same delta y (dy1).
  5. Replace appropriate placeholders with next group of glyphs, all of which have the same delta y (say, dy2).
  6. Call ExtTextOut to position the glyphs with the same delta y (dy2).
  7. Replace appropriate placeholders with next group of glyphs, all of which have the same delta y (say, dy3).
  8. Call ExtTextOut to position the glyphs with the same delta y (dy3).
  9. Continue replacing placeholders and calling ExtTextOut for each group of glyphs with discrete delta y values.

This process is illustrated in the diagram below.

Making multiple ExtTextOutcalls to perform dy positioning on systems without dx, dy support:

Diagram

Suggested glyphs

General punctuation and ‘Latin’ numbers

In addition to script and language specific punctuation and native numbers, general punctuation and 'Latin' numbers are highly recommended for inclusion in all OpenType Layout fonts.

Unicode range 0020 to 003F

  • Unicode range 0020 to 003F

The euro

The European currency sign called the 'euro', should also be included in all OpenType Layout fonts. The Unicode assignment of the 'euro' symbol is U+20AC.

Euro

Suggested glyphs for Microsoft Office

These 41 glyphs are recommended for inclusion in all OpenType Layout fonts so they will function properly in Microsoft Office applications:

41 Recommended

Suggested glyphs for complex scripts

Combining marks and signs that appear in text not in conjunction with a valid consonant base are considered invalid. Uniscribe displays these marks using the fallback rendering mechanism, on a dotted circle. For the fallback mechanism to work properly, an OTL font should contain a glyph for the dotted circle (U+25CC). In case this glyph is missing from the font, the invalid signs will be displayed on the missing glyph shape (white box).

To render a sign standalone (in apparent isolation from any base) one should apply it on a space. Uniscribe requires a ZWJ (zero width joiner U+200C) to be placed between the space and a mark for them to combine into a standalone sign. A ZWNJ (zero width non-joiner; U+200D) can be used between two letters to prevent them from forming a cursive connection.

ZWNJ

Suggested glyphs for right-to-left scripts

In addition to the above glyphs for complex scripts, directional marks for right-to-left scripts should be included: LTR (left-to-right mark; U+200E), and RTL (right-to-left mark; U+200F).

Tools

Tools available for building OpenType fonts supporting Arabic fonts are the same used for the creation of all OpenType Layout fonts. Click through the links for download or availability information.

  • ADDTABLE - A tool for adding tables to existing fonts. The tool properly updates offset entries and checksum values.

  • TTOASM and TTODASM - Assembler/disassembler of OpenType Layout tables. These tools create binary table files that can then be added to an existing font using the ADDTABLE tool.

  • VOLT Visual OpenType Layout Tool (VOLT) - This tool is used to visually specify ligatures, other glyph substitution operations, and glyph positioning operations. The tool automatically builds the source files required for the TTOASM assembler.

  • VTT Visual TrueType (VTT) - This tool is used for hinting OpenType fonts, and also includes the TTOASM tool.

  • SIGNCODE (OpenType Font Signing Tool) - This tool is used to add a digital signature to OpenType fonts, indicating the publisher and "sealing" the bits of the font.

End of OpenType Development document.