Creating and Supporting OpenType Fonts for the Javanese Script

Introduction

This document targets developers implementing shaping behavior compatible with the Microsoft OpenType specification for the Javanese script. It contains information about terminology, font features and behavior of the Javanese shaping engine. While it does not contain instructions for creating Javanese fonts, it will help font developers understand how the Javanese shaping engine processes Javanese text.

This document presents information that will help font developers in creating OpenType fonts for Javanese script as covered by the Unicode Standard 6.3. The Javanese script is used to write the Javanese language. It is also used to write other languages including Sanskrit, Sasak, and Sundanese.

NOTE: Starting in Windows 10, Javanese will be supported by the Universal Shaping Engine rather than a stand-alone shaping engine. Moving forward, developers should refer to this new specification.

Terms

The following terms are useful for understanding the layout features and script rules discussed in this document.

Base glyph – Any glyph that can have a diacritic mark attached to it. Layout operations are defined in terms of a base glyph, not a base character, as a ligature may act as a base

Character – Each character represents a Unicode character code point. A character may have multiple glyph forms

Cluster – A group of characters that form an integral unit in Brahmi-derived scripts, oftentimes this corresponds to a syllable

Consonant – Javanese consonants have an inherent vowel (the short vowel /a/). For example, “Ka” and “Ta”, rather than just “K” or “T”

Consonant conjunct (aka ‘conjunct’) – A ligature of two or more consonants

Format controls – special formatting characters used in the shaping process of Javanese scripts (U+200c and U+200D). These characters have no visual appearance, except when an application chooses to display zero width glyphs

Glyph – A glyph represents a form of one or more characters

Halant – The character used after a consonant to “strip” it of its inherent vowel

Ligature – A combination of glyphs that join to form a single glyph

Matra (dependent vowel) – Used to represent a vowel sound that is not inherent to the consonant. Dependent vowels are referred to as “matras” in Sanskrit. They are always depicted in combination with a single consonant, or with a consonant cluster

OpenType layout engine – The library responsible for executing OpenType layout features in a font. In the Microsoft text formatting stack, it is named OTLS (OpenType layout services)

OpenType tag – A 4-byte identifier for script, language system or feature in the font

Shaping engine - Code responsible for shaping input, classified to a particular script

How the Javanese shaping engine works

The Uniscribe Javanese shaping engine processes text in stages. The stages are:

  1. Analyzing the characters
  2. Reordering pre-base vowels (VPre)
  3. Apply OpenType GSUB features for localized forms and basic shaping forms
  4. Reordering Medial Ra (MR)
  5. Applying OpenType GSUB presentation features
  6. Applying OpenType GPOS features to position glyphs or marks

The descriptions which follow will help font developers understand the rationale for the Javanese feature encoding model, and help application developers better understand how layout clients can divide responsibilities with operating system functions.

Analyzing the characters

The run of text that the shaping engine receives for the purpose of shaping is a sequence of Unicode characters. The shaping engine divides the text into syllable clusters and identifies character properties. Character properties are used in parsing syllables and identifying their parts as well as determining whether any contextual reordering is required.

In the diagrams below, the rules for forming clusters are given in terms of the classes of characters in the character stream. The meanings of the symbols are:

C Consonants (A984, A989–A98B, A98F–A9B2)
D Javanese digits (A9D0–A9D9)
GB Generic base characters (00A0, 00D7, 2012–2015, 2022, 25CC, 25FB–25FE)
H Halant/virama (A9C0)
IV Independent vowel (A985–A988, A98C–A98E)
J Joiners (200C, 200D)
M Modifiers (A980–A983)
MR Medial consonants Ra (A9BF)
MY Medial consonant Ya (A9BE)
N Nukta/Cecak Telu (A9B3)
O SCRIPT_COMMON characters in a Javanese run
P Punctuation (A9C1–A9CD)
R Reserved characters from the Javanese block (A9CE, A9DA–A9DD)
S Symbols (A9CF, A9DE, A9DF)
VAbv Above base dependent vowel (A9B6, A9B7, A9BC)
VBlw Below base dependent vowel (A9B8, A9B9)
VPre Pre base dependent vowel (A9BA, A9BB)
VPst Post base dependent vowel (A9B4, A9B5, A9BD)
VS Variation selectors (FE00–FE0F)
WJ Word joiner (2060)
WS White space (any white space character including ZWSP)
X* sequence of zero or more occurrences of X. Since this could extend a cluster indefinitely an arbitrary limit of 31 characters in a sequence has been used
X+ Below base dependent vowel (102F, 1030, 1058, 1059)
VPre Pre base dependent vowel (1031, 1084)
VPst Post base dependent vowel (102B, 102C, 1056, 1057, 1062, 1067, 1068, 1083)
VS Variation selectors (FE00–FE0F)
WJ Word joiner (2060)
WS White space (any white space character including ZWSP)
X* sequence of zero or more occurrences of X. Since this could extend a cluster indefinitely an arbitrary limit of 31 characters in a sequence has been used
X+ sequence of one or more occurrences of X
<X | Y> disjunction of elements: X or Y
[X] optional (zero or one) occurrence of X
# occurrence of a boundary
× no boundary allowed at indicated position
÷ boundary allowed at indicated position
^ Except

The shaping engine inserts a placeholder glyph (U+25CC) wherever combining marks occur without a valid base. The character U+25CC belongs to the class of generic bases (GB). Well-formed Javanese character clusters are defined as follows:

Simple non-compounding cluster

< IV | P | D | S | R | WS | O | WJ >

Independent vowels (IV), punctuation (P), digits (D), symbols (S), reserved characters from the Javanese block (R), white space (WS), other SCRIPT_COMMON characters (O), and word joiner (WJ) contain one character per cluster.

Cluster terminating in Halant

< C | GB > [VS] [N] (H C [VS] [N])* H

A consonant or generic base and optional variation selector and optional nukta. When Halant follows a base or stack it will form a cluster. Any character other than C following the Halant will terminate the cluster after the Halant.

Complex cluster

< C | GB > [VS] [N] (H C [VS] [N]) [MCR] [MCY] (VPre) (VAbv) (VBlw) (M)*

A consonant or generic base and optional variation selector and optional nukta. Zero or more stacked consonants and optional variation selector and optional nukta. Zero or one of each medial consonant. Zero or more dependent pre-base and above-base vowels. Zero or more dependent below-base and post-base vowels. Zero or more modifiers.

The cluster rules need to permit this level of complexity in order to be able to handle the full range of possible encoded sequences. However, only test cases will try to exercise more than a few of these positions in a single cluster.

Reordering pre-base vowels (VPre)

Once the Javanese shaping engine has analyzed the run into clusters as described above, it performs any required reordering. Pre-base vowels (VPre) are reordered to the start of the syllable cluster. A sequence of multiple pre-base vowels is permitted. Such sequences are moved as a block to the beginning of the cluster. In the following example, the run of code points represents a single cluster.

INPUT

A98F A9C0 A98F A9BF A9BE A9BA A9BA A9B7

Base Medial Ra Pre-base vowels

REORDERED

A9BA A9BA A98F A9C0 A98F A9BF A9BE A9B7

Note: Medial Ra (U+A9BF) does not reorder until later in the shaping process.

Note: the OpenType lookups in a Javanese font must be written to match glyph sequences after re-ordering has occurred. OpenType fonts should not have substitutions that attempt to perform the re-ordering. If a font developer attempted to encode such reordering information in an

OpenType font, they would need to add a huge number of many-to-many glyph mappings to cover the general algorithms that a shaping engine will use.

Apply OpenType GSUB features for localized forms and basic shaping forms

Uniscribe calls OTLS to apply the features. OTL processing is divided into a sets of predefined features (described and illustrated in the Features section of this document). The first application of GSUB features are applied per cluster in the following order.

A. Localized forms

  • Apply feature ‘locl’ to preprocess any localized forms for the current language

B. Basic Shaping forms

  • Apply feature ‘pref’ to get pre-base glyph forms
  • Apply feature ‘abvf’ to get above-base glyph forms
  • Apply feature ‘blwf’ to get below-base glyph forms
  • Apply feature ‘pstf’ to get post-base forms

Note: not all of the features listed here need to be used when defining a font for the Javanese script.

Reorder Medial Ra

Medial Ra reorders in a given syllable depending on the context and the font. The shaping engine uses the following logic to determine whether to reorder Medial Ra:

Diagram that shows the logic used to determine whether to reorder Medial Ra.

In the Javanese shaping engine, the ‘pref’ feature should only be used to substitute a Medial Ra glyph for the pre-base form of Medial Ra. That pre-base form should consist of a single glyph.

Ligatures of Medial Ra are treated as marks that positioned relative to the base character. Therefore, any ligatures with Medial Ra should not reorder before the base.

INPUT

A9BA A9BA A98F A9C0 A98F A9BF A9BE A9B7

Pre-base vowels Base Medial Ra

REORDERED

A9BA A9BA A9BF A98F A9C0 A98F A9BE A9B7

Apply OpenType GSUB features for presentation forms

The presentation form features are applied simultaneously over the entire run. Therefore, several features are operationally equivalent to a single feature. The order of application, therefore, is the order of features defined in the font.

C. Presentation forms

  • Apply feature ‘pres’ to substitute pre-base glyph forms
  • Apply feature ‘abvs’ to substitute above-base glyph forms
  • Apply feature ‘blws’ to substitute below-base glyph forms
  • Apply feature ‘psts’ to substitute post-base glyph forms
  • Apply feature ‘ccmp’ to substitute glyph composition/decomposition glyph forms
  • Apply feature ‘rlig’ to substitute required ligature glyph forms
  • Apply feature ‘liga’ to substitute standard ligature glyph forms
  • Apply feature ‘clig’ to substitute contextual ligature glyph forms
  • •Apply feature ‘calt’ to substitute contextual alternate glyph forms

Apply OpenType GPOS features

The shaping engine next processes the GPOS (glyph positioning) table, applying features concerned with positioning. All features are applied simultaneously to the entire run.

The font developer must consider the effects of re-ordering when creating the GPOS feature and lookup tables.

D. Kerning

  • Apply feature ‘dist’ to make any required distance adjustments
  • Apply feature ‘kern’ to provide pair kerning between glyphs for better typographic quality. Note this feature may be disabled by some applications

E. Mark Placement

  • Apply feature ‘mark’ to position diacritic glyphs relative to the base glyph
  • Apply feature ‘mkmk’ to position diacritic glyphs relative to each other

Features of the Javanese Script

The features listed below have been defined to create the basic forms for languages that use the Javanese script. Regardless of the model an application chooses for supporting layout of complex scripts, Uniscribe requires a fixed order for executing localized and basic shaping form features within a run of text to consistently obtain the proper basic form. This is achieved by calling features one-by-one in the standard order listed below.

The order of the lookups within each feature is also very important. For more information on lookups and defining features in OpenType fonts, see Encoding feature information in the OpenType font development section.

The standard order for applying Javanese features encoded in OpenType fonts:

Feature Feature function Layout operation Required
Localized forms

locl

GSUB

Basic shaping forms

pref

Pre-base forms

GSUB

X

abvf

Above-base forms

GSUB

X

blwf

Below-base forms

GSUB

X

pstf

Post-base forms

GSUB

X

Presentation forms

pres

Pre-base substitutions

GSUB

X

abvs

Above-base substitutions

GSUB

X

blws

Below-base substitutions

GSUB

X

psts

Post-base substitutions

GSUB

X

ccmp

Glyph composition/decomposition

GSUB

X

rlig

Required ligatures

GSUB

X

liga

Standard ligatures

GSUB

clig

Contextual ligatures

GSUB

calt

Contextual alternates

GSUB

Kerning

kern

Pair kerning

GPOS

dist

Distance adjustments

GPOS

X

Mark placement

mark

Mark positioning

GPOS

mkmk

Mark to mark positioning

GPOS

[GSUB = glyph substitution, GPOS = glyph positioning]

Feature examples

The registered features described and illustrated in this document are based on the Microsoft OpenType font Javanese Text (javatext.ttf). Javanese Text contains layout information and glyphs to support all of the required features of the Javanese script.

The illustrations in the following examples show the result of that particular feature being applied. Features must be written to match glyph sequences after re-ordering has occurred. Note that the input context for a feature may be the result of a previous feature having already been applied.

Localized forms

Feature Tag: “locl”

This feature is used in association with OpenType language system tags to trigger lookups that will select alternate glyphs needed for language-specific typographic conventions. The ‘locl’ should not be used in association with the default language system, but only used with other language system tags. See the Appendix of this document for language system tags associated with the Javanese script.

Basic shaping forms

Feature Tag: “pref”

This feature should only be used to substitute the fallback pre-base form of the medial consonant Ra (MR).

Screenshot that shows the 'pref' feature only used to substitute the fallback pre-base form of the medial consonant Ra (M R).

Feature Tag: “abvf”

This feature is used to substitute the above-base forms. The Javanese Text font does not use this feature.

Feature Tag: “blwf”

This feature is used to substitute the below-base forms. For example, the Javanese Text font uses this feature to form stacked consonant glyphs.

Screenshot that shows the 'b l w f' feature used to substitute the below-base forms.

Feature Tag: “pstf”

This feature is used to substitute the post-base forms. For example, the Javanese Text font uses this feature to substitute post-base stacked consonant glyphs.

Screenshot that shows the 'p s t f' feature used to substitute the post-base forms.

Presentation forms

Feature Tag: “pres”

This feature is used to substitute presentation forms involving pre-base elements. For example, the Javanese Text font uses this feature to process ligatures of stacked consonants.

Screenshot that shows the 'pres' feature used to substitute presentation forms involving pre-base elements.

Feature Tag: “abvs”

This feature is used to substitute presentation forms relating to above-base elements. For example, the Javanese Text font uses this feature to form ligatures of above-base marks.

Screenshot that shows the 'a b v s' feature used to substitute presentation forms relating to above-base elements.

Feature Tag: “blws”

This feature is used to substitute presentation forms relating to below-base elements. For example, the Javanese Text font uses this feature to substitute ligatures with below-base vowels.

Screenshot that shows the 'b l w s' feature used to substitute presentation forms relating to below-base elements.

Feature Tag: “psts”

This feature is used to substitute presentation forms relating to post-base elements. For example, the Javanese Text font uses this feature to substitute presentation forms for combinations of medial consonant Ya with a below-base vowel.

Screenshot that shows the 'p s t s' feature used to substitute presentation forms relating to post-base elements.

Feature Tag: “ccmp”

This feature may be used to do glyph composition and decompositions. The Javanese Text font does not use this feature.

Feature Tag: “rlig”

This feature may be used to form required ligatures. The Javanese Text font does not use this feature.

Feature Tag: “liga”

This feature may be used to form standard ligatures. The Javanese Text font does not use this feature.

Feature Tag: “clig”

This feature may be used to form contextual ligatures. The Javanese Text font does not use this feature.

Feature Tag: “calt”

This feature may be used to substitute contextual alternates. The Javanese Text font does not use this feature.

Kerning

Feature Tag: “kern”

This feature may be used to adjust the positioning of glyph pairs. The Javanese Text font does not use this feature.

Feature Tag: “dist”

This feature may be used to adjust distances. The Javanese Text font uses this feature adjust bases to add space required to accommodate the extended swash of the below base Medial Ra mark.

Screenshot that shows the 'dist' feature used to adjust distances.

Note that mark glyphs have their width set to zero by OTLS. If a mark glyph must have width, it is necessary to add back lost width for correct display. The dist feature is a required feature and should be used for this purpose as well as other required distance adjustments.

Mark placement

Feature Tag: “mark”

This feature is used to position marks relative to a base glyph. The Javanese Text font uses this feature to position above and below marks on bases.

Note that the mark feature is a required feature and will always be triggered by the shaping engine.

Screenshot that shows the 'mark' feature used to position marks relative to a base glyph.

Feature Tag: “mkmk”

This feature is used to position marks relative to each other. The Javanese Text font uses this feature to position a sequence of below-base marks.

Note that the mkmk feature is a required feature and will always be triggered by the shaping engine.

Screenshot that shows the 'm k m k' feature used to position marks relative to each other.

Other encoding issues

Handling invalid combining marks

Combining marks and signs that do not occur in conjunction with a valid base are considered invalid. Shaping engine implementations may adopt different strategies for how invalid marks are handled. For example, a shaping engine implementation might treat an invalid mark as a separate cluster and display the stand-alone mark positioned on some default base glyph, such as a dotted circle (U+25CC). (See Fallback Rendering in section 5.13 of the Unicode Standard 4.0.) Shaping engine implementations may vary somewhat with regard to what sequences are or are not considered valid. For instance, some implementations may impose a limit of at most one above-base vowel mark while others may not.

To allow for shaping engine implementations that expect to position an invalid mark on a dotted circle, it is recommended that a Javanese OT font contain a glyph for the dotted circle character, U+25CC, and that appropriate mark positioning lookups are written to position marks relative to it. If this character is not supported in the font, such implementations will display invalid signs on the missing glyph shape (white box).

Unicode code points that are strongly recommended for inclusion in any Javanese font are:

Code point Description

U+200B

Zero Width Space

U+200C

Zero Width Non-Joiner

U+200D

Zero Width Joiner

U+25CC

Dotted Circle

U+002D

Hyphen-minus

U+00A0

No-break space

U+00D7

Multiplication sign

U+2012

Figure dash

U+2013

En dash

U+2014

Em dash

U+2015

Horizontal bar

U+2022

Bullet

U+25FB

White medium square

U+25FC

Black medium square

U+25FD

White medium small square

U+25FE

Black medium small square

These glyphs may be used in text as generic bases and so should be enabled in mark positioning lookups supported by the font. The glyphs are used with shaping engines that recognise them as legitimate bases, and hence not insert the dotted circle base. Due to this a number of characters result in being recommended inclusions in fonts.

Appendix

Writing system and language tags

Features are encoded according to both a designated script and language system. The language system tag specifies a typographic convention associated with a language or linguistic subgroup. For example, there are different language systems defined for the Javanese script: Javanese, Sanskrit, Sasak, and Sundanese.

Not all software applications support specific language tags for use when rendering text runs.

NOTE:It is strongly recommended to include the “dflt” language tag in all OpenType fonts because it defines the basic script handling for a font. The “dflt” language system is used as the default if no other language specific features are defined or if the application does not support that particular language. If the “dflt” tag is not present for the script being used, the font may not work in some applications.

  • NOTE: It is strongly recommended to include the “dflt” language tag in all OpenType fonts because it defines the basic script handling for a font. The “dflt” language system is used as the default if no other language specific features are defined or if the application does not support that particular language. If the “dflt” tag is not present for the script being used, the font may not work in some applications.

The following tables list the registered tag names for scripts and language systems.

Registered tags for the Javanese script Registered tags for Javanese language systems
Script tag Script Language system tag Language

“java”

Javanese

“dflt”

*default script handling

JAV

Javanese

SAN

Sanskrit

SUN

Sundanese

SAS

Sasak

Note: both the script and language tags are case sensitive (script tags should be lowercase, language tags are all caps) and must contain four characters (ie. you must add a space to the three character language tags).