Unicode Math Calligraphic Alphabets

Unicode needs a way to encode bold and regular math Calligraphic/Chancery alphabets as well as bold and regular script alphabets, since it turns out that Calligraphic and Script alphabets are used contrastively by some authors and [La]TeX has had both kinds of letters. In most documents, Script and Calligraphic shapes can be substituted for one other pretty much as a choice of font. Accordingly when the math alphanumeric symbols were added to Unicode, the two shapes were unified. But since then we have come to realize that the two kinds of shapes aren’t always interchangeable and that we need a way to distinguish calligraphic from script in the same document. This post discusses two ways to do this is spite of the quandary that some math fonts have calligraphic letters at the existing math script code points, while others and the Unicode Standard have fancy script letters at those code points. Note from John Hudson: from a typographic and paleographic perspective, it's more precise to name 'fancy script' as round hand, more specifically English round hand, and 'calligraphic' as chancery, more specifically Italian chancery. There's a long history of such typography.

First, here’s an example of script and calligraphic F’s being used in the same document:

And here are examples featuring P’s and C’s in which script letters denote infinity categories

Accordingly the need for both script and calligraphic alphabets is attested.

Let’s turn now to the unfortunate fact that the current math script alphabets may be fancy script in one font and calligraphic in another. Cambria Math, the first widely used Unicode math font, has calligraphic letters at the math script code points, while STIX and the Unicode Standard have fancy script letters at those code points. For example, here’s the upper-case math script H (U+210B) in Cambria Math followed by the one in STIX:

We really can’t change Cambria Math’s math script alphabet choice at this late stage in computing history; too many documents use it. Consequently it is inadequate to add only bold and regular Calligraphic alphabets, expecting the current bold and regular script alphabets to fulfil the need for bold and regular math script alphabets. Unfortunately, the latter are deliberately ambiguous with respect to calligraphic versus script.

There are two unambiguous ways to allow math script and math calligraphic symbols to appear in the same plain text document:

1)      Follow a character in the current math script alphabets with one of two variation selectors similar to the way we use variation selectors (U+FE0E, U+FE0F) for emoji to force text and emoji glyphs, respectively. Specifically, to ensure use of the math calligraphic alphabet, follow the current math script letter with U+FE00. To ensure use of the math fancy script alphabet, follow the current math script letters with U+FE01.

2)      Add four new unambiguous math alphabets: bold and regular, fancy script and calligraphic, leaving the current math script alphabets as ambiguous.

The variation selector choice has the advantages

a)       Contemporary software supports variation sequences for East Asia and emoji, so adding new variation sequences shouldn’t be much of a burden

b)      The variation selector U+FE00 is already used with a number of math operators

c)       No new code points need to be allocated

d)      Typical documents can continue to do what they have been doing: ignore the distinction

e)       If a math font doesn’t support the variation sequences, it falls back naturally to the current script/calligraphic letters instead of displaying the missing-glyph box

These advantages together with the fact that the majority of documents don’t require a script/calligraphic distinction seem to make the variation selector approach preferable. Adding two variation selectors for the math script letters may make people ask why the math alphabets weren't implemented with variation selectors in the first place. They were considered, but the Unicode Technical Committee was concerned that people might misuse them to encode rich-text properties which are not the domain of plain text. Adding two variation selectors seems to solve the present calligraphic quandary quite well, although the use of variation selectors is generally a poor one for situations where symbol shapes need to be used in a contrastive manner. This case should therefore not serve as a general precedent, but should be seen as an exception, tailored to fit this specific case.

In fact, LaTeX has the \mathsf{} and \mathsfit{} control words for math sans serif upright and italic characters, respectively, and they work with Greek letters. Unlike the calligraphic/script distinction which is seldom used contrastively, upright and italic are usually used contrastively in mathematics. Unicode has normal weight upright and italic sans serif math alphabets corresponding to the ASCII letters, but not for the Greek letters. Accordingly, these two math Greek alphabets will probably be added, perhaps in the range U+1D3F80..U+1D3FF. This range has been reserved for math alphanumeric symbols and immediately precedes the Mathematical Alphanumeric Symbols block at U+1D400..U+1D7FF.

It might also be worthwhile for programs like Word to have a math document-level property that specifies which script/calligraphic alphabet to use for the whole document. Then a user who wants the fancy script glyphs could get them without making any changes except for choosing the desired document property setting. A similar setting could be used for choosing sans serif alphabets as the default. It appears such alphabets are often used in chemical formulas.

The choice of calligraphic glyphs for the math script letters in Cambria Math is partly my fault. I had expected to see fancy script letters in Cambria Math as in the Unicode code charts. In my physics career I used math script letters a lot, starting with my PhD thesis on laser theory (1967) and followed by many published papers in the Physical Review and elsewhere and in my three books on lasers and quantum optics. Occasionally in a review article, calligraphic letters were substituted for the fancy script letters because the publishers didn’t have the latter. And in the early days, the IBM Selectric Script ball and the script daisy wheels only had calligraphic letters. So I kind of got used to this substitution.

In addition, Cambria Math was designed partly to look really good on screens, which didn’t have the resolution to display the narrow stem widths of Times New Roman and fancy script letters well. ClearType rendering certainly helped, but it seemed like a good idea to use less resolution demanding calligraphic letters. (Later Word 2013 disabled ClearType for various reasons and many readers of this blog have complained passionately ever since! With high resolution screens as on my Samsung laptop and the Surface Book, even Times New Roman looks crisp and nice with only gray-scale antialiasing, so hopefully this problem will diminish in time.) In contrast, it’s appropriate that the STIX font, based on Times Roman with its narrow glyph stems, would have the fancy script glyphs. With the mechanism described here, people could use calligraphic and script letters contrastively in the same document (assuming the fonts add the missing glyphs).