Directionality in Math Zones
In most places, mathematical text is written “left to right” (LTR). For example, in the expression x + y the plus is displayed to the right of the x and the y is displayed to the right of the plus. But in some Arabic locales, mathematical text is written right to left (RTL). Instead of E = mc2, one would see 2cm = E, although the letters would be Arabic, not Latin.
In such RTL locales, square roots are mirrored, so that the surd symbol √ is flipped relative to the vertical axis. Similarly integral signs are mirrored, although the circular arrows in contour integrals are not mirrored, since they pertain to the 2D complex plane, not the 2D text plane.
The Presentation MathML 3.0 specification provides for RTL math zones. In fact, it allows a dir = “ltr” or “rtl” attribute on the top level <math> element as well as on <mrow>, <mstyle> and token elements like <mi>. Except in rare cases, only the <math> direction need be specified, since all the elements inside have the same directionality (see Section 3.15 of the MathML 3.0 specification). The specification has now undergone Last Call status and so we need to have implementations of the new features. Accordingly I’m interested in implementing at least part of the RTL functionality, namely RTL math zones.
First, consider what an LTR math zone is. This is what Word 2007 and the Office 2010 applications implement. It does have RTL text whenever Arabic or standard Hebrew characters appear adjacent to one another. But all operators and other “neutral” characters are considered to be “strong LTR”, that is, they are displayed to the right of the character that precedes them. This can be quite different from a display that obeys the Unicode Bidirectional Algorithm. A sequence of digits is always displayed LTR, regardless of the character that precedes it even outside math zones and according to the Unicode bidi algorithm. Inside LTR math zones a sequence of digits is displayed to the right of the character that precedes it even if that character is Arabic. According to the Unicode bidi algorithm, a number following an Arabic character is displayed to the left of the Arabic character in both LTR and RTL paragraphs. Inside embedded normal text in a math zone, the usual rules for bidi text are followed. Note that except for such text, the math-zone bidi rules are much simpler than those of the Unicode bidi algorithm, which gets quite tricky in complicated scenarios.
Perhaps you noticed the term “standard Hebrew characters” above. By this I mean all Hebrew characters except the four Hebrew letter-like math symbols ALEF SYMBOL, BET SYMBOL, GIMEL SYMBOL, and DALET SYMBOL (U+2135..U+2138). These symbols are strong LTR characters, unlike their HEBREW LETTER counterparts located in the Unicode Hebrew block (U+0590..U+05FF).
Analogously in an RTL math zone and in the absence of directional overrides, operators and other neutrals are treated as strong RTL characters. A sequence of digits is still displayed LTR, but it appears on the left of the character that precedes it even if that character is Latin. Sequences of Arabic and standard Hebrew letters are RTL as usual. At least that’s how I think a typical RTL math zone should be displayed.
This description of math-zone directionality is somewhat simplified compared to the generality encountered in the real world. To see some of the special cases that can happen, please read the papers by Azzeddine Lazrek:
The following review papers are excellent sources for overviews of RTL math: