RichEdit Text Pointers
A text editor has to provide ways of reading and modifying text. For external clients, the RichEdit editor provides the Text Object Model (TOM) interfaces including ITextRange and ITextSelection.These interfaces are quite efficient, since they are lightweight wrappers around the internal editing machinery. This post describes how RichEdit's design came to be and notes a few pitfalls where a design can be unwieldy from performance and/or maintenance standpoints.
As explained in the post Flyweight RichEdit Controls, an important design criterion was to make plain-text editing fast and small. Accordingly, Christian Fortini’s original model for the text pointers into the RichEdit 2.0 backing store gave priority to plain-text controls. Since RichEdit would also be used for rich-text controls, the design had to accommodate rich text as well. The first attempt was the double-diamond, multiple-inheritance, text pointer hierarchy
CTxtSelection → CTxtRange → CTxtPtr
↑ ↑ ↑
CRchTxtSelection → CRchTxtRange → CRchTxtPtr
Here the CTxtPtr class manipulates the Unicode plain text in the memory backing store, the CTxtRange class manipulates ranges of plain text and the CTxtSelection is a CTxtRange that has added user-interface functionality such as keyboard and mouse handling. The rich-text row in the hierarchy adds the ability to manipulate text runs with different character and paragraph formatting. I implemented this hierarchy back in 1995 partly as an exercise in learning C++. Up to then the only major C++ feature not in C that I had used was operator overloading for handling complex arithmetic elegantly and efficiently in quantum optics calculations.
The double-diamond hierarchy worked. Nevertheless, it seemed overly complex, so one weekend Alex Gounares simplified it to the simple single-inheritance model
CTxtSelection → CTxtRange → CRchTxtPtr
in which CRchTxtPtr contains a CTxtPtr text-run pointer along with similar run pointers for character formatting and paragraph formatting. The resulting riched20.dll went from 145KB down to 90KB! (Now it’s 2.5 MB!) There was a bunch of hidden overhead in the multiple inheritance hierarchy. For sufficiently simple text, the single-inheritance model didn't instantiate any formatting runs, which boosted performance for plain text, a goal of the original model. Ironically the double-diamond inheritance hierarchy turned out to be a bad approach also from a functional point of view, since a multilingual plain-text editor needs multiple fonts to handle multiple fonts and proofing tools need some text run character formatting. As such any international plain-text editor must have at least some degree of richness.
RichEdit 2.0 also shipped with Version 1 of the Text Object Model (TOM). This object model includes the ITextSelection and ITextRange interfaces. CTxtSelection inherits from CTxtRange, since it’s adding UI functionality to a range. Meanwhile ITextSelection inherits from ITextRange. So how can CTxtSelection inherit from ITextSelection without another diamond? For RichEdit up through version 5, we would have had
ITextSelection → ITextRange
CTxtSelection → CTxtRange → CRchTxtPtr
The single inheritance solution for the ranges was to have CTxtRange inherit from ITextSelection and have it return E_NOTIMPL for the selection-specific UI methods. This gives the simplified inheritance
CTxtSelection → CTxtRange → ITextSelection, CRchTxtPtr
RichEdit 6.0 added several more TOM interfaces including ITextRange2 and ITextSelection2. To avoid diamond inheritance, ITextRange2 inherits from ITextSelection and ITextSelection2 inherits from ITextRange2. Unlike ITextSelection, ITextSelection2 doesn’t add any methods to ITextRange2. Starting with RichEdit 6.0, CTxtRange inherits from ITextSelection2 and CTxtSelection continues to inherit from CTxtRange. CTxtRange also inherits from CRchTxtPtr, which has some virtual methods, but the overhead for switching “this” pointers is substantially less than it would be with diamond inheritance.
There are other C++ areas with surprise overhead. Smart pointers have become popular since they don’t need explicit clean up, even when exceptions are thrown. But despite clever operator overloading, smart pointers involve a new language to learn and result in extra steps in debugging. Smart pointers are built into C++ for Windows Universal apps and use the ^ to indicate the smart pointer. Meanwhile in spite of a plethora of new notation, templates, and classes, C++ operators are still mired in the ASCII world, using <= for ≤ and != for ≠. Why not accept these well-defined operators as aliases for the original ASCII operator sequences?
Some habits don’t result in code bloat, but can slow down reading and code maintenance. Some people treat C++ like Lisp, sticking in quantities of unnecessary parentheses and curly braces. When there’s more syntactic sugar to read, you have to wade through it. Mathematics is successful in part due to its conciseness. (Although one shouldn’t be concise to the point of inscrutability). One good technique in writing functions is to return as soon as the results or an error are found. Often you find code where the only return is at the end of a function which may force the code to be deeply nested in hard to follow curly braces.
There’s a moral to be learned from the RichEdit text pointer design: keep things simple and easy to read. Avoid multiple inheritance (except for interfaces) unless it dramatically improves your model. And in any event, avoid diamond inheritance!