Aria Beingessner
September 28th, 2019
Rendering text, how hard could it be? As it turns out, incredibly hard! To my knowledge, literally no system renders text “perfectly”. It’s all best-effort, although some efforts are more important than others.
I’ll be assuming you want to support arbitrary text provided by users with custom fonts, colors, and styles with line-wrapping and support for text-selection. Basically the minimum required to properly display a simple rich-text document, terminal, webpage, or anything else.
The overarching theme here will be: there are no consistent right answers, everything is way more important than you think, and everything affects everything else.
The topics I focus on here have no particular rhyme or reason, they’re just the ones that come to mind after a few years of working on rendering in Firefox. For instance, I don’t spend much time talking about the challenges of text-segmentation or managing the different platform-specific text libraries, because I don’t look at that much.
Text is complicated and english is bad at expressing these nuances. For the purpose of this document, I will try to stick to the following terms. Note that these words aren’t “right”, I just find them useful for communicating the key concepts to native english speakers who don’t have backgrounds in linguistics.
Characters:
- Scalar: A Unicode Scalar, the “smallest unit” unicode describes (AKA a code point).
- Character: A Unicode Extended Grapheme Cluster (EGC), the “biggest unit” unicode describes (potentially composed of multiple scalars).
- Glyph: An atomic unit of rendering yielded by the font. Generally this will have a unique ID in the font.
- Ligature: A glyph that is made up of several scalars, and potentially even several characters (native speakers may or may not think of a ligature as multiple “characters”, but to the font it’s just one “character”).
- Emoji: A “full color” glyph. 🙈🙉🙊
Fonts:
- Font: A document that maps characters to glyphs.
- Script: The set of glyphs that make up some language (fonts tend to implement particular scripts).
- Cursive Script: Any script where glyphs touch and flow into each other (like Arabic).
- Color: RGB and Alpha values for fonts (alpha isn’t needed for some usecases, but it’s interesting).
- Style: Bold and Italics modifiers for fonts (hinting, aliasing, and other settings tend to also get crammed in here in practical implementations).
Just so you have an idea for how a typical text-rendering pipeline works, here’s a quick sketch:
- Styling (parse markup, query system for fonts)
- Layout (break text into lines)
- Shaping (compute the glyphs in a line and their positions)
- Rasterization (rasterize needed glyphs into an atlas/cache)
- Composition (copy glyphs from the atlas to their desired positions)
Unfortunately, these steps aren’t as clean as they might seem.
Most fonts don’t actually provide every glyph in existence. There’s too many glyphs, so fonts are usually designed to only implement a particular script. End users usually don’t know or care about this, and so a robust system must cascade into other fonts when characters aren’t available.
For instance, even though the markup of the following text doesn’t suggest the presence of multiple fonts, drawing it correctly on all systems absolutely requires it: hello 😺 मनीष بسم 好. This is dangerously close to Step 1 (Styling) depending on the results of Step 3 (Shaping)!
(Alternatively, you can take the Noto approach and use a single Uber Font that contains every character ever. Although that means users can’t configure the font, and you can’t provide a “native” text experience to users on all platforms. But let’s assume you want the more robust solution.)
Similarly, layout requires you to know how much space each part of your text takes up, but this is only known once you shape the text! Step 2 depends on the results of Step 3?
Shaping absolutely depends on you knowing your layout and styling, so we seem to be stuck. What do we do?
First off, styling gets to cheat. Although what we really want from a font is full glyphs, styling only needs to ask about scalars. If a font doesn’t properly support a script it shouldn’t claim to know anything about the scalars that make up that script. So we can easily find the “best” font as follows:
For every character (EGC) in our text, keep asking each font in our cascade if it knows about all the scalars that make up that character, and use it if it does. If we get to the end of the cascade with no providers, then we yield tofu ( , a missing glyph indicator).
In the case of emoji, you’ve probably seen the failure mode of this process before! Because some emoji are actually ligatures of several simpler emoji, a font may successfully report support for the character while only yielding the components. So 🤦🏿♀️ may literally appear as 🤦 🏿 ♀ if the font is “too old” to know about the new ligature. This can also happen if your unicode implementation is “too old” to know about a character, causing the styling system to accept a partial match in the font.
So now we know exactly what fonts we’ll use without looking at layout or shape (although shaping might change our colors, more on that in later sections). Can we untie layout and shape as well? Nope! Things like paragraph breaks give you a nice hard break on lines, but the only way to do wrapping is to iteratively do shaping!
You have to assume that your text fits on a single line and shape it until you run out of space. At that point you can perform layout operations and figure out where to break the text and start the next line. Repeat until everything is shaped and laid out.
Coming from english, you might think ligatures are just fancy fluff. I mean, who really cares if “æ” is written as “ae”? Well, as it turns out, some languages are basically entirely ligatures. For instance “ड्ड بسم” has individual characters of “ड् ड ب س م”. If you’re viewing this in a competent text-rendering system (any of the major browsers), those two strings should look very different.
And no: this isn’t about the difference between unicode scalars and extended grapheme clusters. If you ask a unicode-robust system (such as Swift) for the extended grapheme clusters of that string, it will spit out those 5 characters!
The shape of a character depends on its neighbours: you cannot correctly draw text character-by-character.
Which is to say, you must use a shaping library. The industry standard for this is HarfBuzz, and it’s extremely hard to implement your own. Use HarfBuzz.
Cursive scripts frequently have their glyphs intersect to avoid seams, and that