Blog de Frédéric

To content | To menu | To search

Tag - fonts

Entries feed - Comments feed

Saturday, April 16 2016

OpenType MATH in HarfBuzz

TL;DR:

  • Work is in progress to add OpenType MATH support in HarfBuzz and will be instrumental for many math rendering engines relying on that library, including browsers.

  • For stretchy operators, an efficient way to determine the required number of glyphs and their overlaps has been implemented and is described here.

In the context of Igalia browser team effort to implement MathML support using TeX rules and OpenType features, I have started implementation of OpenType MATH support in HarfBuzz. This table from the OpenType standard is made of three subtables:

  • The MathConstants table, which contains layout constants. For example, the thickness of the fraction bar of ab\frac{a}{b}.

  • The MathGlyphInfo table, which contains glyph properties. For instance, the italic correction indicating how slanted an integral is e.g. to properly place the subscript in D\displaystyle\displaystyle\int_{D}.

  • The MathVariants table, which provides larger size variants for a base glyph or data to build a glyph assembly. For example, either a larger parenthesis or a assembly of U+239B, U+239C, U+239D to write something like:

      (abcdefgh\left(\frac{\frac{\frac{a}{b}}{\frac{c}{d}}}{\frac{\frac{e}{f}}{\frac{g}{h}}}\right.  

Code to parse this table was added to Gecko and WebKit two years ago. The existing code to build glyph assembly in these Web engines was adapted to use the MathVariants data instead of only private tables. However, as we will see below the MathVariants data to build glyph assembly is more general, with arbitrary number of glyphs or with additional constraints on glyph overlaps. Also there are various fallback mechanisms for old fonts and other bugs that I think we could get rid of when we move to OpenType MATH fonts only.

In order to add MathML support in Blink, it is very easy to import the OpenType MATH parsing code from WebKit. However, after discussions with some Google developers, it seems that the best option is to directly add support for this table in HarfBuzz. Since this library is used by Gecko, by WebKit (at least the GTK port) and by many other applications such as Servo, XeTeX or LibreOffice it make senses to share the implementation to improve math rendering everywhere.

The idea for HarfBuzz is to add an API to

  1. 1.

    Expose data from the MathConstants and MathGlyphInfo.

  2. 2.

    Shape stretchy operators to some target size with the help of the MathVariants.

It is then up to a higher-level math rendering engine (e.g. TeX or MathML rendering engines) to beautifully display mathematical formulas using this API. The design choice for exposing MathConstants and MathGlyphInfo is almost obvious from the reading of the MATH table specification. The choice for the shaping API is a bit more complex and discussions is still in progress. For example because we want to accept stretching after glyph-level mirroring (e.g. to draw RTL clockwise integrals) we should accept any glyph and not just an input Unicode strings as it is the case for other HarfBuzz shaping functions. This shaping also depends on a stretching direction (horizontal/vertical) or on a target size (and Gecko even currently has various ways to approximate that target size). Finally, we should also have a way to expose italic correction for a glyph assembly or to approximate preferred width for Web rendering engines.

As I mentioned at the beginning, the data and algorithm to build glyph assembly is the most complex part of the OpenType MATH and deserves a special interest. The idea is that you have a list of n1n\geq 1 glyphs available to build the assembly. For each 0in-10\leq i\leq n-1, the glyph gig_{i} has advance aia_{i} in the stretch direction. Each gig_{i} has straight connector part at its start (of length sis_{i}) and at its end (of length eie_{i}) so that we can align the glyphs on the stretch axis and glue them together. Also, some of the glyphs are “extenders” which means that they can be repeated 0, 1 or more times to make the assembly as large as possible. Finally, the end/start connectors of consecutive glyphs must overlap by at least a fixed value omino_{\mathrm{min}} to avoid gaps at some resolutions but of course without exceeding the length of the corresponding connectors. This gives some flexibility to adjust the size of the assembly and get closer to the target size tt.

gig_{i}

sis_{i}

eie_{i}

aia_{i}

gi+1g_{i+1}

si+1s_{i+1}

ei+1e_{i+1}

ai+1a_{i+1}

oi,i+1o_{i,i+1}

Figure 0.1: Two adjacent glyphs in an assembly

To ensure that the width/height is distributed equally and the symmetry of the shape is preserved, the MATH table specification suggests the following iterative algorithm to determine the number of extenders and the connector overlaps to reach a minimal target size tt:

  1. 1.

    Assemble all parts by overlapping connectors by maximum amount, and removing all extenders. This gives the smallest possible result.

  2. 2.

    Determine how much extra width/height can be distributed into all connections between neighboring parts. If that is enough to achieve the size goal, extend each connection equally by changing overlaps of connectors to finish the job.

  3. 3.

    If all connections have been extended to minimum overlap and further growth is needed, add one of each extender, and repeat the process from the first step.

We note that at each step, each extender is repeated the same number of times r0r\geq 0. So if IExtI_{\mathrm{Ext}} (respectively INonExtI_{\mathrm{NonExt}}) is the set of indices 0in-10\leq i\leq n-1 such that gig_{i} is an extender (respectively is not an extender) we have ri=rr_{i}=r (respectively ri=1r_{i}=1). The size we can reach at step rr is at most the one obtained with the minimal connector overlap omino_{\mathrm{min}} that is

i=0N-1(j=1riai-omin)+omin=(iINonExtai-omin)+(iIExtr(ai-omin))+omin\sum_{i=0}^{N-1}\left(\sum_{j=1}^{r_{i}}{a_{i}-o_{\mathrm{min}}}\right)+o_{% \mathrm{min}}=\left(\sum_{i\in I_{\mathrm{NonExt}}}{a_{i}-o_{\mathrm{min}}}% \right)+\left(\sum_{i\in I_{\mathrm{Ext}}}r{(a_{i}-o_{\mathrm{min}})}\right)+o% _{\mathrm{min}}

We let NExt=|IExt|N_{\mathrm{Ext}}={|I_{\mathrm{Ext}}|} and NNonExt=|INonExt|N_{\mathrm{NonExt}}={|I_{\mathrm{NonExt}}|} be the number of extenders and non-extenders. We also let SExt=iIExtaiS_{\mathrm{Ext}}=\sum_{i\in I_{\mathrm{Ext}}}a_{i} and SNonExt=iINonExtaiS_{\mathrm{NonExt}}=\sum_{i\in I_{\mathrm{NonExt}}}a_{i} be the sum of advances for extenders and non-extenders. If we want the advance of the glyph assembly to reach the minimal size tt then

  SNonExt-omin(NNonExt-1)+r(SExt-ominNExt)t{S_{\mathrm{NonExt}}-o_{\mathrm{min}}\left(N_{\mathrm{NonExt}}-1\right)}+{r% \left(S_{\mathrm{Ext}}-o_{\mathrm{min}}N_{\mathrm{Ext}}\right)}\geq t  

We can assume SExt-ominNExt>0S_{\mathrm{Ext}}-o_{\mathrm{min}}N_{\mathrm{Ext}}>0 or otherwise we would have the extreme case where the overlap takes at least the full advance of each extender. Then we obtain

  rrmin=max(0,t-SNonExt+omin(NNonExt-1)SExt-ominNExt)r\geq r_{\mathrm{min}}=\max\left(0,\left\lceil\frac{t-{S_{\mathrm{NonExt}}+o_{% \mathrm{min}}\left(N_{\mathrm{NonExt}}-1\right)}}{S_{\mathrm{Ext}}-o_{\mathrm{% min}}N_{\mathrm{Ext}}}\right\rceil\right)  

This provides a first simplification of the algorithm sketched in the MATH table specification: Directly start iteration at step rminr_{\mathrm{min}}. Note that at each step we start at possibly different maximum overlaps and decrease all of them by a same value. It is not clear what to do when one of the overlap reaches omino_{\mathrm{min}} while others can still be decreased. However, the sketched algorithm says all the connectors should reach minimum overlap before the next increment of rr, which means the target size will indeed be reached at step rminr_{\mathrm{min}}.

One possible interpretation is to stop overlap decreasing for the adjacent connectors that reached minimum overlap and to continue uniform decreasing for the others until all the connectors reach minimum overlap. In that case we may lose equal distribution or symmetry. In practice, this should probably not matter much. So we propose instead the dual option which should behave more or less the same in most cases: Start with all overlaps set to omino_{\mathrm{min}} and increase them evenly to reach a same value oo. By the same reasoning as above we want the inequality

  SNonExt-o(NNonExt-1)+rmin(SExt-oNExt)t{S_{\mathrm{NonExt}}-o\left(N_{\mathrm{NonExt}}-1\right)}+{r_{\mathrm{min}}% \left(S_{\mathrm{Ext}}-oN_{\mathrm{Ext}}\right)}\geq t  

which can be rewritten

  SNonExt+rminSExt-o(NNonExt+rminNExt-1)tS_{\mathrm{NonExt}}+r_{\mathrm{min}}S_{\mathrm{Ext}}-{o\left(N_{\mathrm{NonExt% }}+{r_{\mathrm{min}}N_{\mathrm{Ext}}}-1\right)}\geq t  

We note that N=NNonExt+rminNExtN=N_{\mathrm{NonExt}}+{r_{\mathrm{min}}N_{\mathrm{Ext}}} is just the exact number of glyphs used in the assembly. If there is only a single glyph, then the overlap value is irrelevant so we can assume NNonExt+rNExt-1=N-11N_{\mathrm{NonExt}}+{rN_{\mathrm{Ext}}}-1=N-1\geq 1. This provides the greatest theorical value for the overlap oo:

  ominoomaxtheorical=SNonExt+rminSExt-tNNonExt+rminNExt-1o_{\mathrm{min}}\leq o\leq o_{\mathrm{max}}^{\mathrm{theorical}}=\frac{S_{% \mathrm{NonExt}}+r_{\mathrm{min}}S_{\mathrm{Ext}}-t}{N_{\mathrm{NonExt}}+{r_{% \mathrm{min}}N_{\mathrm{Ext}}}-1}  

Of course, we also have to take into account the limit imposed by the start and end connector lengths. So omaxo_{\mathrm{max}} must also be at most min(ei,si+1)\min{(e_{i},s_{i+1})} for 0in-20\leq i\leq n-2. But if rmin2r_{\mathrm{min}}\geq 2 then extender copies are connected and so omaxo_{\mathrm{max}} must also be at most min(ei,si)\min{(e_{i},s_{i})} for iIExti\in I_{\mathrm{Ext}}. To summarize, omaxo_{\mathrm{max}} is the minimum of omaxtheoricalo_{\mathrm{max}}^{\mathrm{theorical}}, of eie_{i} for 0in-20\leq i\leq n-2, of sis_{i} 1in-11\leq i\leq n-1 and possibly of e0e_{0} (if 0IExt0\in I_{\mathrm{Ext}}) and of of sn-1s_{n-1} (if n-1IExt{n-1}\in I_{\mathrm{Ext}}).

With the algorithm described above NExtN_{\mathrm{Ext}}, NNonExtN_{\mathrm{NonExt}}, SExtS_{\mathrm{Ext}}, SNonExtS_{\mathrm{NonExt}} and rminr_{\mathrm{min}} and omaxo_{\mathrm{max}} can all be obtained using simple loops on the glyphs gig_{i} and so the complexity is O(n)O(n). In practice nn is small: For existing fonts, assemblies are made of at most three non-extenders and two extenders that is n5n\leq 5 (incidentally, Gecko and WebKit do not currently support larger values of nn). This means that all the operations described above can be considered to have constant complexity. This is much better than a naive implementation of the iterative algorithm sketched in the OpenType MATH table specification which seems to require at worst

  r=0rmin-1NNonExt+rNExt=NNonExtrmin+rmin(rmin-1)2NExt=O(n×rmin2)\sum_{r=0}^{r_{\mathrm{min}}-1}{N_{\mathrm{NonExt}}+rN_{\mathrm{Ext}}}=N_{% \mathrm{NonExt}}r_{\mathrm{min}}+\frac{r_{\mathrm{min}}\left(r_{\mathrm{min}}-% 1\right)}{2}N_{\mathrm{Ext}}={O(n\times r_{\mathrm{min}}^{2})}  

and at least Ω(rmin)\Omega(r_{\mathrm{min}}).

One of issue is that the number of extender repetitions rminr_{\mathrm{min}} and the number of glyphs in the assembly NN can become arbitrary large since the target size tt can take large values e.g. if one writes \underbrace{\hspace{65535em}} in LaTeX. The improvement proposed here does not solve that issue since setting the coordinates of each glyph in the assembly and painting them require Θ(N)\Theta(N) operations as well as (in the case of HarfBuzz) a glyph buffer of size NN. However, such large stretchy operators do not happen in real-life mathematical formulas. Hence to avoid possible hangs in Web engines a solution is to impose a maximum limit NmaxN_{\mathrm{max}} for the number of glyph in the assembly so that the complexity is limited by the size of the DOM tree. Currently, the proposal for HarfBuzz is Nmax=128N_{\mathrm{max}}=128. This means that if each assembly glyph is 1em large you won’t be able to draw stretchy operators of size more than 128em, which sounds a quite reasonable bound. With the above proposal, rminr_{\mathrm{min}} and so NN can be determined very quickly and the cases NNmaxN\geq N_{\mathrm{max}} rejected, so that we avoid losing time with such edge cases…

Finally, because in our proposal we use the same overlap oo everywhere an alternative for HarfBuzz would be to set the output buffer size to nn (i.e. ignore r-1r-1 copies of each extender and only keep the first one). This will leave gaps that the client can fix by repeating extenders as long as oo is also provided. Then HarfBuzz math shaping can be done with a complexity in time and space of just O(n)O(n) and it will be up to the client to optimize or limit the painting of extenders for large values of NN

Saturday, June 23 2012

Mozilla MathML Project: installation of mathematical fonts

Just before I posted my blog entry about mathematical fonts, I received a mail from Dave Barton about which mathematical fonts to use in browsers. We also have this kind of discussion at MathJax. So I think this issue is important and I would like to give more updates:

  • Mac OS X: STIX fonts are currently installed by default on Lion (and not shipped in Webkit browsers as I claimed in my previous post).
  • Linux: I asked Mike Hommey to update Iceweasel's font dependencies. To all the package maintainers of Linux distributions: Firefox no longer uses Mathematica or TeX fonts since at least version 4.0 ; the latest versions rely on MathJax fonts, STIX fonts and Asana Math.
  • Windows: Recent versions have Cambria Math installed but Firefox can not use it yet. I just read this article about MSI for deploying Fonts. The approach seems pretty easy: it uses the free software WiX to create a font installer from a small XML file. Any volunteer to create such an installer for MathJax fonts, STIX fonts and Asana Math?
  • Android: The situation is not really clear to me, but according to Bill Gianopoulos it is difficult to install fonts on some devices and the MathML font add-on seems a good alternative. If I understand correctly, this add-on works on native Android build but not XUL-based ones. Bill's builds contain an experimental patch to make a modified version of the add-on usable in XUL-based build too.
  • About warning that mathematical fonts are missing and suggesting to install them: my approach has not raised much enthusiasm so far, so I'm not sure the patches will be integrated in Firefox.

Saturday, April 21 2012

Mozilla MathML Project: about mathematical fonts

After yet another feedback about the fact that Mozilla's MathML rendering is ugly without appropriate fonts installed, more progress has be made towards our font improvement plan. In this post I will talk about what is going on, even if that is still work-in-progress. I will hopefully have the occasion to describe the evolution in my next status report for Gecko 13, 14 and 15. I will also mention other activities in the MathML project.

The Big Picture

First it may be worth to recall again why appropriate fonts are necessary to render mathematical expressions correctly. One of the most obvious reason is that there are exotic mathematical symbols not contained in mainstream fonts. The style of the font, its spacing etc are also more subtle aspects (that may depend on tastes or habits) making people feel that a formula is "good-looking".

More importantly, some symbols (e.g. sums or integrals) should be displayed larger in some situations while others (e.g. braces or roots) have to stretch to cover subexpressions. For that purpose, one needs fonts with size variants as well as glyphs to build characters by parts.

Since Firefox 3.6, the Mozilla MathML team has made several efforts to improve the font support. The table below compares the rendering of a Nightly build (with work-in-progress patches) against the default rendering of Firefox 3.6.

Default (Firefox 3.6) No fonts (Firefox 3.6)
Default No fonts
Asana, MathJax, STIX Asana Font MathJax fonts STIX fonts

The Default Rendering

Let's first consider the default rendering. Note that some generic Unicode code points exist to stretch characters and are used if possible. Hence, this rendering may be worse or better than the one shown above, depending on which fonts are available on the system.

Additionally since bug 414277 is fixed, a scale transform may also be applied to stretch a character if it is not possible to construct it from the fonts available (compare the current default version with the one of Firefox 3.6). Note that for some less often used operators (e.g. quadruple arrow) this latter technique is the only way to stretch them because no mathematical fonts provide the necessary glyphs.

Although the default rendering is acceptable in some situations like elementary math, specific fonts are indispensable to obtain a decent rendering with more advanced notations. There are essentially two directions possible: support more fonts and/or ensure recommended fonts are available. That is the subject of the two next sections.

Generic Support for Open Type fonts

The first direction is to support as much fonts as possible, to increase the chance to find one that can be used. Many mathematical fonts are installed by default on most operating systems, like Cambria Math or Symbol. These fonts allow to display the most common mathematical symbols and to use Unicode constructions to stretch operators.

To obtain complete support of a given font, we currently use our own tables to describe the size variants and constructions. However, we have to write these tables for each font and so this method does not provide a generic way to support mathematical fonts. MathJax fonts will be supported in Firefox 13 and become the default. For the moment, STIX fonts are completely supported since Firefox 3.7 and Asana font is supported since Firefox 7.

Many new Open Type mathematical fonts have recently appeared, but there are technical limitations that prevent us to use our current approach. The big advantage of these recent fonts is that they contain a table describing the size variants and constructions. Hence a worthwhile work could be to refactor our code for stretchy operators and to be able to directly read the tables from the font files. The corresponding Bugzilla entries are bug 663740 and bug 407059. That would give us a generic support for all Open Type mathematical fonts that have the relevant table. This is also interesting to provide more font styles to the users. Here is a list of the Open Type mathematical fonts I am aware of:

  • Cambria Math
  • Neo Euler
  • STIX 1.1
  • Latin Modern Math
  • Lucida fonts
  • Asana Math

Additionally, these fonts actually contain a wealth of information allowing accurate positioning and spacing for the mathematical formulas. In the future we could use the Open Type Math table instead of our heuristic rules inspired from TeX. That could greatly improve the rendering of the formulas.

Ensuring Availability of Recommended Fonts

The second direction is to ensure that recommended fonts are available. Indeed, the main fonts on which we rely are generally not installed by default on the operating systems. Moreover, the installation procedure may not be convenient for all users and there are even some platforms like mobile devices where it is difficult to install new fonts.

We are working to make Web fonts usable to stretch MathML operators. When this bug is fixed, it will be possible either to ship some fonts inside the browser (as Webkit folks do) or available from a public server (as MathJax folks do). However, size and licence may be a problem for the former option. As for the latter, MathJax experience shows that mathematical fonts are generally too big to make this efficient (download time, caching etc), especially on mobile devices.

Another option is to make the installation easier. The MathML-fonts add-on has been written for that purpose. The users will just have to install these mathematical fonts like any other restartless Firefox add-on. The add-on essentially attaches a stylesheet to each page visited. This stylesheet contains @font-face rules to load the mathematical fonts contained in the add-on. Again, the work is not finished yet: because of the bug mentioned above, these rules do not work for stretchy MathML operators yet. Moreover, whatever the technique used, it seems impossible to register the stylesheet with e10s Fennec. That is annoying, since as said above this add-on is particularly important for mobile devices. I expect the first issue to be fixed in Firefox 15 but the second one is a bit out of the scope of the Mozilla MathML team...

The plan is then to display an information bar suggesting to install the add-on when a page containing MathML is visited. Actually, this message will only appear when we detect a stretch failure for MathML operators. Once this bug fixed, that will end a series of regular reports that started more than 10 years ago!

Friday, June 11 2010

stixfonts.org was last updated on 28 May. The next update will occur mid-June.

Two weeks ago, STIX Fonts Version 1.0 was finally released! Hence after about 15 years of work and regular announce of upcoming release, we now have a comprehensive set of Unicode fonts for mathematical symbols. Hopefully, this will greatly improve the rendering of MathML in browsers. However, this new version is not entirely compatible with the current version of Firefox, because some font files were renamed. A patch landed yesterday on trunk to fix this issue but I don't know whether it will also be pushed to old branch. For those who want to test the new STIX Fonts with the current version of Firefox, you just need to install STIX 1.0 and keep the former STIXSiz*.otf files from STIX Beta...