Blog de Frédéric

To content | To menu | To search

Tag - latex

Entries feed - Comments feed

Saturday, April 16 2016

OpenType MATH in HarfBuzz

TL;DR:

  • Work is in progress to add OpenType MATH support in HarfBuzz and will be instrumental for many math rendering engines relying on that library, including browsers.

  • For stretchy operators, an efficient way to determine the required number of glyphs and their overlaps has been implemented and is described here.

In the context of Igalia browser team effort to implement MathML support using TeX rules and OpenType features, I have started implementation of OpenType MATH support in HarfBuzz. This table from the OpenType standard is made of three subtables:

  • The MathConstants table, which contains layout constants. For example, the thickness of the fraction bar of ab\frac{a}{b}.

  • The MathGlyphInfo table, which contains glyph properties. For instance, the italic correction indicating how slanted an integral is e.g. to properly place the subscript in D\displaystyle\displaystyle\int_{D}.

  • The MathVariants table, which provides larger size variants for a base glyph or data to build a glyph assembly. For example, either a larger parenthesis or a assembly of U+239B, U+239C, U+239D to write something like:

      (abcdefgh\left(\frac{\frac{\frac{a}{b}}{\frac{c}{d}}}{\frac{\frac{e}{f}}{\frac{g}{h}}}\right.  

Code to parse this table was added to Gecko and WebKit two years ago. The existing code to build glyph assembly in these Web engines was adapted to use the MathVariants data instead of only private tables. However, as we will see below the MathVariants data to build glyph assembly is more general, with arbitrary number of glyphs or with additional constraints on glyph overlaps. Also there are various fallback mechanisms for old fonts and other bugs that I think we could get rid of when we move to OpenType MATH fonts only.

In order to add MathML support in Blink, it is very easy to import the OpenType MATH parsing code from WebKit. However, after discussions with some Google developers, it seems that the best option is to directly add support for this table in HarfBuzz. Since this library is used by Gecko, by WebKit (at least the GTK port) and by many other applications such as Servo, XeTeX or LibreOffice it make senses to share the implementation to improve math rendering everywhere.

The idea for HarfBuzz is to add an API to

  1. 1.

    Expose data from the MathConstants and MathGlyphInfo.

  2. 2.

    Shape stretchy operators to some target size with the help of the MathVariants.

It is then up to a higher-level math rendering engine (e.g. TeX or MathML rendering engines) to beautifully display mathematical formulas using this API. The design choice for exposing MathConstants and MathGlyphInfo is almost obvious from the reading of the MATH table specification. The choice for the shaping API is a bit more complex and discussions is still in progress. For example because we want to accept stretching after glyph-level mirroring (e.g. to draw RTL clockwise integrals) we should accept any glyph and not just an input Unicode strings as it is the case for other HarfBuzz shaping functions. This shaping also depends on a stretching direction (horizontal/vertical) or on a target size (and Gecko even currently has various ways to approximate that target size). Finally, we should also have a way to expose italic correction for a glyph assembly or to approximate preferred width for Web rendering engines.

As I mentioned at the beginning, the data and algorithm to build glyph assembly is the most complex part of the OpenType MATH and deserves a special interest. The idea is that you have a list of n1n\geq 1 glyphs available to build the assembly. For each 0in-10\leq i\leq n-1, the glyph gig_{i} has advance aia_{i} in the stretch direction. Each gig_{i} has straight connector part at its start (of length sis_{i}) and at its end (of length eie_{i}) so that we can align the glyphs on the stretch axis and glue them together. Also, some of the glyphs are “extenders” which means that they can be repeated 0, 1 or more times to make the assembly as large as possible. Finally, the end/start connectors of consecutive glyphs must overlap by at least a fixed value omino_{\mathrm{min}} to avoid gaps at some resolutions but of course without exceeding the length of the corresponding connectors. This gives some flexibility to adjust the size of the assembly and get closer to the target size tt.

gig_{i}

sis_{i}

eie_{i}

aia_{i}

gi+1g_{i+1}

si+1s_{i+1}

ei+1e_{i+1}

ai+1a_{i+1}

oi,i+1o_{i,i+1}

Figure 0.1: Two adjacent glyphs in an assembly

To ensure that the width/height is distributed equally and the symmetry of the shape is preserved, the MATH table specification suggests the following iterative algorithm to determine the number of extenders and the connector overlaps to reach a minimal target size tt:

  1. 1.

    Assemble all parts by overlapping connectors by maximum amount, and removing all extenders. This gives the smallest possible result.

  2. 2.

    Determine how much extra width/height can be distributed into all connections between neighboring parts. If that is enough to achieve the size goal, extend each connection equally by changing overlaps of connectors to finish the job.

  3. 3.

    If all connections have been extended to minimum overlap and further growth is needed, add one of each extender, and repeat the process from the first step.

We note that at each step, each extender is repeated the same number of times r0r\geq 0. So if IExtI_{\mathrm{Ext}} (respectively INonExtI_{\mathrm{NonExt}}) is the set of indices 0in-10\leq i\leq n-1 such that gig_{i} is an extender (respectively is not an extender) we have ri=rr_{i}=r (respectively ri=1r_{i}=1). The size we can reach at step rr is at most the one obtained with the minimal connector overlap omino_{\mathrm{min}} that is

i=0N-1(j=1riai-omin)+omin=(iINonExtai-omin)+(iIExtr(ai-omin))+omin\sum_{i=0}^{N-1}\left(\sum_{j=1}^{r_{i}}{a_{i}-o_{\mathrm{min}}}\right)+o_{% \mathrm{min}}=\left(\sum_{i\in I_{\mathrm{NonExt}}}{a_{i}-o_{\mathrm{min}}}% \right)+\left(\sum_{i\in I_{\mathrm{Ext}}}r{(a_{i}-o_{\mathrm{min}})}\right)+o% _{\mathrm{min}}

We let NExt=|IExt|N_{\mathrm{Ext}}={|I_{\mathrm{Ext}}|} and NNonExt=|INonExt|N_{\mathrm{NonExt}}={|I_{\mathrm{NonExt}}|} be the number of extenders and non-extenders. We also let SExt=iIExtaiS_{\mathrm{Ext}}=\sum_{i\in I_{\mathrm{Ext}}}a_{i} and SNonExt=iINonExtaiS_{\mathrm{NonExt}}=\sum_{i\in I_{\mathrm{NonExt}}}a_{i} be the sum of advances for extenders and non-extenders. If we want the advance of the glyph assembly to reach the minimal size tt then

  SNonExt-omin(NNonExt-1)+r(SExt-ominNExt)t{S_{\mathrm{NonExt}}-o_{\mathrm{min}}\left(N_{\mathrm{NonExt}}-1\right)}+{r% \left(S_{\mathrm{Ext}}-o_{\mathrm{min}}N_{\mathrm{Ext}}\right)}\geq t  

We can assume SExt-ominNExt>0S_{\mathrm{Ext}}-o_{\mathrm{min}}N_{\mathrm{Ext}}>0 or otherwise we would have the extreme case where the overlap takes at least the full advance of each extender. Then we obtain

  rrmin=max(0,t-SNonExt+omin(NNonExt-1)SExt-ominNExt)r\geq r_{\mathrm{min}}=\max\left(0,\left\lceil\frac{t-{S_{\mathrm{NonExt}}+o_{% \mathrm{min}}\left(N_{\mathrm{NonExt}}-1\right)}}{S_{\mathrm{Ext}}-o_{\mathrm{% min}}N_{\mathrm{Ext}}}\right\rceil\right)  

This provides a first simplification of the algorithm sketched in the MATH table specification: Directly start iteration at step rminr_{\mathrm{min}}. Note that at each step we start at possibly different maximum overlaps and decrease all of them by a same value. It is not clear what to do when one of the overlap reaches omino_{\mathrm{min}} while others can still be decreased. However, the sketched algorithm says all the connectors should reach minimum overlap before the next increment of rr, which means the target size will indeed be reached at step rminr_{\mathrm{min}}.

One possible interpretation is to stop overlap decreasing for the adjacent connectors that reached minimum overlap and to continue uniform decreasing for the others until all the connectors reach minimum overlap. In that case we may lose equal distribution or symmetry. In practice, this should probably not matter much. So we propose instead the dual option which should behave more or less the same in most cases: Start with all overlaps set to omino_{\mathrm{min}} and increase them evenly to reach a same value oo. By the same reasoning as above we want the inequality

  SNonExt-o(NNonExt-1)+rmin(SExt-oNExt)t{S_{\mathrm{NonExt}}-o\left(N_{\mathrm{NonExt}}-1\right)}+{r_{\mathrm{min}}% \left(S_{\mathrm{Ext}}-oN_{\mathrm{Ext}}\right)}\geq t  

which can be rewritten

  SNonExt+rminSExt-o(NNonExt+rminNExt-1)tS_{\mathrm{NonExt}}+r_{\mathrm{min}}S_{\mathrm{Ext}}-{o\left(N_{\mathrm{NonExt% }}+{r_{\mathrm{min}}N_{\mathrm{Ext}}}-1\right)}\geq t  

We note that N=NNonExt+rminNExtN=N_{\mathrm{NonExt}}+{r_{\mathrm{min}}N_{\mathrm{Ext}}} is just the exact number of glyphs used in the assembly. If there is only a single glyph, then the overlap value is irrelevant so we can assume NNonExt+rNExt-1=N-11N_{\mathrm{NonExt}}+{rN_{\mathrm{Ext}}}-1=N-1\geq 1. This provides the greatest theorical value for the overlap oo:

  ominoomaxtheorical=SNonExt+rminSExt-tNNonExt+rminNExt-1o_{\mathrm{min}}\leq o\leq o_{\mathrm{max}}^{\mathrm{theorical}}=\frac{S_{% \mathrm{NonExt}}+r_{\mathrm{min}}S_{\mathrm{Ext}}-t}{N_{\mathrm{NonExt}}+{r_{% \mathrm{min}}N_{\mathrm{Ext}}}-1}  

Of course, we also have to take into account the limit imposed by the start and end connector lengths. So omaxo_{\mathrm{max}} must also be at most min(ei,si+1)\min{(e_{i},s_{i+1})} for 0in-20\leq i\leq n-2. But if rmin2r_{\mathrm{min}}\geq 2 then extender copies are connected and so omaxo_{\mathrm{max}} must also be at most min(ei,si)\min{(e_{i},s_{i})} for iIExti\in I_{\mathrm{Ext}}. To summarize, omaxo_{\mathrm{max}} is the minimum of omaxtheoricalo_{\mathrm{max}}^{\mathrm{theorical}}, of eie_{i} for 0in-20\leq i\leq n-2, of sis_{i} 1in-11\leq i\leq n-1 and possibly of e0e_{0} (if 0IExt0\in I_{\mathrm{Ext}}) and of of sn-1s_{n-1} (if n-1IExt{n-1}\in I_{\mathrm{Ext}}).

With the algorithm described above NExtN_{\mathrm{Ext}}, NNonExtN_{\mathrm{NonExt}}, SExtS_{\mathrm{Ext}}, SNonExtS_{\mathrm{NonExt}} and rminr_{\mathrm{min}} and omaxo_{\mathrm{max}} can all be obtained using simple loops on the glyphs gig_{i} and so the complexity is O(n)O(n). In practice nn is small: For existing fonts, assemblies are made of at most three non-extenders and two extenders that is n5n\leq 5 (incidentally, Gecko and WebKit do not currently support larger values of nn). This means that all the operations described above can be considered to have constant complexity. This is much better than a naive implementation of the iterative algorithm sketched in the OpenType MATH table specification which seems to require at worst

  r=0rmin-1NNonExt+rNExt=NNonExtrmin+rmin(rmin-1)2NExt=O(n×rmin2)\sum_{r=0}^{r_{\mathrm{min}}-1}{N_{\mathrm{NonExt}}+rN_{\mathrm{Ext}}}=N_{% \mathrm{NonExt}}r_{\mathrm{min}}+\frac{r_{\mathrm{min}}\left(r_{\mathrm{min}}-% 1\right)}{2}N_{\mathrm{Ext}}={O(n\times r_{\mathrm{min}}^{2})}  

and at least Ω(rmin)\Omega(r_{\mathrm{min}}).

One of issue is that the number of extender repetitions rminr_{\mathrm{min}} and the number of glyphs in the assembly NN can become arbitrary large since the target size tt can take large values e.g. if one writes \underbrace{\hspace{65535em}} in LaTeX. The improvement proposed here does not solve that issue since setting the coordinates of each glyph in the assembly and painting them require Θ(N)\Theta(N) operations as well as (in the case of HarfBuzz) a glyph buffer of size NN. However, such large stretchy operators do not happen in real-life mathematical formulas. Hence to avoid possible hangs in Web engines a solution is to impose a maximum limit NmaxN_{\mathrm{max}} for the number of glyph in the assembly so that the complexity is limited by the size of the DOM tree. Currently, the proposal for HarfBuzz is Nmax=128N_{\mathrm{max}}=128. This means that if each assembly glyph is 1em large you won’t be able to draw stretchy operators of size more than 128em, which sounds a quite reasonable bound. With the above proposal, rminr_{\mathrm{min}} and so NN can be determined very quickly and the cases NNmaxN\geq N_{\mathrm{max}} rejected, so that we avoid losing time with such edge cases…

Finally, because in our proposal we use the same overlap oo everywhere an alternative for HarfBuzz would be to set the output buffer size to nn (i.e. ignore r-1r-1 copies of each extender and only keep the first one). This will leave gaps that the client can fix by repeating extenders as long as oo is also provided. Then HarfBuzz math shaping can be done with a complexity in time and space of just O(n)O(n) and it will be up to the client to optimize or limit the painting of extenders for large values of NN

Tuesday, June 3 2014

TeXZilla 0.9.7 Released

Today the Mozilla MathML team released a new version of TeXZilla. You can download a release package or install it with npm. We fixed a few bugs, but there are known issues due to errors in the unicode.xml file of XML Entity Definitions for Characters or inherited from the itex2MML grammar that does not make it ready for version 1.0. The main improvements in this new release are enhancements to the public API and to the command line interface.

Stream filter

TeXZilla can now be used as a stream filter. Each TeX expressions delimited by the classical $ ... $, $$ ... $$, \[ ... \] and \( ... \) will be converted into inline or display MathML. Outside these delimiters, you can use \$ and \\ as escaped characters. We offer three ways to apply that stream filter:

  • From the command line, in a UNIX pipeline:

    cat foo-tex.html | phantomjs TeXZilla.js streamfilter > foo-mathml.html

    echo "This is a **Markdown** document with a *math formula*: $ z = \\sqrt{x^2 + y^2} $" | markdown | nodejs TeXZilla.js streamfilter | sed '1s/^/\n<!-- HTML5 document -->\n/'

    (note: this is not yet supported by slimerjs)

  • Using the TeXZilla.filterString(aString) function, for example TeXZilla.filterString("blah $x^2$ blah") will return the filtered string.

  • Using the TeXZilla.filterElement(aElement) function. This one will browse recursively the descendants of the DOM element aElement and the stream filter will be applied to the text leaves.

By introducting these TeXZilla.filter* function, it becomes tempting to use TeXZilla the same way as MathJax, that is to process all the text nodes in your Web pages and to filter the TeX strings. This is not the intended goal of TeXZilla and it is strongly discouraged: not only the MathML content won't appear in crawlers (e.g. search engines or feed readers) but also browsing all the DOM elements and appending new ones can be very slow for large documents. Instead, it is recommend to filter your static Web page with commonJS TeXZilla.js streamfilter before publishing it or to use a server-side conversion for example using the Web server mode. There are situations where you do not have other choice, though. In that case try to reduce as much as possible the number of elements being processed (see the example in the next section). Of course, if you do not care about performance and MathML availibility outside your web site, you can just use MathJax.

New Safe and Itex-Identifier parsing modes

The most notable difference between TeXZilla and itex2MML is the handling of some expressions like $xy$ or $Func$. By default, TeXZilla interprets this as individual MathML identifiers <mi>x</mi><mi>y</mi> (so that as in LaTeX, they will render in italic) while itex2MML interprets this as a single indentifier <mi>Func</mi>. It is now possible to configure TeXZilla to align with itex2MML's behavior. To do that, use TeXZilla.setItexIdentifierMode or pass the appropriate boolean to the command line. Consecutive non-basic letters (like Greek or Arabic) are still treated as individual tokens. With that change, we hope that TeXZilla could be used to parse all the commands supported by itex2MML into an equivalent output. Together with the command line stream filter, this should allow to recover all the nice itex2MML features.

Similarly, a safe mode is now available and can be enabled with TeXZilla.setSafeMode or by passing the appropriate boolean to the command line. This mode will forbid commands that could be used for XSS injections like \href. With that mode and the new TeXZilla.filterElement function, I'm now able to remove MathJax's use from my blog (users of browsers without good MathML support can still enable it or choose the lighter mathml.css stylesheet). MathJax was a bit overkill for my blog since I'm only parsing visitor comments. To illustrate how the setSafeMode and filterString functions can be used, I now just have to do

// Process TeX fragments in blog comments and comment preview.
window.addEventListener("DOMContentLoaded", function() {
  TeXZilla.setSafeMode(true);
  var toProcess =
    document.querySelectorAll("#comments > dl > dd, #comment-form dd.comment-preview");
  for (var i = 0; i < toProcess.length; i++) {
    TeXZilla.filterElement(toProcess[i]);
  }
});

Inserting equations in a 2D/WebGL canvas

The new function TeXZilla.toImage has been introduced to convert a TeX fragment into a math HTML image with a base64-encoded src attribute. Contrary to other functions of the API, this one needs to do some work to determine the image size and perform the conversion, so it is unlikely to work as expected in a non-browser context. The goal is really only to have a convenient function to generate image of mathematical formulas and insert them into a canvas context to draw 2D or 3D scientific schemas. At the moment, this works well only in Gecko. For instance,

var image =
    TeXZilla.toImage("\\vec{F} = G \\frac{m_1 m_2}{r^2} \\mathbf{u}");
image.onload = function() {
    canvas.getContext("2d").drawImage(image,
        (canvas.width - image.width) / 2,
        (canvas.height - image.height) / 2);
}

will insert a mathematical formula in the middle of a 2D canvas. Similarly, you can insert a mathematical formula as a texture in a WebGL canvas. It is recommended to pass aRoundToPowerOfTwo=true to TeXZilla.toImage, so that the image will have dimensions that are power of two. Note that the mathematical formula will be automatically centered in the middle of the generated image. See this example for how to setup the formulas with three.js and make them always oriented in the direction of the camera.

MathML in WebGL

Integration in Mozilla products

  • The CKeditor editor plugin is now integrated in MDN, so you can click on the square root logo square root logo in the editor toolbar to insert mathematical formulas. By the way, the mathml.css is now used for browsers without MathML support. See for example the pages for acosh, atanh or CSS transform.

  • The editor/ in comm-central now integrates a small input box to insert mathematical formulas, accessible from the Insert menu. This will be available in Thunderbird 31 and Seamonkey 2.28, so that you can write mathematics in your emails and in the WYSIWYG editors.

  • Various FirefoxOS Web math apps have been written and use TeXZilla. Raniere is also working on a math keyboard for FirefoxOS as a GSoC project, which will allow to type mathematics faster on mobile devices.

Tuesday, February 25 2014

TeXZilla 0.9.4 Released

update 2014/03/11: TeXZilla is now available as an npm module.

Introduction

For the past two months, the Mozilla MathML team has been working on TeXZilla, yet another LaTeX-to-MathML converter. The idea was to rely on itex2MML (which dates back from the beginning of the Mozilla MathML project) to create a LaTeX parser such that:

  • It is compatible with the itex2MML syntax and is similarly generated from a LALR(1) grammar (the goal is only to support a restricted set of core LaTeX commands for mathematics, for a more complete converter of LaTeX documents see LaTeXML).
  • It is available as a standalone Javascript module usable in all the Mozilla Web applications and add-ons (of course, it will work in non-Mozilla products too).
  • It accepts any Unicode characters and supports right-to-left mathematical notation (these are important for the world-wide aspect of the Mozilla community).

The parser is generated with the help of Jison and relies on a grammar based on the one of itex2MML and on the unicode.xml file of the XML Entity Definitions for Characters specification. As suggested by the version number, this is still in development. However, we have made enough progress to present interesting features here and get more users and developers involved.

Quick Examples

\frac{x^2}{a^2} + \frac{y^2}{b^2} = 1

x2a2+y2b2=1\frac{x^2}{a^2} + \frac{y^2}{b^2} = 1

∑_{n=1}^{+∞} \frac{1}{n^2} = \frac{π^2}{6}

n=1+1n2=π26∑_{n=1}^{+∞} \frac{1}{n^2} = \frac{π^2}{6}

س = \frac{-ب\pm\sqrt{ب^٢-٤اج}}{٢ا}

س=-ب±ب٢-٤اج٢اس = \frac{-ب\pm\sqrt{ب^٢-٤اج}}{٢ا}

Live Demo / FirefoxOS Web app

A live demo is available to let you test the LaTeX-to-MathML converter with various options and examples. For people willing to use the converter on their mobiles a FirefoxOS Web app is also available.

Using TeXZilla in a CommonJS program or Web page

TeXZilla is made of a single TeXZilla.js file with a public API to convert LaTeX to MathML or extract the TeX source from a MathML element. The converter accepts some options like inline/display mode or RTL/LTR direction of mathematics.

You can load it the standard way in any Javascript program and obtain a TeXZilla object that exposes the public API. For example in a commonJS program, to convert a TeX source into a MathML source:

  var TeXZilla = require("./TeXZilla");
  console.log(TeXZilla.toMathMLString("\\sqrt{\\frac{x}{2}+y}"));

or in a Web Page, to convert a TeX source into a MathML DOM element:

  <script type="text/javascript" src="TeXZilla.js"></script>
  ...
  var MathMLElement = TeXZilla.toMathML("\\sqrt{\\frac{x}{2}+y}");

Using TeXZilla in Mozilla Add-ons

One of the goal of TeXZilla is to be integrated in Mozilla add-ons, allowing people to write cool math applications (in particular, we would like to have an add-on for Thunderbird). A simple Firefox add-on has been written and passed the AMO review, which means that you can safely include the TeXZilla.js script in your own add-ons.

TeXZilla can be used as an addon-sdk module. However, if you intend to use features requiring a DOMParser instance (for example toMathML), you need to initialize the DOM explicitly:

  var {Cc, Ci} = require("chrome");
  TeXZilla.setDOMParser(Cc["@mozilla.org/xmlextras/domparser;1"].
                        createInstance(Ci.nsIDOMParser));

More generally, for traditional Mozilla add-ons, you can do

  TeXZilla.setDOMParser(Components.
                        classes["@mozilla.org/xmlextras/domparser;1"].
                        createInstance(Components.interfaces.nsIDOMParser));

Using TeXZilla from the command line

TeXZilla has a basic command line interface. However, since CommonJS is still being standardized, this may work inconsistently between commonjs interpreters. We have tested it on slimerjs (which uses Gecko), phantomjs and nodejs. For example you can do

  $ slimerjs TeXZilla.js parser "a^2+b^2=c^2" true
  <math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><semantics><...

or launch a Web service (see next section). We plan to implement a stream filter too so that it can behave the same as itex2MML: looking the LaTeX fragments from a text document and converting them into MathML.

Using TeXZilla as a Web Server

TeXZilla can be used as a Web Server that receives POST and GET HTTP requests with the LaTeX input and sends JSON replies with the MathML output. The typical use case is for people willing to perform some server-side LaTeX-to-MathML conversion.

For instance, to start the TeXZilla Webserver on port 7777:

  $ nodejs TeXZilla.js webserver 7777
  Web server started on http://localhost:7777

Then you can sent a POST request:

  $ curl -H "Content-Type: application/json" -X POST -d '{"tex":"x+y","display":"true"}' http://localhost:7777
  {"tex":"x+y","mathml":"<math xmlns=\"http://www.w3.org/1998/Math/MathML\"...

or a GET request:

  $ curl "http://localhost:7777/?tex=x+y&rtl=true"
  {"tex":"x+y","mathml":"<math xmlns=\"http://www.w3.org/1998/Math/MathML\"...

Note that client-side conversion is trivial using the public API, but see the next section.

Web Components Custom Element <x-tex>

We used the X-Tag library to implement a simple Web Components Custom Element <x-tex>. The idea is to have a container for LaTeX expressions like

  <x-tex dir="rtl">س = \frac{-ب\pm\sqrt{ب^٢-٤اج}}{٢ا}</x-tex>

that will be converted into MathML by TeXZilla and displayed in your browser: س=-ب±ب٢-٤اج٢اس = \frac{-ب\pm\sqrt{ب^٢-٤اج}}{٢ا}. You can set the display/dir attributes on that <x-tex> element and they will be applied to the <math> element. Instances of <x-tex> elements also have a source property that you can use to retrieve or set the LaTeX source. Of course, the MathML output will automatically be updated when dynamic changes occur. You can try this online demo.

CKEditor Plugins / Integration in MDN

Finally, we created a first version of a TeXZilla CKEditor plugin. An online demo is available here. We already sent a pull request to Kuma and we hope it will soon enable users to put mathematical mathematical formulas in MDN articles without having to paste the MathML into the source code view. It could be enhanced later with a more advanced UI.

Wednesday, January 29 2014

New MathML Firefox add-ons on AMO

While the patches for MathML integration in MediaWiki are progressively being reviewed and merged and while we are working on the support for Open Type fonts with a MATH table in Gecko, I finally found time to check the progress in Mozilla's add-on SDK. In particular, since the last time I tried (some years ago) they have introduced a cleaner interface for content scripts as well as the possibility to use XPCOM for missing features. Hence I have been able to update some of my experimental MathML add-ons. I have submitted two new add-ons to Mozilla's AMO that I hope could be useful to some people:

  • MathJax Native MathML, an add-on to force MathJax to switch to Gecko's MathML support without having to use the MathJax menu to change the output mode and works even on Websites where that menu is disabled. This also removes MathJax's automatic rescaling and inline-block span that are currently causing random rendering bugs with Gecko's native MathML (and will confuse possible future line-breaking support anyway).
    MathJax Native MathML
  • MathML Copy (at the moment only partially reviewed by the AMO team), an add-on to copy MathML and TeX into the clipboard. For MathML, two flavors are copied: the source as plain text (to paste in your favorite text editor) and the MathML as HTML (to paste in Thunderbird, MDN, any Gecko-based HTML editor etc). Copying TeX is only possible when it is provided via the standard MathML annotation method, which is the case in e.g. LaTeXML and Instiki documents as well as in Wikipedia in the future.
    MathML Copy

As usual, there is room for improvements and bug fixes, but that's a start. In particular I would be happy to get translations for the two strings of the MathML Copy add-on: "Copy MathML Formula" and "Copy TeX Source". Also, because I used the add-on SDK these add-ons are unfortunately only available for Firefox at the moment...

Wednesday, November 14 2012

Writing mathematics in emails

People writing mathematics in emails, like researchers in mathematics or physics, have probably encountered this difficulty to properly format complex mathematical formulas. The most common technique is just to write text with LaTeX-like or ASCIIMathML-like syntax and hope that the recipient will just understand the expressions. Obviously, this is not really convenient to write and read, some errors may happen and result in misunderstandings between the sender and the recipient. There are other classical issues like how to write the math (special syntax? math panel? handwriting recognition?), accessibility, rendering quality etc Of course, these issues are well-known and expected to be addressed by MathML. Since HTML is a common format for email and MathML is now part of HTML5, this is clearly a good candidate to solve the problem of mathematics in emails.

The idea to use MathML in emails is not new and was already suggested in a screenshot from the Mozilla MathML Project more than 10 years ago. Thunderbird has been able to render MathML in newsfeeds for a long time, provided that the author served his content as XHTML. I may also mention Amaya, which added support for sending a document by email in 2007, although I have never figured out how to configure it to send emails. Two years ago, I tried without success to fix a bug to display XHTML attachment inline and which could be a partial solution to the problem. Finally, one year ago Bob Mathews (from Design Science) asked me about the status of MathML in Thunderbird, and I could unfortunately not give him a better answer than what is in the present paragraph. But I hoped that MathML in HTML5 will change the situation.

Indeed, while I was working on some MathML-in-clipboard patches, I realized that it is now possible to paste MathML inside an email. After further discussions with Bob Mathews, Paul Topping & David Carlisle, I've been able to do more testing. The situation is the following:

  • Thunderbird can send emails containing MathML and render them correctly.
  • Apple Mail (used in Mac OS X and iOS) can receive emails containing MathML and should render them correctly since MathML is enabled in Apple's products.
  • Microsoft Outlook does not render MathML in emails. However the rendering is based on Microsoft Word which has MathML support. Basically, Thunderbird sends MathML in HTML5 and Word displays MathML after an XSLT conversion into Microsoft's own OMML format. Hence Microsoft might be able to do something not too complicated to make the whole stuff work.
  • Web Mail Clients like Gmail or Zimbra seem to filter the MathML in emails and so do not render it correctly. If this filter is removed, they can certainly let the browser do the rendering job or use MathJax to do so.

Now let's consider a basic example about how to send MathML in emails, using Thunderbird. One of the issue is that Gecko's editor has really been designed with only HTML-editing features in mind and if you start editing MathML formulas you are going to get some invalid markup messages or other troubles. And of course Thunderbird does not have any math panel or other WYSIWYG tools to write mathematics. However it might not be too difficult to write an add-on to add MathML editing features in Thunderbird like BlueGriffon's add-on or Firemath (these add-on might even be installed without too much trouble in Thunderbird). Or one can of course use one of the existing tools to generate MathML and just paste the code in Thunderbird. Here I'm going to use the itex2MML filter. So first write your mail in a separate text file:

mail.txt

Hi Matthew, I just read your email about the behavior of the factorial function and harmonic series for large values of $n$. If you denote by $\gamma \approx 0.5772156649$ the Euler's number, by $e \approx 2.7182818284$ the Euler's constant then you have the well-known Stirling's approximation:

$$n! = \sqrt{2 \pi n} {\left( \frac{n}{e} \right)}^n \left( 1 + O \left( \frac{1}{n} \right) \right)$$

where of course I use the classical constant $\pi \approx 3.1415926535$. We also have the following asymptotic expansion:

$$\sum_{k=1}^n \frac{1}{k} = \ln(n) + \gamma + O \left( \frac{1}{n} \right)$$

I hope that this answers your question.

then call itex2MML to replace the LaTeX code by <math> elements:

cat mail.txt | itex2MML > mail.html

Write a new mail in Thunderbird and use the menu "Insert ; HTML" . David Carlisle told me that you have to be sure that the "send as HTML" is enabled if it does not show up. Then just copy the mail.html source into the window:

insert MathML

Once you click the insert button, the MathML should be automatically rendered in Thunderbird:

MathML in Thunderbird

When your email is ready, just send it as usual! Here is how it appears on an iPod:

MathML in Apple Mail

Let's just hope that other mail clients will support MathML in emails!

- page 1 of 2