Blog de Frédéric

To content | To menu | To search

Wednesday, June 18 2014

Mozilla MathML Add-ons

Four years ago I started to write some MathML add-ons using Jetpack 0.8, now called Add-on SDK. I've recently made progress on this project, so that all the initial features are now available as Firefox add-ons (my initial hope was that the Add-on SDK would eventually be compatible with all Gecko browsers but unfortunately that still does not seem to be the case at the moment). The Mathzilla collection is available on AMO but some of the add-ons are still undergoing review. Here is an overview:

  • The math editor feature is now provided by the TeXZilla add-on. The Arabic math support I experimented a bit later is also available.

  • The conversion of content MathML using David Carlisle's XSLT stylesheet is now in its own MathML-ctop add-on. There is another similar add-on to add MathML3 features missing in Gecko called MathML-mml3ff. Note that these add-ons do not rely on the Add-on SDK and will work in any Gecko browsers. However, they should probably be improved.

  • Another add-on that does not rely on the Add-on SDK is the one adding mathematical fonts called MathML-fonts. I uploaded version 2.0 to use the new OpenType MATH fonts supported in Gecko 31, but I hope that it will no longer be necessary in the future (more on this later).

  • The conversion of PNG images into MathML is now provided by the Image to MathML add-on. At the moment, it is still experimental, see the details on mozilla.dev.tech.mathml if you want to help. It only works for some Web sites using LaTeX in alt text but I wish I can find a solution for Wolfram Websites.

  • Since many Web sites are using MathJax and because in the meantime MathJax moved to its slow HTML-CSS output by default I had to write an add-on to force MathJax to use native MathML, which is available here. Actually, it's even better since it disables the mml2jax preprocessor to avoid useless work by MathJax for Web sites that already use MathML in the source code. It also prevents the MathJax menu to override the browser user interface (note that the three add-ons below provide some UI features similar to what one can find in MathJax).

  • The feature to copy a MathML formula is now provided by the MathML Copy add-on. Note that it actually copies two flavors (text and html). It is also possible to copy the original TeX source when it is provided (e.g. on MDN).

  • A new MathML Zoom add-on provides a zooming feature similar to what MathJax does.

  • A new MathML Font Settings add-on allows to configure font-family and font-size of mathematics similar to what MathJax provides. Note however, that the list of font-family choices in the context menu is based on the OpenType MATH fonts that will only be supported in Gecko 31.

I believe splitting the original Mathzilla add-on into many add-ons gives more flexibility to let people choose the desired features. As usual, help to localize the add-ons is very welcome.

Tuesday, June 3 2014

TeXZilla 0.9.7 Released

Today the Mozilla MathML team released a new version of TeXZilla. You can download a release package or install it with npm. We fixed a few bugs, but there are known issues due to errors in the unicode.xml file of XML Entity Definitions for Characters or inherited from the itex2MML grammar that does not make it ready for version 1.0. The main improvements in this new release are enhancements to the public API and to the command line interface.

Stream filter

TeXZilla can now be used as a stream filter. Each TeX expressions delimited by the classical $ ... $, $$ ... $$, \[ ... \] and \( ... \) will be converted into inline or display MathML. Outside these delimiters, you can use \$ and \\ as escaped characters. We offer three ways to apply that stream filter:

  • From the command line, in a UNIX pipeline:

    cat foo-tex.html | phantomjs TeXZilla.js streamfilter > foo-mathml.html

    echo "This is a **Markdown** document with a *math formula*: $ z = \\sqrt{x^2 + y^2} $" | markdown | nodejs TeXZilla.js streamfilter | sed '1s/^/\n<!-- HTML5 document -->\n/'

    (note: this is not yet supported by slimerjs)

  • Using the TeXZilla.filterString(aString) function, for example TeXZilla.filterString("blah $x^2$ blah") will return the filtered string.

  • Using the TeXZilla.filterElement(aElement) function. This one will browse recursively the descendants of the DOM element aElement and the stream filter will be applied to the text leaves.

By introducting these TeXZilla.filter* function, it becomes tempting to use TeXZilla the same way as MathJax, that is to process all the text nodes in your Web pages and to filter the TeX strings. This is not the intended goal of TeXZilla and it is strongly discouraged: not only the MathML content won't appear in crawlers (e.g. search engines or feed readers) but also browsing all the DOM elements and appending new ones can be very slow for large documents. Instead, it is recommend to filter your static Web page with commonJS TeXZilla.js streamfilter before publishing it or to use a server-side conversion for example using the Web server mode. There are situations where you do not have other choice, though. In that case try to reduce as much as possible the number of elements being processed (see the example in the next section). Of course, if you do not care about performance and MathML availibility outside your web site, you can just use MathJax.

New Safe and Itex-Identifier parsing modes

The most notable difference between TeXZilla and itex2MML is the handling of some expressions like $xy$ or $Func$. By default, TeXZilla interprets this as individual MathML identifiers <mi>x</mi><mi>y</mi> (so that as in LaTeX, they will render in italic) while itex2MML interprets this as a single indentifier <mi>Func</mi>. It is now possible to configure TeXZilla to align with itex2MML's behavior. To do that, use TeXZilla.setItexIdentifierMode or pass the appropriate boolean to the command line. Consecutive non-basic letters (like Greek or Arabic) are still treated as individual tokens. With that change, we hope that TeXZilla could be used to parse all the commands supported by itex2MML into an equivalent output. Together with the command line stream filter, this should allow to recover all the nice itex2MML features.

Similarly, a safe mode is now available and can be enabled with TeXZilla.setSafeMode or by passing the appropriate boolean to the command line. This mode will forbid commands that could be used for XSS injections like \href. With that mode and the new TeXZilla.filterElement function, I'm now able to remove MathJax's use from my blog (users of browsers without good MathML support can still enable it or choose the lighter mathml.css stylesheet). MathJax was a bit overkill for my blog since I'm only parsing visitor comments. To illustrate how the setSafeMode and filterString functions can be used, I now just have to do

// Process TeX fragments in blog comments and comment preview.
window.addEventListener("DOMContentLoaded", function() {
  TeXZilla.setSafeMode(true);
  var toProcess =
    document.querySelectorAll("#comments > dl > dd, #comment-form dd.comment-preview");
  for (var i = 0; i < toProcess.length; i++) {
    TeXZilla.filterElement(toProcess[i]);
  }
});

Inserting equations in a 2D/WebGL canvas

The new function TeXZilla.toImage has been introduced to convert a TeX fragment into a math HTML image with a base64-encoded src attribute. Contrary to other functions of the API, this one needs to do some work to determine the image size and perform the conversion, so it is unlikely to work as expected in a non-browser context. The goal is really only to have a convenient function to generate image of mathematical formulas and insert them into a canvas context to draw 2D or 3D scientific schemas. At the moment, this works well only in Gecko. For instance,

var image =
    TeXZilla.toImage("\\vec{F} = G \\frac{m_1 m_2}{r^2} \\mathbf{u}");
image.onload = function() {
    canvas.getContext("2d").drawImage(image,
        (canvas.width - image.width) / 2,
        (canvas.height - image.height) / 2);
}

will insert a mathematical formula in the middle of a 2D canvas. Similarly, you can insert a mathematical formula as a texture in a WebGL canvas. It is recommended to pass aRoundToPowerOfTwo=true to TeXZilla.toImage, so that the image will have dimensions that are power of two. Note that the mathematical formula will be automatically centered in the middle of the generated image. See this example for how to setup the formulas with three.js and make them always oriented in the direction of the camera.

MathML in WebGL

Integration in Mozilla products

  • The CKeditor editor plugin is now integrated in MDN, so you can click on the square root logo square root logo in the editor toolbar to insert mathematical formulas. By the way, the mathml.css is now used for browsers without MathML support. See for example the pages for acosh, atanh or CSS transform.

  • The editor/ in comm-central now integrates a small input box to insert mathematical formulas, accessible from the Insert menu. This will be available in Thunderbird 31 and Seamonkey 2.28, so that you can write mathematics in your emails and in the WYSIWYG editors.

  • Various FirefoxOS Web math apps have been written and use TeXZilla. Raniere is also working on a math keyboard for FirefoxOS as a GSoC project, which will allow to type mathematics faster on mobile devices.

Tuesday, February 25 2014

TeXZilla 0.9.4 Released

update 2014/03/11: TeXZilla is now available as an npm module.

Introduction

For the past two months, the Mozilla MathML team has been working on TeXZilla, yet another LaTeX-to-MathML converter. The idea was to rely on itex2MML (which dates back from the beginning of the Mozilla MathML project) to create a LaTeX parser such that:

  • It is compatible with the itex2MML syntax and is similarly generated from a LALR(1) grammar (the goal is only to support a restricted set of core LaTeX commands for mathematics, for a more complete converter of LaTeX documents see LaTeXML).
  • It is available as a standalone Javascript module usable in all the Mozilla Web applications and add-ons (of course, it will work in non-Mozilla products too).
  • It accepts any Unicode characters and supports right-to-left mathematical notation (these are important for the world-wide aspect of the Mozilla community).

The parser is generated with the help of Jison and relies on a grammar based on the one of itex2MML and on the unicode.xml file of the XML Entity Definitions for Characters specification. As suggested by the version number, this is still in development. However, we have made enough progress to present interesting features here and get more users and developers involved.

Quick Examples

\frac{x^2}{a^2} + \frac{y^2}{b^2} = 1

x2a2+y2b2=1\frac{x^2}{a^2} + \frac{y^2}{b^2} = 1

∑_{n=1}^{+∞} \frac{1}{n^2} = \frac{π^2}{6}

n=1+1n2=π26∑_{n=1}^{+∞} \frac{1}{n^2} = \frac{π^2}{6}

س = \frac{-ب\pm\sqrt{ب^٢-٤اج}}{٢ا}

س=-ب±ب٢-٤اج٢اس = \frac{-ب\pm\sqrt{ب^٢-٤اج}}{٢ا}

Live Demo / FirefoxOS Web app

A live demo is available to let you test the LaTeX-to-MathML converter with various options and examples. For people willing to use the converter on their mobiles a FirefoxOS Web app is also available.

Using TeXZilla in a CommonJS program or Web page

TeXZilla is made of a single TeXZilla.js file with a public API to convert LaTeX to MathML or extract the TeX source from a MathML element. The converter accepts some options like inline/display mode or RTL/LTR direction of mathematics.

You can load it the standard way in any Javascript program and obtain a TeXZilla object that exposes the public API. For example in a commonJS program, to convert a TeX source into a MathML source:

  var TeXZilla = require("./TeXZilla");
  console.log(TeXZilla.toMathMLString("\\sqrt{\\frac{x}{2}+y}"));

or in a Web Page, to convert a TeX source into a MathML DOM element:

  <script type="text/javascript" src="TeXZilla.js"></script>
  ...
  var MathMLElement = TeXZilla.toMathML("\\sqrt{\\frac{x}{2}+y}");

Using TeXZilla in Mozilla Add-ons

One of the goal of TeXZilla is to be integrated in Mozilla add-ons, allowing people to write cool math applications (in particular, we would like to have an add-on for Thunderbird). A simple Firefox add-on has been written and passed the AMO review, which means that you can safely include the TeXZilla.js script in your own add-ons.

TeXZilla can be used as an addon-sdk module. However, if you intend to use features requiring a DOMParser instance (for example toMathML), you need to initialize the DOM explicitly:

  var {Cc, Ci} = require("chrome");
  TeXZilla.setDOMParser(Cc["@mozilla.org/xmlextras/domparser;1"].
                        createInstance(Ci.nsIDOMParser));

More generally, for traditional Mozilla add-ons, you can do

  TeXZilla.setDOMParser(Components.
                        classes["@mozilla.org/xmlextras/domparser;1"].
                        createInstance(Components.interfaces.nsIDOMParser));

Using TeXZilla from the command line

TeXZilla has a basic command line interface. However, since CommonJS is still being standardized, this may work inconsistently between commonjs interpreters. We have tested it on slimerjs (which uses Gecko), phantomjs and nodejs. For example you can do

  $ slimerjs TeXZilla.js parser "a^2+b^2=c^2" true
  <math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><semantics><...

or launch a Web service (see next section). We plan to implement a stream filter too so that it can behave the same as itex2MML: looking the LaTeX fragments from a text document and converting them into MathML.

Using TeXZilla as a Web Server

TeXZilla can be used as a Web Server that receives POST and GET HTTP requests with the LaTeX input and sends JSON replies with the MathML output. The typical use case is for people willing to perform some server-side LaTeX-to-MathML conversion.

For instance, to start the TeXZilla Webserver on port 7777:

  $ nodejs TeXZilla.js webserver 7777
  Web server started on http://localhost:7777

Then you can sent a POST request:

  $ curl -H "Content-Type: application/json" -X POST -d '{"tex":"x+y","display":"true"}' http://localhost:7777
  {"tex":"x+y","mathml":"<math xmlns=\"http://www.w3.org/1998/Math/MathML\"...

or a GET request:

  $ curl "http://localhost:7777/?tex=x+y&rtl=true"
  {"tex":"x+y","mathml":"<math xmlns=\"http://www.w3.org/1998/Math/MathML\"...

Note that client-side conversion is trivial using the public API, but see the next section.

Web Components Custom Element <x-tex>

We used the X-Tag library to implement a simple Web Components Custom Element <x-tex>. The idea is to have a container for LaTeX expressions like

  <x-tex dir="rtl">س = \frac{-ب\pm\sqrt{ب^٢-٤اج}}{٢ا}</x-tex>

that will be converted into MathML by TeXZilla and displayed in your browser: س=-ب±ب٢-٤اج٢اس = \frac{-ب\pm\sqrt{ب^٢-٤اج}}{٢ا}. You can set the display/dir attributes on that <x-tex> element and they will be applied to the <math> element. Instances of <x-tex> elements also have a source property that you can use to retrieve or set the LaTeX source. Of course, the MathML output will automatically be updated when dynamic changes occur. You can try this online demo.

CKEditor Plugins / Integration in MDN

Finally, we created a first version of a TeXZilla CKEditor plugin. An online demo is available here. We already sent a pull request to Kuma and we hope it will soon enable users to put mathematical mathematical formulas in MDN articles without having to paste the MathML into the source code view. It could be enhanced later with a more advanced UI.

Wednesday, January 29 2014

New MathML Firefox add-ons on AMO

While the patches for MathML integration in MediaWiki are progressively being reviewed and merged and while we are working on the support for Open Type fonts with a MATH table in Gecko, I finally found time to check the progress in Mozilla's add-on SDK. In particular, since the last time I tried (some years ago) they have introduced a cleaner interface for content scripts as well as the possibility to use XPCOM for missing features. Hence I have been able to update some of my experimental MathML add-ons. I have submitted two new add-ons to Mozilla's AMO that I hope could be useful to some people:

  • MathJax Native MathML, an add-on to force MathJax to switch to Gecko's MathML support without having to use the MathJax menu to change the output mode and works even on Websites where that menu is disabled. This also removes MathJax's automatic rescaling and inline-block span that are currently causing random rendering bugs with Gecko's native MathML (and will confuse possible future line-breaking support anyway).
    MathJax Native MathML
  • MathML Copy (at the moment only partially reviewed by the AMO team), an add-on to copy MathML and TeX into the clipboard. For MathML, two flavors are copied: the source as plain text (to paste in your favorite text editor) and the MathML as HTML (to paste in Thunderbird, MDN, any Gecko-based HTML editor etc). Copying TeX is only possible when it is provided via the standard MathML annotation method, which is the case in e.g. LaTeXML and Instiki documents as well as in Wikipedia in the future.
    MathML Copy

As usual, there is room for improvements and bug fixes, but that's a start. In particular I would be happy to get translations for the two strings of the MathML Copy add-on: "Copy MathML Formula" and "Copy TeX Source". Also, because I used the add-on SDK these add-ons are unfortunately only available for Firefox at the moment...

Monday, January 13 2014

Improvements to Mathematics on Wikipedia

Introduction

Wikipedia

As mentioned during the Mozilla Summit and recent MathML meetings, progress has recently be made to the way mathematical equations are handled on Wikipedia. This work has mainly be done by the volunteer contributor Moritz Schubotz (alias Physikerwelt), Wikimedia Foundation's developer Gabriel Wicke as well as members of MathJax. Moritz has been particularly involved in that project and he even travelled from Germany to San Francisco in order to meet MediaWiki developers and spend one month to do volunteer work on this project. Although the solution is essentially ready for a couple of months, the review of the patches is progressing slowly. If you wish to speed up the integration of what is probably the most important improvements to MediaWiki Math to happen, please read how you can help below.

Current Status

The approach that has been used on Wikipedia so far is the following:

  • Equations are written in LaTeX or more precisely, using a specific set of LaTeX commands accepted by texvc. One issue for the MediaWiki developers is that this program is written in OCaml and no longer maintained, so they would like to switch to a more modern setup.
  • texvc calls the LaTeX program to convert the LaTeX source into PNG images and this is the default mode. Unfortunately, using images for representing mathematical equations on the Web leads to classical problems (for example alignment or rendering quality just to mention a few of them) that can not be addressed without changing the approach.
  • For a long time, registered users have been able to switch to the MathJax mode thanks to the help of nageh, a member of the MathJax community. This mode solves many of the issues with PNG images but unfortunately it adds its own problems, some of them being just unacceptable for MediaWiki developers. Again, these issues are intrinsic to the use of a Javascript polyfill and thus yet another approach is necessary for a long-term perspective.
  • Finally, registered users can also switch to the LaTeX source mode, that is only display the text source of equations.

Short Term Plan

Native MathML is the appropriate way to fix all the issues regarding the display of mathematical formulas in browsers. However, the language is still not perfectly implemented in Web rendering engines, so some fallback is necessary. The new approach will thus be:

  • The TeX equation will still be edited by hand but it will be possible to use a visual editor.
  • texvc will be used as a filter to validate the TeX source. This will ensure that only the texvc LaTeX syntax is accepted and will avoid other potential security issues. The LaTeX-to-PNG conversion as well as OCaml language will be kept in the short term, but the plan is to drop the former and to replace the latter with a a PHP equivalent.
  • A LaTeX-to-MathML conversion followed by a MathML-to-SVG conversion will be performed server-side using MathJax.
  • By default all the users will receive the same output (MathML+SVG+PNG) but only one will be made visible, according your browser capabilities. As a first step, native MathML will only be used in Gecko and other rendering engines will see the SVG/PNG fallback ; but the goal is to progressively drop the old PNG output and to move to native MathML.
  • Registered users will still be able to switch to the LaTeX source mode.
  • Registered users will still be able to use MathJax client-side, especially if they want to use the HTML-CSS output. However, this is will no longer be a separate mode but an option to enable. That is, the MathML/SVG/PNG/Source is displayed normally and progressively replaced with MathJax's output.

Most of the features above have already been approved and integrated in the development branch or are undergoing review process.

How can you help?

MediaWiki

The main point is that everybody can review the patches on Gerrit. If you know about Javascript and/or PHP, if you are interested in math typesetting and wish to get involved in an important Open Source project such as Wikipedia then it is definitely the right time to help the MediaWiki Math project. The article How to become a MediaWiki hacker is a very good introduction.

When getting involved in a new open source project one of the most important step is to set up the development environment. There are various ways to setup a local installation of MediaWiki but using MediaWiki-Vagrant might be the simplest one: just follow the Quick Start Guide and use vagrant enable-role math to enable the Math Extension.

The second step is to create a WikiTech account and to set up the appropriate SSH keys on your MediaWiki-Vagrant virtual machine. Then you can check the Open Changes, test & review them. The Gerrit code review guide may helpful, here.

If you need more information, you can ask Moritz or try to reach people on the #mediawiki (freenode) or #mathml (mozilla) channels. Thanks in advance for your help!

- page 1 of 13