Maths, Informatique, Jeux
Web site created by Frédéric and François WANG
Répertoire principalInternationalWhy you should use MathML

Introduction

Everybody agrees that Internet is a powerful way to exchange information and consequently a very useful medium for scientists. More and more people are asking themselves what is the proper way to put their mathematical contents on the Web. In this article, I try to explain why it is worth using MathML, the standard for mathematical formulae on the Web.

The document could also be entitled "Answer to the 5 criticisms against MathML and 20 good reasons to use it" but I think new reasons to use MathML will be added in the future - and I hope criticisms will disappear - so I did not choose this title. Because I believe that this document could be useful to make MathML widely used and that people could improve it, I put it under a creative commons license. Consequently you can both freely distribute and modify its content.

The criticisms against MathML

Nowadays, using MathML on web pages is not easy because of the lack of software implementations and the misunderstanding of some scientists. In this section I answer to the classical criticisms and indicate which are really justified.

MathML is not the standard language for scientists

I have already heard something like « (La)TeX is the lingua franca for scientists. All the scientists use it and they will never change for anything else ». I find such a dogmatism a bit worrying and I hope no serious scientist would say something like that. At the opposite, scientists believe in progress and even if - as every human beings - they naturally prefer what they are conversant with, they are always interested in better tools. Fortunately, since Rhind Papyrus, methods to produce scientific document have been improved a lot and the advancement stopped neither after the invention of printing press by Gutenberg nor - more recently - after the birth of the first computer.

TeX is currently the most convenient way to produce a scientific document but it is not sure that it is perfectly adapted to computers. Scientists have used for a long time a visual representation of mathematical formulae and this idea has been kept in the early years of computer science. Nevertheless, some developments in this area (scientific computation, theorem proving...) and the birth of the Web - with the paradigm of a semantic Web - show that the mathematical meaning is also important in order to easily process formulae by a computer. As a consequence, a standard language for scientists must at less deal with both aspects, just as MathML does. In fact, mathematical languages should not be opposed but seen as complementary. Being the mathematical language for the Web, MathML plays a central role and should rather be seen as a bridge between all of them.

MathML is not human-readable and too verbose to be edited

It is false to say that MathML is not human-readable. As all the XML languages, it uses a source code that can be read and edited by human beings. Nevertheless, as it is much more complex than TeX this is not supposed to be done. Further explanation will be given later.

TeX was a great creation because it allowed to stop coding formulae in ASCII format and to get them on your computer documents with a high printing quality. Nevertheless, it let the persistent idea that the source code of a formula must be directly edited by human beings. If you look to other types of computer data such as text, images or musics you see that tools are available to manipulate them and it is quite surprising and even paradoxical that scientists have kept primitive methods.

MathML can be replaced by images, CSS or SVG to render formulae

Because HTML can not display mathematical formulae, authors have been used to insert in their web pages images generated from (La)TeX. But images cause a lot of problem with accessibility, rendering or size of files. These issues in which MathML as a great role are developed later.

Some people think they can solve the problems of layout thanks to CSS or SVG. But for accessibility - and more generally for all the cases where mathematical meaning is important - this method is not relevant. Nevertheless, it has the advantage to allow a larger software compatibility.

The method can be saved by coding in MathML and then propose to the reader either to get the formulae in MathML format or to convert it to CSS/SVG. The feasibility of such a conversion is discussed later. Another idea to save this method, comes from the Opera community. Thre Opera browser had a good CSS support but no MathML support at all. Member of Opera worked with the W3C to produce A MathML for CSS profile that proposes rules to display MathML using the CSS rendering engine.

MathML can not be displayed by the majority of browsers

This is false, even if you have to do some manual configuration before MathML works. First as MathML is a XML format, the webmaster must either save the document with an XML extension or send a special header to indicate the MIME type. For instance in PHP file :

header("Content-Type:application/xhtml+xml");

Firefox natively supports MathML, but currently you have to download and install yourself proprietary fonts. Last version of Opera also supports MathML. Internet Explorer needs a plug-in to read MathML such as Mathplayer.

Another issue to render MathML properly is the need for mathematical fonts that cover the whole set of Unicode characters for mathematics. The STIX project recently proposed a beta version of free fonts that will certainly achieve this goal.

MathML can only be used with the intolerant XML format

If you study the web pages, you will see that the majority of them are not well-encoded. For instance, I invite you to try the W3C validator and to check a web page. Normally, you will get a lot of syntax errors, except if you are very lucky - or if you choose well the page. This is a real difficulty as to make a web page with MathML, it must have no error and additionally be encoded in X(HT)ML.

It will take a long time before everybody produce valid XHTML pages and from this point of view one can say that the adoption of MathML is likely to keep really limited. In fact, there are already tools like Amaya to make XHTML+MathML documents so the problem comes rather from the users that have not integrated XML yet than from the developers... Also, note that MathML is likely to be introduced in HTML 5. I believe this will be useful for people that are late in web technologies and claim no to be able to use XML yet...

Web pages and layout

This section shows the most visible advantages of a web page that use MathML and that have been strong motivations for its creation (see chapter 1 of [2]).

MathML is loaded an displayed faster than images

If you are used to visit scientific websites, you must have already seen how pitiful it is to see the text of the page displayed immediately and then waiting a long time before all the images of formulae appear... Despite its verbosity, MathML is less voluminous than images and uses less bandwidth. Moreover being text with a lot of repeated bytes, it is more compressible. Finally, as an XML language, its syntax is well-defined and algorithms to process and display it are supposed to be faster.

MathML is adapted with the surrounding text

You may object that the previous point is not a big deal as advancement will always offer faster machines and better storage unit. However, images still have a lot of drawbacks as they are not adapted at all with the surrounding text. For instance, you will have difficulties to get images of formulae with the same size as the surrounding text and that are also well aligned with it. Even if you success in doing it, you will never be sure that this layout is good in other machine, with other screen configurations and with different fonts. Moreover, even with a screen that displays a good layout for your formulae, zooming only makes the text bigger, not the images. All these problems disappear with MathML, as formulae are simply handle as text.

One of the great idea of the Web is the principle of hypertext link. But when you have an image it is difficult to delineate one part and make it linkable - although it is possible with the <map> object. The XML nature of MathML makes the use of XLink [3] possible and as a consequence linking part of formulae becomes really easy. Also, webmasters are used to add stylistic properties to components of a web pages thanks to CSS. Of course, this possibility has been kept for MathML formulae.

Among the different media the Web must deal with - medium for visually impaired, screens, search engine, etc - the first that would come in mind for a scientist used to TeX is the printed document. More precisely, a high quality level is expected. Not only images of formulae are not adapted with the surrounding text, and you have problems of layout but also the quality is really inferior to this text. The MathML specification [2] indicates about 70 dpi for images and 300, 600 or more for the text. And as it is well-known, increasing the quality of the images also increase the size...

MathML avoid the "printable documents to download"

Sadly enough, the scientists have kept the habit to put their works (papers, theses...) on the Web as printable documents to download - you can even find some pdf documents in the references I give :-( ... This is certainly because for a lot of people, the creation of a pdf from TeX is currently more easy than the one of an XHTML+MathML document...

But doing this way totally destroy the first principle of the Web that is to allow everybody - including visually impaired - to go document from document thanks to the links. The size of a pdf is several Mb - whereas it is several Kb for a web page - so even with a plugin to directly open the document in a tab, your browser would not handle it so easily. Anyway, when you open a pdf file you suddenly go out from the Web as you lose the access to source code which is necessary for a lot of things such that accessibility, sharing or search engine.

At the opposite an XHTML+MathML document really includes your work in the World Wide Web and avoid your site to simply become a place to download files...

Editing scientific documents

The goal of this section is to explain how mathematical formulae can be produced and re-used. As indicated in the MathML specification [2] :

While MathML is human-readable, it is anticipated that, in all but the simplest cases, authors will use equation editors, conversion programs, and other specialized software tools to generate MathML. Several early versions of such MathML tools already exist, and a number of others, both freely available software and commercial products, are under development.

MathML does not require to learn its syntax

Often, people refuse to use MathML because they claim it is too verbose. As I said, this is because they think the only way to make formulae is to directly edit the source. But if manual edition is certainly necessary when you have to do things at a low level - for instance writing a program or using command line - it is a bit strange to do it for sophisticated computer operations. To give a caricatural example, nobody creates an image by entering the value of each pixel !

Moreover learning the syntax of a language is not easy for everybody so a formula editor is required if you want mathematics to be accessible by a large audience. From this point of you, the complexity of the MathML syntax is an advantage : the formulae can be more easily handled by software tools and one can imagine efficient tools - among them, there are of course editors but many others are exposed later - whereas TeX reduces the use of computer to files exchange and printed document.

The first page of [6] presents an interesting point of view about the communication between users and machines, its beginning, its evolution and the future role the scientists will have to play to improve it.

MathML can be converted to / generated from other languages

It is really important to have converters between MathML and other mathematical languages. As for all the other XML languages, the use of XSLT style sheets [4] makes the work easier. One can convert MathML to CSS / SVG, but there are also style sheets to render MathML as TeX. Generate TeX to MathML may be more difficult, as you have to "guess" the additional information, but because TeX is widely used, people have already work on heuristics to do the job. Some of them use javascript/DOM to transform row text to MathML, but I am not sure this is a good idea - though I have not study it a lot - as the new DOM tree may not be accessible to be copied-and-pasted, read by visually-impaired or analysed by search engines... In any case, I recommend the webmasters to always give pages with MathML sources available.

The presentation markup allows an easy conversion with the languages such as TeX that only deal with layout but there is also a need to build a bridge with the systems that deal with the meaning of the formulae. Fortunately, the content markup allows to do it as you will see at the end of this page. To make easier internal conversion between presentation/content markup, the MathML specification allows the use of parallel markup.

The W3C MathML implementations page give a lot of available tools such as converters and style sheets.

MathML allows formulae to be re-used

Before being distributed, TeX sources are supposed to be compiled and transformed to images, pdf files etc. This limits the possibility for documents to be shared and edited by a community of scientists : something as basic as selecting one part of a formula, copy and paste it in a computer algebra system is nowadays impossible. In the age of the so-called Web 2.0 and even more for scientists - that are supposed to communicate and share - this is not acceptable. In the case of MathML, it is always possible to copy-and-paste the formula - or at less its source - in a program and then to process it.

Compatibility, accessibility and internationalization

Using the Web means sharing your work with a lot of people around the world. Consequently, MathML has been designed to overcome several issues in compatibility, accessibility and internationalization that are just ignored when you use a printable document - such as pdf - generated from a TeX source.

MathML is a W3C standard

W3C is the international organization that creates the "standards" for the Web. The MathML specification is as a consequence a normative document, what allows MathML to be highly compatible. Also, it was created by a Math Working Group composed of people coming from several countries and scientific fields. Hence MathML takes into account the different needs. Additionally, MathML is used in the ISO format ODT.

MathML is an XML language

The fact that MathML is an eXtensible Markup Language means that new features can be added if new needs appear - see for instance the Arabic mathematical notation - or at the opposite can become deprecated if experience shows they are useless. Finally MathML can be used in combination with other Web languages. For example, formulae are often included inside a document with text, table, images... so MathML is likely to be combined with XHTML and SVG [5]. As seen before you can use CSS and XLink to apply style or make links and convert / generate MathML thanks to XSLT. Finally, formulae can be handled by scripts : the chapter 8 of the specification [2] describes the DOM for MathML.

I made a page where I show examples of MathML mixed with other W3C standards. Dana Lee Ling told me that these examples allows him to produce his own "MathML in SVG" page, which shows the importance - especially for physics - of mixing formulae and diagrams. See also his "Web page technologies".

MathML can be read by visually impaired

TeX has been essentially designed for a visual rendering. This allows to get a simpler syntax - and an edition by hand - but makes difficult the access of documents by visually impaired. Such a limitation can not be accepted for a web language and the accessibility was one of the aspect taken into account when MathML was made [2].

Braille is a system that allows visually impaired to write and read. For mathematical Braille, I have only found the french software BraMaNet whose sources contained a style sheet to transform Presentation MathML to Braille, but I do not know whether there is an interface to directly send information from a browser to a Braille terminal. The LAMBDA project is another example of mathematical program that enables braille display.

Also, speech synthesizer is a good option for blind people to replace human readers [11]. Firevox, an extension for Firefox, can read MathML formulae. The DAISY standard go further and enables audio books with a synchronised text (on screen or on braille display). This is essential for blind and for dyslexic people. MathML has been chosen by the DAISY consortium as the way to include mathematical contents.

Finally, blind people have there own notation for mathematical content, and here again, MathML converters are important. Tools are already available, for instance : a program to convert MathML to Nemeth (a braille code for encoding mathematical and scientific notation linearly) or MathML to AMS (a notation developed for blind students at the University of Karlsruhe).

Thanks to Michael Zacherle for his e-mail with a lot of information about Math for the blind. They have been really useful to complete this section.

MathML can be written from right to left

MathML is made by an international organization and consequently the particularities of each country are supposed to be taken into account. A W3C note about Arabic mathematical notation [1] has been added to clarify some points not described in MathML 2.0 and prepare improvements for future versions of the MathML specification. The browser Dadzilla includes this feature.

MathML can avoid ambiguity of notations

As indicated in the chapter 4 of MathML specification [2], "there are many to one mappings from presentation to semantics and vice versa" and "Mathematical presentation also changes with culture and time". As a consequence, when you work in an international context, it is better to encode the meaning of the formulae and allows each reader to choose how they want to display them. The Content Markup of MathML allows to do so.

The scientific community and the Web 2.0

The Web 2.0 is a new Web where collaboration and sharing become predominant. While it is supposed to be really adapted for scientists, tools to handle formulae in blogs or wikis or currently limited. This section shows the importance of MathML for the exchange by Internet.

MathML could be sent by e-mail

Here, I use "could" as I am not sure this is already possible. Amaya can send document by mail but this feature require an SMTP without authentication, so I can not test it for the moment. Nevertheless, I think Thunderbird can be configured to receive mail with MathML formulae as show this screenshot at Mozilla MathML page or this instructions for Fedora.

MathML can be used in your scientific blog

It is in fashion for a scientist to have a blog where he talks about his current works, his future projects... But of course a scientific blog is supposed to contains formulae, what implies the use of MathML with regards to what has been told before. Unfortunately, one often find a blog with a "test" post that fails to display MathML and makes the authors think the browsers can not handle MathML...

Another issue comes with the "comments" from the readers of the scientific blog, as they are likely to contain formulae too. Generally, blogs do not allow XHTML markup in comments, as it would be an additional constraint to check the validity. An interesting aspect of the W3C project Annotea is that it allows this feature. Unfortunately the use of annotations are even less widespread than MathML. Annotations are external remarks that can be attached to a web document without modify it. As a consequence the web pages do not require to be produced by a dynamic language such as PHP. The Annotations in Amaya allows annotations written in all the languages the editor can deal with - including MathML - and can be attached to any part of the page. Annozilla is an extension for gecko-based browsers that allows to use annotations but the current version seems to have a more limited edition interface.

MathML could be used in wikis, discussion groups, forum etc

For mailing lists see what has been said before about MathML in e-mail. Now for other types of discussion groups or wikis, as seen in the blog section the possibility to include XHTML markup is not likely to be possible in a near future because of the way users write - i.e. with a source - and the validity constraint.

Nevertheless, the possibility to transform the TeX in the source to images is generally available, so a temporary solution could be to add an option to transform TeX to presentation MathML. Of course you need to have XHTML documents but this is already the case as you can see with Wikipedia or PunBB. I found a wiki that can handle both MathML and SVG. Blatex is a project to bring MathML support in Wikipedia, but it looks forgiven.

MathML can be used in e-learning

Nowadays, the majority of teachers use computer to produce their document. More and more have a website where students can get lesson, work, help... This is a very efficient way for them to deal resources. Moreover students can help each other when the site is a collaborative place. This is good as pupils learn both to understand and explain ideas.

Of course when you teach in a scientific field, you are confronted to several problems if you want to put on the Web resources with mathematical formulae and refuse that your site become a place for "printable documents to download". The paper [12] shows how browsing/editing capabilities and the use of Annotation could be combined to get such a site for scientific e-learning.

The Content MathML and the semantic Web

The semantic Web go further than Web 2.0 : not only people communicate each other but also the data are easily handled by computers and you finally get powerful tools to work. Paradoxically, whereas the recursive structure of mathematical formulae should naturally allow an easy process by computer such tools are really limited in respect to other data [9]. In addition to the basic copy-and-paste operation I have talked about before, this section describes new features I hope we will have when MathML is more adopted.

MathML can be exported to / imported from computer algebra systems

According to their web sites, the two most used computer algebra systems Mathematica and Mapple, can import and export MathML. About free software, I know that Maxima has an experimental implementation and I started an XML processor that could improve the MathML integration.

Of course, this feature belongs to the idea of a re-use of the mathematical contents : formula are easily sent from one program to another - such as mail client, browser, editor. But I have also an idea of a new generation of authoring tools that would make edition nearer to the one you can do with a pen.

Typically when you create a document with a semantic mathematical language - such as Content MathML - the software tool can "understand" the formula and propose new tools to transform the last written formula. For instance when you type the expression "∫ exp x ⅆx =", the computer could guess you want to take a primitive and propose this operation. There is a lot of other basic transformations one can imagine - derivation, terms grouping... - but even when they are really easy for a human who has practiced them since a long time, they need a sophisticated program. In this case, the interaction between the editor and a computer algebra system becomes necessary.

MathML could be used in theorem proving

The possibility to formalise Mathematics allows to imagine softwares that help mathematicians by finding or checking proofs of theorems. This is really interesting as the complexity and the mathematical fields involved for mathematical proofs are increasing.

Even if theorem proving softwares already exist, it is difficult to take semantic information from current papers such as LaTeX documents. The content markup of MathML is a first step to make easier the process of papers by theorem proving software. [10] describes such a use of MathML. Also, OMDoc [7] is an extension of content MathML that increase its power of expression.

MathML can be found by search engines

If you read what I said before, you may have understood that pdf documents can not be found by search engine because they do not give the source. In fact, I meant that from the "semantic web" point of view, the scope of a search engine is not reduced to "keywords" but includes meta data, relations between web pages and other information more complex than row text.

The section "Math-Aware searching" of [9] shows how important could be the possibility to search mathematical formulae and sometimes more powerful than "keywords". As explained in [8] you have to send requests that avoid the ambiguity of mathematical notations and as a consequence you need a language that encodes the mathematical meaning. The MathWebSearch prototype is already available, but unfortunately, as MathML - and a fortiori Content MathML - is not a lot used on the Web, the indexed pages are currently quite limited...

Conclusion

The study of the criticisms of MathML shows that they are due both to a misunderstanding of scientists that refuse to use it and to the lack of software implementations. These two issues are of course related, and we can hope they will dissappear in the future.

From a purely technical point of view, using MathML is the right way to truly put mathematical content on the Web whereas pdf document or images must be avoid. Even if other ways could also give a good layout, the MathML source is necessary to allow compatibility, accessibility and internationalization. Moreover, you can use either other languages created in a primitive fashion - i.e write source by hand - before a conversion to MathML or sophisticated interfaces. In any case the MathML source allows to easily re-use the mathematical content.

MathML is also really important for the next generations of the World Wide Web. The paradigm of the Web 2.0 - i.e. sharing - is now widespread but quite ironically the scientific community has still problems to use mathematical formulae in wikis, forums or e-mails. While Presentation MathML is enough to solve these issues, Content MathML will carry out the dream of a (scientific) semantic web.

After reading this document, you may understand the importance of MathML to communication by Internet, and more precisely on the Web. While LaTeX should only be used in what it has been made for - i.e. creating printable documents with a high quality level from an easy-to-write code source - MathML is likely to be used in computers for several purposes. But the future of MathML is strongly dependent on its use by scientists and software vendors. Because of all the beautiful things MathML could bring, I chose to join people who use it and I hope the readers of this document will join us too. À bon entendeur...

References

  1. W3C Interest Group Note - Arabic mathematical notation
  2. W3C Recommendation - Mathematical Markup Language (MathML) Version 2.0 (Second Edition)
  3. W3C Recommendation - XML Linking Language (XLink) Version 1.0
  4. W3C Recommendation - XSL Transformations (XSLT) Version 2.0
  5. W3C Working Draft - An XHTML + MathML + SVG Profile
  6. Conran Barski - How to tell stuff to a computer
  7. Michael Kohlhase - An Open Markup Format for Mathematical Documents
  8. Michael Kohlhase and Ioan Sucan - A Search Engine for Mathematical Formulae
  9. Robert Miner - The Importance of MathML to Mathematics Communication
  10. Hanane Naciri and Laurence Rideau - The Marriage of MathML and Theorem Proving
  11. Abraham Nemeth - Mathspeak
  12. Vincent Quint and Irène Vatton - MathML in E-Learning with Amaya
Creative Commons License
This work is licensed under a Creative Commons License.
This page is W3C-compliant - Author: Frédéric WANG
Valid XHTML 1.1 Valid MathML 2.0 Valid SVG Valid CSS Amaya, the W3C browser/editor Déclaration qualité Opquast Foxkeh banners for Firefox 2