# Blog de Frédéric

## Tag - xml

Saturday, July 24 2010

## Using Mozilla to print a scientific report based on Web formats

So I'm finally done with my Master's Project in Quantum Computing at Århus. My final report on The Hidden Subgroup Problem is now available in XHTML and pdf formats. There are also the slides I've made for the oral defense, although as always it is not really useful without the comments. This time, I've used the W3C's slidy tool and so my work is 100% based on Web formats :-)

The Hidden Subgroup Problem still remains difficult with our current knowledge, but I've very appreciated thinking on this challenging topic and I hope my work will help research in this area. As an advocate of the Web as a publishing medium for scientists, I've also found a meta-interest in seeing whether it is possible to use Web formats to produce documents readable both on the screen and on the paper (and in this latter case with the same quality as the more traditional methods). One way to do that is to use TeX-like documents and export them in Web/Printing formats using tools such that Tex4ht or GELLMU. Since my personal preference is to write Web contents by a hybrid method (mixing WYSIWYG and simple-syntax-parsers), I've rather considered how one can print these Web formats. One tool to do that is Prince but I'm not sure it is that good with SVG/MathML (there were some issues last time I tried it). Hence I've preferred to use a free layout engine and thus printed my report with Gecko :-).

One of the difficulty is to handle numbering (pages, sections, formulas, theorems...) and the corresponding references (HTML links or page numbers) either in the table of contents or inside the body of the report itself. Moreover, the report has to be split into several small pieces for otherwise it is too large to be edited easily (in my case, 1.1Mb on a single XHTML page for an amount of 99 pages). Hence at the beginning I had to quickly write once and for all an automated system. I'm not really prude of it - it is truly an usine à gaz involving CSS counters, Flex lexical analyzer and a command line print extension for Firefox - but at least it works almost as I expect. Ideally, it should be possible to print each piece separately before grouping them into one single pdf document. However, because of the choice of CSS counters and the lack of control on page numbering in Mozilla I could only print the whole large XHTML document. This was really not convenient and was one of the main annoying issue. The other one being that there is no access to the page numbers and so for the table of contents, I had to write them manually :-(. It seems that a way to overcome the page number issues would be to implement some CSS rules for printing although I don't know if it helps working on separate pieces.

The other problems are various bugs in Gecko. Thanks to the recent improvements in MathML (related to stretchy and fonts) , the mathematical formulas are now displayed with a good quality, or at least one of which I'm satisfied. One issue is the incorrect computation of dimension in mtable which is slightly visible for some split equations. I've also discovered a wrong thickness of bars which seems to be the only difference between print and screen renderings. However, I could workaround it and Karl gave me a hint that would hopefully allow to fix the bug. There are still some annoying bugs with linebreaking within mathematical expressions and around them. For the former I can avoid the problem by adding <mrow/>'s but the latter makes particularly weird effects. Typically, when some comma or period is placed just after an inline formula, a linebreak may happen and move the punctuation symbol to the beginning of the next line... Regarding quantum circuits and other schemas, the SVG code I use is very elementary. Hence I don't have any complaint to formulate, even if I'm among those who are expecting the possibility to use SVG images in the standard <img/> tag. Finally, I have some mysterious bugs with HTML tables printed on several pages and more issues with CSS page breaking, requiring me to do some small manual tweaking at two or three places.

As a conclusion, using Web formats to print a scientific report is not yet so easy. However, I'm quite satisfied of the result and I expect the issues above to be fixed in the near future...

Monday, June 14 2010

## XSLT Stylesheet to help updating the MathML Operator Dictionary

MathML 3 comes with a new version of the operator dictionary and I expect to update Mozilla's own version accordingly. Some members of the MathML WG have actually written a separate recommendation XML Entity Definitions for Characters which contains various information on Unicode characters, including some properties relevant to mathematical operators. They provide a file unicode.xml which is very large (> 6Mb) so I've written an XSLT stylesheet to extract only the information required for the operator dictionary. The stylesheet does nothing magical: it only creates an XML document with a <root/> and <entry/>'s children. The attributes from the initial unicode.xml are attached to each of these entries, using the compact form of the MathML 3 dictionary for boolean-valued properties.

Using the xsltproc tool, the typical command is

xsltproc -o dictionary.xml operatorDictionary.xsl unicode.xml

which produces a XML file dictionary.xml of moderate size (< 150 ko) that should be easier to process. I hope it will help developers of MathML tools to align on the new operator dictionary. Note that the stylesheet is distributed under a triple MPL/GPL/LGPL licence, so people should be able to use it freely.

Sunday, May 2 2010

## Hacking Dotclear 2 for writing XHTML+MathML+SVG

Of course, the first thing I've checked after setting this blog is whether it is possible to have a XHTML+MathML+SVG page in dotclear. As indicated in the Mozilla MathML Project page, one has to:

1. serve the page as application/xhtml+xml.

This is simply done by editing the value of parameter \$content_type in the function serveDocument of the file dotclear/inc/public/lib.urlhandlers.php.

2. write well-formed XML document.

Apparently, dotclear is good enough to do that. However, its editor does not seem to have any option for producing MathML or SVG. Personally, I don't use the wiki syntax and prefer to write the XHTML page myself with the tools I'm used to. As a consequence, this is not really a problem for me.

It is also a good idea to write a valid XML document. By default, dotclear uses the doctype XHTML 1.0 Strict so I had to modify the files themes/name_of_your_theme/tpl/*.html to set XHTML+MathML+SVG instead.

Once this simple changes are made, you can insert MathML formulas such that this expression of a general Quantum Fourier Transform over a finite group:$\forall g\in G,{F}_{G}\mid g⟩=\frac{1}{\sqrt{\mid G\mid }}\sum _{\rho \in \stackrel{̂}{G}}\sqrt{{d}_{\rho }}\sum _{i,j=1}^{{d}_{\rho }}{\rho }_{ij}\left(g\right)\mid \rho ij⟩$

or SVG images such that this quantum circuit used in the famous Shor's algorithm:

Voilà!

--update 30/01/2011: you can find my patch for Dotclear 2.2