Exporting notebook to HTML, using conversion rules

15

8

The idea is to export the whole notebook to HTML with all the math equations in latex form. In this way we can add the mathjax script and it will take care of the math.

Suppose we have the following in a notebook:

This is an equation: $f(x) = x^2$. We let $x \in [0,1]$ for this example.

The following is the expression that generates the above.

Cell[TextData[{
 "This is an equation: ",
 Cell[BoxData[
  FormBox[
   RowBox[{
    RowBox[{"f", "(", "x", ")"}], " ", "=", " ", 
    SuperscriptBox["x", "2"]}], TraditionalForm]],
  FormatType->"TraditionalForm"],
 ". We let ",
 Cell[BoxData[
  FormBox[
   RowBox[{"x", " ", "\[Element]", " ", 
    RowBox[{"[", 
     RowBox[{"0", ",", "1"}], "]"}]}], TraditionalForm]],
  FormatType->"TraditionalForm"],
 " for this example."
}], "Text"]

If you look here you will find an example of how to use the conversion rules. Unfortunately, I haven't been very lucky with it, as you can see in one of my previous posts.

In any case, here is an attempt:

ExportString[
 Cell[TextData[{"This is an equation: ", 
    Cell[BoxData[
      FormBox[RowBox[{RowBox[{"f", "(", "x", ")"}], " ", "=", " ", 
         SuperscriptBox["x", "2"]}], TraditionalForm]], 
     FormatType -> "TraditionalForm"], ". We let ", 
    Cell[BoxData[
      FormBox[RowBox[{"x", " ", "\[Element]", " ", 
         RowBox[{"[", RowBox[{"0", ",", "1"}], "]"}]}], 
       TraditionalForm]], FormatType -> "TraditionalForm"], 
    " for this example."}], "Text"
  ],
 "HTML", "ConversionRules" -> {"TraditionalForm" -> {"$", 
     Convert`TeX`BoxesToTeX[#] &, "$"}}]

The result is the following html code:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1 plus MathML 2.0//EN"
        "HTMLFiles/xhtml-math11-f.dtd">

<!-- Created by Wolfram Mathematica 8.0 : www.wolfram.com -->

<html xmlns="http://www.w3.org/1999/xhtml">
<head>
 <title>
  Untitled
 </title>
 <link href="HTMLFiles/m0001756441.css" rel="stylesheet" \
type="text/css" />
</head>

<body>

<p class="Text">
 This is an equation: <span><span><img \
src="HTMLFiles/m0001756441_1.gif" alt="m0001756441_1.gif" width="51" \
height="15" style="vertical-align:middle" /></span></span>. We let \
<span><span>x &isin; [0,1]</span></span> for this example.
</p>






<div style="font-family:Helvetica; font-size:11px; width:100%; \
border:1px none #999999; border-top-style:solid; padding-top:2px; \
margin-top:20px;">
 <a href="http://www.wolfram.com/products/mathematica/" \
style="color:#000; text-decoration:none;">
  <img src="HTMLFiles/spikeyIcon.png" alt="Spikey" width="20" \
height="21" style="padding-right:2px; border:0px solid white; \
vertical-align:middle;" />
  <span style="color:#555555">Created with</span> Wolfram <span \
style="font-style: italic;">Mathematica</span> 8.0
 </a>
</div>
</body>

</html>

If we try to export to TeX I'm not so lucky either, instead of the dollar signs Mathematica just ignores it and it gives me:

\documentclass{article}
\usepackage{amsmath, amssymb, graphics, setspace}

\newcommand{\mathsym}[1]{{}}
\newcommand{\unicode}[1]{{}}

\newcounter{mathematicapage}
\begin{document}

This is an equation: \(f(x) = x^2\). We let \(x \in  [0,1]\) for this \
example.

\end{document}

The question is: How do we export a notebook to HTML and TeX so that the math is wrapped around the dollar signs? Any other tips or comments are welcomed.

EDIT:

This conversion rule for HTML might be more useful:

"Text" -> {"<p class=\"text\">\n",
  If[MatchQ[#, _FormBox], 
    "$" <> Convert`TeX`BoxesToTeX[#[[1]]] <> "$", #] &, "\n</p>"}

Using that we obtain:

<p class="text">
This is an equation: <span><span>$f(x) = x^2$</span></span>. We let
<span><span>$x \in  [0,1]$</span></span> for this example.
</p>

Question now is... how do we get rid of the span tag?

jmlopez

Posted 2012-05-25T07:02:07.570

Reputation: 6 130

+1 just for letting me know about ConversionRules! – Szabolcs – 2012-05-25T09:57:05.690

the span tag is superfluous and will not affect the display of the html, so I wouldn't worry about it. – rcollyer – 2012-05-25T14:35:29.787

I changed the highlighter used for the $\TeX$ block to lang-tex, per this answer on meta.so. It is listed under the optional extensions listed near the bottom.

– rcollyer – 2012-05-25T14:52:31.370

@rcollyer, That is the case with the span tag, but what I want is to have full control over the conversion rules. Is there anywhere where we can see the source code involving the conversion rules? What are the default settings for HTML and TeX? I think this would make it a lot easier to decide how to make them instead of just having very few examples in the doc center. – jmlopez – 2012-05-25T15:16:15.873

I don't have access to the source code, but I posted an answer with some ideas, including how to remove the spans. – rcollyer – 2012-05-25T15:17:11.123

@rcollyer, I think I understand conversion rules a little bit better. But now that I went back to another example I had before I found something interesting. You can see that in the edit. – jmlopez – 2012-05-25T16:17:44.550

1Okay, that is the third change in scope of the question. So, do me a favor, and post your update as an answer, and your new question as a separate post. – rcollyer – 2012-05-25T16:32:39.587

Answers

8

It seems that every time you open a new cell with no style, that is "", it automatically puts the span tag to it. To to avoid this we can use this rule:

"ConversionRules" -> {
   "" -> {"", ""}
  }

So now, evaluating

ExportString[
    Cell[TextData[{"This is an equation: ", 
        Cell[BoxData[
            FormBox[RowBox[{RowBox[{"f", "(", "x", ")"}], " ", "=", " ", 
            SuperscriptBox["x", "2"]}], TraditionalForm]], 
            FormatType -> "TraditionalForm"
        ], 
        ". We let ", 
        Cell[BoxData[
            FormBox[RowBox[{"x", " ", "\[Element]", " ", 
            RowBox[{"[", RowBox[{"0", ",", "1"}], "]"}]}], 
            TraditionalForm]], FormatType -> "TraditionalForm"
        ], 
        " for this example."}
    ], "Text"], 
    "HTML",
    "FullDocument" -> False,
    "ConversionRules" -> {
        "Text" -> {"<p class=\"text\">\n", If[MatchQ[#, _FormBox], "$" <> Convert`TeX`BoxesToTeX[#[[1]]] <> "$", #] &, "\n</p>"},
        "" -> {"", ""}
    }
]

Now we obtain the desired result:

<p class="text">
This is an equation: $f(x) = x^2$. We let $x \in  [0,1]$ for this example.
</p>

jmlopez

Posted 2012-05-25T07:02:07.570

Reputation: 6 130

3

The span tag does nothing unless you specifically style it as it is designed to

The <span> tag is used to group inline-elements in a document.

The <span> tag provides no visual change by itself.

The <span> tag provides a way to add a hook to a part of a text or a part of a document.

Using css, you can hook on to that and style those sections. For example,

p.text>span>span
{
  color:blue;
  background-color:red;
}

would produce this

enter image description here

Not my idea of reasonable styling, but MathJax seems to respect the styles of the html the $\TeX$ is placed in.

However, if you really want to remove the spans, then I would do the following

StringReplace[ExportString[...],
  "<span><span>$" ~~ t : Shortest[__] ~~ "$</span></span>" :> 
      "$" <> t <> "$"
]

Note the use of Shortest.

rcollyer

Posted 2012-05-25T07:02:07.570

Reputation: 32 561

hi, do you know how to designate the title of a html page, when Exporting one notebook to a html file. I've tried the option"HeadElements" -> "<title>My title</title>" however it's override by the default `<title> Untitled

</title>` – HyperGroups – 2013-12-24T05:24:03.710

@HyperGroups no I don't. I have never looked at it. – rcollyer – 2013-12-24T13:43:45.063

1This is a way to remove those tags, +1 for that. But I'm really looking for a better understanding of those darn conversion rules. I'm pulling my hair out just to know what the default settings are. How does it map the notebook to html. – jmlopez – 2012-05-25T15:23:43.213

@jmlopez I honestly don't know, I have not explored that part of mathematica. – rcollyer – 2012-05-25T15:24:29.617