How to convert a notebook cell to a string retaining all formatting, colorization of identifiers etc?

14

3

I have an opened Mathematica notebook containing several cells. Suppose, I am interested in one of them -- it may contain a complete or incomplete expression (e.g. with syntax errors, highlighted unbalanced brackets etc), possibly with embedded graphics, some of the identifiers automatically colorized by Mathematica Front End according to their meaning (undefined symbols, Block locals, Module locals, conflicting names...) and some possibly having a manually changed style (font, size, color etc). I need to get a String whose content represents the content of this cell with highest possible fidelity (recall that Mathematica strings preserve formatting and support arbitrary embedded expressions including graphics). The string should mimic the structure and be editable to the same extent as the original cell (so it should not, for example, contain just a graphical image depicting the cell).

Could you please suggest how to do this?

Vladimir Reshetnikov

Posted 2013-06-18T02:27:03.100

Reputation: 6 772

Perhaps rasterizing cell contents, do text-recognizing with TextRecognize or Tesseract, and a bit of image-processing to get the colors/style? – István Zachar – 2013-12-11T22:01:01.953

What do you want to do with this string? Or more specific: With what program do you want to be able to edit this string? – Karsten 7. – 2014-07-23T23:02:08.087

@Karsten7. with Mathematica – Vladimir Reshetnikov – 2014-07-23T23:22:08.377

@VladimirReshetnikov Although you can include all kinds of formatting in a Mathematica string, the things you want to extract are done on-the-fly by the front-end for viewing purposes only. While it makes some sense to provide e.g. access to a hidden front end Mathematica lexer, I would really be surprised, if there is any way to use the front ends internal algorithms used for highlighting. Anyway, maybe the only person that could give you an answer here mentions at several places...

– halirutan – 2014-07-29T20:12:01.890

(here and here) a rather pessimistic view about the possibility to access front end used for highlighting.

– halirutan – 2014-07-29T20:14:38.680

I hope you're not expecting the "string local variables" to change color when I remove it from the "Module" in the string :P – rm -rf – 2013-06-18T19:17:02.877

@rm-rf Of course, I'm not. I want to capture a snapshot of the cell, not its behavior. – Vladimir Reshetnikov – 2013-06-19T08:17:16.667

I know, I was only joking... :) This is a good question. I have no idea where to dig into this, but most probably, it will be some crazy function in FrontEnd`. – rm -rf – 2013-06-19T08:48:51.750

Answers

10

While I was working on alternative TeX export, I had similar requirement. I wanted to export annotated Mathematica code to TeX, with annotations reflecting FrontEnd's syntax highlighting.

Since I couldn't find a way to use front end itself to do it, I decided to write my own package. My SyntaxAnnotations package is now available on GitHub.

It works by analyzing boxes. When a special form e.g

RowBox[{func_, "[", args_, "]"}]

is encountered, it looks for SyntaxInformation associated with func and wraps symbols in args with special boxes describing their syntactic role. It also supports wide range of elements not governed by SyntaxInformation.

List of syntax elements, that are highlighted by front end, can be found in Options Inspector in: Editing Options > Private Editing Options > Auto Style Options.

Currently my package implements highlighting of following elements:

  • LocalVariables,
  • FunctionLocalVariables,
  • PatternVariables,
  • LocalScopeConflicts,
  • UndefinedSymbols,
  • Strings,
  • Comments

I don't have access to actual algorithms used by front end, so package implements my observations of built-in highlighting. It can happen that, in some rare edge cases, highlighting done by my package is slightly different than built-in highlighting.


Examples

We'll create a notebook with pairs of cells. First cell in a pair is an "Input" cell using built-in highlighting, second cell is an "Output" cell, with highlighting done purely by SyntaxAnnotations package.

Import["https://raw.githubusercontent.com/jkuczm/MathematicaSyntaxAnnotations/master/SyntaxAnnotations/SyntaxAnnotations.m"]

ClearAll[a, b, c, d, e, f, customFunction]
SyntaxInformation[customFunction] = {"LocalVariables" -> {"Integrate", {2, 3}}};

CreateDocument[Join @@ (
    {
        Cell[BoxData[#], "Input"],
        Cell[BoxData@AnnotateSyntax[#], "Output", FontWeight -> Bold, ShowStringCharacters -> True]
    } & /@
        Join[
            List @@ (MakeBoxes /@ HoldComplete[
                (* Function with user defined SyntaxInformation *)
                customFunction[a b c d e f, b, {c, d, e}, f],
                (* Built-in functions with SyntaxInformation *)
                Solve[a == b, a],
                Table[a b c d e f, {a, b, c}, {d, e, f}],
                (* Functions with special box forms *)
                Sum[a b c, {a, b, c}],
                Integrate[a b c d e f, {a, b, c}, {d, e, f}],
                #1 ##2 #name &,
                (* Nesting *)
                a b_ c__ d___ _e __f ___g h_i (j : k : l) 2 := a a_ a_b (a : b) b b_ b__ b___ _b __b ___b b_a c c_ d d_ e e_ f f_ g g_ h h_ i i_ j j_  k k_ 2,
                Module[{a, b, c_, C, 2}, {Block[{a, C, c, d, e_}, a a_ b b_ c c_ C C_ d d_ D D_ e e_ 2], a_ C c d e_ :> a a_ b b_ c c_ C C_ d d_ D D_ e e_ 2}],
                a_ b_ _b c C 2 := {Block[{a, C, c, d, e_, f = b}, a a_ b b_ c c_ C C_ d d_ D D_ e e_ 2], a_ C c d_ e :> a a_ b b_ c c_ C C_ d d_ D D_ e e_ 2}
            ]),
            {
                RowBox[{RowBox[{"f", "::", "usage"}], "=", "\"\<f[] does something\>\""}],
                RowBox[{"Module", "[",
                    RowBox[{RowBox[{"{", RowBox[{"a", ",", RowBox[{"b", "=", RowBox[{"a", " ", "c"}]}]}], "}"}], ",", "\[IndentingNewLine]", 
                    RowBox[{"(*", " ", RowBox[{"Manually", " ", "formatted", " ", "Module", " ", "with", " ", "a", " ", "comment"}], " ", "*)"}], "\[IndentingNewLine]",
                    RowBox[{"a", " ", "b", " ", "c"}]}], "\[IndentingNewLine]",
                "]"}]
            }
        ]
)]

Print screen of results (click on image to enlarge it): Print screen of results

Semantic annotations

Above examples show default behavior of AnnotateSyntax function, which is wrapping of boxes with appropriate StyleBox. This behavior is controlled by "Annotation" option.

By using custom "Annotation" we can see semantic information instead of style boxes.

Solve[a == b, a] // MakeBoxes
AnnotateSyntax[%, "Annotation" -> myAnnotation]
RowBox[{"Solve", "[", RowBox[{RowBox[{"a", "\[Equal]", "b"}], ",", "a"}], "]"}]

RowBox[{"Solve", "[", RowBox[{
    RowBox[{
        myAnnotation["a", {"FunctionLocalVariable", "UndefinedSymbol"}],
        "\[Equal]",
        myAnnotation["b", {"UndefinedSymbol"}]
    }], ",",
    myAnnotation["a", {"FunctionLocalVariable", "UndefinedSymbol"}]
}], "]"}]
(a_ := a a_ ) // MakeBoxes
AnnotateSyntax[%, "Annotation" -> myAnnotation]
RowBox[{"a_", ":=", RowBox[{"a", " ", "a_"}]}]

RowBox[{
    myAnnotation["a_", {"PatternVariable"}],
    ":=", 
    RowBox[{
        myAnnotation["a", {"PatternVariable", "UndefinedSymbol"}], 
        " ",
        myAnnotation["a_", {"LocalScopeConflict"}]
    }]
}]

Encoding syntax highlighting in a String

Mathematica comes with built in support for string representation of boxes. So if we have syntax highlighting on box level, there's nothing more we need to do to have it encoded in a String.

Convert annotated boxes to String:

ToString[Solve[x == y, x] // MakeBoxes // AnnotateSyntax, InputForm]
(* version >= 10.2:
"RowBox[{\"Solve\", \"[\", RowBox[{RowBox[{StyleBox[\"x\", \
{{FontColor -> RGBColor[0.235, 0.49, 0.568]}, {FontColor -> \
RGBColor[0., 0.173, 0.765]}}], \"\[Equal]\", StyleBox[\"y\", \
{{FontColor -> RGBColor[0., 0.173, 0.765]}}]}], \",\", \
StyleBox[\"x\", {{FontColor -> RGBColor[0.235, 0.49, 0.568]}, \
{FontColor -> RGBColor[0., 0.173, 0.765]}}]}], \"]\"}]"
*)
(* version < 10.2:
"\\(Solve[\\(\\(\\*StyleBox[\"x\", List[List[Rule[FontColor, \
RGBColor[0.235`, 0.49`, 0.568`]]], List[Rule[FontColor, RGBColor[0.`, \
0.173`, 0.765`]]]]] \[Equal] \\*StyleBox[\"y\", \
List[List[Rule[FontColor, RGBColor[0.`, 0.173`, 0.765`]]]]]\\), \
\\*StyleBox[\"x\", List[List[Rule[FontColor, RGBColor[0.235`, 0.49`, \
0.568`]]], List[Rule[FontColor, RGBColor[0.`, 0.173`, \
0.765`]]]]]\\)]\\)"
*)

Recover boxes from string:

% // ToExpression
(*
RowBox[{"Solve", "[", 
  RowBox[{RowBox[{StyleBox[
   "x", {{FontColor -> RGBColor[
      0.235, 0.49, 0.568]}, {FontColor -> RGBColor[
      0., 0.173, 0.765]}}], "\[Equal]", 
  StyleBox["y", {{FontColor -> RGBColor[0., 0.173, 0.765]}}]}], 
",", StyleBox[
 "x", {{FontColor -> RGBColor[0.235, 0.49, 0.568]}, {FontColor -> 
    RGBColor[0., 0.173, 0.765]}}]}], "]"}]
*)

Display recovered boxes:

% // DisplayForm
(* Solve[x == y, x] (colored) *)

Convert boxes to original expression:

ToExpression[%, StandardForm, HoldForm]
(* Solve[x == y, x] *)

We can also create a colored string by converting DisplayForm of boxes to StandardForm string.

ToString[
    Solve[x == y, x] // MakeBoxes // AnnotateSyntax // DisplayForm,
    StandardForm
]
(* Solve[x == y, x] (colored string) *)

If we prefer we can encode semantic annotations in a string:

ToString[
    AnnotateSyntax[Solve[a == b, a] // MakeBoxes, "Annotation" -> myAnnotation],
    InputForm
]
(* version >= 10.2:
"RowBox[{\"Solve\", \"[\", RowBox[{RowBox[{myAnnotation[\"a\", \
{\"FunctionLocalVariable\", \"UndefinedSymbol\"}], \"\[Equal]\", \
myAnnotation[\"b\", {\"UndefinedSymbol\"}]}], \",\", myAnnotation[\"a\
\", {\"FunctionLocalVariable\", \"UndefinedSymbol\"}]}], \"]\"}]"
*)
(* version < 10.2:
"\\(Solve[\\(\\(\\*myAnnotation[\"a\", \
List[\"FunctionLocalVariable\", \"UndefinedSymbol\"]] \[Equal] \
\\*myAnnotation[\"b\", List[\"UndefinedSymbol\"]]\\), \
\\*myAnnotation[\"a\", List[\"FunctionLocalVariable\", \
\"UndefinedSymbol\"]]\\)]\\)"1
*)

jkuczm

Posted 2013-06-18T02:27:03.100

Reputation: 14 388