While I was working on alternative TeX export, I had similar requirement.
I wanted to export annotated *Mathematica* code to TeX, with annotations reflecting FrontEnd's syntax highlighting.

Since I couldn't find a way to use front end itself to do it, I decided to write my own package. My SyntaxAnnotations package is now available on GitHub.

It works by analyzing boxes. When a special form e.g

```
RowBox[{func_, "[", args_, "]"}]
```

is encountered, it looks for `SyntaxInformation`

associated with `func`

and wraps symbols in `args`

with special boxes describing their syntactic role. It also supports wide range of elements not governed by `SyntaxInformation`

.

List of syntax elements, that are highlighted by front end, can be found in `Options Inspector`

in:
`Editing Options`

> `Private Editing Options`

> `Auto Style Options`

.

Currently my package implements highlighting of following elements:

- LocalVariables,
- FunctionLocalVariables,
- PatternVariables,
- LocalScopeConflicts,
- UndefinedSymbols,
- Strings,
- Comments

I don't have access to actual algorithms used by front end,
so package implements my observations of built-in highlighting.
It can happen that, in some rare edge cases, highlighting done by my package is slightly different than built-in highlighting.

# Examples

We'll create a notebook with pairs of cells. First cell in a pair is an `"Input"`

cell using built-in highlighting, second cell is an `"Output"`

cell, with highlighting done purely by `SyntaxAnnotations`

package.

```
Import["https://raw.githubusercontent.com/jkuczm/MathematicaSyntaxAnnotations/master/SyntaxAnnotations/SyntaxAnnotations.m"]
ClearAll[a, b, c, d, e, f, customFunction]
SyntaxInformation[customFunction] = {"LocalVariables" -> {"Integrate", {2, 3}}};
CreateDocument[Join @@ (
{
Cell[BoxData[#], "Input"],
Cell[BoxData@AnnotateSyntax[#], "Output", FontWeight -> Bold, ShowStringCharacters -> True]
} & /@
Join[
List @@ (MakeBoxes /@ HoldComplete[
(* Function with user defined SyntaxInformation *)
customFunction[a b c d e f, b, {c, d, e}, f],
(* Built-in functions with SyntaxInformation *)
Solve[a == b, a],
Table[a b c d e f, {a, b, c}, {d, e, f}],
(* Functions with special box forms *)
Sum[a b c, {a, b, c}],
Integrate[a b c d e f, {a, b, c}, {d, e, f}],
#1 ##2 #name &,
(* Nesting *)
a b_ c__ d___ _e __f ___g h_i (j : k : l) 2 := a a_ a_b (a : b) b b_ b__ b___ _b __b ___b b_a c c_ d d_ e e_ f f_ g g_ h h_ i i_ j j_ k k_ 2,
Module[{a, b, c_, C, 2}, {Block[{a, C, c, d, e_}, a a_ b b_ c c_ C C_ d d_ D D_ e e_ 2], a_ C c d e_ :> a a_ b b_ c c_ C C_ d d_ D D_ e e_ 2}],
a_ b_ _b c C 2 := {Block[{a, C, c, d, e_, f = b}, a a_ b b_ c c_ C C_ d d_ D D_ e e_ 2], a_ C c d_ e :> a a_ b b_ c c_ C C_ d d_ D D_ e e_ 2}
]),
{
RowBox[{RowBox[{"f", "::", "usage"}], "=", "\"\<f[] does something\>\""}],
RowBox[{"Module", "[",
RowBox[{RowBox[{"{", RowBox[{"a", ",", RowBox[{"b", "=", RowBox[{"a", " ", "c"}]}]}], "}"}], ",", "\[IndentingNewLine]",
RowBox[{"(*", " ", RowBox[{"Manually", " ", "formatted", " ", "Module", " ", "with", " ", "a", " ", "comment"}], " ", "*)"}], "\[IndentingNewLine]",
RowBox[{"a", " ", "b", " ", "c"}]}], "\[IndentingNewLine]",
"]"}]
}
]
)]
```

Print screen of results (click on image to enlarge it):

### Semantic annotations

Above examples show default behavior of `AnnotateSyntax`

function, which is wrapping of boxes with appropriate `StyleBox`

. This behavior is controlled by `"Annotation"`

option.

By using custom `"Annotation"`

we can see semantic information instead of style boxes.

```
Solve[a == b, a] // MakeBoxes
AnnotateSyntax[%, "Annotation" -> myAnnotation]
```

```
RowBox[{"Solve", "[", RowBox[{RowBox[{"a", "\[Equal]", "b"}], ",", "a"}], "]"}]
RowBox[{"Solve", "[", RowBox[{
RowBox[{
myAnnotation["a", {"FunctionLocalVariable", "UndefinedSymbol"}],
"\[Equal]",
myAnnotation["b", {"UndefinedSymbol"}]
}], ",",
myAnnotation["a", {"FunctionLocalVariable", "UndefinedSymbol"}]
}], "]"}]
```

```
(a_ := a a_ ) // MakeBoxes
AnnotateSyntax[%, "Annotation" -> myAnnotation]
```

```
RowBox[{"a_", ":=", RowBox[{"a", " ", "a_"}]}]
RowBox[{
myAnnotation["a_", {"PatternVariable"}],
":=",
RowBox[{
myAnnotation["a", {"PatternVariable", "UndefinedSymbol"}],
" ",
myAnnotation["a_", {"LocalScopeConflict"}]
}]
}]
```

# Encoding syntax highlighting in a String

*Mathematica* comes with built in support for
string representation of boxes.
So if we have syntax highlighting on box level, there's nothing more we need to do to have it encoded in a String.

Convert annotated boxes to String:

```
ToString[Solve[x == y, x] // MakeBoxes // AnnotateSyntax, InputForm]
(* version >= 10.2:
"RowBox[{\"Solve\", \"[\", RowBox[{RowBox[{StyleBox[\"x\", \
{{FontColor -> RGBColor[0.235, 0.49, 0.568]}, {FontColor -> \
RGBColor[0., 0.173, 0.765]}}], \"\[Equal]\", StyleBox[\"y\", \
{{FontColor -> RGBColor[0., 0.173, 0.765]}}]}], \",\", \
StyleBox[\"x\", {{FontColor -> RGBColor[0.235, 0.49, 0.568]}, \
{FontColor -> RGBColor[0., 0.173, 0.765]}}]}], \"]\"}]"
*)
(* version < 10.2:
"\\(Solve[\\(\\(\\*StyleBox[\"x\", List[List[Rule[FontColor, \
RGBColor[0.235`, 0.49`, 0.568`]]], List[Rule[FontColor, RGBColor[0.`, \
0.173`, 0.765`]]]]] \[Equal] \\*StyleBox[\"y\", \
List[List[Rule[FontColor, RGBColor[0.`, 0.173`, 0.765`]]]]]\\), \
\\*StyleBox[\"x\", List[List[Rule[FontColor, RGBColor[0.235`, 0.49`, \
0.568`]]], List[Rule[FontColor, RGBColor[0.`, 0.173`, \
0.765`]]]]]\\)]\\)"
*)
```

Recover boxes from string:

```
% // ToExpression
(*
RowBox[{"Solve", "[",
RowBox[{RowBox[{StyleBox[
"x", {{FontColor -> RGBColor[
0.235, 0.49, 0.568]}, {FontColor -> RGBColor[
0., 0.173, 0.765]}}], "\[Equal]",
StyleBox["y", {{FontColor -> RGBColor[0., 0.173, 0.765]}}]}],
",", StyleBox[
"x", {{FontColor -> RGBColor[0.235, 0.49, 0.568]}, {FontColor ->
RGBColor[0., 0.173, 0.765]}}]}], "]"}]
*)
```

Display recovered boxes:

```
% // DisplayForm
(* Solve[x == y, x] (colored) *)
```

Convert boxes to original expression:

```
ToExpression[%, StandardForm, HoldForm]
(* Solve[x == y, x] *)
```

We can also create a colored string by converting `DisplayForm`

of boxes to `StandardForm`

string.

```
ToString[
Solve[x == y, x] // MakeBoxes // AnnotateSyntax // DisplayForm,
StandardForm
]
(* Solve[x == y, x] (colored string) *)
```

If we prefer we can encode semantic annotations in a string:

```
ToString[
AnnotateSyntax[Solve[a == b, a] // MakeBoxes, "Annotation" -> myAnnotation],
InputForm
]
(* version >= 10.2:
"RowBox[{\"Solve\", \"[\", RowBox[{RowBox[{myAnnotation[\"a\", \
{\"FunctionLocalVariable\", \"UndefinedSymbol\"}], \"\[Equal]\", \
myAnnotation[\"b\", {\"UndefinedSymbol\"}]}], \",\", myAnnotation[\"a\
\", {\"FunctionLocalVariable\", \"UndefinedSymbol\"}]}], \"]\"}]"
*)
(* version < 10.2:
"\\(Solve[\\(\\(\\*myAnnotation[\"a\", \
List[\"FunctionLocalVariable\", \"UndefinedSymbol\"]] \[Equal] \
\\*myAnnotation[\"b\", List[\"UndefinedSymbol\"]]\\), \
\\*myAnnotation[\"a\", List[\"FunctionLocalVariable\", \
\"UndefinedSymbol\"]]\\)]\\)"1
*)
```

Perhaps rasterizing cell contents, do text-recognizing with

`TextRecognize`

or Tesseract, and a bit of image-processing to get the colors/style? – István Zachar – 2013-12-11T22:01:01.953What do you want to do with this string? Or more specific: With what program do you want to be able to edit this string? – Karsten 7. – 2014-07-23T23:02:08.087

@Karsten7. with

Mathematica– Vladimir Reshetnikov – 2014-07-23T23:22:08.377@VladimirReshetnikov Although you can include all kinds of formatting in a Mathematica string, the things you want to extract are done on-the-fly by the front-end for viewing purposes only. While it makes some sense to provide e.g. access to a hidden front end Mathematica lexer, I would really be surprised, if there is any way to use the front ends internal algorithms used for highlighting. Anyway, maybe

– halirutan – 2014-07-29T20:12:01.890theonly person thatcouldgive you an answer here mentions at several places...(here and here) a rather pessimistic view about the possibility to access front end used for highlighting.

– halirutan – 2014-07-29T20:14:38.680I hope you're not expecting the "string local variables" to change color when I remove it from the "Module" in the string :P – rm -rf – 2013-06-18T19:17:02.877

@rm-rf Of course, I'm not. I want to capture a snapshot of the cell, not its behavior. – Vladimir Reshetnikov – 2013-06-19T08:17:16.667

I know, I was only joking... :) This is a good question. I have no idea where to dig into this, but most probably, it will be some crazy function in

`FrontEnd``

. – rm -rf – 2013-06-19T08:48:51.750