Programmatic formatting for Mathematica code - possible?

114

88

It seems to be both an interesting programming challenge and a very useful practical application to have a Mathematica program which would allow one to pretty-print Mathematica code, so that it is easy to read and understand. Some of the desirable features of such a program:

  • Works both on string and box level
  • Covers many or most common cases for Mathematica code
  • Is robust
  • Is customizable
  • Is extensible

Is this possible?

Leonid Shifrin

Posted 2012-03-11T12:30:31.260

Reputation: 108 027

Hi,@Leonid Shifrin ,I've tried a little, does this has a new version/update? CodeFormatterMakeBoxes has the HoldAllCompleteAttribute, how can I use it for CellExpression, for example CodeFormatterMakeBoxes[cells[1]] works like CodeFormatterMakeBoxes[Plot[x,{x,0,10}]] – HyperGroups – 2015-04-12T10:14:00.133

@HyperGroups What is the exact expression you want to convert to boxes? What is in cells variable? You don't give enough information to answer your question. – Leonid Shifrin – 2015-04-12T15:02:46.110

@LeonidShifrin Hi, I asked a question here cells[[1]] may be Cell[BoxData[ RowBox[{"Plot", "[", RowBox[{"x", ",", RowBox[{"{", RowBox[{"x", ",", "1", ",", "5"}], "}"}]}], "]"}]], "Input"]

– HyperGroups – 2015-04-12T15:32:34.250

@HyperGroups Well, first of all, CodeFormatterMakeBoxes takes an expression (just like MakeBoxes), not a boxed form, while your cells contains already boxed form of an expression. So, you still need to provide a wider context of what you need, preferably in that question (not here). As to overriiding HoldAllComplete, one possibility is to do what Kuba suggested: CodeFormatterMakeBoxes[#]&[expr]. – Leonid Shifrin – 2015-04-12T16:03:27.777

@LeonidShifrin hi, see my update at that post – HyperGroups – 2015-04-16T12:25:11.960

@Mr.Wizard Because the formatter is not ready. It still contains bugs, and needs a serious rewrite too. So, seeing this answer not being accepted is just another reminder for me to do this. – Leonid Shifrin – 2013-03-16T13:23:01.030

18Today's forecast calls for light rain in the morning, followed by patchy fog and *flying pigs*. – Mr.Wizard – 2012-03-11T12:50:43.043

I smell an answer coming soon ;-) I've been looking forward to this! (You could give it some time though, maybe there'll be answers that give good ideas!) – Szabolcs – 2012-03-11T12:54:58.287

@Szabolcs I only had time to do this now, since that's a lot of stuff - so I posted the answer now. But this does not mean I am not waiting for more ideas! I am actually very much interested in the contributions of others, this was one of the main ideas of this project. – Leonid Shifrin – 2012-03-11T14:07:49.367

Welcome... Now you're one of us, one of the people :) – Rojo – 2012-03-11T15:56:44.347

You mean, something that takes some string or box structure with MMA code and retuns a string or box structure with the code nicely formatted, customizable, etc? So that your package generating package generating packages create nice-to-read things? – Rojo – 2012-03-11T15:59:35.890

You had already self-answered... Too late – Rojo – 2012-03-11T16:05:45.100

@Rojo No it's not too late. I'd be very interested to see alternatives, other ideas, etc. While I spent a lot of time thinking about the problem, had a few attempts prior to this which turned out to be blind alleys, and feel that the way I've chosen at the end has certain advantages from different points of view, there may be other ways or ideas which I have missed completely. So, anything goes, and please do post if you feel like sharing. It does not have to be complete. – Leonid Shifrin – 2012-03-11T16:27:34.260

I like the design. To me at least it would be way better to shape it into a new form (and maybe even add it to context menus or give it a shortcut), and delegate a little bit to the stylesheets... – Rojo – 2012-03-11T16:43:32.540

@Rojo Thanks. Thanks also for the suggestions, I intend to do that soon. What new form do you have in mind? – Leonid Shifrin – 2012-03-11T16:48:02.610

let us continue this discussion in chat

– Leonid Shifrin – 2012-03-11T17:48:55.353

@Szabolcs that's just the bacon from the flying pigs you smell. – rcollyer – 2012-03-12T02:19:14.340

So, on track for two silvers: both Good Answer and Question from a single post. I'm a little jealous. I wish I had more time to spend here. – rcollyer – 2012-03-12T02:20:27.677

@rcollyer For this particular question, I spent most of the time off-line :-). This problem was bothering me or quite a while, and now that I have a rather satisfactory solution, I thought it may benefit others as well, while the feedback will help improve it. And, or course, I like silver :) – Leonid Shifrin – 2012-03-12T08:05:19.650

Answers

94

Update November 3, 2013

Finally, the formatter has been made much more robust by adding a custom function like MakeBoxes, names CodeFormatterMakeBoxes, to construct simplified box representation. This solves the main problem that the formatter does not currently support many boxes, since CodeFormatterMakeBoxes constructs the pure RowBox-based representation.

Also, added functions CodeFormatterPrint which prints definitions for a given function / symbol, and CodeFormatterSpelunk to make system spelunking easier.

Some things to try:

Import["https://raw.github.com/lshifr/CodeFormatter/master/CodeFormatter.m"]

and then

CodeFormatterPrint[RunThrough]

CodeFormatterSpelunk[RunThrough]

Basically, the difference is only that CodeFormatterPrint does retain long (fully-qualified) symbol names, while CodeFormatterSpelunk does not. Some more details on spelunking in my answer here.

Update 19th of June, 2013

Code formatter has been extended to support spaces in place of tabs, variable - width tabs and an overall offset.

This made it possible finally to start using it for code formatting here on SE - check this out!

Also, several other improvements and several bugs fixed.


Short answer

Yes.

Code formatter

While I will be most happy to see other answers to this question, and will hope that there will be better answers than this one, I am also glad to announce the alpha version of the Mathematica code formatter, which I was working on for some time.

The project

The code formatter lives here, and the specific file (package) can be downloaded using this link. The code fomatter resides in a package CodeFormatter.m, and has currently two public functions: FullCodeFormat and FullCodeFormatCompact. Both take a piece of code converted to boxes, and return the box form of formatted code. The README file in the project contains the brief description of how to use them, and the notebook which is included in the project contains many more examples.

Stealing from README, the typical way to use this is to define a helper function like this one:

prn = CellPrint[Cell[BoxData[#], "Input"]] &

and then, use it like:

prn@FullCodeFormat@MakeBoxes@
   Module[{a, b}, a = 1; Block[{c, d}, c = a + 1; d = b + 2]; b]

Screenshots

These are some screenshots of code pieces processed by FullCodeFormat.

enter image description here enter image description here enter image description here enter image description here

Further plans/development

There are a number of things I plan to add to the formatter and /or develop based on it, such as

  • Develop the palette to paste code to SE, based on the formatter (this will come out very soon)
  • Support more boxes
  • Refactor the code to eliminate the code duplication and better separate the DSL layer
  • Add more ways to format code

I will start accepting pull requests soon, perhaps after I do the main code refactoring I currently plan. Meanwhile, please do fork me on GitHub if you are interested in playing with this.

Comments, suggestions, bug reports

All welcome. For bug reports, I have not decided yet what would be the best place to put them, but the Issues section on GitHub project repository seems appropriate.

A simplified bare-bones code formatting engine

Here, I will try to explain my approach, and provide a minimal functioning code-formatting "engine", which is a simplified version of the one I referred to above. The motivation for this section (and in fact, for placing the entire post here rather than on meta for example), is to make the code of the formatter accessible, and explain it in simple terms. This would allow you to fork the project and modify the formatter easier to suit your needs, should you wish to do so.

Design problems and choices

The main idea here is that while parsed expression is a bit too high-level for the formatting purposes, plus tries to evaluate all the time, the box-level is a bit too low-level, and the formatter based on that has a danger of not being robust.

Therefore, I take the box input, and create an intermediate inert Mathematica expression representation with preprocess and preformat (see below). The formatting procedure itself has two stages: the format proper - which only decides where to put new lines and tabs, and the tabification stage (tabify), which "executes" the tabification instructions given by format at the previous stage.

This architecture is because there is a certain impedance mismatch between the statement "I want to move this block of several lines of code one or more tabs to the right", and the actual way to achieve that with boxes (I suspect, this was one of the major obstacles for implementing code formatter - since I am sure, many people tried that). By separation of these two stages, it was possible to make this tabification abstraction reasonably robust. The final stage is post-formatting (postformat). It takes the result produced by format and tabify, and converts that back to boxes.

By using this 3-layer architecture, I am able to make high-level description of the actual formatting rules, in the definitions of format only, and the rest is taken care of by other layers. This makes the formatter both more robust (because, for example, the tabification engine is rather general and does not depend on specific formatting rules, including those I may wish to add later), and more easily extensible. In a sense, I implemented a very small DSL for code formatting.

Settings and preprocessing

Now, the code. First comes the only setting we will have here:

$maxLineLength = 70;

this defines the maximal length of line of code, and is used by the formatter to decide which long lines need dissection.

Next comes the pre-processing function, which serves to remove spaces and tabs possibly existing in the box expression:

ClearAll[preprocess];
preprocess[boxes_] :=
  boxes //.
      {RowBox[{("\t" | "\n") .., expr___}] :> expr} //.
      {
        s_String /; StringMatchQ[s, Whitespace] :> Sequence[],
            RowBox[{r_RowBox}] :> r
      };

Converting boxes to intermediate inert representation

Now, we will define the heads of our inert intermediate representation, to which we want to transform the original box expression:

ClearAll[$blocks, blockQ];
$blocks = {
   CompoundExpressionBlock, GeneralHeadBlock, GeneralBlock, 
   StatementBlock, NewlineBlock, FinalTabBlock, GeneralSplitHeadBlock,   
   SuppressedCompoundExpressionBlock, CommaSeparatedGeneralBlock     
};

blockQ[block_Symbol] :=    MemberQ[$blocks, block];

The following function will translate the box expression into this intermediate language. It is very simplistic and misses many important cases - a more comprehensive one is in the code of the CodeFormatter` - but it illustrates the general structure. Note that is is recursive, moving from outside to inside.

ClearAll[preformat];

preformat[RowBox[elems : {PatternSequence[_, ";"] ..}]] :=
  SuppressedCompoundExpressionBlock @@ Map[
      Map[preformat, StatementBlock @@ DeleteCases[#, ";"]] &,
      Split[elems, # =!= ";" &]];

preformat[RowBox[elems : {PatternSequence[_, ";"] .., _}]] :=
  CompoundExpressionBlock @@ Map[
      Map[preformat, StatementBlock @@ DeleteCases[#, ";"]] &,
      Split[elems, # =!= ";" &]];

preformat[RowBox[elems : {PatternSequence[_, ","] .., _}]] :=
  CommaSeparatedGeneralBlock @@ 
    Map[preformat, DeleteCases[elems, ","]];

preformat[RowBox[elems_List]] /; ! FreeQ[elems, "\n" | "\t", 1] :=
  preformat[RowBox[DeleteCases[elems, "\n" | "\t"]]];

preformat[RowBox[{head_, "[", elems___, "]"}]] :=
  GeneralHeadBlock[preformat@head, 
    Sequence @@ Map[preformat, {elems}]];

preformat[RowBox[elems_List]] :=
  GeneralBlock @@ Map[preformat, elems];

preformat[block_?blockQ[args_]] :=
  block @@ Map[preformat, {args}];

preformat[a_?AtomQ] := a;

preformat[expr_] :=
    Throw[{$Failed, expr}, preformat];

You can see that it treats only few selected heads like CompoundExpression separately, plus has rules for general heads.

Formatting

Next will come two helper functions, used in formatting to determine whether or not a given line of code is too long and needs to be split. The first one, maxLen, determines the maximal length of the code line in an expression, accounting for the fact that it may already have been split into several lines.

Clear[maxLen];
maxLen[boxes : _RowBox ] :=
  Max@Replace[
      Split[Append[Cases[boxes, s_String, Infinity], "\n"], # =!= "\n" &],
      {s___, ("\t" | " ") ..., "\n"} :> 
        Total[{s} /. {"\t" -> 4, ss_ :> StringLength[ss]}],
      {1}];

maxLen[expr_] :=
  With[ {boxes = postformat@expr},
       maxLen[boxes] /; MatchQ[boxes, _RowBox ]
   ];

maxLen[expr_] :=
  Throw[{$Failed, expr}, maxLen];

Note that maxLen uses not yet defined postformat, which is perhaps a bit stronger coupling between components than desirable, and is a design short-cut to be removed. The next one is a simple convenience function:

ClearAll[needSplitQ];
needSplitQ[expr_, currentTab_] :=
  maxLen[expr] > $maxLineLength - currentTab;

Now comes the main formatting function, format. All specific formatting rules are included here. It takes an intermediate inert expression as a first argument, and the current number of tabs inserted, as a second one. It is also essentially recursive, and processing an expression from outside to inside.

ClearAll[format];
format[expr_] :=  format[expr, 0];    

format[TabBlock[expr_], currentTab_] :=
  TabBlock[format[expr, currentTab + 4]];

format[NewlineBlock[expr_, flag_], currentTab_] :=
  NewlineBlock[format[expr, currentTab], flag];    

format[(ce : (CompoundExpressionBlock | 
    SuppressedCompoundExpressionBlock))[elems__], 
    currentTab_] :=
  With[ {formatted = Map[format[#, currentTab] &, {elems}]},
       (ce @@ Map[NewlineBlock[#, False] &, formatted]) /; 
         !FreeQ[formatted, NewlineBlock]
  ];

format[StatementBlock[el_], currentTab_] :=
    StatementBlock[format[el, currentTab]];

format[expr : GeneralHeadBlock[head_, elems___], currentTab_] :=
  With[ {splitQ = needSplitQ[expr, currentTab]},
       GeneralSplitHeadBlock[
           format[head, currentTab],
           Sequence @@ Map[
               format[If[ splitQ,
                             TabBlock@NewlineBlock[#, False],
                             #
                         ], 
                   currentTab] &,
               {elems}]] /; splitQ
   ];

(* For a generic block, it is not obvious that we have to tab, so we don't*)
format[expr : (block_?blockQ[elems___]), currentTab_] :=
  With[ {splitQ = needSplitQ[expr, currentTab]},
       block @@ Map[
           format[If[ splitQ,
                         NewlineBlock[#, False],
                         #
                     ], currentTab] &,
           {elems}]
   ];

format[a_?AtomQ, _] := a;

You can see that format uses two new block types: NewlineBlock and TabBlock, and the former also accepts a flag which can be True or False. This flag, when being set to True, forces the formatter to create a new line, while when being set to False, tells the formatter to propagate the new line request deeper into the expression. The TabBlock directive also accepts a similar flag. The reason that the flags are needed in this approach is that it is not straightforward to implement the abstraction such as "move this piece of code one tab to the right" on the box level, for example because each new line in the boxes must be tabbed separately.

In any case, format does only part of the job, because it only instructs what must be done. It has a companion, tabify, which actually executes the instructions of format:

ClearAll[tabify];
tabify[expr_] /; ! FreeQ[expr, TabBlock[_]] :=
    tabify[expr //. TabBlock[sub_] :> TabBlock[sub, True]];

tabify[(block_?blockQ /; ! MemberQ[{TabBlock, FinalTabBlock}, block])[
elems___]] :=
    block @@ Map[tabify, {elems}];

tabify[TabBlock[FinalTabBlock[el_, flag_], tflag_]] :=
  FinalTabBlock[tabify[TabBlock[el, tflag]], flag];

tabify[TabBlock[NewlineBlock[el_, flag_], _]] :=
  tabify[NewlineBlock[TabBlock[el, True], flag]];

tabify[TabBlock[t_TabBlock, flag_]] :=
  tabify[TabBlock[tabify[t], flag]];

tabify[TabBlock[(block_?blockQ /; ! MemberQ[{TabBlock}, block])[ 
     elems___], flag_]] :=
  FinalTabBlock[
    block @@ Map[tabify@TabBlock[#, False] &, {elems}],
    flag];

tabify[TabBlock[a_?AtomQ, flag_]] :=
  FinalTabBlock[a, flag];

tabify[expr_] :=  expr;

You can see that it introduces another tab-related block, FinalTabBlock - which is a block that signifies the need to tab a particular line by one tab, and is inert in the sense that once TabBlock is converted to FinalTabBlock, it does not any more actively influence the work of tabify.

Post-formatting

The final stage of the formatting procedure is to take the expression processed with format and tabify. We need one helper function which serves to prevent the addition of several new lines ("gaps" in the formatted code) , by determining whether or not the next line of code starts with a new line (if so, the NewlineBlock directive around it is ignored):

ClearAll[isNextNewline];
isNextNewline[_NewlineBlock] := True;

isNextNewline[block : (_?blockQ | TabBlock)[fst_, ___]] :=
  isNextNewline[fst];

isNextNewline[_] := False;

Here is finally the code for the inverse converter from the inert intermediate representation to boxes, postformat:

ClearAll[postformat];
postformat[GeneralBlock[elems__]] :=
  RowBox[postformat /@ {elems}];

postformat[CompoundExpressionBlock[elems__]] :=
  RowBox[Riffle[postformat /@ {elems}, ";"]];

postformat[SuppressedCompoundExpressionBlock[elems__]] :=
  RowBox[Append[Riffle[postformat /@ {elems}, ";"], ";"]];

postformat[GeneralHeadBlock[head_, elems___]] :=
  RowBox[{postformat@head, "[", 
      Sequence @@ Riffle[postformat /@ {elems}, ","], "]"}];

postformat[GeneralSplitHeadBlock[head_, elems___]] :=
  With[ {formattedElems = postformat /@ {elems}},
       RowBox[{postformat@head, "[",
           Sequence @@ Riffle[Most[formattedElems], ","],
           Last[formattedElems], "]"}]
   ];

postformat[GeneralBlock[elems___]] :=
  RowBox[Riffle[postformat /@ {elems}, ","]];

postformat[StatementBlock[elem_]] :=
  postformat[elem];

postformat[NewlineBlock[elem_?isNextNewline, False]] :=
  postformat@elem;

postformat[CommaSeparatedGeneralBlock[elems__]] :=
  RowBox[Riffle[postformat /@ {elems}, ","]];

postformat[NewlineBlock[elem_, _]] :=
  RowBox[{"\n", postformat@elem}];

postformat[FinalTabBlock[expr_, True]] :=
  RowBox[{"\t", postformat@expr}];

postformat[FinalTabBlock[expr_, False]] :=
  postformat@expr;

postformat[a_?AtomQ] :=  a;

postformat[arg_] :=
  Throw[{$Failed, arg}, postformat];

It is also necessarily recursive, and the code should be pretty much self-documenting.

The final function

The function which brings it all together is very simple:

ClearAll[fullCodeFormat];
fullCodeFormat[boxes_] :=
  postformat@tabify@format@preformat@preprocess@boxes;

Examples and limitations

This very simplified version of the formatter is quite limited. However, it can handle a few not so trivial examples. Here is a rather non-trivial one to try:

prn@fullCodeFormat@MakeBoxes[
   Compile[{{data, _Real, 2}}, 
     Module[{means = Table[0., {maxIndex}], num = Table[0, {maxIndex}], 
        ctr = 0, i = 0, index = 0, resultIndices = Table[0, {maxIndex}], 
          indexHash = Table[0, {maxIndex}]}, 
      Do[index = IntegerPart[data[[i, 2]]];
       means[[index]] += data[[i, 1]];
       num[[index]]++;
       If[indexHash[[index]] == 0, indexHash[[index]] = 1;
       resultIndices[[++ctr]] = index];, {i, Length[data]}];
       resultIndices = Take[resultIndices, ctr];
       Transpose[{resultIndices, 
       means[[resultIndices]] + num[[resultIndices]]}]],(*Module*)
       CompilationTarget -> "C", RuntimeOptions -> "Speed"]]

here is a screenshot of what you should see as a result (prn is defined as prn = CellPrint[Cell[BoxData[#], "Input"]] &):

enter image description here

Because of the way the final function is written, it is very easy to inspect what is happening in the intermediate stages. For example, you can only apply

format@preformat@preprocess@boxes

to see what format is doing.

This simplified formatter has a number of limitations, so don't expect it to work nicely on code involving e.g. function definitions through SetDelayed, some complex patterns, etc. The real code formatter in the CodeFormatter` package, while having the same core, has a number of additional rules to handle more cases.

Leonid Shifrin

Posted 2012-03-11T12:30:31.260

Reputation: 108 027

A really nice package. However it seems that you use a fixed indentation. You really should make that customizable (it should be trivial to do, and people violently disagree on what indentation level is readable). – celtschk – 2012-04-05T09:17:58.710

@celtschk Thanks, this is one of the first things I plan to add. I've been pretty tied up recently, so no time for any serious work (adding this by itself is not difficult, but I want to rework the structure, add some hooks, and generally integrate variable-length indentation as a part of my formatting mini-language, and this would take a bit of time to add and test properly). But rest assured that this is on my to-do list, and I hope to add this very soon. – Leonid Shifrin – 2012-04-05T11:34:12.487

Excellent. The November 3, 2013 update handles all of the contributed TraceView implementations, which I've been trying to understand. BTW, fellow noobs, RunThrough is a built-in symbol.

– duozmo – 2013-11-03T20:52:41.780

@duozmo Thanks. Actually, for some of those functions, you can get better results via varying the $maxLineLength parameter, such as Block[{CodeFormatter`Private`$maxLineLength = 120}, CodeFormatterSpelunk[traceView2]]. – Leonid Shifrin – 2013-11-03T21:44:30.627

I posted some, ahem, acrobatic code which draws a tall vertical line under the mouse when viewing formatter output, to help the reader keep track of nested blocks. https://gist.github.com/duozmo/7313383. Sample result, with traceView2: http://i.stack.imgur.com/i3y3c.png.

– duozmo – 2013-11-05T03:57:18.527

@duozmo Thanks, this is nice. But this shows clearly that the formatter has still a long way to go, because it's results on code like that are clearly sub-optimal. It splits lines too much there.The actual reason for that is that it decides about the split without knowing how the subsequent elements will get split, I think. Anyway, something to think about. – Leonid Shifrin – 2013-11-05T04:43:11.463

I was doing a similar formatter thing recently when I remembered your Q/A so I might be re-inventing the wheel (ohh especially the intermediate representation part!). But there do be one thing I would like to say, that you're applying your preformat recursively, which I happened to have tried yesterday too and thought it will make the whole parser too slow. Then I thought Position all RowBox then use ReplacePart on them will do all the job for one time through. I'm not sure I understand your code correctly, but that is my little thought currently. Great job! (of course +1 already :) – Silvia – 2014-08-12T20:40:48.687

@Silvia Thanks a lot, I do appreciate! Re: recursive - in my experience, recursion in such problems is the most clean, flexible and powerful method. The speed has not been too critical for me so far, and even when it is, I am not at all sure that the bottleneck is due to the use of recursion. You might succeed in using rule replacements (ReplaceAll and ReplaceRepeated) instead, but in my experience they are inferior for harder problems, particularly those which are naturally recursive (like the case at hand). – Leonid Shifrin – 2014-08-15T07:48:04.877

Sorry for replying late. Was trying to make a minimal example for my question (please see "Another question" behind). About the timing, you're absolutely right, the time is mainly consumed by format and tabify on my machine. Is it because of complex pattern matchings at the LHS? I wish the formatter be as fast as possible, so we might be able to use it in real-time during typing (like the auto-formatter in Visual Studio 2013's IDE). – Silvia – 2014-08-17T07:49:15.930

Another question: what do you think about in-code-folding? I'm doing something like this, but it looks troublesome to make the nice structure editable and I'm not so sure whether it worth the effort..

– Silvia – 2014-08-17T07:50:01.417

@Silvia Re: speed of formatter - the formatter is deeply recursive, and that recursion is rather non-trivial. It basically makes several passes through the code. I think, the main slowness is due to the fact that the transformations related to tabification are non-trivial. We should think of them more like compilation. You can look at the output after the tabify / format, but before preformat, for some longer pieces of code, and chances are that you'll see rather large expressions. That said, it may well be that the formatter can be seriously optimized - I just didn't look into it. – Leonid Shifrin – 2014-08-17T09:16:07.300

@Silvia Having on-the fly formatter is a different story though, it may have its own issues to solve. I didn't think about it yer, although I agree that it would be nice. Re: code folding - I also did experiments similar to things I see on the screen shot. I did rather many of such. The final conclusion I came to is that dressing editable code with such constructs will generally make it too fragile to be always sure that what you execute is what you meant. In other words, the robust way to do this would then be to distinguish "read-only" code (which can be foldable), from editable code. But .. – Leonid Shifrin – 2014-08-17T09:20:11.360

1@Silvia .. then it will make it less convenient. Generally, I am working on a full-fledged FE-based IDE, and I have made some good advances recently. The main idea is to make notebook-based approach scalable for large code bases, by providing convenient navigation etc in the notebook environment. Very soon I will have a version for "alpha-testing". If you'd like to test it, I will send you a copy (I intend this to be MIT-licensed). We could perhaps join forces at some point, as you seem to be interested in the same sort of things. – Leonid Shifrin – 2014-08-17T09:24:41.917

@LeonidShifrin Sorry somehow I didn't get the message pin.. I'm really excited for even just thinking about an FE-based IDE! And I'm very much interested in testing it! :D If there is anything I can contribute it will be my pleasure. Regarding the formatter, I will closely read your code on github. Thanks for the detailed response :) – Silvia – 2014-08-17T22:02:11.167

@Silvia Sounds good! By now I am fully convinced that such an IDE is possible, realistic, and is a good idea. I have been using some core parts of it to structure my work for almost a year now. The hardest problems are to get the design right on all levels, and to accumulate some critical mass of the functionality that, working together, can support really simple but powerful workflows. Doing this is a lot of work, and of a kind that needs continuous chunks of free time. Right now I feel I am very close to that critical mass. – Leonid Shifrin – 2014-08-18T15:05:40.407

+1 Probably one of most used posts. Now this isn't an entirely portable solution/answer(especially because I haven't truly thought it all through yet and still haven't read all through your chat history), but if you simple worry about adding \[IndentingNewLines] to a BoxData form of the expression you can then pass it it into a function like the following http://mathematica.stackexchange.com/a/63515/5615 to get a form you can copy and paste to stackexchange or insert into a code block. Again I can't tell if you thought of doing such by just glancing through your chats.

– William – 2014-10-22T00:32:37.653

@LiamWilliam Thanks. I wanted to construct the formatter on a box level, and I wanted to avoid \[IndentingNewLines], which is one reason why I went with the design described here. I also wanted maximal flexibility to be able to override rules and add more specific formatting rules - for which my approach also seems quite fit. I'll have a closer look at your answer at some later point, thanks for the ref. – Leonid Shifrin – 2014-10-22T00:42:20.937

I just noticed in 10.0's GetFEKernelInit.tr file, there is a "Code Folding Routines" section. May I guess it's your work? :) Would like to see it coming out officially! – Silvia – 2014-11-25T12:08:00.400

@Silvia Actually, I have no relation to that work. I think this is probably John Fultz's, given that he was mentioning his desire to have this a few times here on SE. That's an interesting bit of information, though, I wasn't aware of that. Thanks for letting me know! – Leonid Shifrin – 2014-11-25T13:15:10.883

5Amazing--this is a welcome capability! Thank you! – Cassini – 2012-03-11T14:14:29.773

@David Thanks, I am happy to share this. I hope it won't turn out to be unusable and full of bugs. – Leonid Shifrin – 2012-03-11T16:30:15.643

What if you had put [IndentingNewLine]s instead of tabs and new lines? It would have avoided the need for the tabbing propagation maybe? – Rojo – 2012-03-11T16:35:21.660

@Rojo Perhaps, generally not, because for inner blocks I have to apply tabs several times. This is exactly the problem with boxes: there is no simple way to translate a simple-sounding request to move a block of code one or several tabs to the right, into boxes, because each new line has to be tabbed separately in general. So we need to find the box right next to the new line, and apply tabs to that box. You can try typing say a and then on the new line tab twice and type say b, and then look at the cell's content: you'll see that tabs are still there for b. – Leonid Shifrin – 2012-03-11T16:44:54.423

This is where people that understand SE say "let's continue this in chat" and post a link – Rojo – 2012-03-11T17:42:29.913

@Rojo I wanted to do that but got side-tracked. Here goes: http://chat.stackexchange.com/rooms/2745/discussion-between-leonid-shifrin-and-rojo

– Leonid Shifrin – 2012-03-11T17:49:55.553

1@Rojo One of the points of having this is to avoid \[IndentingNewLines]s so the formatted code is copyable (copyable here, for example) and can exist in a code-style cell as well. – Szabolcs – 2012-03-11T18:04:21.247

1Very nice! And congrats with your first ever question posted here. It looks like Github is indeed a better repository for code of this size. As per your request I posted a couple of smallish issues there. I wish we'd have an amalgam of SO and Github... – Sjoerd C. de Vries – 2012-03-11T23:29:37.323

@Sjoerd Thanks! As to the questions - I had to break my rules :). But in the SE format, there seems to be no other way to do this. And, thinking of it, it is not a bad option - I may get some other answers with good ideas. Thanks for testing also - I will look at those issues tomorrow (am off for today). The one with disappearance of // and @ is unfortunate, but this is done on the level of MakeBoxes already - so not related to the formatter proper (I noticed this effect before but forgot to mention here). It seems that the only path to keep those is to somehow feed the direct box... – Leonid Shifrin – 2012-03-11T23:37:40.583

@Sjoerd ...representation to the formatter - which is quite possible if it is grabbed from the input cell directly. So, I expect that in a practically important case of formatting existing code cell, this issue will not manifest itself. The second issue I will look at tomorrow. Thanks again for finding them. I suspect there will be a lot more coming - I released it pretty early, so the formatter is quite raw. As for GitHub, I see the main advantage not even in the ability to store large files, but in versioning and collaboration model it offers to collaborate on projects. – Leonid Shifrin – 2012-03-11T23:41:10.813

7

It is worth mentioning GeneralUtilities`PrintDefinitions.

Use it like <<GeneralUtilities` then GeneralUtilities`PrintDefinitions@f to get a pretty-printed, formatted version of ??f in a new notebook (with pink background).

The function returns a handle to the created notebook. One should be able to programmatically extract the plain formatted code from there.

masterxilo

Posted 2012-03-11T12:30:31.260

Reputation: 5 447

2I have to admit that the formatter implemented in GeneralUtilities impressed me a lot, and currently is in many respects more complete and better doing its job than my version. So, +1. At some point, I will try to find time to improve my version, and in particular address the recent github issues that you opened. Alas, this won't happen in the next couple of months, though – Leonid Shifrin – 2016-08-18T19:37:50.773

When I try: <<GeneralUtilities` ;f = Do[x = i; Do[y = j; Do[z = k;, {k, 1, 3}];, {j, 1, 3}];, {i, 1, 3}]; GeneralUtilities`PrintDefinitions@f then I get in the new notebook the message Attributes[Null] := {Protected};. What am I doing wrong? – mrz – 2016-09-26T11:43:52.843

@mrz Do returns Null, so your f is Null. Furthermore, you need Unevaluated to print definitions for symbols with own-values because PrintDefinitions doesn't have HoldAll. $f = 2; GeneralUtilities`PrintDefinitions@Unevaluated@$f works while $f = 2; GeneralUtilities`PrintDefinitions@$f doesn't. – masterxilo – 2016-09-27T14:58:19.970

The function you actually want is not GeneralUtilties`PrintDefinitions but rather GeneralUtilities`MakeFormattedBoxes which does the actual formatting work. See for example GeneralUtilities`MakeFormattedBoxes@HoldForm[(expression1; expression2;)] // RawBoxes. If you drill drown into the PrintDefinitions machinery far enough it does this sort of thing. – b3m2a1 – 2016-12-08T18:09:47.747

0

t = CreateTemporary[];
Save[t, mySymbol];
Import[t, "Text"]

applies some basic formatting to the definitions immediately and indirectly associated with mySymbol.

masterxilo

Posted 2012-03-11T12:30:31.260

Reputation: 5 447

I'm sorry but the question is indeed not about the default auto-wrapping in the "Package" files (which is ugly in many respects). It is about creating highly-readable-formatted code as opposed to the default formatting which is a mess. – Alexey Popkov – 2016-08-18T16:29:11.330

Alright. But maybe these built-in solutions can serve as a robust starting point. – masterxilo – 2016-08-18T17:21:21.597