## Performance tuning in Mathematica?

173

144

What performance tuning tricks do you use to make a Mathematica application faster? MATLAB has an amazing profiler, but from what I can tell, Mathematica has no similar functionality.

3What do I do if I want to accept two answers??? Never had that happen before. Thanks for the awesome responses! – John – 2011-01-20T04:56:52.840

2

Jon McLoone's article entitled 10 Tips for writing fast Mathematica code

– tlehman – 2012-01-05T03:54:44.687

3@Tobi While most of Jon's suggestions are very valid and many overlap with the suggestions in the answers below, his suggestion to use Block in place of Module can be a dangerous practice that I'd rather avoid (except possibly inside packages, but may be even there). For a minimal efficieny boost, you get a danger of variable conflicts which can be a debugging nightmare in large projects. The situation is somewhat better when you use Block inside a package, since then the name conflicts can only happen with symbols from the same package (context). – Leonid Shifrin – 2012-01-05T15:37:04.707

@LeonidShifrin I didn't know that, I'll definitely keep that in mind. – tlehman – 2012-01-05T16:41:23.757

218

Since Mathematica is a symbolic system, with symbolic evaluator much more general than in Matlab, it is not surprising that performance-tuning can be more tricky here. There are many techniques, but they can all be understood from a single main principle. It is:

Avoid full Mathematica symbolic evaluation process as much as possible.

All techniques seem to reflect some facet of it. The main idea here is that most of the time, a slow Mathematica program is such because many Mathematica functions are very general. This generality is a great strength, since it enables the language to support better and more powerful abstractions, but in many places in the program such generality, used without care, can be a (huge) overkill.

I won't be able to give many illustrative examples in the limited space, but they can be found in several places, including some WRI technical reports (Daniel Lichtblau's one on efficient data structures in Mathematica comes to mind), a very good book of David Wagner on Mathematica programming, and most notably, many Mathgroup posts. I also discuss a limited subset of them in my book. I will supply more references soon.

Here are a few most common ones (I only list those available within Mathematica language itself, not mentioning CUDA \ OpenCL, or links to other languages, which are of course also the possibilities):

1. Push as much work into the kernel at once as possible, work with as large chunks of data at a time as possible, without breaking them into pieces

1.1. Use built-in functions whenever possible. Since they are implemented in the kernel, in a lower-level language (C), they are typically (but not always!) much faster than user-defined ones solving the same problem. The more specialized version of a built-in function you are able to use, the more chances you have for a speed-up.

1.2. Use functional programming (Map, Apply, and friends). Also, use pure functions in #-& notation when you can, they tend to be faster than Function-s with named arguments or those based on patterns (especially for not computationally-intensive functions mapped on large lists).

1.3. Use structural and vectorized operations (Transpose, Flatten, Partition, Part and friends), they are even faster than functional.

1.4. Avoid using procedural programming (loops etc), because this programming style tends to break large structures into pieces (array indexing etc). This pushes larger part of the computation outside of the kernel and makes it slower.

2. Use machine-precision whenever possible

2.1. Be aware and use Listability of built-in numerical functions, applying them to large lists of data rather than using Map or loops.

2.2. Use Compile, when you can. Use the new capabilities of Compile, such as CompilationTarget->"C", and making our compile functions parallel and Listable.

2.3. Whenever possible, use vectorized operations (UnitStep, Clip, Sign, Abs, etc) inside Compile, to realize "vectorized control flow" constructs such as If, so that you can avoid explicit loops (at least as innermost loops) also inside Compile. This can move you in speed from Mathematica byte-code to almost native C speed, in some cases.

2.4. When using Compile, make sure that the compiled function doesn't bail out to non-compiled evaluation. See examples in this MathGroup thread.

3. Be aware that Lists are implemented as arrays in Mathematica

3.1. Pre-allocate large lists

3.2. Avoid Append, Prepend, AppendTo and PrependTo in loops, for building lists etc (because they copy entire list to add a single element, which leads to quadratic rather than linear complexity for list-building)

3.3. Use linked lists (structures like {1,{2,{3,{}}}} ) instead of plain lists for list accumulation in a program. The typical idiom is a = {new element, a}. Because a is a reference, a single assignment is constant-time.

3.4. Be aware that pattern-matching for sequence patterns (BlankSequence, BlankNullSequence) is also based on Sequences being arrays. Therefore, a rule {fst_,rest___}:>{f[fst],g[rest]} will copy the entire list when applied. In particular, don't use recursion in a way which may look natural in other languages. If you want to use recursion on lists, first convert your lists to linked lists.

4. Avoid inefficient patterns, construct efficient patterns

4.1. Rule-based programming can be both very fast and very slow, depending on how you build your structures and rules, but in practice it is easier to inadvertently make it slow. It will be slow for rules which force the pattern-matcher to make many a priory doomed matching attempts, for example by under-utilizing each run of the pattern-matcher through a long list (expression). Sorting elements is a good example: list//.{left___,x_,middle___,y_,right___}/;x>y:>{left,y,middle,x,right} - has a cubic complexity in the size of the list (explanation is e.g. here).

4.2. Build efficient patterns, and corresponding structures to store your data, making pattern-matcher to waste as little time on false matching attempts as possible.

4.3. Avoid using patterns with computationally intensive conditions or tests. The pattern-matcher will give you the most speed when patterns are mostly syntactic in nature (test structure, heads, etc). Every time when condition (/;) or pattern test (?) is used, for every potential match, the evaluator is invoked by the pattern-matcher, and this slows it down.

5. Be aware of immutable nature of most Mathematica built-in functions

Most Mathematica built-in functions which process lists create a copy of an original list and operate on that copy. Therefore, they may have a linear time (and space) complexity in the size of the original list, even if they modify a list in only a few places. One universal built-in function that does not create a copy, modifies the original expression and does not have this issue, is Part.

5.1. Avoid using most list-modifying built-in functions for a large number of small independent list modifications, which can not be formulated as a single step (for example, NestWhile[Drop[#,1]&,Range[1000],#<500&] )

5.2. Use extended functionality of Part to extract and modify a large number of list (or more general expression) elements at the same time. This is very fast, and not just for packed numerical arrays (Part modifies the original list).

5.3. Use Extract to extract many elements at different levels at once, passing to it a possibly large list of element positions.

6. Use efficient built-in data structures

The following internal data structures are very efficient and can be used in many more situations than it may appear from their stated main purpose. Lots of such examples can be found by searching the Mathgroup archive, particularly contributions of Carl Woll.

6.1. Packed arrays

6.2. Sparse arrays

7. Use hash - tables.

Starting with version 10, immutable associative arrays are available in Mathematica (Associations)

7.1 Associations

the fact that they are immutable does not prevent them to have efficient insertion and deletion of key-value pairs (cheap copies different from the original association by the presence, or absence, of a given key-value pair). They represent the idiomatic associative arrays in Mathematica, and have very good performance characteristics.

For earlier versions,the following alternatives work pretty well, being based on internal Mathematica's hash-tables:

7.2. Hash-tables based on DownValues or SubValues

7.3. Dispatch

8. Use element - position duality

Often you can write faster functions to work with positions of elements rather than elements themselves, since positions are integers (for flat lists). This can give you up to an order of magnitude speed-up, even compared to generic built-in functions (Position comes to mind as an example).

9. Use Reap - Sow

Reap and Sow provide an efficient way of collecting intermediate results, and generally "tagging" parts you want to collect, during the computation. These commands also go well with functional programming.

10. Use caching, dynamic programming, lazy evaluation

10.1. Memoization is very easily implemented in Mathematica, and can save a lot of execution time for certain problems.

10.2. In Mathematica, you can implement more complex versions of memoization, where you can define functions (closures) at run-time, which will use some pre-computed parts in their definitions and therefore will be faster.

10.3. Some problems can benefit from lazy evaluation. This seems more relevant to memory - efficiency, but can also affect run-time efficiency. Mathematica's symbolic constructs make it easy to implement.

A successful performance - tuning process usually employs a combination of these techniques, and you will need some practice to identify cases where each of them will be beneficial.

Does point 7 need to be updated now that we have Association? – bobthechemist – 2014-06-28T00:33:12.447

@bobthechemist Thanks, updated. – Leonid Shifrin – 2014-06-28T01:57:28.903

@LeonidShifrin Great post. Thanks for sharing knowledge. Could you be a little more analytical in 10.2? What do you mean by defining functions at runtime? – Fierce82 – 2016-03-16T12:12:04.540

1

@TomZinger Glad you found it useful. Re: 10.2: I meant memoizing entire patterns. Have a look at this answer for an example of that, and also this great answer. And also this one - in there, I list the generated definitions, so it is rather explicit in what you get memoized.

– Leonid Shifrin – 2016-03-16T15:48:21.523

@LeonidShifrin Unfortunately it is too difficult for me to follow your answers, and replicate them to my own problems with recursive functions. Nevertheless thank you for your quick reply. – Fierce82 – 2016-03-17T12:30:56.920

@TomZinger You can always ask a question on your specific problem on the main site. There is a good chance that you will get a decent and clear answer for your particular problem. – Leonid Shifrin – 2016-03-17T13:31:28.610

1

@TomZinger Actually, there is also a section in this answer, called "Memoization for functions of more than one parameter, and using patterns in memoized definitions", which contains a somewhat more accessible explanation of the same thing. See if you find it more clear.

– Leonid Shifrin – 2016-03-17T13:34:56.843

@LeonidShifrin thank you for sharing your insight. Regarding to your single main principle, in other programming languages code optimization in general means efficient data structures including its optimal operations on physical layer representation. Your perspective on Mathematica symbolic evaluation is driving me on another insight into computational semiotics. Wolfram Language has elevated both symbolic and semantic computation, i.e. interpretation. I find it is challenging to balance data structures (Association), functional operations (AssociateTo) and composite elements (Entity). – Athanassios – 2016-08-24T17:35:43.660

@Athanassios Glad you found it useful! – Leonid Shifrin – 2016-08-24T19:26:25.447

Re: 2.2 on compiling. As far as I can tell, when attempting to inline one compiled function into another, inlining will fail if the function one wants to inline has been compiled with RuntimeAttributes->{Listable}. See this question and answer

– FalafelPita – 2017-07-20T19:54:42.740

Section 3 "Lists are implemented as arrays in Mathematica". Is List special in that regard, or are your statements 3.1, 3.2, 3.3 true of any Head? After all, I could certainly have f[item1, item2, item3, ...] in place of {item1, item2, item3, ...}; and same story for 'linked' version f[item1,f[item2, f[item3, ...]]]. I'm just wondering if there's anything particularly special about Lists? – QuantumDot – 2017-07-22T01:12:14.333

@FalafelPita Thanks, an interesting point. I wasn't aware of this. No time to dig deeper into that at the moment, unfortunately. – Leonid Shifrin – 2017-07-22T14:25:56.700

@QuantumDot In this particular respect, List is not different from general expressions (heads), so 3.1 - 3.3. are true for any head. – Leonid Shifrin – 2017-07-22T14:55:10.543

Nothing if not thorough! Excellent and detailed answer. [guess you meant list//.{left___,x_,y_,right___}/;x>y:>{left,y,x,right} in 4.1, probably a cut and past problem] – acl – 2011-01-19T11:22:42.707

@acl: Thanks! Indeed, I meant BlankNullSequence. This was a copy-paste problem, I changed the font from "text" to "code" and it worked. – Leonid Shifrin – 2011-01-19T13:11:35.950

1I cannot stress the use of Dispatch enough myself! I frequently go out of my way to use ReplaceAll with a dispatch table if at all possible. – Timo – 2011-01-20T07:52:50.867

1@Timo: Dispatch and DownValues-based hashes have somewhat different characteristics. Creating Dispatched definitions for large lists of key-value pairs seems generally faster than with DownValues, and application of Dispatched rules is sometimes also faster. However, once formed, Dispatched hash is not easily appended with new elements, so this is a kind of a one-time affair. With DownValues, you don't face this problem, you can add more key-value pairs at any time without a performance hit. DownValues can also be used to implement a kind of a classic cache, with Sort option set to False – Leonid Shifrin – 2011-01-20T17:44:07.643

1

@Leonid: A really good answer (+100)! Lots of useful hints. You should add a link to this answer in the What is in your Mathematica tool bag question.

– Simon – 2011-03-16T03:21:10.360

@Simon Thanks! I will add a link soon, this is a good suggestion. Hopefully, also soon I will find some time to add more links to this answer, as I promised there. – Leonid Shifrin – 2011-03-16T22:37:45.970

@Leonid Please, could you comment a little bit on your statement "This pushes larger part of the computation outside of the kernel and makes it slower." on 1.4? – Dr. belisarius – 2011-06-27T12:34:23.140

@belisarius What I meant is that, while functions like Map go to the main evaluator only to compute the value of the mapped function on a given list element, the list itself and its indexing etc are in the kernel. So, Map, Scan, etc do more operations in the kernel and less operations go through the main evaluator, than explicit array indexing in procedural loops. You can also view this as follows:Map etc are more limited in what list operations can be done (you can not jump over elements, etc), and gain speed from that. Due to the symbolic nature of mma, such gains can be significant. – Leonid Shifrin – 2011-06-27T12:51:04.587

@belisarius Another thing is that often functional operations make subscripting unnecessary altogether, by operating on entire list at once - such as Plus@@list. Since the list list is in memory, and Plus is an internal function, we avoid the main evaluation of each list element and push even more inside the kernel. – Leonid Shifrin – 2011-06-27T12:55:42.757

@Leonid Thanks a lot! – Dr. belisarius – 2011-06-27T13:55:49.693

6+1 it seems now a scientist of physics should be also a very good programmer. :) – None – 2011-08-20T03:23:15.207

47

You may use the profiler included in the Wolfram Workbench

7

For the sake of completing the cross-reference: One can profile within Mathematica, too. See Profiling from Mathematica

– Michael E2 – 2014-03-16T12:50:44.997

I see workbench is free for premium subscribers. Unfortunately, wolfram customer service has been extremely difficult for us to work with. :( We can't get any premium services activated. sigh – John – 2011-01-20T04:57:58.113

3Where is the profiler in Workbench? – Eli Lansey – 2012-02-16T18:40:19.840

20

Take a look at the presentation Principles for Efficient Mathematica Programs from the Wolfram Technology Conference 2007.

Another useful presentation is Tips for Memory Efficient Coding in the Wolfram Language.

4

If I want to make my code faster, I check if I can use FunctionCompile to speed it up:

https://reference.wolfram.com/language/guide/CodeCompilation.html

This compiler translates WL code into LLVM byte-code, which can then be compiled to native machine code. There is a very good tutorial video on YouTube which explains what you can do with this compiler: