## Linked lists: general discussion

I would like to describe here a pattern which is IMO quite important yet seems to be generally under-appreciated and under-used in our community (and probably in Mathematica programming in general) - namely, *linked lists*.

Linked lists are important for several reasons. Here is a (partial) list of benefits they can bring to the table:

*Idiomatic recursive functional programming*. Linked lists allow one to use recursion in Mathematica in ways idiomatic for functional programming. Many people actually consider linked lists and recursion to be one of the corner stones of FP. Since main Mathematica data structures - lists, are actually arrays rather than linked lists, recursion using lists is not a very practical tool in Mathematica, because of performance reasons (we will discuss that in more detail below).

*Transparent, easy to guess complexity of algorithms* implemented using linked lists. This is quite important particularly in Mathematica, where a lot of internal details may be hidden behind concise programming constructs and built-in functions. In general, Mathematica is a two (or multi) scale language in terms of performance, where one performance scale is roughly given by built-in functions, while another one by the best-written top-level code (the real situation is yet more complex, but this schematic view will be good enough for us). Many algorithms which may be concisely implemented may be fast for smaller samples but then change their behavior, since their speed may be due to small time constants associated with built-in functions, rather than due to the asymptotic behavior of the resulting algorithm itself. And in any case, my point is that an reasonably efficient Mathematica solution may require an expert to quickly determine the complexity of the resulting algorithm.

The algorithms using linked lists typically don't use the "standard" machinery such as vectorization and working with large amounts of data at once, so they are much easier to understand since they use much less of the internal machinery.

*Predictable and usually quite reasonable speed*. While solutions based on linked lists may be slower than vectorized versions, they are still among the fastest solutions one can achieve purely with the top-level code. And sometimes, they can actually lead to some of the fastest solutions for a problem (one example is a Boggle problem).

*Memory efficiency*. The solutions using linked lists are typically quite memory efficient, often much more so that vectorized solutions.

*Generality*. Because solutions based on linked lists don't usually require to have or know all the data at once, they can be used for a more general class of problems than the techniques such as vectorization or structural operations. In particular, there are several classes of problems where linked lists provide by far the most natural and clear solutions.

One final comment here is that, to an extent, declaring linked lists as a design pattern means that here we face the language limitations. Were linked lists so natural to use within the language, and we would not feel the need to describe it as a pattern. To some extent also, it is because the dominant paradigm of practical Mathematica programming is to work with as much data at once as possible and use vectorized and structural operations, an approach which is not quite compatible with linked lists and the way one works with them.

## Basic linked lists

### Construction

A linked list is a construct of this sort:

```
llist = {1,{2,{3,{4,{5,{}}}}}}
```

It can be conveniently constructed from a flat list, using `Fold`

:

```
toLinkedList[l_List]:=Fold[{#2,#1}&,{},Reverse@l]
```

so that `llist`

could have been constructed as `toLinkedList[Range[5]]`

. The reverse operation of converting to a flat list can be also done very conveniently using `Flatten`

:

```
Flatten[llist, Infinity]
(* {1, 2, 3, 4, 5} *)
```

When elements are lists themselves, one needs to use generalized linked lists, which I will cover later.

The main structural elements of a linked lists are *head* (the first element), and *tail* (a list containing the rest of the elements). Note that tail is also a linked list. Extracting one or the other is very easy - just use `First`

for head and `Last`

or tail:

```
First@llist
(* 1 *)
Last@llist
(* {2,{3,{4,{5,{}}}}} *)
```

### The defining algorithmic property and an illustration - accumulation of results.

In the algorithmic aspect, the main, defining property of a linked list, which makes it very different from usual Mathematica list, is that it is cheap to copy. The reason is that any linked list is a list of two elements only - head and tail. And because expressions in Mathematica are immutable, the copying of `tail`

does not involve a deep copy, but rather (internally) a pointer to the list is copied. This is in contrast to normal Mathematica lists, whose copying will in most cases have a linear complexity in the size of the list.

Everything else follows from this property. For example, collecting of intermediate results is easy with linked lists. The main idiom here is

```
list = {new-elem, list}
```

One practical example of using this idiom can be found here. To illustrate this here, consider a toy implementation of `Select`

based in linked lists:

```
selectLL[l_List, pred_] :=
Module[{result = {}},
Do[If[pred[el], result = {el, result}], {el, l}];
Reverse@Flatten[result]];
```

Here is how it is used (note that it will only work correctly for list elements which are not themselves lists):

```
selectLL[Range[10], OddQ]
(* {1, 3, 5, 7, 9} *)
```

Now, the analog we may write using normal lists will be

```
selectNaive[l_List, pred_] :=
Module[{result = {}},
Do[If[pred[el], AppendTo[result, el]], {el, l}];
result];
```

But in this case, entire list is copied every time when a new element is added, which results in much worse performance even for relatively small lists:

```
selectLL[Range[20000],OddQ];//AbsoluteTiming
selectNaive[Range[20000],OddQ];//AbsoluteTiming
(*
{0.037109,Null}
{0.561523,Null}
*)
```

Note by the way that the built-in `Select`

is not dramatically faster than our implementation based on linked lists:

```
Select[Range[20000], OddQ]; // AbsoluteTiming
(* {0.014648, Null} *)
```

One can argue that `Sow`

and `Reap`

can be used for accumulation of results, and in fact may be more efficient in that than linked lists. This is true, but the `Sow`

- `Reap`

model is more limited because partial lists of results accumulated "so far" are not available to the user.

### Generalized linked lists

The general construction for linked lists is similar to the one described above, but the `List`

head is replaced by some (arbitrary) head (say,`ll`

). For certain reasons, it is better in many cases if `ll`

is `HoldAllComplete`

(basically, to prevent search for `UpValues`

, which may be expensive and slow things down). So, we end up with

```
ClearAll[ll];
SetAttributes[ll,HoldAllComplete];
toLinkedListGen[l_List]:= Fold[ll[#2,#1]&,ll[],Reverse@l]
```

for example

```
llgen = toLinkedListGen[Range[5]]
(* ll[1, ll[2, ll[3, ll[4, ll[5, ll[]]]]]] *)
```

and the code to convert this back to the flat list may look like

```
List @@ Flatten[llgen, Infinity, ll]
(* {1, 2, 3, 4, 5} *)
```

This construction is valid for lists with arbitrary elements (except those having themselves the head `ll`

). If you need to use linked lists inside other linked lists, you can define several such heads.

One thing to keep in mind here is that you have to make sure that the element you are adding has been fully evaluated (if that's essential) before writing

```
llist = ll[elem, llist]
```

if `ll`

is `HoldAllComplete`

.

## Typical use cases and examples

Here I will try to show some examples and give some links for the typical use cases of linked lists, to achieve various goals.

### Implementing algorithms requiring complex list iteration, possibly with look-ahead analysis.

A typical such problem is the merge sort algorithm. I will show here the implementation I posted here:

```
Clear[toLinkedList];
toLinkedList[x_List] := Fold[{#2, #1} &, {}, Reverse[x]];
Module[{h,lrev},
mergeLinked[x_h,y_h]:=
Last[
{x,y,h[]}//.
{
{fst:h[hA_,tA_h],sec:h[hB_,tB_h],e_h}:>
If[hA>hB,{tA,sec,h[hA,e]},{fst,tB,h[hB,e]}],
{fst:h[hA_,tA_h],h[],e_h}:>{tA,h[],h[hA,e]},
{h[],sec:h[hB_,tB_h],e_h}:>{h[],tB,h[hB,e]}
}
];
lrev[set_]:=
Last[h[set,h[]]//.
h[h[hd_,tl_h],acc_h]:>h[tl,h[hd,acc]]];
sort[lst_List]:=
Flatten[
(h[#1,h[]]&)/@lst//.
x_List:>
Flatten[
{toLinkedList[x],{}}//.
{{hd1_,{hd2_,tail_List}},accum_List}:>
{tail,{accum,lrev[mergeLinked[hd1,hd2]]}}
],
\[Infinity],
h
]
]
```

Here, two different linked lists are used. One is for a list of already merged lists, which at every iteration gets half its previous lengths, since already ordered sub-lists are merged pairwise. For this one, a normal `List`

head is used, and this "higher-level" merging procedure is implemented in `sort`

. Another one is used for individual sub-lists which are merged - they are first converted to linked lists with head `h`

. The lower-level merging procedure is implemented in `mergeLinked`

, while `lrev`

reverses such linked list.

This implementation is a bit unusual in that it uses exclusively linked lists, local rules and `ReplaceRepeated`

, instead of recursion. It was originally posted on MathGroup, where it is explained in detail, and the corrected version I use in this post was posted here. The main idea was to illustrate that `ReplaceRepeated`

can be just as efficient as other means, when used with appropriately constructed patterns. My implementations were certainly motivated by a similar discussion in the book of David Wagner.

In any case, the naive implementation using normal lists, which can look e.g. like

```
Clear[merge];
merge[x_List,y_List]:=
Block[{merge},
Flatten[
merge[x,y]//.
{
merge[{a_,b___},{c_,d___}]:>
If[a<c,{a,merge[{b},{c,d}]},{c,merge[{a,b},{d}]}],
merge[{},{a__}]:>{a},
merge[{a__},{}]:>{a}
}
]
]
```

Will have an unacceptable complexity due to the list copying at every stage.

### Notes on recursion

Note that while in the previous example recursion was not used explicitly, this was rather artificial, and normally the solutions involving linked lists are recursive. To finish with the merge sort example and illustrate this point, here is the recursive implementation of `merge`

, taken from here

```
ClearAll[merge];
merge[a_ll, ll[], s_, _] := List @@ Flatten[ll[s, a], Infinity, ll];
merge[ll[], b_ll, s_, _] := List @@ Flatten[ll[s, b], Infinity, ll];
merge[ll[a1_, atail_], b : ll[b1_, _], s_, f_: LessEqual] /;f[a1, b1] :=
merge[atail, b, ll[s, a1], f];
merge[a : ll[a1_, _], ll[b1_, brest_], s_, f_: LessEqual] :=
merge[a, brest, ll[s, b1], f];
merge[a_List, b_List, f_: LessEqual] :=
merge[toLinkedListGen@a, toLinkedListGen@b, ll[], f];
```

(where I replaced `toLinkedList`

with `toLinkedListGen`

, the latter defined previously in this post).

Four additional important things to note here: first, note how we can effectively combine pattern-matching on linked lists and function overloading (more than one definition / global rule), to end up with code which is quite expressive and self-documenting.

Second, note that the last definition is for normal lists, and automatically translates them to linked lists first, while the first two definitions automatically translate the result into the flat list again. This is a common technique, it allows the function to be self-contained, in that linked lists are only used internally, while both input and output is in terms of normal lists.

Third, you can see that the list of accumulated results (which is also a linked list), is being passed to the function as its third argument. This is also a common /standard technique - make the function pass all additional information that it changes in the process, as extra parameter(s). Not only does this allow one to avoid introducing mutable state into the function (to hold this changing information), but also, and perhaps more importantly, it allows the function to be tail-recursive in the Mathematica sense. This is an absolute requirement for recursive solutions to be practical. The main reason why the function won't be tail-recursive otherwise is explained in the link I just gave, but basically you will have something like

```
f[x_]:= (
Do-something-with-mutable-state;
f[x-1]
)
```

And the presence of `CompoundExpression`

around is what makes it non-tail-recursive.

Last thing to note is that, for large enough lists, one may need to increase the setting of `$IterationLimit`

, or set it to infinity altogether. This can always be done on the level of function's definitions. For example, for `merge`

, we could change the last definition to

```
merge[a_List, b_List, f_: LessEqual] :=
Block[{$IterationLimit = Infinity},
merge[toLinkedListGen@a, toLinkedListGen@b, ll[], f]
];
```

Note also that if the system complains about `$RecursionLimit`

being exceeded, this means that your function is not properly tail-recursive (as far as Mathematica is concerned). All properly tail-recursive Mathematica functions can only exceed `$IterationLimit`

, but not `$RecursionLimit`

.

### General list processing based on some look-ahead analysis

A typical class of use cases for linked lists consists of problems of splitting lists into various groups, according to their structure. I will show a simple example, takesn from here, where the built-in `Split`

would do, but in general `Split`

will not suffice, since the splitting condition may involve more than just two adjacent elements.

This function will count the lengths of sequence of adjacent elements, using linked lists:

```
Clear[cValsRec];
cValsRec[l_List] := cValsRec[toLinkedList@l, {{}, 0}];
cValsRec[{}, counts_] := Most@Flatten[counts];
cValsRec[{val_, tail : {val_, _List}}, {prev_List, count_}] :=
cValsRec[tail, {prev, count + 1}];
cValsRec[{val_, tail_List}, {prev_List, count_}] :=
cValsRec[tail, {{prev, count + 1}, 0}];
```

and it can be used e.g. as

```
test = Sort@RandomInteger[5, 15]
(* {0, 0, 2, 2, 2, 3, 3, 3, 4, 4, 4, 4, 5, 5, 5} *)
cValsRec[test]
(* {2, 3, 3, 4, 3} *)
```

As before, you can see the same pattern: the first definition is a "starting" one and converts initial list to a linked list. The last definition describes the final stage and conversion the result (which is also accumulated in a linked list) back to normal list. And the results are accumulated in a separate linked list which is passed to the next function call as a separate argument.

Sometimes, instead of splitting the list, one has to do something else, like e.g. deleting or adding some elements, but also based on near-by elements. All these cases are problems for which functional iterators such as `Fold`

are generally ill-suited. I also add such cases to the set of cases I cover in this section, since they are conceptually similar.

Some further links for similar problems solved using linked lists:

I particularly recommend to look into the code of the last two links, since those use a number of interesting techniques in combination.

Some general remarks for this section: first, I'd like to stress one clear advantage of linked lists for this type of problems: they are *completely straight-forward* to use. You don't have to figure out some clever vectorizing scheme or sequence of list-manipulation steps. You don't have to know all subtleties of various built-in functions involved, and their combinations. This means that linked lists provide a straight-forward implementation path for users who are not Mathematica experts.

Second, the code using linked lists and recursion tends to be readable and self-documenting. This is quite important, since cryptic code is hard to understand and maintain, and it is easier to create bugs when the code is cryptic.

Third, code based on linked lists can be completely generic, since it does not rely on vectorization, packed arrays etc. At the same time, it remains completely idiomatic, since linked lists are immutable, and no mutable state is used. Admittedly, for numerical problems where the structure of the data may allow (much) more efficient vectorized solutions, linked lists may not be the fastest. But in those cases, one can dispatch to such special cases automatically, as it has been done e.g. for the merge function here.

### Using linked lists for tree traversals and tree-building

This is a more advanced use of linked lists, and here they can fully reveal their potential. The thing is, problems involving trees are usually naturally recursive, and while for flat list creation / processing it is very often possible to get away by using standard Mathematica arsenal for list manipulation, here itis often not enough.

While I plan to extend this part of the post soon, right now I will illustrate this with one relatively simple example - a solution for a variant of the cutting stock problem in *Mathematica*, and then provide links to a few more involved cases.

So, the problem there was to list the ways in which one could sum elements from a list (with repetitions), so the sum is less-than-or-equal to some value. So, here are the values of the list and the maximum value, from the original question:

```
i = {7.25, 7.75, 15, 19, 22};
m = 22;
```

The solution using linked lists is fairly simple:

```
ClearAll[getCombs];
getCombs[l_List,lim_Integer]:=
Cases[
getCombs[{},toLinkedList[Reverse[Sort[l]]],lim],
getCombs[acc:Except[{}],__]:>Flatten[acc],
\[Infinity]
];
getCombs[accum_,l:{head_,tail_List},lim_?Positive]:=
{
If[lim>=head,getCombs[{accum,head},l,lim-head],Sequence@@{}],
getCombs[accum,tail,lim]
};
```

The first definition is both the "starting" and "ending" one, and it realizes the "wrap-around definition" pattern - that is, preprocess data (in this case, convert to linked list the original list), call the main function, and then post-process the data (`Cases`

here, I will explain what it does in a moment).

The second definition is the one doing the real work. Borrowing the explanation from the original answer, the way it works is to create a tree by going backwards, subtracting a given element from the current maximal for the sum, to get the maximal for the sums of remaining elements. The case of repetitions is accounted for by the line `If[lim>=head,...]`

. The resulting numbers are also accumulated in linked lists (`accum`

parameter), and finally they are found with `Cases`

and converted back to normal lists using `Flatten`

. In this way, we avoid unnecessary array copying.

As you can see, the most important difference here is that a tree is created as a result, rather than a list. That tree is then post-processed, and the results collected from it, using `Cases`

. This construction allows us to take the best from both worlds: linked lists + recursion, and pattern-based expression destructuring. And since the patterns are "syntactic" See my comment to a post here, it is really fast. Because the algorithm is essentially recursive, and the result is a tree rather than a more regular list, the standard list-manipulation approaches are at a loss here.

Here are some more links to applications of linked lists involving trees:

In some of the examples linked, the use of linked lists or associated data structures (and trie can be viewed as a generalization of a linked list) led to some quite dramatic performance boosts. You can read explanations in the linked posts for more details.

## Remarks

I tried to outline the usefulness of linked lists and illustrate the main ideas behind their use, and techniques associated with them. One of the main advantages they offer is the possibility of direct and straight-forward implementation of recursive algorithms in Mathematica, with quite reasonable performance.

This post is a work in progress. I plan to add more links, and make a special separate link section where all links will be collected. I also may add a couple of more sections, such as a section of using linked lists to trade speed for memory.

All remarks, suggestions and additions are welcome and encouraged - this is a community post as are all answers here.

5Great question, +1. – Leonid Shifrin – 2013-05-10T13:45:41.373

Stating the obvious, but I'm expecting a great answer from Leonid :) – Jacob Akkerboom – 2013-05-10T13:54:46.030

One thing to mention is that an

idiomis a pattern that has gained some acceptance in the community and is commonly used (thus it's somewhat a cultural thing, just like in a natural language). We can identify idioms by looking at what code people typically use. A design pattern can also gain acceptance in the community and become widely known and tested (tested in the sense that it's been proven to be a robust and convenient design by practical usage). But theoretically it's possible to invent completely new design patterns that are not commonly used. The inventor, if s/he posts here, – Szabolcs – 2013-05-12T21:08:51.117would have to make a case for why his design pattern is effective and why it will work well in a broad range of situation, how it will solve common problems, ect. This is going to be more difficult unless the inventor can present some convincing case studies. My point is that a pattern can be idiomatic (commonly used); it can also be an effective design. The are two independent things. – Szabolcs – 2013-05-12T21:11:37.833

3@JacobAkkerboom I have spent several hours on this, and identified a few dozens of patterns so far. Some of them have been mentioned by others (particularly recent post of @Szabolcs). However, to my mind, a huge amount of work is needed to write posts which really show people how to use those patterns and give enough links, rather than just stating a very brief description (which is what is being currently done for most of them, and most of which is understandable by experts only). As time allows, I will add descriptions for some of them, which may overlap with those already posted by others). – Leonid Shifrin – 2013-05-13T13:44:49.627