Quick multiple selections from a list

15

6

What is the fastest way to make multiple selections from a list? Compiled methods included.

For example, here are two methods for selecting a subset, compared:-

biglist = {{5, "e", 500}, {4, "d", 400},
   {3, "c", 300}, {2, "b", 200}, {1, "a", 100}};
subset = {2, 5, 4};

Cases[biglist, {#, __}] & /@ subset

{{{2, "b", 200}}, {{5, "e", 500}}, {{4, "d", 400}}}

Cases[biglist, {Apply[Alternatives, subset], __}]

{{5, "e", 500}, {4, "d", 400}, {2, "b", 200}}

Only first method returns items in the order matched in the subset list, but it is much slower:-

n = 10000;
biggerlist = Map[{#, FromCharacterCode[Mod[# - 1, 26] + 97], #*100} &,
   Range[n]];
unsortedbiglist = RandomSample[biggerlist, n];
unsortedsubset = RandomSample[Range[n], Round[n/10]];

Row[{First[Timing[selection1 = Map[Cases[unsortedbiglist, {#, __}] &,
       unsortedsubset];]], " seconds"}]

1.123 seconds

Row[{First[Timing[selection2 = Cases[unsortedbiglist,
       {Apply[Alternatives, unsortedsubset], __}];]], " seconds"}]

0.171 seconds

The selections are the same, but differently ordered:-

SameQ[Flatten[selection1, 1],
 Extract[selection2,
  Flatten[Map[Position[First /@ selection2, #] &,
    unsortedsubset], 1]]]

True

Including the sorting routine in the selection process still gives a better timing:-

Row[{First[Timing[selection3 = Function[selection2,
        Extract[selection2,
         Flatten[Map[Position[First /@ selection2, #] &,
           unsortedsubset], 1]]][Cases[unsortedbiglist,
        {Apply[Alternatives, unsortedsubset], __}]];]], " seconds"}]

0.359 seconds

SameQ[Flatten[selection1, 1], selection3]

True

Nevertheless, only the slow Cases method returns {} when there are unmatched subset items, which is sometimes useful.

Ideas for speedy selections would be great.

Chris Degnen

Posted 2013-02-07T11:30:24.780

Reputation: 27 033

Answers

14

This seems to give a rather decent performance (final version with improvements by jVincent):

Clear[getSubset];
getSubset[input_List,sub_List]:=
Module[{inSubQ,sowMatches},
    Scan[(inSubQ[#] := True)&,sub];
    sowMatches[x_/;inSubQ@First@x] := Sow[x,First@x];
    Apply[Sequence, Last@Reap[Scan[sowMatches, input], sub], {2}]
];

Benchmarks:

n = 10000;
biggerlist =  Map[{#, FromCharacterCode[Mod[# - 1, 26] + 97], #*100} &, Range[n]];
unsortedbiglist = RandomSample[biggerlist, n];
unsortedsubset = RandomSample[Range[n], Round[n/10]];

Row[{First[Timing[selection1=Map[Cases[unsortedbiglist,{#,__}]&,unsortedsubset];]]," seconds"}]

(* 1.170008 seconds  *)

(sel1 = getSubset[unsortedbiglist,unsortedsubset])//Short//Timing

(* {0.031,{{{8286,r,828600}},<<998>>,{{6420,x,642000}}}}  *)

selection1===sel1

(* True *)

Leonid Shifrin

Posted 2013-02-07T11:30:24.780

Reputation: 108 027

I updated the code and reran the test and timing. Feel free to revert if it's not to your liking. – jVincent – 2013-02-07T13:37:36.550

@jVincent Thanks, looks good to me. – Leonid Shifrin – 2013-02-07T13:44:55.043

Very nice. Also, adding ... sub] /. {} -> {{}} to the last line makes it handle non-matching sublist items the same as Cases. – Chris Degnen – 2013-02-07T14:19:59.373

@ChrisDegnen Great. Feel free to update the code in the answer if you feel this change is important. – Leonid Shifrin – 2013-02-07T14:27:42.560

6

 Timing[selection3 = Pick[unsortedbiglist, unsortedbiglist[[All, 1]], 
   Alternatives @@ unsortedsubset];]
 (* {0.218401, Null}  -- same as Cases[..., Alternatives@@ ..] *)
 selection3 == selection2
 (* True *)

kglr

Posted 2013-02-07T11:30:24.780

Reputation: 302 076

Yes, same result and timing. – Chris Degnen – 2013-02-07T12:41:04.067

6

Here is my take on Leonid's method. It's better because it's shorter and uses ~infix~. ;-)
(It's just a little bit faster, too: about 20% on his test.)

getSubset2[input_List, sub_List] := Module[{test},
  (test@# = True) & ~Scan~ sub;
  Apply[Sequence,
   Reap[Cases[input, x:{y_?test, ___} :> x ~Sow~ y], sub][[2]],
   {2}
  ]
]

getSubset2[Range@20 ~Partition~ 4, Prime ~Array~ 7]
{{}, {}, {{5, 6, 7, 8}}, {}, {}, {{13, 14, 15, 16}}, {{17, 18, 19, 20}}}

Although slower I cannot pass by the more direct implementation without comment:

getSubset3[input_List, sub_List] :=
  Last @ Reap[# ~Sow~ #[[1]] & ~Scan~ input, sub, Sequence @@ #2 &]

Also slower than getSubset2 but pleasingly clean, Association can be nicely applied to this problem in the form of GroupBy and Lookup.

getSubset4[set_, sub_] := Lookup[set ~GroupBy~ First, sub, {}]

getSubset4[Range@20 ~Partition~ 4, Prime ~Array~ 7]
{{}, {}, {{5, 6, 7, 8}}, {}, {}, {{13, 14, 15, 16}}, {{17, 18, 19, 20}}}

Mr.Wizard

Posted 2013-02-07T11:30:24.780

Reputation: 259 163

Thanks, I was wondering how Sequence could be used. – Chris Degnen – 2013-02-11T13:02:27.543

This is so useful! Apply[Sequence, {{}, {{}}}, {2}] – Chris Degnen – 2013-02-28T11:21:16.687

@Chris I'm happy to have planted the seed. :-) – Mr.Wizard – 2013-02-28T11:23:31.327

@Chris Apply[Sequence, lis, {-3}] is a solution to this question that I don't believe anyone has posted. You should.

– Mr.Wizard – 2013-02-28T11:28:07.420