## How do I obtain an intersection of two or more list of lists conditioned on the first element of each sub-list?

22

10

Given two lists like

list1 = {{1, 1}, {2, 4}, {3, 9}, {4, 16}};
list2 = {{2, 6}, {3, 9}, {4, 12}, {5, 15}};


I would like to produce an output like

listout = {{2, {4, 6}}, {3, {9, 9}}, {4, {16, 12}}}


14

Good question. Second try.

f1[a_List, b_List] :=
Reap[Sow[#2, #] & @@@ a ~Join~ b, a[[All, 1]] ⋂ b[[All, 1]], List][[2, All, 1]]

f2[a_List, b_List] :=
With[{aa = a[[All, 1]], bb = b[[All, 1]]},
{#[[1, 1]], #[[All, 2]]} & /@
Pick[a ~Join~ b, aa ~Join~ bb, Alternatives @@ (aa ⋂ bb)] ~GatherBy~ First
]


Edit: Here is another method using GatherBy. While this method did not come to mind when I first wrote this answer I have used related methods for some time. It works by preconditioning GatherBy so that we collect the expressions we want at the beginning of the results and then discarding the rest. This is the same principle I used for How to Delete Elements from List1 appearing in List2? and more recently Complement on pre-sorted lists, and which jVincent used for Counting the population of integers.

f3[a_List, b_List] :=
With[{pre = List /@ ( a[[All, 1]] ⋂ b[[All, 1]] )},
{pre[[All, 1]], GatherBy[Join[pre, a, b], First][[;; Length@pre, 2 ;;, 2]]}\[Transpose]
]


At the expense of greater code length this can be made faster by using Szabolcs's inversion method with GatherBy:

f4[a_List, b_List] :=
Module[{pre, first, all, n, a1, b1},
{a1, b1} = {a[[All, 1]], b[[All, 1]]};
n = Length[pre = a1 ⋂ b1];
first = Join[pre, a1, b1];
all = Join[a, b];
{pre, Map[all[[#, 2]] &,
GatherBy[Range@Length@first, first[[#]] &][[;; n, 2 ;;]] - n]}\[Transpose]
]


Test:

list1 = {{1, 1}, {2, 4}, {3, 9}, {4, 16}};
list2 = {{2, 6}, {3, 9}, {4, 12}, {5, 15}, {5, 7}};

f1[list1, list2]

{{2, {4, 6}}, {3, {9, 9}}, {4, {16, 12}}}


I included {5, 7} in list2 to show that this is finding the intersection of the two lists and not merely repeats within a single list.

For those of us who cannot read infix: f2[a_, b_] := ({#1[[1, 1]], #1[[All, 2]]} & ) /@ GatherBy[ Cases[Join[a, b], {Alternatives @@ Intersection[a[[All, 1]], b[[All, 1]]], _}], First] – Mike Honeychurch – 2012-12-17T06:56:34.710

Looks fine to me. Would be interested to know how Apply Alternative compares with the MemberQ test for larger lists. I normally use Alternative but forgot all about it when I wrote my answer – Mike Honeychurch – 2012-12-17T06:57:58.767

@Mike Alternatives test quite a bit faster than MemberQ for me. For pure speed this seems best of what I've tried: Pick[#, First /@ #, Alternatives @@ inter] & @ Join[a, b] where inter is the first elements intersection. – Mr.Wizard – 2012-12-17T07:10:47.750

Yes Pick will generally provide a much faster solution than e.g. Cases/Select if you can develop a Pick method. – Mike Honeychurch – 2012-12-17T07:35:58.720

11

this works for the given example:

ReplaceList[{list1, list2}, {{___, {a_, b_}, ___}, {___, {a_, c_}, ___}} :> {a, {b, c}}]

(* {{2, {4, 6}}, {3, {9, 9}}, {4, {16, 12}}} *)


2Very clever! However, this runs into problems if there are duplicates (in terms of first element) internally in either list. It is also rather slow on longer lists. Enthusiastic +1 nevertheless! – Mr.Wizard – 2012-12-17T07:46:27.657

2An additional note: make sure you use RuleDelayed (:>) when working with named patterns on the right-hand side. This correctly localizes the symbols. I made this edit for you; I hope you don't mind. – Mr.Wizard – 2012-12-17T07:56:42.807

9

Update 3: A generalization for any number of lists and any column as the key:

 ClearAll[combineBy];
combineBy[lists : __List, col_Integer] /; (col <= Min[Length /@ # & /@ {lists}]) :=
With[{intNodes =  Alternatives @@ Intersection @@ (#[[col]] & /@ # & /@ {lists}),
joined = GatherBy[Join[lists], #[[col]] &],
othercols = DeleteCases[Range[Min[Length /@ # & /@ {lists}]], col]},
{#[[1, col]], Join @@ #[[All, othercols]]} & /@
Pick[joined, ! FreeQ[#[[1, col]], intNodes] & /@ joined]]


OP's example:

  list1 = {{1, 1}, {2, 4}, {3, 9}, {4, 16}};
list2 = {{2, 6}, {3, 9}, {4, 12}, {5, 15}, {5, 7}};
combineBy[list1, list2, 1]
(* {{2, {4, 6}}, {3, {9, 9}}, {4, {16, 12}}} *)
combineBy[list1, list2, 2]
(* {{9, {3, 3}}} *)


More examples:

  list3 = Table[RandomSample[Range[7], 3], {3}];
list4 = Table[RandomSample[Range[7], 3], {4}];
list5 = Table[RandomSample[Range[7], 3], {6}];

Prepend[Prepend[{SpanFromAbove, SpanFromAbove,  #,
Column[combineBy[list4, list5, #]]} & /@ {2, 3},
{Column@list4, Column@list5, 1, Column[combineBy[list4, list5, 1]]}],
{"list4", "list5", "key column", "result"}] //
Grid[#, Alignment -> {Center, Center}, Dividers -> All] &


  Prepend[Prepend[{SpanFromAbove, SpanFromAbove, #,
Column[combineBy[list3, list4, #]]} & /@ {2, 3},
{Column@list3, Column@list4, 1, Column[combineBy[list3, list4, 1]]}],
{"list3", "list4", "key column", "result"}] //
Grid[#, Alignment -> {Center, Center}, Dividers -> All] &


  Prepend[Prepend[{SpanFromAbove, SpanFromAbove,  #,
Column[combineBy[list3, list5, #]]} & /@ {2, 3},
{Column@list3, Column@list5, 1,  Column[combineBy[list3, list5, 1]]}],
{"list3", "list5", "key column", "result"}] //
Grid[#, Alignment -> {Center, Center}, Dividers -> All] &


  Prepend[Prepend[{SpanFromAbove, SpanFromAbove, SpanFromAbove,  #,
Column[combineBy[list3, list4, list5, #]]} & /@ {2, 3},
{Column@list3, Column@list4, Column@list5, 1,
Column[combineBy[list3, list4, list5, 1]]}],
{"list3", "list4", "list5", "key column", "result"}] //
Grid[#, Alignment -> {Center, Center}, Dividers -> All] &


  ClearAll[combine];
combine[list1_List, list2_List] :=
With[{intNodes = Intersection[First /@ list1, First /@ list2],
joined =  GatherBy[Join[list1, list2], First]},
{First[First@#],  Last[#]} & /@ (Transpose /@
Select[joined, MemberQ[intNodes, #[[1, 1]]] &])]

combine[list1, list2]
(* {{2, {4, 6}}, {3, {9, 9}}, {4, {16, 12}}} *)
combine[list1, {{2, 6}, {3, 9}, {4, 12}, {5, 15}, {5, 7}}]
(* {{2, {4, 6}}, {3, {9, 9}}, {4, {16, 12}}} *)


(Updated with correction thanks to @Mr.W's comment: the second argument of Select is changed from Length[#]>2& in the original post to the correct version that accounts for the intersection of the first columns of the two lists.)

Update 2: Using Pick instead of Select:

ClearAll[combine2];
combine2[list1_List, list2_List] :=
With[{intNodes = Intersection[First /@ list1, First /@ list2],
joined =  GatherBy[Join[list1, list2], First]},
{#[[1, 1]], #[[-1]]} & /@
(Transpose /@ Pick[joined, MemberQ[intNodes, #[[1, 1]]] & /@ joined])]


@Mr.W thank you ... updated with correction. – kglr – 2012-12-17T06:54:29.023

Gladly up-voted. – Mr.Wizard – 2012-12-17T06:59:00.093

Funny, I was just trying Pick myself. See the comments below my answer. Great minds and all that. – Mr.Wizard – 2012-12-17T07:12:13.307

@Mr.W, of the triplets (Cases, Select,Pick), Pick is almost always the first that I try ... and it rarely disappoints. – kglr – 2012-12-17T07:38:53.930

6

I like this one:

{#[[1,1]],#[[All,2]]}&/@Select[GatherBy[list1~Join~list2,First],Length[#]>1&]

(*{{2,{4,6}},{3,{9,9}},{4,{16,12}}}*)


5

Edit 1

processList[list1_, list2_] := Module[{intersection, tmp1, tmp2},

(* find the intersection of the all the first elements *)
intersection = Intersection[list1[[All, 1]], list2[[All, 1]]];

(* now find cases in each list in which the first element is one of the intersecting
elements *)
tmp1 = Cases[list1, {x_ /; MemberQ[intersection, x], __}];
tmp2 = Cases[list2, {x_ /; MemberQ[intersection, x], __}];

(* now gather all sub-lists based on the first element and map them to give the
desired output *)
{#[[1, 1]], ##[[All, 2]]} & /@ GatherBy[Join[tmp1, tmp2], First]
]


test:

processList[list1, {{2, 6}, {3, 9}, {4, 12}, {5, 15}, {5, 7}}]
(* {{2, {4, 6}}, {3, {9, 9}}, {4, {16, 12}}} *)


Edit 2

Since we're onto Pick methods :) ...this seems relatively concise. Two steps from above and then pick out the elements for output.

processList2[list1_, list2_] := Module[{intersection, tmp},

intersection = Intersection[list1[[All, 1]], list2[[All, 1]]];
tmp = {#[[1, 1]], ##[[All, 2]]} & /@ GatherBy[Join[list1, list2], First];
Pick[tmp, tmp[[All, 1]] /. Thread[Rule[intersection, True]]]
]

list2 = {{2, 6}, {3, 9}, {4, 12}, {5, 15}, {5, 7}};
processList2[list1, list2]
(* {{2, {4, 6}}, {3, {9, 9}}, {4, {16, 12}}}  *)


For all of these Pick methods I'm not sure how efficient creating the stencil can be for this particular problem. If the stencil is efficiently created then Pick is a fast way of picking out elements from lists.

How does this find an intersection of the two lists? – Mr.Wizard – 2012-12-17T06:08:02.407

Re: Edit, that looks like a more intelligent approach than the one I just posted. I hope you don't mind if I borrow from it. :-) – Mr.Wizard – 2012-12-17T06:21:22.603

@Mr.Wizard not sure how intelligent it is, or efficient. Just a stepwise approach. :) – Mike Honeychurch – 2012-12-17T06:24:13.137

Please take a look at my answer now and tell me if you see anything wrong. – Mr.Wizard – 2012-12-17T06:43:16.303

Perhaps I should insist on your removing infix notation before looking at it :) – Mike Honeychurch – 2012-12-17T06:58:25.367

5

a = Join[list1, list2]
n = Length[list1];
{a[[#, 1]], {a[[#, 2]], a[[# + n - 1, 2]]}} & /@ Range[2, n]

![Mathematica graphics](http://i.stack.imgur.com/FXKvd.png)


Explanation: Use linear indexing. We have 2 matrices as input. list1 and list2. Each is an n by 2 size matrix. Joining them results in one 2n by 2 matrix called a. This diagram explains the algorithm

Am I correct in observing that this only works if the two lists have a common ordered subset, e.g. {2, 3, 4}? – Andrew Cheong – 2013-08-25T03:34:28.187

2

Just for fun, a rule-based approach:

list2/.{{i_Integer/;!MemberQ[list1[[All,1]],i],x_}:>Sequence[],{i_Integer,x_}:>{i,{i/.(Rule@@@list1),x}}}