Combining values in one column (or part) when values in another column (or part) match

21

5

Consider the following:

data={{{3542313600, 175}, {3542918400, 175}}, {{3544128000, 450},
{3544128000, 450}, {3545337600, 450}, {3545942400, 450}, {3546547200, 450}}};

whereas the first element of each {_,_} represent a date in absolute time, i.e. {_,_} corresponds to the pattern {AbsoluteDate_,Value_}.

Now I would like to calculate the sum of all Value_ when the corresponding AbsoluteDate_ is equal, e.g.

{3545337600, 450}, {3545337600, 450}-> {3545337600, 900}.

John

Posted 2012-04-16T23:21:34.763

Reputation: 4 161

Do you mind if your list gets sorted? – rm -rf – 2012-04-16T23:28:26.523

No, I don't mind, because I can get the order back to the original one later. – John – 2012-04-16T23:29:29.323

2Looks like there are many ways to skin the cat. – DavidC – 2012-04-17T00:50:44.127

1@David: in Mathematica? Always. – J. M.'s ennui – 2012-04-17T11:58:39.150

3@John moving-target questions are generally discouraged. You should consider posting your update as a new question. – Mr.Wizard – 2012-04-17T21:22:08.390

1

I changed everything back and posted a new question: http://mathematica.stackexchange.com/questions/4379/list-operations-only-when-restrictions-are-fulfilled-part-2

– John – 2012-04-17T21:32:02.840

Answers

9

The following also works (though its a bit messy to look at) and will probably be faster than pattern matching if you have large input data.

Transpose[{#[[All, 1, 1]], 
      Total[#[[All, All, 2]], {2}]}] &[#] & /@ (GatherBy[#, First] & /@
    data)

EDIT:

I should point out that there are at least a couple of different problems being solved in the answers posted here. The results posted by myself and @RM attempt to preserve the structure of the original data and so do not gather like dates if they are present in multiple sublists.

Answers posted by @David Carraher and @rcollyer ignore the structure of the original data and gather all like dates.

I'm not sure which is the correct approach in this case but it felt worth pointing out the difference.

Andy Ross

Posted 2012-04-16T23:21:34.763

Reputation: 18 640

15

Here's another one to consider:

{#[[1, 1]], Total[#[[All, 2]]]} & /@ GatherBy[Flatten[data, 1], First]

GatherBy... gathers pairs with the same AbsoluteDate.

{#[[1, 1]], Total[#[[All, 2]]]} & /@ returns {date, totVal} for each date.

DavidC

Posted 2012-04-16T23:21:34.763

Reputation: 15 949

This is efficient, readable, and terse. +1 – Mr.Wizard – 2012-04-17T01:37:45.563

8

Assuming that you did not intend to add an extra layer of lists in your data, for this type of manipulation, I'm a fan of Reap and Sow:

Reap[ Sow[#2, #1]& @@@ Flatten[data,1], _, {#1,Total[#2]}&][[2]]
(* {{3542313600, 175}, {3542918400, 175}, {3544128000, 450}, 
    {3544732800, 450}, {3545337600, 900}, {3545942400, 450}, 
    {3546547200, 450}, {3547152000, 0}}
*)

If it wasn't mistaken, we need to Map the above onto each data set:

Reap[ Sow[#2, #1]& @@@ #, _, {#1,Total[#2]}&][[2]]& /@ data
(* {{{3542313600, 175}, {3542918400, 175}}, 
    {{3544128000, 450}, {3544732800, 450}, {3545337600, 900}, 
     {3545942400, 450}, {3546547200, 450}, {3547152000, 0}}}
*)

rcollyer

Posted 2012-04-16T23:21:34.763

Reputation: 32 561

You've got my vote. :-) – Mr.Wizard – 2012-04-17T01:19:51.000

@Mr.Wizard why? Reap and Sow? Or, the "excessive" use of shorthand? Or, better yet, it's a one liner? – rcollyer – 2012-04-17T01:21:26.397

Reap and Sow only. You can do better if you're going for terse (#1 -> #, Total -> Tr at least). – Mr.Wizard – 2012-04-17T01:26:13.737

@Mr.Wizard that's true, but sometimes it pays to include the 1 for clarity. :P – rcollyer – 2012-04-17T01:35:27.940

Agreed. I was simply explaining that if you want to gain my vote on the basis of shorthand it's going to have to be more thorough. ;-) – Mr.Wizard – 2012-04-17T01:39:13.377

7

Here is another way (partially similar to Andy's solution) :

Transpose[{#[[All, 1, 1]], Total[#[[All, All, 2]], {2}]}] &[#]& @ SplitBy[ Flatten[ data, 1], First]

or similar to David's solution :

{#[[1, 1]], Total[#[[All, 2]]]} & /@ SplitBy[ Flatten[ data, 1], First]

Artes

Posted 2012-04-16T23:21:34.763

Reputation: 51 831

6

For data that is not already grouped by the first element:

dat2 = {{baz, 16}, {foo, 65}, {baz, 41}, {bar, 88}, {foo, 2},
        {bar, 96}, {foo, 20}, {bar, 19}, {foo, 36}, {baz, 90}};

GroupBy (introduced in version 10) offers a concise syntax:

res = GroupBy[dat2, First -> Last, Total]
<|baz -> 147, foo -> 123, bar -> 203|>

Note that this provides the flexibility of specifying First and Last as the functions that yield the parts to compare and combine. For the specific format shown a shorter v10 method is to convert the expressions into a Rule list and Merge them:

Merge[Rule @@@ dat2, Total]
<|baz -> 147, foo -> 123, bar -> 203|>

The result is an Association. Keeping the data in this format has a number of benefits but it can also be converted back to a common List with Normal or KeyValueMap:

List @@@ Normal[res]

KeyValueMap[List, res]
{{baz, 147}, {foo, 123}, {bar, 203}}

{{baz, 147}, {foo, 123}, {bar, 203}}

Mr.Wizard

Posted 2012-04-16T23:21:34.763

Reputation: 259 163

2From Mathematica version 10.1, KeyValueMap can be used on the association: KeyValueMap[List, GroupBy[dat2, First -> Last, Total]] or KeyValueMap[List, Merge[Rule @@@ dat2, Total]]. – None – 2016-01-03T13:41:50.800

@Xavier Thanks for the note. I overlooked this comment before. – Mr.Wizard – 2016-01-15T00:19:51.120

5

This solution doesn't involve pattern matching and should be fast for large lists (pattern matching, especially with ___ slows down tremendously):

Block[{g},
    g[{x_?NumericQ, y_?NumericQ}] := {x, y};
    g[x_List, y__List] := {First@x, Total[{x, y}[[All, 2]]]};
    Apply[g, (GatherBy[#, First] & /@ data), {2}]
]

rm -rf

Posted 2012-04-16T23:21:34.763

Reputation: 85 395

I wrote (completely independently) almost the exact same function (even used g as the dummy function!) in a more recent question that I didn't realise was a duplicate of this one. I even got downvoted and annoyed cause I thought it was a decent answer! Naturally I will give you +1. I had defined g[x_?VectorQ] := x; in place of your g[{x_?NumericQ, y_?NumericQ}] := {x, y}; but I doubt this makes any difference here. – gpap – 2014-08-06T14:12:12.280

4

Here's a straightforward one using replacements:

data /. {h___, {x_, y_}, b___, {x_, z_}, t___} :> {h, {x, y + z}, b, t}
(* Out[1]= {{{3542313600, 175}, {3542918400, 175}}, 
    {{3544128000, 450}, {3544732800, 450}, {3545337600, 900}, {3545942400, 450}, 
     {3546547200, 450}, {3547152000, 0}}} *)

Change the /. or ReplaceAll to //. or ReplaceRepeated for lists with multiple repeated first elements. This method doesn't sort the list.

rm -rf

Posted 2012-04-16T23:21:34.763

Reputation: 85 395