Improving Map Function on Lists

8

2

Dealing with spectral data (2D lists) where offsets (corresponding 1D list) must be removed from each individual spectrum. Looking for feedback on improvements over my current method.

    offset = {53.0617, 52.5185, 53.2469, 52.8025, 53.2716, 53.284, 53.2716, 53.6049, 53.5062, 53.642};

    data = {{-0.0617284, -1.51852, -2.24691, -1.80247, -2.2716, -0.283951, -2.2716, -4.60494, -2.50617, -2.64198}, {0.938272, -1.51852, -0.246914, 0.197531, -0.271605, 0.716049, 1.7284, 0.395062, -0.506173, 1.35802}, {-0.0617284, 0.481481, -0.246914, 0.197531, -0.271605, -0.283951, -1.2716, -2.60494, -2.50617, -0.641975}, {-4.06173, -3.51852, -2.24691, -1.80247, -2.2716, -2.28395, -2.2716, -2.60494, -2.50617, -0.641975}, {-0.0617284, -1.51852, -2.24691, -1.80247, -2.2716, -2.28395, -2.2716, -2.60494, -2.50617, -4.64198}, {-2.06173, -1.51852, -0.246914, 0.197531, -2.2716, -2.28395, -1.2716, -0.604938, -0.506173, -2.64198}, {-4.06173, -3.51852, -2.24691, -3.80247, -4.2716, -3.28395, -2.2716, -3.60494, -4.50617, -4.64198}, {-3.06173, -3.51852, -2.24691, -3.80247, -3.2716, -2.28395, -2.2716, -3.60494, -4.50617, -2.64198}, {-2.06173, -3.51852, -2.24691, -3.80247, -3.2716, -2.28395, -2.2716, -2.60494, -2.50617, -4.64198}, {-1.06173, -3.51852, -2.24691, -1.80247, -4.2716, -3.28395, -1.2716, -0.604938, -0.506173, -0.641975}};

I currently always use the following MMA code snippet for processing.

    data = (# - offset) & /@ data;

Is there a better use of Thread, Map, etc. that may be considered? The data sets typically include 1000's of spectra each 1000-2000 points long. So 2D list with some million values.

OpticsMan

Posted 2019-12-14T19:47:41.230

Reputation: 191

What do you mean by "better"? More performant? – Carl Lange – 2019-12-14T20:03:08.737

In what sense better? Faster? – mikado – 2019-12-14T20:03:09.920

1@mikado Jinx - we asked the same question within one second of the other! – Carl Lange – 2019-12-14T20:04:32.910

Faster is desired. Not concerned about memory usage, etc. – OpticsMan – 2019-12-14T20:16:26.513

Thanks for the suggestions. Never considered compiling such a simple expression. But the 5x speed-up is impressive. – OpticsMan – 2019-12-15T17:55:32.603

Related: (23395)

– Mr.Wizard – 2019-12-15T18:41:12.387

Answers

7

{n, m} = {10^4, 10^4};
offset = RandomReal[1, n];

data = RandomReal[1, {m, n}];

cf = Compile[{{v, _Real, 1}, {offset, _Real, 1}}, 
   Table[v[[i]] - offset[[i]], {i, Length[v]}],
   RuntimeAttributes -> {Listable}, CompilationTarget -> "C", 
   RuntimeOptions -> "Speed"];

r1 = (# - offset) & /@ data; // RepeatedTiming
r2 = Plus[data, ConstantArray[-offset, m]]; // RepeatedTiming
r3 = ArrayReshape[Outer[Plus, Developer`ToPackedArray@{-offset}, data, 1], 
      {m, n}]; // RepeatedTiming
r4 = cf[data, offset]; // RepeatedTiming

r1 == r2 == r3 == r4

Output

{1.08, Null}

{0.557, Null}

{0.233, Null}

{0.20, Null}

True

chyanog

Posted 2019-12-14T19:47:41.230

Reputation: 11 827

6

A slightly faster method uses KroneckerProduct to create a suitable matrix of offsets. Some data:

{n, m} = {10^4, 10^4};
offset = RandomReal[100, n];
data = RandomReal[100, {m, n}];

Your method:

r1 = (#-offset)& /@ data; //AbsoluteTiming

{1.80568, Null}

Using KroneckerProduct:

r2 = data + KroneckerProduct[ConstantArray[-1., m], offset]; //AbsoluteTiming

{0.830738, Null}

Check:

r1 == r2

True

Carl Woll

Posted 2019-12-14T19:47:41.230

Reputation: 112 778

3While I love a good use of KroneckerProduct, why not just use ConstantArray[-offset, m]? – NonDairyNeutrino – 2019-12-14T22:29:02.933

4

The Map version looks efficient compared with MapThread.

data2 = Flatten[ConstantArray[data, 100000], 1];
First[Timing[data3 = (# - offset) & /@ data2;]]

0.384383

First[Timing[
  data4 = MapThread[
     Plus, {data2, -ConstantArray[offset, Length[data2]]}];]]

2.80672

data3 == data4

True

Chris Degnen

Posted 2019-12-14T19:47:41.230

Reputation: 27 033