## Modular arithmetic - efficiently calculating the remainders of factorials

11

When working on this question regarding the divisibility of the sum of factorials, I decided to write some code to test "small values" of the problem using the following code.

f[p_] := Total[Mod[#!, p] & /@ Range[p - 1]];
Table[Mod[f@Prime@i, Prime@i], {i, 1, 500}]


Basically, what the code does is sum up all the factorials $$1!+2!+3!+\dots+(p-1)!$$

and find the remainder modulo $p$, for prime $p$.

Unfortunately, my code as written takes a very long time to run. Checking the first 500 primes takes 88.280966 seconds on my computer, but checking the first 2000 primes took me about 4 hours.

Is there any way to improve the code, or is it already the best we can do?

As for optimizations not involving the code, I used Wilson's Theorem, which states that for all primes $p$,

$$(p-1)!\equiv-1 \bmod p$$

Using the above theorem, we can modify the code as follows.

h[p_] := Total@Flatten[{Mod[#!, p], PowerMod[(# - 1)!*(-1)^(#), -1, p]} & /@ Range[(p - 1)/2]];
Table[Mod[h@Prime@i, Prime@i], {i, 1, 500}]


This is considerably faster than the previous code, since checking the first 500 primes takes only 25.896166 seconds. However, checking the first 2000 primes still takes an inordinately long time.

14

This is bit faster:

toPrime = 500;
sums = Accumulate@FoldList[Times, 1, Range[2, Prime@toPrime - 1]];
primes = Prime[Range[toPrime]];
Mod[sums[[primes - 1]], primes]


Precompute factorial sums and primes. Mod is fast on lists.

2Is there any reason why you used #1 #2 & instead of Times? – J. M.'s ennui – 2015-06-15T12:53:18.480

@J. M. I can't remember...Probably because I think of Fold in a pedestrian manner in terms of #1 and #2 -- not seeing the forest for the trees, so to speak. Thanks. – Michael E2 – 2015-06-15T15:07:05.737

1you are way too humble! calculating your sum for even the first 2000 primes takes less than a second. However, is there a way to get around storing large numbers in sums in memory? It keeps crashing my computer when I try toPrime=5000. – Vincent Tjeng – 2013-03-26T06:03:25.503

+1 (that's freaking fast!) Can you explain why Accumulate@FoldList[#1 #2 &, 1, Range[n] is so much quicker to Accumulate@Array[#! &, n] + 1? I really don't get it. – gpap – 2013-03-26T11:23:20.127

@gpap Calculating factorial so many times costs a lot. Since we know we want all the factorials up to Prime[toPrime]-1, we ultimately gain a lot keeping the intermediate results with FoldList. – Michael E2 – 2013-03-26T12:04:08.090

@MichaelE2 Yes, worked it out myself in the meantime - you multiply the previous result and don't calculate a factorial at every step. Thanks – gpap – 2013-03-26T12:11:53.483

@VincentTjeng Not off hand. The speed up comes from storing the sums of factorials. If for instance you reduced the factorials mod $p$ before summing, you would save a lot of space but would have to computed the factorials over and over (for each prime $p$). The factorials take as much space to store as their sums. Well, maybe something will occur to somebody. – Michael E2 – 2013-03-26T12:16:29.117

@MichaelE2 I hope you don't mind me delaying awarding you the answer; I hope someone might come around and come up with an even better way to do so, and I think marking the question as answered will serve to discourage people from looking here! – Vincent Tjeng – 2013-03-26T13:46:38.200

@VincentTjeng It's a good idea to delay. More people are likely to look at the question, and we all would like to see a better answer, if there is one. – Michael E2 – 2013-03-26T14:05:54.107

5

Let $x \equiv r_1 \bmod p$ and $y \equiv r_2 \bmod p$. Then, $x y \equiv r_1 r_2 \bmod p$. So, we can compute the sum of the factorials mod p using:

f[p_] := Mod[ Total @ FoldList[ Mod[Times[##], p]&, Range[p-1]], p]


Let's compare this to the naive implementation:

t[p_] := Mod[Sum[k!, {k, p-1}], p]


For example:

f[Prime[500]] //AbsoluteTiming
t[Prime[500]] //AbsoluteTiming


{0.00058, 1813}

{0.085628, 1813}

The nice thing about using Mod[Times[##], p] as the FoldList function is that the output should be a machine number unless you are working with very large primes. That means that f can be compiled:

fc = Compile[{{p, _Integer}},
Mod[ Total @ FoldList[ Mod[Times[##], p]&, Range[p-1]], p],
RuntimeAttributes->{Listable}
];


Let's compare for a large prime:

f[Prime[10^6]] //AbsoluteTiming
fc[Prime[10^6]] //AbsoluteTiming


{1.83041, 9308538}

{1.05449, 9308538}

Faster, but the real difference is that fc is Listable. Hence, comparing timings on a list shows a much larger difference. For example:

r1 = f /@ Prime[Range[5000, 5500]]; //AbsoluteTiming
r2 = fc[Prime[Range[5000, 5500]]]; //AbsoluteTiming

r1 === r2


{2.06691, Null}

{0.395402, Null}

True

Finally, since the list of all factorials is not stored anywhere, the memory used is much more manageable. Here is the memory and timing for the first 5000 primes:

r1 = fc[Prime[Range[5000]]]; //MaxMemoryUsed //AbsoluteTiming


{3.03463, 462848}

This is quite a bit smaller than @MichaelE2's answer:

MaxMemoryUsed[
toPrime = 5000;
sums = Accumulate@FoldList[Times, 1, Range[2, Prime@toPrime - 1]];
primes = Prime[Range[toPrime]];
r2 = Mod[sums[[primes - 1]], primes]
] //AbsoluteTiming

r1 === r2


{2.40441, 4064607008}

True

The compiled answer uses .462KB while the approach where all the factorials are precomputed takes 4GB, and the timing is not too different.