## Fastest square number test

23

6

What is the fastest possible square number test in Mathematica, both for machine size and big integers?

I presume starting in version 8 the fastest will be a dedicated C LibraryLink function.

Even in version 7, you have MathLink. I don't know how much the MathLink overhead is, but on the local machine MathLink should be very fast as it uses memory mapped files for communication. – Szabolcs – 2012-01-21T14:45:04.300

@Szabolcs; you're right; I was curious what would be best in native Mathematica, but if someone cares to create a robust MathLink executable and share it here I certainly won't turn it down! – Mr.Wizard – 2012-01-21T15:01:01.747

2The overhead of messaging involved in MatheLink would kill it. Possibly Mathematica 8's ability to load a DLL would help if you used C. But consider if your application really warrants this compared to what I offered below. The speed up you may get is probably not worth the effort. – Sal Mangano – 2012-01-21T16:04:40.717

## Answers

7

Here's an idea similar to Carl Woll's that's a little faster:

sQ[n_] := FractionalPart@Sqrt[n + 01] == 0;


Here are some timing runs similar to @fgrieu's:

timeRun[f_] := Module[{a, m},
a = (2^1024 - 3^644)^2;
m = (2^1024 - 3^644)^2 + 9;
First@ AbsoluteTiming@ Do[f[n], {n, m - 200000, m}]
]

timeRun2[f_] :=
First@ AbsoluteTiming[
Do[
f /@ (n^2 + {-2, -1, 0, 1, 2}),
{n, 2^1357, 0, -Floor[2^1357/99]}]
];


Tests of a long sequence of consecutive integers about single large square number:

timeRun[sQ]
timeRun[SqQ]
timeRun[sqQ1]
timeRun[SquareQ2]
timeRun[SquareQ08]
(*
0.626601  sQ
0.789668  SqQ (@fgrieu)
1.11774   sqQ1 (@CarlWoll)
1.63489   SquareQ2 (@Mr.Wizard)
3.39258   SquareQ08 (@KennyColnago)
*)


Tests of short sequences of consecutive integers about many small to large square numbers:

timeRun2[sQ]
timeRun2[SqQ]
timeRun2[sqQ1]
timeRun2[SquareQ2]
timeRun2[SquareQ08]
(*
0.002639   sQ
0.003289   SqQ
0.0039     sqQ1
0.005791   SquareQ2
0.01749    SquareQ08
*)


A test of just smaller numbers:

aa = 1; bb = 10^6;
AbsoluteTiming@Do[sQ@(n), {n, aa, bb}]
AbsoluteTiming@Do[SqQ@(n), {n, aa, bb}]
AbsoluteTiming@Do[sqQ1@(n), {n, aa, bb}]
AbsoluteTiming@Do[SquareQ2@(n), {n, aa, bb}]
AbsoluteTiming@Do[SquareQ08@(n), {n, aa, bb}]
(*
{2.34658, Null}
{3.2571,  Null}
{3.18561, Null}
{3.42899, Null}
{3.25997, Null}
*)


If you want to verify its accuracy, you can test it against other solutions like this:

aa = 10^20 - 100; bb = aa + 10^3;
Table[sQ[n], {n, aa, bb}] === Table[IntegerQ@Sqrt[n], {n, aa, bb}]
(*  True  *)

aa = 1; bb = 10^6;
Table[sQ[n], {n, aa, bb}] === Table[IntegerQ@Sqrt[n], {n, aa, bb}]
(*  True  *)


Admirable simplicity!! To watch: sQ's correctness varies with Mathematica version. On M4.0 PowerPC, sQ[9] is False. On M5.2 x86, sQ[2] is True. On M7.0.1 x64 it passes my tests (negative of squares give True but that's a matter of convention). Same on M12.0 x64, except for a hiccup at -1. Inexact arithmetic scares me. Also: I tuned my SqQ by removing n>=0 && and it wins more benchmarks (negatives are still never squares). Incorporating your idea is next (imitation is the sincerest form of flattery). – fgrieu – 2019-11-01T14:52:57.363

12

## Update

Sorry for my ignorance not taking into account that the question specifically asked for a Mathematica 7 solution. I updated the complete post.

## Mathematica 7

In Mathematica 7 we don't have the option the compile code into a C-library which includes the thread parallelization which can be turned on when using RuntimeAttributes->Listable and Parallelization->True. Therefore, acl's solution will not run in Mathematica 7 because the RuntimeAttributes option for Compile was introduced in version 8.

This leaves the possibility to not compile the used function and make it a normal Mathematica function where you can set the attribute Listable. I tried this, but it was horrible slow.

After a bit of research I found a nice solution which uses some number-properties in base 16. Since (at least in V7) it seems somewhat hard to return lists of True|False, I use 0 and 1 where 0 means no square.

fPat = Compile[{{numbers, _Integer, 1}},
With[{l = Length[numbers]},
Module[{n = 0, i = 0, h = 0, test = 0.0, result = Table[0, {l}]},
For[i = 1, i <= l, ++i,
n = numbers[[i]];
h = BitAnd[15, n];
If[h > 9, Continue[]];
If[h != 2 && h != 3 && h != 5 && h != 6 && h != 7 && h != 8,
test = Sqrt[n];
result[[i]] = test == Floor[test];
];
];
result
]
]
];


Comparing this with the almost one-liner of Sal gives

data = Table[i, {i, 1, 10^6}];

fSal = Compile[{{n, _Integer}},
With[{test = Sqrt[n]}, Floor[test] == test]];

BarChart[{Timing[fSal /@ data][[1]], Timing[fPat[data]][[1]]
}, ChartLabels -> {"Sal Mangano", "Patrick V7"},
ChartStyle -> {Gray, Green}]


I leave it to you to decide whether such a C-like programming style is worth the small speed up.

## Mathematica 8

The fastest way (using Mathematica only) I know is to compile a C-library and process all data in parallel. Since most computers these days have at least 2 cores, this gives a boost. In Mathematica 8 the compilation to a C-library does not copy the data when it is called.

To make the computation parallel you have to use the Parallization option and the compiled function must be Listable. If you are sure of your input-data, you can additionally switch off most of the data-checks by using RuntimeOptions set to "Speed".

Update I include here the parallelized version of the Mathematica 7 code above:

data = Table[i, {i, 1, 10^6}];

fSal = Compile[{{n, _Integer}},
With[{test = Sqrt[n]}, Floor[test] == test]];
fAcl = Compile[{{n, _Integer}},
With[{test = Sqrt[n]}, Floor[test] == test],
RuntimeAttributes -> {Listable}];
fPat = Compile[{{n, _Integer}},
With[{test = Sqrt[n]}, Floor[test] == test],
CompilationTarget -> "C", RuntimeAttributes -> {Listable},
Parallelization -> True, RuntimeOptions -> "Speed"];

fPat2 = Compile[{{numbers, _Integer, 1}},
With[{l = Length[numbers]},
Module[{n = 0, i = 0, h = 0, test = 0.0, result = Table[0, {l}]},
For[i = 1, i <= l, ++i,
n = numbers[[i]];
h = BitAnd[15, n];
If[h > 9, Continue[]];
If[h != 2 && h != 3 && h != 5 && h != 6 && h != 7 && h != 8,
test = Sqrt[n];
result[[i]] = test == Floor[test];
];
];
result
]
], CompilationTarget -> "C", RuntimeAttributes -> {Listable},
Parallelization -> True, RuntimeOptions -> "Speed"
];

BarChart[{Timing[fSal /@ data][[1]], Timing[fAcl[data]][[1]],
Timing[fPat[data]][[1]],
Timing[fPat2[data]][[1]]},
ChartLabels -> {"Sal Mangano", "acl", "Patrick",
"Patrick V7 parallel"},
ChartStyle -> {Gray, Gray, Darker[Green], Green}]


The results here come from my MacBook in battery-save mode which has 2 Intel cores. The disadvantage is that you need a C-compiler installed on your system which is most likely not true for the majority of Mathematica users.

1

Here's some background info on the method employed in fPat and fPat2 for others. I was almost done implementing it, but I see you beat me to it :) I believe that will be the fastest of all approaches

– rm -rf – 2012-01-21T21:14:38.263

@R.M, sorry ;-) I was ashamed that I didn't read the question carefully and I felt that I should correct this. – halirutan – 2012-01-21T21:52:04.770

btw, really nice is that in a compiled function result[[i]] = test == Floor[test]; works as expected for integers and gives a speed-up compared to its if-then-else counterpart. – halirutan – 2012-01-21T21:53:57.747

Very nice, +1. Mind you, @Mr.W will probably not consider this mathematica code, though: http://stackoverflow.com/a/8891064/559318 [see comments] :)

– acl – 2012-01-21T23:40:33.783

1At least I didn't use Goto ;-) – halirutan – 2012-01-21T23:50:58.503

@acl in this case I specified fastest possible and I meant it. I'd like to use it for some Project Euler stuff. I don't know that I can make use of a Listable function. For machine integers the Compile solutions are all much faster than IntegerQ@Sqrt@# & but what about big integers? – Mr.Wizard – 2012-01-22T04:46:55.773

11

I voted for all three previous answer because they all taught me something. However they, being Compile solutions, are not helpful with big integers.

At least on my system, Sal Mangano's code appears reducible to this without loss of speed:

isSq2 = Compile[n, Floor@# == # & @ Sqrt @ n];


For big integers between about 2*10^9 and 2*10^11 I am currently using this code from Sasha:

SquareQ =
JacobiSymbol[#, 13] =!= -1 &&
JacobiSymbol[#, 19] =!= -1 &&
JacobiSymbol[#, 17] =!= -1 &&
JacobiSymbol[#, 23] =!= -1 &&
IntegerQ@Sqrt@# &;


For integers larger than that I am using code (modified) from Daniel Lichtblau:

SquareQ2 = # == Round@# & @ Sqrt @ N[#, Log[10, #] + $MachinePrecision] &;  @Mr.Wizard SquareQ2 = FractionalPart@Sqrt@N[#, 2 Log[10, #]] == 0 &; By removing the extra step you get marginal improvement. Of course, we both know that there is a big assumption there, which is probably not true for sufficiently large$N$. – mhp – 2014-11-06T08:17:05.117 1Also, this is all academic. No serious application would use this method. Mathematica is missing integer remainder square root function$(N-\lfloor \sqrt{N} \rfloor^2)$which would solve this problem. The computation of that function is more efficient than multiplication of two numbers of the size of$N$. – mhp – 2014-11-06T08:31:50.820 @mhp I haven't thought this through but could PowerMod be of any use here? – Mr.Wizard – 2014-11-06T08:38:38.880 2 Not really. It took me several weeks to develop the algorithm, just to find out that someone had already beat me to it. See: https://hal.inria.fr/inria-00072854/PDF/RR-3805.pdf – mhp – 2014-11-06T08:42:14.487 2@rm -rf JacobiSymbol[n,p]=1 for any odd prime$p$when$n$is square. JacobiSymbol[n,p]=1 for some odd primes$p$when$n$is not square. JacobiSymbol[n,p]=0 for any odd prime$p$that divides$n$, where$n$may or may not be square. JacobiSymbol[n,p]=-1 otherwise. Hence, JacobiSymbol[n,p]=-1 means$n$is not square. Evaluating JacobiSymbol[n,p] is much faster than Sqrt[n]. Use JacobiSymbol to filter candidate$n$before passing to Sqrt[n]. I tested with 8$p$near 541 on$n>10^{11}$and found it faster than SquareQ by @Sasha and SquareQ2 by @Daniel Lichtblau. – KennyColnago – 2012-10-18T16:41:45.417 @KennyColnago would you please post your complete function as an answer? – Mr.Wizard – 2012-10-19T00:24:55.787 Speaking of the code you say you got from Sasha: have a look at the definition of NumberTheory\PowersRepresentationsDumpProbablePerfectSquareQ[] sometime... – J. M.'s ennui – 2012-10-20T15:46:41.047 @J.M. what to I need to Get to load that function? – Mr.Wizard – 2012-10-20T16:00:53.433 It's supposed to be one of the subroutines used by PowersRepresentations[]; if you have that, you're supposed to have the Jacobi symbol-based test as well... (unless the implementation changed a bit between version seven and version eight) – J. M.'s ennui – 2012-10-20T16:08:09.740 @J.M. Ah yes, I just needed to use PowersRepresentations once to load it. Quite similar indeed... – Mr.Wizard – 2012-10-20T16:45:59.430 Could you add references or a short explanation as to how JacobiSymbol comes into play here. I don't know the underlying workings of this and I'd like to know... – rm -rf – 2012-02-20T14:21:37.630 @R.M you would do better to ask Sasha. – Mr.Wizard – 2012-02-20T15:22:14.687 9 I don't think there are any built-in functions for this but the following is probably fast enough for most purposes. isSq = Compile[{{n, _Integer}}, With[{test = Sqrt[n]}, Floor[test] == test]];  Does 1 million integers in under a second. Timing[Table[isSq[i], {i, 1, 1000000}]][[1]] (* 0.76195 *)  This is under 2 orders of magnitude faster than the un-compiled equivalent, by the way. 1The Sqrt operation is comparatively expensive; I believe one should filter out numbers that are destined to fail with a less expensive test before progressing to Sqrt. – Mr.Wizard – 2012-01-21T15:08:00.817 Adding a test for EvenQ gives a slight speed up but not much. If the numbers are very large it probably helps more. Testing for ending in 0,1,4,6,9, or 25 is possible but I have not tried it and seem like overkill. – Sal Mangano – 2012-01-21T15:23:30.953 1Ignore what I said about EvenQ! Need coffee! – Sal Mangano – 2012-01-21T15:30:15.750 7 More info as requested by @Mr.Wizard. For$n$below the$\approx 2*10^9$limit, Compile gives the fastest solutions. For larger$n$, Sasha used JacobiSymbol with four primes 13, 19, 17, and 23, before resorting to the expensive IntegerQ[Sqrt[n]]. The number of ambiguous cases where JacobiSymbol[n,p]=0 decreases as the size of the prime$p$increases. So using larger$p$helps filter out more candidates before Sqrt must be called. Similarly, using more primes filters more candidates. However, the computation of JacobiSymbol slows as the number of and size of$p$increases (no free lunch). As a rough balance, I used SquareQ08. SquareQ08[n_] := JacobiSymbol[n, 541] =!= -1 && JacobiSymbol[n, 547] =!= -1 && JacobiSymbol[n, 557] =!= -1 && JacobiSymbol[n, 563] =!= -1 && JacobiSymbol[n, 569] =!= -1 && JacobiSymbol[n, 647] =!= -1 && JacobiSymbol[n, 653] =!= -1 && JacobiSymbol[n, 659] =!= -1 && IntegerQ[Sqrt[n]] SetAttributes[SquareQ08, Listable]  @KennyColnago I just found this thread, and tried it myself. I tried isSq2 as well as SquareQ08 on sets of 100 random integers in various ranges (10^8 to 10^9, 10^9 to 10^10, 10^10 to 10^11, and 10^11 to 10^12) with RepeatedTiming. In each case, the results were remarkably consistent - isSq2 was about 7.4 x 10^{-5} seconds and SquareQ08 was about 4.5 x 10^{-4} seconds. This is on Mathematica 10.3; I wonder what version you were using and if the performance of isQq2 was somehow improved. – rogerl – 2015-12-24T17:09:33.147 @rogerl I don't recall which version was used. On version 10.3, I find isSq2 about 6 to 10 times faster than SquareQ08, which is written for integers of any size. I do not understand how isSq2, which is compiled, works with integers larger than 2 billion. Can you explain? – KennyColnago – 2015-12-25T01:09:07.037 @KennyColnago Well, my machine is a 64-bit machine, so my reading of the tea leaves is that Compile will work with 64-bit integers (or about$10^18$). Is that right? And, just to be clear, your result regarding isQq1 and SquareQ08 that you give in your comment above is at variance with your original answer, right? Do you know what's going on, or am I just confused? – rogerl – 2015-12-25T01:44:04.047 @rogerl Having failed before by using Compile on integers greater than 2 billion, my original answer never tested isSq2 on such large numbers. Therefore, the result in the comment is new, and not at odds with the original answer. However, before I trust any compiled result on large integers, I must see a documented source which explains that Compile on 64 bit machines works on integers above 2 billion. – KennyColnago – 2015-12-25T01:57:06.510 @KennyColnago You might want to look at this question that I posted this morning. I've done some more investigation of this issue. – rogerl – 2015-12-25T16:12:05.393 1Would it be advantageous to select the number of primes to use based on the size of the integer? – s0rce – 2012-10-24T20:31:13.033 @s0rce My rough tests showed that about 50% of candidate$n$were filtered out for each JacobiSymbol test, and that this percentage was independent of the size of$n$. Your mileage may vary... – KennyColnago – 2012-10-24T23:55:09.583 6 This is a variation of Daniel Lichtblau's contribution that avoids the need to compute logarithms: sqQ1[i_Integer] := Floor[Sqrt[i + If[i>10^16, .11, .1]]]^2 == i  It seems to be a bit faster than SquareQ2. For example: n = 432^2; sqQ1[n] //RepeatedTiming SquareQ2[n]//RepeatedTiming  {2.42*10^-6, True} {3.2*10^-6, True} and: n = 43212113212231231241334^2; sqQ1[n] //RepeatedTiming SquareQ2[n]//RepeatedTiming  {3.61*10^-6, True} {5.3*10^-6, True} But not always: n = 432121231231241334^2; sqQ1[n] //RepeatedTiming SquareQ2[n]//RepeatedTiming  {7.8*10^-6, True} {5.26*10^-6, True} A "listable" version appears to be faster than the compiled versions (at least when the maximum value is less than 10^16): sqQ2[x:{__Integer}] := With[{add = If[Max[x]>10^16, .11, .1]}, UnitStep[Floor[Sqrt[x+add]]^2 - x] ]  Comparison with fPat2: data = RandomInteger[10^15, 10^6]; r1 = sqQ2[data]; //RepeatedTiming r2 = fPat2[data]; //RepeatedTiming r1 === r2  {0.0075, Null} {0.023, Null} True Of course, sqQ2 works for any size integers, while the compile solutions only work for integers less than Developer$MaxMachineInteger.

5

The following is optimized for large values. The main idea is to reduce the integer tested modulo a product of small primes less than 64-bit, so that the cost is low and linear with the bit size of the argument, and filter the remainder using precomputed Jacobi tables to weed out all except very few (1/11595) of the non-squares.

SqQ::usage =
"SqQ[n] is True when n is an exact square, and False otherwise.";
(* We reduce n modulo a product of small primes and use *)
(* pre-computed tables of Jacobi symbols to filters out *)
(* most non-squares with a single multi-precision operation. *)
(* We use IntegerQ[Sqrt[n]] on less than 1/11595 integers. *)
(* Pre-computed variables starting in SqQ$$are for internal use; *) SqQ$$m = (SqQ$$0 = 59*13*7*5*3)*(SqQ$$1 = 23*19*17*11)*
(SqQ$$2 = 47*37*31) *(SqQ$$3 = 43*41*29);
SqQ$$u = SqQ$$v = SqQ$$w = SqQ$$x = 0;
Block[{j},
For[j = SqQ$$0, j-- > 0, SqQ$$u += SqQ$$u + If[ JacobiSymbol[j, 59] < 0 || JacobiSymbol[j, 13] < 0 || JacobiSymbol[j, 7] < 0 || JacobiSymbol[j, 5] < 0 || JacobiSymbol[j, 3] < 0, 1, 0]]; For[j = SqQ$$1, j-- > 0, SqQ$$v += SqQ$$v + If[
JacobiSymbol[j, 23] < 0 || JacobiSymbol[j, 19] < 0 ||
JacobiSymbol[j, 17] < 0 || JacobiSymbol[j, 11] < 0, 1, 0]];
For[j = SqQ$$2, j-- > 0, SqQ$$w += SqQ$$w + If[ JacobiSymbol[j, 47] < 0 || JacobiSymbol[j, 37] < 0 || JacobiSymbol[j, 31] < 0, 1, 0]]; For[j = SqQ$$3, j-- > 0, SqQ$$x += SqQ$$x + If[
JacobiSymbol[j, 43] < 0 || JacobiSymbol[j, 41] < 0 ||
JacobiSymbol[j, 29] < 0, 1, 0]]
];
(* The function itself starts here *)
SqQ[n_Integer] := Block[{m = Mod[n, SqQ$$m]}, BitGet[SqQ$$u, Mod[m, SqQ$$0]] == 0 && BitGet[SqQ$$v, Mod[m, SqQ$$1]] == 0 && BitGet[SqQ$$w, Mod[m, SqQ$$2]] == 0 && BitGet[SqQ$$x, Mod[m, SqQ\$3]] == 0 &&
IntegerQ[Sqrt[n]]]
(* Automatically thread over lists *)
SetAttributes[SqQ, Listable];


It comfortably beats sqQ1, SquareQ2 and SqareQ08 when benchmarked with large non-squares

m = (2^1024 - 3^644)^2 + 9;
Timing[s = 0;
For[n = m - 200000, n < m, ++n, If[SqQ[n], ++s]];
s == 1]


and more narrowly so when benchmarked/validated as

Timing[For[n = 2^1357,
n > 0 && SqQ[s = n^2] && ! SqQ[s + 1] && ! SqQ[s + 2], --n,
n -= Floor[n/99]]; n == 0]


5

I am not sure how to speed up each comparison (as in, I spent half an hour trying different things and didn't manage to), but making the compiled function listable speeds things up quite a bit.

If isSq is the direct implementation that Sal gave, simply make it listable and compare:

isSqL = Compile[
{{n, _Integer}}, With[{test = Sqrt[n]}, Floor[test] == test],
RuntimeAttributes -> {Listable}
];


and then compare:

Timing[Table[isSq[i], {i, 1, 10^6}]]; // Timing
isSq /@ Range[1, 10^6]; // Timing
isSqL[Range[1, 10^6]]; // Timing
(*
{0.697799, Null}
{0.545856, Null}
{0.150171, Null}
*)


ie, it's 3-4 times faster.

What makes you say Sqrt is expensive? (ie, compared to what?).

True, but this does not speed up the sq test per se it is just a faster way to apply the function across a list than Table is. Good idea though! – Sal Mangano – 2012-01-21T16:25:35.100

@Sal That's true, but I can't get the test faster. Looking at CompilePrint[isSq], it's hard to see anything more costly thatn Sqrt. – acl – 2012-01-21T16:34:31.623

Perhaps, you'd like to use AbsoluteTiming for these tests, especially if stuff is running in parallel. See the "How can I improve the speed of eigenvalue decompositions for large matrices?" thread. – None – 2012-01-21T17:49:32.687

@ruebenko that is a good point (in this case, though, they give the same answer, so I'll leave this the way it is--but your point about running them in parallel stands) – acl – 2012-01-21T20:17:00.603