For completeness, here is a way to extend the compiled or LibraryLink approaches to arbitrarily large integers. Since it comes so long after the original answer, I post it separately.

As explained in this answer, we can bridge the gap between arbitrary and machine precision at least somewhat efficiently by using `IntegerDigits`

to express a large integer as a string of base-$2^{62}$ digits, which we are then free to process using efficient compiled code. The only additional trick we need here is to ensure that there are the *same number* of such digits for both of the integers, so that they have a 1:1 correspondence and yield the correct answer when compared.

Let's take as our list of big integer pairs:

```
data = {{2^63, 2^24, 2^56 + 1, 2^84}, {2^64, 2^92, 2^33 - 1, 2^73 - 1}} // Transpose;
```

And this is what we want to get from it:

```
Tr /@ IntegerDigits[BitXor @@@ data, 2]
(* -> {2, 2, 33, 74} *)
```

It's difficult to know *a priori* how many digits the largest integer will have, without which knowledge the digits produced by `IntegerDigits`

are bound to be inconsistent between the inputs. We have two options for producing a full-rank array, i.e.

```
full = IntegerDigits[data, 2^62, Ceiling@Log[2^62, Max @@ data]]
```

or

```
full = PadLeft@IntegerDigits[data, 2^62]
```

of which my testing shows that the second is the better-performing formulation in terms of CPU time, with both being similar enough in their memory consumption as probably makes no difference. Why this should be is not clear, since in the first case, `IntegerDigits`

has the opportunity to produce a packed array, which however it fails to do. In both cases we are clearly wasting some amount of memory to contain these padding digits, and if the integers differ drastically in size, then that could be the primary limitation on performance. If so, then the first possibility is able to save 20% on memory consumption, at the cost of 20% more CPU time. The choice is yours.

Finally, we need to take advantage of the listability and efficient (OpenMP) parallelization of the LibraryLink function over the longest axis of the digit array, and `Map`

over the other, shorter axis. Which is the longest axis depends on whether there are many moderately sized integers, or a few very large ones. Then, we add up the results along the appropriate axis using `Total`

.

For number of integers greater than number of digits per integer:

```
Total[hammingDistanceListable /@ Transpose[full, {2, 3, 1}], {1}]
```

Or for number of digits greater than number of integers:

```
Total[hammingDistanceListable /@ Transpose[full, {1, 3, 2}], {2}]
```

Obviously, this decision and the appropriate type of transpose can be decided at run time. Both give the right answer, of course:

```
(* -> {2, 2, 33, 74} *)
```

Putting it together:

```
hammingDistanceListableBigIntegers[nums : {{_Integer, _Integer} ..}] :=
With[{digits = PadLeft@IntegerDigits[nums, 2^62]},
With[{dims = Dimensions[digits]},
If[First[dims] > Last[dims],
Total[hammingDistanceListable /@ Transpose[digits, {2, 3, 1}], {1}],
Total[hammingDistanceListable /@ Transpose[digits, {1, 3, 2}], {2}]
]
]
];
```

Now let's compare it with `Tr /@ IntegerDigits[BitXor @@@ data, 2]`

:

```
dataLarge = RandomInteger[2^1024, {100000, 2}];
hammingDistanceListableBigIntegers[dataLarge]; // AbsoluteTiming (* -> 0.390625 seconds *)
MaxMemoryUsed[hammingDistanceListableBigIntegers[dataLarge];] (* -> 79.3467 MB *)
Tr /@ IntegerDigits[BitXor @@@ dataLarge, 2]; // AbsoluteTiming (* -> 1.015625 seconds *)
MaxMemoryUsed[Tr /@ IntegerDigits[BitXor @@@ dataLarge, 2];] (* -> 862.148 MB *)
```

So, it is three times faster and an amazing 10 times more memory-efficient than the alternative in this particular case. Amazing, because in order to accomplish this we have had to produce a rather large, un-packed array containing a lot of padding.

Unfortunately, it is not as efficient in every case. Surprisingly, it actually takes three times longer if the integers are only 512 bits in length rather than 1024! As a result, it will be beaten by the top-level code. Thus far, I haven't been able to fully understand the curious performance characteristics of this approach.

If you are still interested in this, please see this question and my answer. I realised that you can deal with large integers simply by using

– Oleksandr R. – 2015-07-04T00:45:20.740`IntegerDigits`

to partition them into machine-size chunks.No idea why, but replacing

`Total`

with`Tr`

in your definition of`m2`

gives you a ~20% improvement in speed (on my machine). – gpap – 2013-04-16T19:09:45.600@gpap

`Tr`

is often a bit faster than`Total`

on packed arrays, at least in version 7. I'm assuming that's the case here. – Mr.Wizard – 2013-04-16T19:49:50.6803If you can easily transpose or concatenate your data to consist of few really big integers (even hundreds of millions of binary digits) instead of many smaller ones, both

`Tr@IntegerDigits`

is faster and`DigitCount`

is entirely feasible. Speedup may be even 400-fold. – kirma – 2013-04-16T21:51:53.893@kirma Please consider posting your approach in light of my edit. – István Zachar – 2013-04-17T09:07:50.183

@IstvánZachar Sadly enough, I don't believe this is generalizable to your situation. – kirma – 2013-04-17T11:37:10.440