## Can we speed up the Grover's Algorithm by running parallel processes?

10

3

In classical computing, we can run the key search (for example AES) by running parallel computing nodes as many as possible.

It is clear that we can run many Grover's algorithms, too.

My question is; it possible to have a speed up using more than one Grover's algorithm as in classical computing?

6

Certainly! Imagine you have $$K=2^k$$ copies of the search oracle $$U_S$$ that you can use. Normally, you'd search by iterating the action $$H^{\otimes n}(\mathbb{I}_n-2|0\rangle\langle 0|^{\otimes n})H^{\otimes n}U_S,$$ starting from an initial state $$(H|0\rangle)^{\otimes n}$$. This takes time $$\Theta(\sqrt{N})$$. (I'm using $$\mathbb{I}_n$$ to denote the $$2^n\times 2^n$$ identity matrix.)

You could replace this with $$2^k$$ parallel copies, each indexed by an $$x\in\{0,1\}^k$$, using $$\left(\mathbb{I}_k\otimes H^{\otimes (n-k)}\right)\mathbb{I}_k\otimes(\mathbb{I}_{n-k}-2|0\rangle\langle 0|^{\otimes (n-k)})\left(\mathbb{I}_k\otimes H^{\otimes (n-k)}\right)U_S$$ and starting from a state $$|x\rangle(H|0\rangle)^{\otimes(n-k)}$$ The time required for running these would be reduced to $$O(\sqrt{N/K})$$, at the cost of requiring $$K$$ times more space.

In a scaling sense, one might consider this an irrelevant result. If you have a fixed number of oracles, $$K$$, then you get a fixed ($$\sqrt{K}$$) improvement (just like, if you have $$K$$ parallel classical cores, the best improvement you can get is a factor of $$K$$), and that does not change the scaling. But it does change the fundamental running time. We know that Grover's algorithm is exactly optimal. It takes the absolute minimum time possible with a single oracle. So, knowing that you get a $$\sqrt{K}$$ improvement in time is useful with regards to that benchmark of a specific running time at a specific value of $$N$$.

but if you do this, the comparison with the classical performance loses some of its meaning, doesn't it? After all, you can also speed up the classical search by running the operation that checks if a given $x$ is the target in parallel over all the inputs. That clearly requires additional assumptions over the available resources, but the same kind of assumptions that are made in your argument – glS – 2018-10-25T11:55:07.470

1$N$ goes to infinity but $K$ does not. Your problem gets bigger but your resources remain few. – AHusain – 2018-10-25T14:15:35.823

2This answer is correct (though it may not be optimal, as DaftWullie does warn). This is the same attutude towards parallelization as one takes in classical circuit complexity. If you want a speed-up due to parallelization, then you look to the circuit depth (because co-ordinating multiple processes isn't going to reduce the total work). It doesn't even matter if $K$ is constant --- either you're interested in the depth improvement from parallelization, or you're not. As with quantum computation itself, merely throwing more computers at a problem doesn't magically make everything fast! – Niel de Beaudrap – 2018-10-26T08:17:30.263

3

In a sense, if we were doing it in parallel on different nodes, you would save time for running. But if we talk about complexity (that is what we refer to speedup generally), we need a bit of analysis.

You agree that we need about $$\sqrt{N}$$ operations for the non-parallel case. Say we have two nodes, and we separate the list of N elements into two lists of size $$N_1,N_2$$. The search on the sub-lists takes about $$\sqrt{N_1},\sqrt{N_2}$$.

However, we have that $$\sqrt{N} = \sqrt{N_1+N_2} \le \sqrt{N_1} + \sqrt{N_2}$$

And you would still need to verify which output among what is returned by the parallel processes is the one you seek. It adds a constant in the complexity so we generally hide it into the $$O$$ notation.

However, that would still be interesting especially if we have to cluster hardware because we are limited in numbers of qubits or another limitations.

2For N1=N2 it's still an inequality: sqrt(2) * sqrt(N1) < 2 * sqrt(N1) – Mariia Mykhailova – 2018-10-24T22:09:56.493

Oh indeed. In my head $\sqrt{ab} = \sqrt{a} \sqrt{b}$ I thought. I should stop answering answers here at midnight and when tired. Thanks for pointing that out. – cnada – 2018-10-25T05:54:26.570

3

@cnada: There are at least two different notions of complexity, both of which are relevant to speed-up. One is size complexity, and one is depth complexity. Unless otherwise specified, we often prefer to consider size complexity, but depth complexity is still something which is very much of interest in quantum computational complexity, for instance in MBQC [arXiv:quant-ph/0301052, arXiv:0704.1736] and recent results on unconditional depth separations [arXiv:1704.00690].

– Niel de Beaudrap – 2018-10-25T08:40:18.273

@NieldeBeaudrap I thought people look more at depth complexity. But for Grover, the size and depth complexity are about the same order. That is quadratic in the size of the problem (generally seen as the size of a list of N elements). Do you think my approach here is not right? – cnada – 2018-10-25T11:53:42.290

2You're not saying anything that's wrong, I'm just pointing out that you're unduly emphasising size complexity and not really working out the benefit to depth complexity. Not much interesting happens if you only do $k \in O(1)$ parallel Grover processes, but as DaftWullie's answer suggests (and considering the classical post-processing), the depth complexity goes from $\sqrt N$ to $\log(k) \sqrt{N/k}$ for $k(N) \in \Omega(1)$ parallel Grover processes, which is an improvement by a factor of $\sqrt{k}/!!;\log(k)$ (the log factor comes from identifying which if any process found a solution). – Niel de Beaudrap – 2018-10-25T12:21:04.950

Cause for me, say we had for example 4 processes (so 4 lists of size N/4), each search would require a depth complexity of $\sqrt{N/4}$. So the overall complexity of the whole search for me is $4* \sqrt{N/4} = \sqrt{4*N}$. So I am a bit confused here. – cnada – 2018-10-26T07:59:35.523

No, that's right. Disregarding classical processing, depth complexity decreases by a factor of 2, and size complexity increases by a factor of 2. In the (perverse) extreme limit of having N processes, we can do it in depth 1 (where each process checks one assignment, which isn't even a 'quantum' algorithm) but require $O(N)$ size. This is one of the reasons why it is important to distinguish between depth complexity and size complexity: you can sometimes trade off one for the other. And when one considers 'parallelisation', by default I would assume that the target is depth complexity. – Niel de Beaudrap – 2018-10-26T08:08:54.017