## Is there a layman's explanation for why Grover's algorithm works?

29

13

This blogpost by Scott Aaronson is a very useful and simple explanation of Shor's algorithm.

I'm wondering if there is such an explanation for the second most famous quantum algorithm: Grover's algorithm to search an unordered database of size $O(n)$ in $O(\sqrt{n})$ time.

In particular, I'd like to see some understandable intuition for the initially surprising result of the running time!

– Condo – 2020-09-30T17:08:48.783

## Answers

26

There is a good explanation by Craig Gidney here (he also has other great content, including a circuit simulator, on his blog).

Essentially, Grover's algorithm applies when you have a function which returns True for one of its possible inputs, and False for all the others. The job of the algorithm is to find the one that returns True.

To do this we express the inputs as bit strings, and encode these using the $$|0\rangle$$ and $$|1\rangle$$ states of a string of qubits. So the bit string 0011 would be encoded in the four qubit state $$|0011\rangle$$, for example.

We also need to be able to implement the function using quantum gates. Specifically, we need to find a sequence of gates that will implement a unitary $$U$$ such that

$$U | a \rangle = - | a \rangle, \,\,\,\,\,\,\,\,\,\,\,\,\, U | b \rangle = | b \rangle$$

where $$a$$ is the bit string for which the function would return True and $$b$$ is any for which it would return False.

If we start with a superposition of all possible bit strings, which is pretty easy to do by just Hadamarding everything, all inputs start off with the same amplitude of $$\frac{1}{\sqrt{2^n}}$$ (where $$n$$ is the length of the bit strings we are searching over, and therefore the number of qubits we are using). But if we then apply the oracle $$U$$, the amplitude of the state we are looking for will change to $$-\frac{1}{\sqrt{2^n}}$$.

This is not any easily observable difference, so we need to amplify it. To do this we use the Grover Diffusion Operator, $$D$$. The effect of this operator is essentially to look at how each amplitude is different from the mean amplitude, and then invert this difference. So if a certain amplitude was a certain amount larger than the mean amplitude, it will become that same amount less than the mean, and vice-versa.

Specifically, if you have a superposition of bit strings $$b_j$$, the diffusion operator has the effect

$$D: \,\,\,\, \sum_j \alpha_j \, | b_j \rangle \,\,\,\,\,\, \mapsto \,\,\,\,\,\, \sum_j (2\mu \, - \, \alpha_j) \, | b_j \rangle$$

where $$\mu = \sum_j \alpha_j$$ is the mean amplitude. So any amplitude $$\mu + \delta$$ gets turned into $$\mu - \delta$$. To see why it has this effect, and how to implement it, see these lecture notes.

Most of the amplitudes will be a tiny bit larger than the mean (due to the effect of the single $$-\frac{1}{\sqrt{2^n}}$$), so they will become a tiny bit less than the mean through this operation. Not a big change.

The state we are looking for will be affected more strongly. Its amplitude is a lot less than the mean, and so will become a lot greater the mean after the diffusion operator is applied. The end effect of the diffusion operator is therefore to cause an interference effect on the states which skims an amplitude of $$\frac{1}{\sqrt{2^n}}$$ from all the wrong answers and adds it to the right one. By repeating this process, we can quickly get to the point where our solution stands out from the crowd so much that we can identify it.

Of course, this all goes to show that all the work is done by the diffusion operator. Searching is just an application that we can connect to it.

See the answers to other questions for details on how the functions and diffusion operator are implemented.

8

I find a graphical approach quite good for giving some insight without getting too technical. We need some inputs:

• we can produce a state $|\psi\rangle$ with non-zero overlap with the 'marked' state $|x\rangle$: $\langle x|\psi\rangle\neq 0$.
• we can implement an operation $U_1=-(\mathbb{I}-2|\psi\rangle\langle\psi|)$
• we can implement an operation $U_2=\mathbb{I}-2|x\rangle\langle x|$.

This last operation is the one that can mark our marked item with a -1 phase. We can also define a state $|\psi^\perp\rangle$ to be orthonormal to $|x\rangle$ such that the $\{|x\rangle,|\psi^\perp\rangle\}$ forms an orthonormal basis for the span of $\{|x\rangle,|\psi\rangle\}$. Both the operations that we have defined preserve this space: you start with some state in the span of $\{|x\rangle,|\psi^\perp\rangle\}$, and they return a state within the span. Moreover, both are unitary, so the length of the input vector is preserved.

A vector of fixed length within a two-dimensional space can be visualised as the circumference of a circle. So, let's set up a circle with two orthogonal directions corresponding to $|\psi^\perp\rangle$ and $|x\rangle$. Our initial state $|\psi\rangle$ will have small overlap with $|x\rangle$ and large overlap with $|\psi^\perp\rangle$. If it were the other way around, search would be easy: we'd just prepare $|\psi\rangle$, measure, and test the output using the marking unitary, repeating until we got the marked item. It wouldn't take long. Let's call the angle between $|\psi\rangle$ and $|\psi^\perp\rangle$ the angle $\theta$. Now let's take a moment to think about what our two unitary actions do. Both have a -1 eigenvalue, and all other eigenvalues +1. In our two-dimensional subspace, that reduces to a +1 eigenvalue and a -1 eigenvalue. Such an operation is a reflection in the axis defined by the +1 eigenvector. So, $U_1$ is a reflection in the $|\psi\rangle$ axis, while $U_2$ is a reflection in the $|\psi^\perp\rangle$ axis. Now, take an arbitrary vector in this space, and apply $U_2$ followed by $U_1$. The net effect is that the vector is rotated by an angle $2\theta$ towards the $|x\rangle$ axis. So, if you start from $|\psi\rangle$, you can repeat this sufficiently many times, and get to within an angle $\theta$ of $|x\rangle$. Thus, when we measure that state, we get the value $x$ with high probability.

Now we need a little care to find the speed-up. Assume that the probability of finding $|x\rangle$ in $|\psi\rangle$ is $p\ll 1$. So, classically, we'd need $O(1/p)$ attempts to find it. In our quantum scenario, we have that $\sqrt{p}=\sin\theta\approx\theta$ (since $\theta$ is small), and we want a number of runs $r$ such that $\sin((2r+1)\theta)\approx 1$. So, $r\approx \frac{\pi}{2\theta}\approx \frac{\pi}{2\sqrt{p}}$. You can see the square-root speed-up right there.

3

The simple explanation for how (and hence why) Grover's algorithm works is that a quantum gate can only reshuffle (or otherwise distribute) probability amplitudes. Using an initial state with equal probability amplitudes for all states of the computational basis, one starts with an amplitude of $1/\sqrt{N}$. This much can be "added" to the desired (solution) state in each iteration, such that after $\sqrt{N}$ iterations one arrives at a probability amplitude of $1$ meaning the desired state has been distilled.

1

Grover's Algorithm uses 2 simple tricks to search an unordered database (like a phonebook that contains names and phone numbers but not in alphabetical order). It inputs an equal superposition of all possible entries and searches the database in one operation. When it finds the matching entry, it marks it by flipping the sign of the wavefunction of this entry. At this point you have a wavefunction that's an equal superposition of all but one entry with a positive sign and one entry with a negative sign. Even though you've marked the entry, you haven't accomplished anything at this point, because you have to measure something to see your answer. Since the probability of picking any particular answer is determined by the square of it's wavefunction, the fact that one particular part of the wavefunction does you know good. i.e., if the wave function contains 9 entries with wavefunction 1/10 and one entry with wavefunction -1/10, you're probability for picking the correct entry (the one marked with the minus sign) is no better than the probability of picking one of the incorrect answers. So you need to do something to increase the wavefunction value for the correct answer. The trick you use is a simple mathematical operation called "inversion about the mean". If you have 9 values of 1/10 and 1 value of -1/10 and you calculate the mean, it will be just a little less than 1/10. If you calculate the difference between each individual entry and the mean, it will be a very small number for 9 of the entries and a larger difference from the mean for the one entry. Inversion about the mean allows you to create an new wavefunction where 9 of the entries have small values and 1 entry's value is larger. It turns out that inversion about the mean can be written as a unitary matrix (i.e., it's an operation that can be implemented on a quantum computer). Since a large database has many wrong answers and only one correct answer, most likely one pass through the "inversion about the mean" operation will not magnify the wavefunction of your desired answer enough to outweigh the total value of the probabilities of all the undesired answers. Therefore, rather than just going through this inversion posses once, you create a loop that takes it through the inversion several times, each time increasing the wavefunction amplitude of the correct answer and descreasing the amplitude of the undesired answers. It turns out if you go through the loop too many times, it starts to backfire on you, therefore, there's an optimal number of times through the loop to get the greatest chance of measuring the correct answer.