Certainly! Imagine you have $K=2^k$ copies of the search oracle $U_S$ that you can use. Normally, you'd search by iterating the action
$$
H^{\otimes n}(\mathbb{I}_n-2|0\rangle\langle 0|^{\otimes n})H^{\otimes n}U_S,
$$
starting from an initial state $(H|0\rangle)^{\otimes n}$. This takes time $\Theta(\sqrt{N})$. (I'm using $\mathbb{I}_n$ to denote the $2^n\times 2^n$ identity matrix.)

You could replace this with $2^k$ parallel copies, each indexed by an $x\in\{0,1\}^k$, using
$$
\left(\mathbb{I}_k\otimes H^{\otimes (n-k)}\right)\mathbb{I}_k\otimes(\mathbb{I}_{n-k}-2|0\rangle\langle 0|^{\otimes (n-k)})\left(\mathbb{I}_k\otimes H^{\otimes (n-k)}\right)U_S
$$
and starting from a state $|x\rangle(H|0\rangle)^{\otimes(n-k)}$
The time required for running these would be reduced to $O(\sqrt{N/K})$, at the cost of requiring $K$ times more space.

In a scaling sense, one might consider this an irrelevant result. If you have a fixed number of oracles, $K$, then you get a fixed ($\sqrt{K}$) improvement (just like, if you have $K$ parallel classical cores, the best improvement you can get is a factor of $K$), and that does not change the scaling. But it does change the fundamental running time. We know that Grover's algorithm is exactly optimal. It takes the absolute minimum time possible with a single oracle. So, knowing that you get a $\sqrt{K}$ improvement in time is useful with regards to that benchmark of a specific running time at a specific value of $N$.

but if you do this, the comparison with the classical performance loses some of its meaning, doesn't it? After all, you can also speed up the classical search by running the operation that checks if a given $x$ is the target in parallel over all the inputs. That clearly requires additional assumptions over the available resources, but the same kind of assumptions that are made in your argument – glS – 2018-10-25T11:55:07.470

1$N$ goes to infinity but $K$ does not. Your problem gets bigger but your resources remain few. – AHusain – 2018-10-25T14:15:35.823

2This answer is correct (though it may not be optimal, as DaftWullie does warn). This is the same attutude towards parallelization as one takes in classical circuit complexity. If you want a speed-up due to parallelization, then you look to the circuit depth (because co-ordinating multiple processes isn't going to

reducethe total work). It doesn't even matter if $K$ is constant --- either you're interested in the depth improvement from parallelization, or you're not. As with quantum computation itself, merely throwing more computers at a problem doesn't magically make everything fast! – Niel de Beaudrap – 2018-10-26T08:17:30.263