Have there been any truly ground breaking algorithms besides Grover's
and Shor's?

It depends on what you mean by "truly ground breaking". Grover's and Shor's are particularly unique because they were really the first instances that showed particularly valuable types of speed-up with a quantum computer (e.g. the presumed exponential improvement for Shor) and they had killer applications for specific communities.

There have been a few quantum algorithms that have been designed since, and I think three are particularly worthy of mention:

The BQP-complete algorithm for evaluating the Jones polynomial at particular points. I mention this because, aside from more obvious things like Hamiltonian simulation, I believe it was the first BQP-complete algorithm, so it really shows the full power of a quantum computer.

The HHL algorithm for solving linear equations. This is a slightly funny one because it's more like a quantum subroutine, with quantum inputs and outputs. However, it is also BQP-complete and it's receiving a lot of attention at the moment, because of potential applications in machine learning and the like. I guess this is the best candidate for truly ground breaking, but that's a matter of opinion.

Quantum Chemistry. I know very little about these, but the algorithms have developed substantially since the time you mention, and it has always been cited as one of the useful applications of a quantum computer.

Has there been any progress in defining BQP's relationship to P, BPP
and NP?

Essentially, no. We know BQP contains BPP, and we don't know the relation between BQP and NP.

Have we made any progress in understanding the nature of quantum speed
up other than saying that "it must be because of entanglement"?

Even back when you were studying it originally, I would say it was more precisely defined than that. There are (and were) good comparisons between universal gate sets (potentially capable of giving exponential speed-up) and classically simulable gate sets. For example, recall that the Clifford gates produce entanglement but are classically simulable. Not that it's straightforward to state precisely what is required in a more pedagogical manner.

Perhaps where some progress has been made is in terms of other models of computation. For example, the model DQC1 is better understood - this is a model that appears to have some speed-up over classical algorithms but is unlikely to be capable of BQP-complete calculations (but before you get drawn into the hype that you might find online, there *is* entanglement present during the computation).

On the other hand, the "it's because of entanglement" sort of statement still isn't entirely resolved. Yes, for pure state quantum computation, there must be some entanglement because otherwise the system is easy to simulate, but for mixed separable states, we don't know if they can be used for computations, or if they can be efficiently simulated.

Also, one might try to ask a more insightful question: Have we made any progress in understanding which problems will be amenable to a quantum speed-up? This is subtly different because if you think that a quantum computer gives you new logic gates that a classical computer doesn't have, then it's obvious that to get a speed-up, you must use those new gates. However, it is not clear that every problem is amenable to such benefits. Which ones are? There are classes of problem where one might hope for speed-up, but I think that still relies on individual intuition. That can probably still be said about classical algorithms. You've written an algorithm x. Is there a better classical version? Maybe not, or maybe you're just not spotting it. That's why we don't know if P=NP.

1It's good question, Alex. It certainly isn't amateurish. – John Duffield – 2018-09-25T20:56:54.627