How can I rank paths through an HMM?

2

1

I have a profile hidden Markov model that I use to identify all instances of a user-defined pattern of symbols in a long sequence of symbols. I use the Viterbi algorithm to find the most probable path that generates that long sequence of symbols, and it all works very well to identify the user's pattern. But I'm now interested in extending this, and identifying the k-most probable paths through my model that would generate some long sequence. Is there an existing algorithm to do this? I've come across something called the "1-best" or "k-best" algorithm, but I can't find much of anything describing it.

I've also considered what I'm sure would be a heuristic, by finding the most probable path and then returning the neighbourhood around it. This would be found by iterating over each move between states in the most probable path, setting that particular step to 0, and resolving using Viterbi. This would obviously increase the run-time by a factor of (length of the path ~= length of pattern), which I suspect will be reasonable for my purposes.

Has anybody come across something like this, or can anybody see anything obviously wrong in my neighbourhood idea? I'd appreciate any feedback. As a note, this has been cross-posted to the Theoretical Computer Science forum.

DaveTheScientist

Posted 2015-09-09T23:32:26.267

Reputation: 123

http://cstheory.stackexchange.com/q/32497/5038 – D.W. – 2017-01-04T23:00:58.327

Answers

2

Use log-probabilities, then use k-shortest paths.

The probability of a path is the product of the probabilities on its edges. If you log-transform everything (so that each edge is annotated with the log of the probability rather than the probability itself), the log-probability of a path becomes the sum of the log-probabilities on the edges.

Now you want to rank the paths by decreasing log-probability, and find the $k$ paths with the highest log-probability. Note that the log-probability of a path is just the length of the path (the sum of the lengths of the edges).

Therefore, the problem becomes: given a graph with a length on each edge, find the $k$ shortest paths.

There are efficient algorithms for finding the $k$ shortest paths. See, e.g., https://cstheory.stackexchange.com/a/32501/5038.

D.W.

Posted 2015-09-09T23:32:26.267

Reputation: 2 721