Apparently, I misunderstood your question.

There are several methods for finding the k-best paths with extending versions of the Viterbi algorithm.

My first advice would be to look at this question on SO that is similar to yours and has a good illustrated answer.

Then, I would refer you to two articles/thesis that are publicly available and from where one can extend his/her research. *(Disclaimer: these references may not be the "best" one but I've chosen them because they are publicly available and provide a good number of reference to deepen research on the topic)*

The k-best paths in Hidden Markov Models. Algorithms and Applications to Transmembrane Protein Topology Recognition. Thesis by Golod, available here. (a thesis like this gives countless references on the topic)

Decoding HMMs using the k-best paths: algorithms and applications by Brown and Dolog, available here (a partial short version of what you will find in the aforementioned thesis)

**My previous answer:**

For anyone coming across this question looking for a way of computing some of the typical state sequences of an HMM (as I thought first this question was about), just know that such a concept of most probable sequence without specifying data is not really something used in any theory about HMMs, as far as I know. However, one can follow these steps:

As a first try, I would implement something like this:

*Get the state at time $t=0$*

Draw an initial state $s_0$ from the initial state probability mass function (pmf)

*Get the state at time $t+1$*

Draw the new state $s_1$ from the pmf defined by the $s_0$-th row of the transition matrix

Repeat this step as many times as needed to draw states up to $s_N$

Then you can repeat the entire procedure $X$ times in order to get as many sample paths as you wish.

This is very fast and easy to implement and it will give you what you want. Many scientific libraries in many languages have a built-in function for drawing a random sample from a pmf.

Thank you for the response. I don't understand what you mean by 'without specifying data' (observations?). Viterbi algorithm decodes the MOST probable sequence given observations + model. That is, argmax_x p(state_1,...,state_n|obs_1,...,obs_n). What I am asking is how to get 'next argmax' with possibly slightly smaller probabilities. – Anton – 2017-10-10T15:02:34.413

We could of course sample sequences with the model the way you've described and take the most frequent ones, but that's going to take ages for good estimates for some tasks. – Anton – 2017-10-10T15:04:16.117

@Anton yes I meant observations. I think you should edit your question to make it clearer with what you explained in the comments. I misunderstood it at first. I edited my answer giving you proper references. :) The downvote is a bit harsh given that the question could be understood both ways, hope you reconsider that too. – Eskapp – 2017-10-10T20:58:52.143

thanks for the update; it was not me who downvoted you, btw. – Anton – 2017-10-10T23:06:35.823

1Upvoted. I still don't quite understand, how I could improve my question, though, because

Viterbi-like stuffI've asked for can not be donewithoutobservations. – Anton – 2017-10-10T23:18:59.420