How to set the parameters of a Hidden Markov Model that'll be used to correct the mistakes by a previous classifier?


Say we've previously used a neural network or some other classifier C with $N$ training samples $I:=\{I_1,...I_N\}$ (that has a sequence or context, but is ignored by C) the, belonging to $K$ classes. Assume, for some reason (probably some training problem or declaring classes), C is confused and doesn't perform well. The way we assign a class using C to each test data $I$ is: $class(I):= arg max _{ {1 \leq j \leq K} } p_j(I)$, where $p_j(I)$ is the probability estimate of $I$ corresponding to the $j$-th class, given by C.

Now, on top of this previous classifier C, I'd like to use a Hidden Markov Model (HMM) to "correct" the mistakes made by the previous context-free classifier C, by taking into account the contextual/sequential information not used by C.

Hence let in my HMM, the hidden state $Z_i$ denote the true class of the $i$-th sample $I_i$, and $X_i$ be the predicted class by C. My question is: how could we use the probabilistic information $cl(I):= arg max _{ {1 \leq j \leq K} } p_j(I)$ to train this HMM? I understand that the confusion matrix of C can be used to define the emission prob. of the HMM, but how do we define the transition and start/prior prob.? I'm tempted to define the start/prior prob. vector as $\pi:=(p_1(x_1), ..., p_K(x_1))$. But I may be wrong. This is my main question.

A follow up question: One can define an HMM in the above way (using confusion matrix and the prob. information from C); call the resulting parameter set $\Theta_0$. But after doing so, is it advisable to estimate the parameters to best fit the data $I$ used for C, while initializing a parameter set with the values mentioned in the previous paragraph?


Posted 2018-04-20T10:54:05.070

Reputation: 11

Why can you not make the original classifier aware of the context? E.g. use a CNN with a window of time, or an RNN? – kbrose – 2018-04-20T14:16:37.293

@kbrose: I'm kind of new to the subject, but I've been instructed to use neighborhood information of the samples; samples are retail products in supermarkets, and the contexts are not time, but the for a product P(t,s), the neighborhood consisting of all P(t+/- 1, s+/-1). – Sus_Q – 2018-04-20T14:22:30.790

Ok, so a CNN with filter widths of 3? – kbrose – 2018-04-20T23:10:30.530



As far i know u can't saything about hidden class, hidden class value at time t is 'some intermediate values of weighted values of all hidden classes'.

see the point- 2) hidden state sequence in blog-

so ur statement-

"Hence let in my HMM, the hidden state Zi denote the true class of the i-th sample Ii, and Xi be the predicted class by C- is incorrect.

How can u compare hidden class value with actual, u could have compared emission value and actual.

You can try ensembling of HMM and Cs. I wonder how exactly you trying to use parameters from classification problem to time series/sequence (HMM).

Arpit Sisodia

Posted 2018-04-20T10:54:05.070

Reputation: 365