Hidden Markov Model (HMM) is a State-Observation Model with a single discrete state variable S and a single observation variable 0
HMM, the transition model often encodes a lot of structural constraints
-> resulting model is sparse
Graphical representation of transition model
Hidden Markov Model (HMM)
Simple example
Simple example observation model
Complete specification
HMMs : common reasoning tasks :
computing the proability of an observation sequence
Filter (2)
observation are writting like <humid, medium?>
Forward-backward algorithm :
Forward backward algorithms need for re-scaling (or work with logarithms)
Decoding : the most likely state sequence is not the same as the sequence of most likely states as computed in filtering or smoothing
Decoding naive approach -> problem : combinatorial number of possible state sequences
Viterbi Algorithm idea
Computing the likelihood of a HMM will be used for classification
THe probability of the model producing a sequence is equal to the probability, over all possible final states of the model producing sequences and einding state (law of total probability)
HMMs come from :
structure is automatically defined
problem : find good parameters pi, A,B
model parameters must be learned
HMM parameter estimation is an optimisation problem
Learning HMMs from Obseration Sequences
Central problem in learning HMMs:
contains unobservable variables
cannot observed in training sequences
problem concerns all three sets of parameters
Summary of the dilemma :
if we knew the hidden counts, we could estimate the model parameters
if we knew the model parameters, we could estimate the hiddencounts
The General re-estimation procedure
Computing the expected values of this hidden counts :
Baum-Welch is an instance of a very general algorithm schema for estimating hidden model parameters by iterating between :
aligning the training date to the current model
refitting the parameters of the model
Constrained transition model structures :
If observations are vectors of feature values -> solutions : discretation or vector quantisation
vector quantisation : discretise continuous observations into discrete intervals and use these as values -> possible huge number of values + impractical to represent)
solution 2 : continuousprobability density models -> large error introduction
Gaussian Observation Models
The mutlivariate case
problem with the mutlivariate case :
highly unlikely that joint distribution of N variables will be well described by a single Gausian
Gausian mixture model : model joint condition distribution by a weighted combination of single Gaussians