MachineLearning Real-world signals are generated from underlying stochastic processes, so in turn, can be modeled using parametric random processes. While in most situations, the Martingale assumption (that the expected observation only depends on some collection of past states) is reasonable, it is in general computationally intractable when the correlation length of observations is high. One can model these long-range dependencies in a more tractable way by using latent variables that only depend on the previous state of the latent variable, i.e. latent variables form a Markov Process.

A pair of Stochastic Processes, is called a Hidden Markov Model (HMM) if:

  • is not observed

Assuming a discrete-time, discrete-space model (i.e. with . The observation model is given by the conditional distribution (or emission probability) with joint distribution:

Misplaced &\mathbb{P}(X_{\{1,\cdots,T\}}, Z_{\{1,\cdots,T\}}\space | \space \theta) &= \mathbb{P}(X_{\{1,\cdots,T\}} \space | \space Z_{\{1,\cdots,T\}}, \theta) \space \mathbb{P}(Z_{\{1,\cdots,T\}} \space | \space \theta) \\ &= \pi(Z_1) \prod_{n=2}^T \mathbb{P}(Z_n \space | \space Z_{n-1}, \theta_t) \prod_{n=1}^T \mathbb{P}(X_n \space | \space Z_n, \theta_o) \end{aligned}$$ where the parameters $\theta = \{ \pi, \theta_t, \theta_o\}$ are the following quantities: - The initial state distribution: $\pi_i =\mathbb{P}(Z_1 = i)$ - The parameters of the transition model: $\mathbb{P}(Z_n = j \space | \space Z_{n-1} = i, \theta_t)$ - The parameters of the emission distribution: $\mathbb{P}(X_n \space | \space Z_n = i, \theta_o)$