Belief Propagation

Algorithms MachineLearning ProbabilityTheory Statistics This is a message passing algorithm that utilizes the sum and product rules of probability for exact and efficient inference i.e. computing marginal distributions in a Probabilistic Graphical Model. The algorithm is as follows, where is any node along a path from an (arbitrary) root node to leaf node:

Pick an arbitrary node as the root,
Compute recursive factorizations of the marginals (below) starting from the root node to the leaves:
Compute recursive factorizations of the marginals (below) starting from the leaf nodes to the root:
Compute the product of messages at each node required for the marginal
Normalize

Each path from the arbitrary root to a leaf node forms a Markov chain which has the simplest Conditional Independence structure (a linear chain of pairwise interactions): and corresponds to a joint factorization Inference consists of computing the marginals When the integrals above are sums over discrete values, this computation is (exponential in ). Substituting in for the corresponding joint factorization and using the sum and product rules one derives the Chapman-Kolmogorov equations:

Misplaced &p(X_k) = \frac1{\mathcal{Z}} &\left( \int_{x_{k-1}} \phi_{k-1,k}(x_{k-1}, x_k) \cdots \Big( \int_{x_2} \phi_{2,3}(x_2, x_3) \Big( \int_{x_1} \phi_{1,2}(x_1, x_2) \space dx_1 \Big) \space dx_2 \Big) \cdots dx_{k-1}\right) \\ \cdot &\left( \int_{x_{k+1}} \phi_{k,k+1}(x_k, x_{k+1}) \cdots \Big( \int_{x_{N-1}} \phi_{N-2,N-1}(x_{N-2}, x_{N-3}) \Big( \int_{x_N} \phi_{N-1,N}(x_{N-1}, x_N) \space dx_N \Big) \space dx_{N-1} \Big) \cdots dx_{k+1} \right) \\ = \frac1{\mathcal{Z}} &\left( \int_{x_{k-1}} \phi_{k-1,k}(x_{k-1}, x_k) \Big( \phi(x_{k-1}) \Big) \space dx_{k-1} \right) \cdot \left( \int_{x_{k+1}} \phi_{k,k+1}(x_k, x_{k+1}) \Big( \psi(x_{k+1}) \Big) \space dx_{k+1} \right) \\ = \frac1{\mathcal{Z}} &\phi(x_k) \cdot \psi(x_k) \end{aligned}$$ where we have defined $\phi$, $\psi$ recursively and requires only $O(NC^2)$ computations. The recursive functions can be understood as propagating the message/belief $\phi(X_k)$ to node $X_k$ ![[Message_passing.svg|500]] Utilizing the recursive definition for computing all $N$ marginals only requires two passes of the recursive evaluations (by caching the computed messages $\phi$, $\psi$) and still has $O(NC^2)$ complexity, whereas re-computing all $N$ marginals (without caching) would be $O(N^2 C^2)$.

Paying in Attention

Explorer

Belief Propagation

Graph View

Backlinks