BayesianMachineLearningDeepLearningMachineLearningStatistics Neural Autoregressive Density Estimate (NADE) is a neural autoregressive model where a collection of random variables have a joint factorization where the conditional distributions are parameterized by one-layer neural networks:

Misplaced &h_k &= \sigma_\text{NN}(W_{0,k} X_{\{1,\cdots,k\}} +b_{0,k}) \\ \mathbb{P}(X_k \space | \space X_1, \cdots X_{k-1}; \theta_k) &= \sigma(w_{1,k}^T h_k +b_{1,k}) \end{aligned}$$ where $\sigma_{NN}$ is the non-linear activation function and $\sigma$ depends on the domain of the data: - Sigmoid for binary r.v.s - Softmax for general discrete r.v.s - A continuous probability distribution (i.e. $\sigma(w_{1,k}^T h_k +b_{1,k}) := (\mu, \Sigma)$ for a Gaussian) One can reduce the number of parameters by sharing the weights between various layers ($W_k$'s) in the sequential model; the intuition here is that a previous layer learns features that get reused at later stages in the computation. This helps with overfitting and reduces the complexity to $O(n)$. One can then sample from this distribution sequentially $$\begin{aligned} \tilde{X}_1 &\sim \mathbb{P}(X_1) \\ \tilde{X}_2 &\sim \mathbb{P}(X_2 \space | \space X_1 = \tilde{X}_1) \\ &\cdots \\ \tilde{X}_m &\sim \mathbb{P}(X_{m} \space | \space X_1 = \tilde{X}_1, \cdots, X_{m-1} = \tilde{X}_{m-1}) \end{aligned}$$