BayesianMachineLearningDeepLearningMachineLearningStatistics The Masked autoencoder for distribution estimation (MADE) is a neural autoregressive model where a collection of random variables have a joint factorization where the conditional distributions are parameterized by an autoencoder.

In the NADE model, one can share parameters for efficiency, but the DAG structure needed to correspond to a valid Bayesian Network is explicitly encoded. By contrast, an autoencoder is fully-connected, and each prediction depends on all inputs simultaneously. In order to enforce the autoregressive ordering, one can mask the autoencoder weights to block certain paths in the computation graph:

Misplaced &h_k &= \sigma_\text{NN}(W_{0,k} X_{\{1,\cdots,k\}} +b_{0,k}) \\ \mathbb{P}(X_k \space | \space X_1, \cdots X_{k-1}; \theta_k) &= \sigma(w_{1,k}^T h_k +b_{1,k}) \end{aligned}$$