DeepLearningMachineLearningNeuralNetworksProbabilityTheoryStatistics An autoregressive model is a probabilistic model that parameterizes the conditional distributions in the chain rule factorization of the joint distribution: where each conditional solves a learning sub-task (i.e. regression or classification of given ).

Example: Fully Visible Sigmoid Belief Network (FVSBN)

A deep autoregressive model is a deep generative model that parameterizes the conditionals using neural networks:

Misplaced &&= \mathbb{P}(X_1) \cdot \mathbb{P}(X_2 \space | \space X_1) \cdots \mathbb{P}(X_N \space | \space X_1, \cdots, X_{N-1}) \\ &\approx p_{NN}(X_1; \theta_1)\cdot p_{NN}(X_2 \space | \space X_1; \theta_2) \cdots p_{NN}(X_N \space | \space X_1, \cdots, X_{N-1}; \theta_N) \end{aligned}$$ Such an approximation is feasible due to [[Neural Network Approximation|universal approximation]] when $\mathbb{P}$ is [[Absolutely Continuous|absolutely continuous]] w.r.t. the Lebesgue measure (and so the probability density function $p$ exists). Autoregressive models are commonly used in Large Language Models (LLMs) and Audio (Speech/Music). **Example**: [[Neural Autoregressive Density Estimation|Neural Autoregressive Density Estimation (NADE)]], [[Masked Autoencoder for Distribution Estimation|Masked Autoencoder for Distribution Estimation (MADE)]]