# Autoregressive (AR) Language Modeling

Autoregressive (AR) Language Modeling is one of the most well-known and used pertaining objectives in the Natural Language Processing sphere. Given its roots in time series modeling, it is quite difficult to grasp its functionality in language modeling at once. This article aims to provide a simple explanation of AR modeling.

# Regression Analysis

A set of statistical methods that are used to evaluate the relationship between a dependant variable and one or more independent variables is known as regression analysis. It is useful to determine the robustness of the relationship of the variables under consideration and for fine-tuning the future relationship between these variables.

Variations of this analysis include non-linear regression, linear regression, and multiple linear regression. Each variation has its own set of assumptions.

# What is Autoregression?

Autoregression is a time series model that uses results from prior time steps as input to a regression equation that predicts the value at the next time step. The term autoregression specifies that it is a regression of the variable against itself.

An autoregressive model of order ** p** can be written as follows:

where ** εt** is white noise. This is like a multiple regression but with lagged

*values*of

**as predictors. We refer to this as an**

*yt***AR(**, an autoregressive model of order

*p*) model**.**

*p*# Autoregressive (AR) Language Modeling

The autoregressive model is a feed-forward model, that predicts the future word from a set of words in a given context. The context word, in this model, is restrained to two directions, either backward *or* forward. Thus making it effective in NLP generative tasks that create context in the forward direction.

Howbeit, it does have a problem. This model can only utilize the forward context *or *the backward context, therefore implying that both contexts cannot be used simultaneously, this causes the model to curb itself in its understanding of prediction and context.

An autoregressive parametric model such as a neural network is trained to model the joint probability distribution of a text corpus, for either a forward product or a backward product conditioned on the words before or after the predicted token.

# Autoregressive (AR) Language Model Examples

The famed OpenAI GPT-2 and GPT-3 are both stock AR models. XLNet, a language representation model utilizes autoregression along with autoencoding while avoiding their limitations, thus enjoying the best of both worlds. Other examples include BERT (Bidirectional Encoder Representations), RoBERTa, ALBERT, XLM, DistilBERT, ELECTRA.

# References

- Autoregressive Models in Deep Learning: A Survey
- Forecasting Principals and Practice: 8.3. Autoregressive Models
- Summary of the Models: Huggingface.co
- Understanding Language using XLNet with Autoregressive pre-training: Maggie Xiao
- Lecture 15: Autoregressive and Reversible Models: Department of Computer Science, University of Toronto
- Autoregression Models for Time Series Forecasting with Python: Machine Learning Mastery
- WaveNet: A generative model for raw audio