Autoencoders: A Comprehensive Guide

10 min readAug 6, 2021

The term “autoencoder” has been frequently mentioned in many artificial intelligence-related research papers, journals, and dissertations over the recent years. Since the 2010s, some of the most powerful AI models have autoencoders stacked deep within. Introduced in 1980, autoencoders are an unsupervised learning technique for neural networks that can learn from an unlabeled training set.

Contents:
How do Autoencoders work?
Autoencoder Hyperparameters
Autoencoder Variants
Autoencoder Applications
Adversarial Autoencoders
Autoencoders in Natural Language Processing
Conclusion

This post serves as a guide on autoencoders, their variations, and their applications.

How do Autoencoders work?

Autoencoder networks learn to compress data from the input layer into a shorter format, and then un-compress that code into whatever format best matches the original input. In this copying-like task, autoencoders are typically forced to reconstruct the input approximately, preserving only the most relevant aspects of the data in the copy. Multiple autoencoders can be involved in this process.

Let's consider this task using an example. The first autoencoder process will learn to encode external features like the casing of a system unit, while the second analyzes the output of the first layer to encode less obvious features such as dimensions of the motherboard. Then the third encodes a monitor and so on until the final autoencoder encodes the whole image into a code that matches the image of a “computer.”

Autoencoders can also be used for generative modeling. For example, if a system is manually given the codes it learned for wheels and plants, it could generate an image of a plant moving on wheels, even though it has never processed such an image.

Core Architecture

An autoencoder comprises of three parts:

Encoder- maps the input into the code.
Code (Encoded Layer)- This part of the network represents the compressed input that is fed to the decoder.
Decoder- maps the code to a reconstruction of the input.

If an autoencoder was stripped down to its basic form, one would see a feedforward, non-recurrent neural network similar to single-layer perceptrons that participate in multilayer perceptrons (MLP), consisting of both an input layer and output layer connected by one or more hidden layers.

The output layer is made of the same number of nodes as the input layer. As mentioned previously, its purpose is to reconstruct its inputs by curtailing the difference between the input and the output, instead of predicting a target value Q given inputs P. Hence, this indicates that autoencoders learn unsupervised.

Autoencoder Hyperparameters

Number of Layers

This pretty much speaks for itself. The number of layers may depend on the complexity of the task at hand.

Number of Nodes per Layer

As seen in the core architecture of the autoencoder above, the number of nodes reduces as it reaches the encoded layer (code), and increases once again on the decoder side, leading up to the output.

Code (Encoded Layer) Size

The number of nodes in the center layer.

Loss Function

Typically we use, mean squared error (MSE) or binary cross-entropy (aka Log Loss). If the input values are in the range between 0 and 1, we use binary cross-entropy, else, the mean squared error is used.

Autoencoder Variants

These variants have been created to force the learning representations to take up useful properties.

Contractive Autoencoder (CAE)

A contraction autoencoder aims to have a robust learned representation that is less volatile to minor variations in the data. This robustness is achieved by applying a penalty term to the loss function. The Contractive autoencoder is a regularization technique like sparse and denoising autoencoders, which you will read about below. This autoencoder is a better option than denoising autoencoder to learn useful feature extraction.

Convolutional Autoencoder

A convolution is the simple application of a filter to an input that results in an activation. Repeated application of the same filter to an input results in a map of activations called a feature map, indicating the locations and strength of a detected feature in an input, such as an image.
— How Do Convolutional Layers Work in Deep Learning Neural Networks? (machinelearningmastery.com)

Autoencoders do not account for the fact that a signal can be seen as a sum of other signals. Convolutional Autoencoders use the convolution operator to overcome this. They teach themselves to encode the input in a set of simple signals and then try to reconstruct the input from them, modify the geometry or the reflectance of the image. They are the trailblazing tools for unsupervised learning of convolutional filters. Once learned, these filters can be used for any input in order to draw out features. These features, then, can be used to do any task that requires a compact representation of the input, like classification.

Deep Autoencoder

A deep belief network is a generative graphical model, or alternatively a class of deep neural network, composed of multiple layers of latent variables, with connections between the layers but not between units within each layer.

A deep autoencoder comprises two, symmetrical deep-belief networks that typically have four to five shallow layers representing the encoding side of the network, and another set of four to five layers that make up the decoding side. We use unsupervised layer by layer pre-training for this model.

The layers are Restricted Boltzmann Machines which are the building blocks of deep-belief networks. Deep autoencoders are useful in topic modeling, or statistically modeling abstract topics that are distributed across a collection of documents.

Denoising autoencoder

When autoencoders have more nodes in the hidden layer than there are inputs, the network ends up learning the Identity Function, which results in the output being equal to the input, thus rendering the Autoencoder futile. marking the Autoencoder useless.

To overcome this issue, denoising autoencoders make a contaminated copy of the input by introducing some noise. This prevents the autoencoders from completely copying the input to the output without learning the features within. These autoencoders take a partially corrupted input while training to recover the original undistorted input.

Input, contaminated data & reconstructed data by opendeep.org, MNIST Dataset.

Sparse Autoencoder

In AI inference and machine learning, sparsity refers to a matrix of numbers that includes many zeros or values that will not significantly impact a calculation.
— How Sparsity Adds Umph to AI Inference (blogs.nvidia.com)

Sparse autoencoders have hidden nodes greater than input nodes. Yes, this may seem strange since denoising autoencoders were built to overcome this issue. However, sparse encoders can still discover important features from the data. The insignificance of a node corresponds with the level of activation, in a sparse autoencoder. Sparsity constraint is introduced on the hidden layer. This is to prevent the output layer from directly copying input data. Some of the most powerful AIs in the 2010s involved sparse autoencoders stacked inside of deep neural networks.

Under-complete Autoencoder

The under-complete autoencoder’s main task is to catch the key features present in data. Under-complete autoencoders have a smaller dimension for the hidden layer compared to the input layer, which is useful when obtaining key features.

Variational Autoencoder (VAE)

Variational autoencoders are utilized for learning latent representations.

The encoder functions as a compressor, that condenses that data into a low dimensional space (The low dimensional space is stochastic, usually modeled with a Gaussian probability density). It encodes data into the latent space, where information is lost due to lower dimensionality, after which the decoder tries to recover the original input.

The decoder, whose input will be the output of the encoder, will also be of the same dimensional space. Its function is to bring the data back to the original probability distribution. For example, to output an image similar to those in our dataset.

The error is backpropagated throughout the whole network thus improving its ability to reconstruct the original inputs. Once our training has converged to a stable level, we can sample from this distribution and create fresh samples, which is why variational autoencoders are referred to as a generative model.

The probability distribution of the latent vector of a variational autoencoder typically matches that of the training data much closer than a standard autoencoder.

Autoencoder Applications

1. Data Compression

It is no surprise that autoencoders can be used for data compression, as it is in the definition itself. However the surprise would be that of all the applications listed here, autoencoders are used least in data compression. This is because other simple yet powerful data compression algorithms exist, such as LZMA, LZMA2, and Huffman coding.

2. Dimensionality Reduction

In machine learning classification problems, there are often too many factors on the basis of which the final classification is done. These factors are basically variables called features. The higher the number of features, the harder it gets to visualize the training set and then work on it. Sometimes, most of these features are correlated, and hence redundant. This is where dimensionality reduction algorithms come into play. Dimensionality reduction is the process of reducing the number of random variables under consideration, by obtaining a set of principal variables. It can be divided into feature selection and feature extraction.
— Introduction to Dimensionality Reduction (geeksforgeeks.org)

The autoencoders convert the input into a reduced representation which is stored in the middle layer called code. This is where the information from the input has been compressed and by extracting this layer from the model, each node can now be treated as a variable. Thus we can conclude that by removing the decoder part, an autoencoder can be used for dimensionality reduction with the output being the code layer.

Other methods used for dimensionality reduction:

Principal Component Analysis (PCA)
Linear Discriminant Analysis (LDA)
Generalized Discriminant Analysis (GDA)

An autoencoder teaches itself, non-linear transformations with a non-linear activation function and multiple layers thus producing a representation of each layer as the output.

3. Feature Extraction

The encoder helps learn vital concealed features present in the input data, in the process to reduce the reconstruction error. During encoding, a new series of combinations of original features are generated.

4. Image Colourization

Autoencoder can be used to transform a black and white picture into a coloured image. Or we can convert a coloured image into a grayscale image.

5. Image Denoising

Autoencoders are very good at denoising images. When an image gets corrupted or has a bit of noise in it, we call this image a noisy image. To obtain proper information about the content of the image, we perform image denoising.

6. Image Generation

Variational Autoencoders as mentioned earlier is a generative model, used to generate images that have not been seen by the model yet. The idea is that given input images like images of an animal or a beautiful background, and the system will generate similar images.

Adversarial Autoencoders (AAE)

Adversarial Autoencoders are a cross between Variational Autoencoders (VAEs) and Generative Adversarial Networks (GANs). Also known as VAE-GAN. It uses adversarial loss to regularize the latent code instead of the KL-divergence that a traditional VAE uses.

With a GAN present, the architecture of this autoencoder is different from that of a standard autoencoder. It consists of the:

Encoder- Takes in input and transforms it into a low dimension space. (latent code z)
Decoder- Takes encoder output and generates an image using it.
Discriminator, the discriminator takes a random vector from the real dataset and one from the encoded latent code z (fake) from the autoencoder as the input, to check if the input is real or not.

One of the interesting applications of AAE is in the anomaly detection and localization tasks With AAE, the capability of the autoencoder can be improved with the adversarial loss.

AutoEncoders in Natural Language Processing

As you may have noticed the aforementioned variants and applications rely heavily on the images. Autoencoders and their application in the NLP domain have been rather low comparatively.

Variational autoencoders have been utilized in NLP to generate realistic sentences from a latent code space. Common sentence autoencoders fail in this task due to the latent space. VAEs apply a prior distribution on the hidden latent space, enabling a model to generate proper samples.

The training objective is to maximize a variational lower bound on the log-likelihood of observed data under the generative model.

An RNN-based VAE generative model was proposed to create more varying and well-formed sentences. Other models include structured variables such as sentiment and tense to be incorporated into the latent code, to generate believable sentences.

Conclusion

This article which initially started out to be a quick guide on autoencoders quickly evolved into a full-blown explanation. Autoencoders have played a pivotal role in the advancement of artificial intelligence over the past decades and still continue to do so.