top of page

AUTOENCODERS

Updated: Aug 25, 2022

In this blog, you will be introduced to Autoencoders, their various types, and their implementations.


INTRODUCTION

Autoencoder, one of the feed forward networks, is a neural network that is used for feature learning through backpropagation algorithm. It constitutes of two symmetrical deep-belief networks, encoder and decoder, with four or five shallow layers. The encoder takes the input and generates the lower feature representation of input though the hidden layers. The decoder is used to regenerate the initial input from the lower feature representation.


Types of autoencoders that the blog is going to cover:

· a simple autoencoder based on a fully-connected layer

· a sparse autoencoder

· a deep fully-connected autoencoder

· a deep convolutional autoencoder

· a sequence-to-sequence autoencoder

· a variational autoencoder


SIMPLE AUTOENCODER BASED ON A FULLY-CONNECTED LAYER

First, import the essential libraries

import keras
from keras import layers
from keras.datasets import mnist
import numpy as np
# This is the size of our encoded representations
encoding_dim = 32
# 32 floats -> compression of factor 24.5, assuming the input is 784 floats

# This is our input image
input_img = keras.Input(shape=(784,))
# "encoded" is the encoded representation of the input
encoded = layers.Dense(encoding_dim, activation='relu')(input_img)
# "decoded" is the lossy reconstruction of the input
decoded = layers.Dense(784, activation='sigmoid')(encoded)
# This model maps an input to its reconstruction
autoencoder = keras.Model(input_img, decoded)

Create a model for encoder

# This model maps an input to its encoded representation
encoder = keras.Model(input_img, encoded)

Create a model for the decoder


# This is our encoded (32-dimensional) input
encoded_input = keras.Input(shape=(encoding_dim,))
# Retrieve the last layer of the autoencoder model
decoder_layer = autoencoder.layers[-1]
# Create the decoder model
decoder = keras.Model(encoded_input, decoder_layer(encoded_input))

Compile the model

(x_train, _), (x_test, _) = mnist.load_data()
Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist.npz 
11493376/11490434 [==============================] - 0s 0us/step
11501568/11490434 [==============================] - 0s 0us/step

Preprocess the images before training

x_train = x_train.astype('float32') / 255.
x_test = x_test.astype('float32') / 255.
x_train = x_train.reshape((len(x_train), np.prod(x_train.shape[1:])))
x_test = x_test.reshape((len(x_test), np.prod(x_test.shape[1:])))

Get the shape of the datasets

print(x_train.shape)
print(x_test.shape)

(60000, 784)

(10000, 784)

Train the model

autoencoder.fit(x_train, x_train,
                epochs=50,
                batch_size=256,
                shuffle=True,
                validation_data=(x_test, x_test))
235/235 [==============================] - 3s 11ms/step - loss: 0.0935 - val_loss: 0.0921 Epoch 23/50 
235/235 [==============================] - 3s 11ms/step - loss: 0.0934 - val_loss: 0.0920 Epoch 24/50 
235/235 [==============================] - 3s 11ms/step - loss: 0.0933 - 
...
...
...
235/235 [==============================] - 2s 10ms/step - loss: 0.0928 - val_loss: 0.0916 Epoch 48/50 
235/235 [==============================] - 2s 10ms/step - loss: 0.0928 - val_loss: 0.0916 Epoch 49/50 235/235 [==============================] - 3s 11ms/step - loss: 0.0927 - val_loss: 0.0916 Epoch 50/50 
235/235 [==============================] - 2s 11ms/step - loss: 0.0927 - val_loss: 0.0915 
<keras.callbacks.History at 0x7fee29e9a2d0>

Now encode and decode some images having digits.

# Encode and decode some digits
# Note that we take them from the *test* set
encoded_imgs = encoder.predict(x_test)
decoded_imgs = decoder.predict(encoded_imgs)
# Use Matplotlib (don't ask)
import matplotlib.pyplot as plt

n = 10  # How many digits we will display
plt.figure(figsize=(20, 4))
for i in range(n):
    # Display original
    ax = plt.subplot(2, n, i + 1)
    plt.imshow(x_test[i].reshape(28, 28))
    plt.gray()
    ax.get_xaxis().set_visible(False)
    ax.get_yaxis().set_visible(False)

    # Display reconstruction
    ax = plt.subplot(2, n, i + 1 + n)
    plt.imshow(decoded_imgs[i].reshape(28, 28))
    plt.gray()
    ax.get_xaxis().set_visible(False)
    ax.get_yaxis().set_visible(False)
plt.show()

Now, we will move on to sparse auto encoder.

SPARSE AUTOENCODER

In basic autoencoder, we try to generate the output that is similar to its input. We archive that by using constraint such as using small size of hidden layers. In previous example, it was 32. In this way, its result is approximately similar to PCA. But even if the number of hidden units are large, we can archive similar structure by adding some constraints. One such constraint is Sparse. By doing this, a few units will get fired at a time. This can be done by using activity_regularizer in our dense layer.

from keras import regularizers

encoding_dim = 32

input_img = keras.Input(shape=(784,))
# Add a Dense layer with a L1 activity regularizer
encoded = layers.Dense(encoding_dim, activation='relu',
                activity_regularizer=regularizers.l1(10e-5))(input_img)
decoded = layers.Dense(784, activation='sigmoid')(encoded)

autoencoder = keras.Model(input_img, decoded)

DEEP AUTOENCODER

In deep autoencoder, we use multiple hidden layers instead of only one.

input_img = keras.Input(shape=(784,))
encoded = layers.Dense(128, activation='relu')(input_img)
encoded = layers.Dense(64, activation='relu')(encoded)
encoded = layers.Dense(32, activation='relu')(encoded)

decoded = layers.Dense(64, activation='relu')(encoded)
decoded = layers.Dense(128, activation='relu')(decoded)
decoded = layers.Dense(784, activation='sigmoid')(decoded)
autoencoder = keras.Model(input_img, decoded)
autoencoder.compile(optimizer='adam', loss='binary_crossentropy')

autoencoder.fit(x_train, x_train,
                epochs=100,
                batch_size=256,
                shuffle=True,
                validation_data=(x_test, x_test))

CONVOLUTIONAL AUTOENCODER

Since our inputs are images, it is good to use convolutional neural network. It performs better with the image and gives good result.


The encoder will be in a stack of Conv2d and MaxPooling2d layers, and decoder will be in a stack of Conv2d and UpSampling2d layers.


Before performing convolution, we will use the MNIST dataset and normalize the images's values between 0 and 1.

import keras
from keras import layers
from keras.datasets import mnist
import numpy as np

(x_train, _), (x_test, _) = mnist.load_data()

x_train = x_train.astype('float32') / 255.
x_test = x_test.astype('float32') / 255.
x_train = np.reshape(x_train, (len(x_train), 28, 28, 1))
x_test = np.reshape(x_test, (len(x_test), 28, 28, 1))

Create a CNN autoencoder model.

input_img = keras.Input(shape=(28, 28, 1))

x = layers.Conv2D(16, (3, 3), activation='relu', padding='same')(input_img)
x = layers.MaxPooling2D((2, 2), padding='same')(x)
x = layers.Conv2D(8, (3, 3), activation='relu', padding='same')(x)
x = layers.MaxPooling2D((2, 2), padding='same')(x)
x = layers.Conv2D(8, (3, 3), activation='relu', padding='same')(x)
encoded = layers.MaxPooling2D((2, 2), padding='same')(x)

# at this point the representation is (4, 4, 8) i.e. 128-dimensional

x = layers.Conv2D(8, (3, 3), activation='relu', padding='same')(encoded)
x = layers.UpSampling2D((2, 2))(x)
x = layers.Conv2D(8, (3, 3), activation='relu', padding='same')(x)
x = layers.UpSampling2D((2, 2))(x)
x = layers.Conv2D(16, (3, 3), activation='relu')(x)
x = layers.UpSampling2D((2, 2))(x)
decoded = layers.Conv2D(1, (3, 3), activation='sigmoid', padding='same')(x)

autoencoder = keras.Model(input_img, decoded)
autoencoder.compile(optimizer='adam', loss='binary_crossentropy')
autoencoder.fit(x_train, x_train,
                epochs=50,
                batch_size=128,
                shuffle=True,
                validation_data=(x_test, x_test))
decoded_imgs = autoencoder.predict(x_test)

n = 10
plt.figure(figsize=(20, 4))
for i in range(1, n + 1):
    # Display original
    ax = plt.subplot(2, n, i)
    plt.imshow(x_test[i].reshape(28, 28))
    plt.gray()
    ax.get_xaxis().set_visible(False)
    ax.get_yaxis().set_visible(False)

    # Display reconstruction
    ax = plt.subplot(2, n, i + n)
    plt.imshow(decoded_imgs[i].reshape(28, 28))
    plt.gray()
    ax.get_xaxis().set_visible(False)
    ax.get_yaxis().set_visible(False)
plt.show()

SEQUENCE-TO-SEQUENCE AUTOENCODER

If the input data is in sequences, then we use temporal memory, for example LSTM, in our encoder and decoder models. First we will convert an input sequence into a vector through encoder. Then we will repeat this n times, where n is the number of timesteps in the output sequence, and then we will run the decoder to turn this vector into target sequence.

timesteps = ...  # Length of your sequences
input_dim = ... 
latent_dim = ...

inputs = keras.Input(shape=(timesteps, input_dim))
encoded = layers.LSTM(latent_dim)(inputs)

decoded = layers.RepeatVector(timesteps)(encoded)
decoded = layers.LSTM(input_dim, return_sequences=True)(decoded)

sequence_autoencoder = keras.Model(inputs, decoded)
encoder = keras.Model(inputs, encoded)

VARIATIONAL AUTOENCODER

Latent space is a representation of compressed data in which the similar data points are clustered together in a space.


A variational autoencoder (VAE) provides a probabilistic manner to describe an observation in latent space. and because of that, the encoder describes a probability distribution for each latent attribute.


The encoder will project high dimensional input on to low dimensional latent space. And the decoder will decode the vector, in the low dimensionality, to higher dimensional input.

First, we will import the MNIST dataset.

(x_train, y_train), (x_test, y_test) = mnist.load_data()

x_train = x_train.astype('float32') / 255.
x_test = x_test.astype('float32') / 255.
x_train = x_train.reshape((len(x_train), np.prod(x_train.shape[1:])))
x_test = x_test.reshape((len(x_test), np.prod(x_test.shape[1:])))

Now, we will map the input to the later distribution space.

original_dim = 28 * 28
intermediate_dim = 64
latent_dim = 2

inputs = keras.Input(shape=(original_dim,))
h = layers.Dense(intermediate_dim, activation='relu')(inputs)
z_mean = layers.Dense(latent_dim)(h)
z_log_sigma = layers.Dense(latent_dim)(h)

We can use these parameter to get new data points from the latent space.

from keras import backend as K

def sampling(args):
    z_mean, z_log_sigma = args
    epsilon = K.random_normal(shape=(K.shape(z_mean)[0], latent_dim),
                              mean=0., stddev=0.1)
    return z_mean + K.exp(z_log_sigma) * epsilon

z = layers.Lambda(sampling)([z_mean, z_log_sigma])

We can map these sample data points in latent space back to the reconstructed inputs.

# Create encoder
encoder = keras.Model(inputs, [z_mean, z_log_sigma, z], name='encoder')

# Create decoder
latent_inputs = keras.Input(shape=(latent_dim,), name='z_sampling')
x = layers.Dense(intermediate_dim, activation='relu')(latent_inputs)
outputs = layers.Dense(original_dim, activation='sigmoid')(x)
decoder = keras.Model(latent_inputs, outputs, name='decoder')

# instantiate VAE model
outputs = decoder(encoder(inputs)[2])
vae = keras.Model(inputs, outputs, name='vae_mlp')

Compile the model with with a custom loss function: the sum of a reconstruction term, and the KL divergence regularization term.

reconstruction_loss = keras.losses.binary_crossentropy(inputs, outputs)
reconstruction_loss *= original_dim
kl_loss = 1 + z_log_sigma - K.square(z_mean) - K.exp(z_log_sigma)
kl_loss = K.sum(kl_loss, axis=-1)
kl_loss *= -0.5
vae_loss = K.mean(reconstruction_loss + kl_loss)
vae.add_loss(vae_loss)
vae.compile(optimizer='adam')

We train the netwok on MNIST dataset.

vae.fit(x_train, x_train,
        epochs=100,
        batch_size=32,
        validation_data=(x_test, x_test))

Then, we will generate the output

# Display a 2D manifold of the digits
n = 15  # figure with 15x15 digits
digit_size = 28
figure = np.zeros((digit_size * n, digit_size * n))
# We will sample n points within [-15, 15] standard deviations
grid_x = np.linspace(-15, 15, n)
grid_y = np.linspace(-15, 15, n)

for i, yi in enumerate(grid_x):
    for j, xi in enumerate(grid_y):
        z_sample = np.array([[xi, yi]])
        x_decoded = decoder.predict(z_sample)
        digit = x_decoded[0].reshape(digit_size, digit_size)
        figure[i * digit_size: (i + 1) * digit_size,
               j * digit_size: (j + 1) * digit_size] = digit

plt.figure(figsize=(10, 10))
plt.imshow(figure)
plt.show()

If you need implementation for any of the topics mentioned above or assignment help on any of its variants, feel free to contact us.

Comments


bottom of page