Updated: Aug 18, 2022


As Convolutional Neural Network models have become more popular in Computer Vision, a number of attempts have been made to significantly improve the accuracy of these models. One of those successful attempts is VGG. In this blog, you will be introduced to VGG and its architecture.

VGG is a convolutional neural network model which was proposed by Karen Simonyan & Andrew Zissermanv. The concept of VGG starts with the purpose to investigate the accuracy of the convolutional network, with increasing depth, in the large-scale image recognition configuration. While doing that, it uses a filter, of size 3x3, to extract the features from the images. VGG proves that a significant improvement can be achieved by increasing the depth of the CovNets.


VGG takes the input images of fixed size, that is, 224 x 224 RGB images. The preprocessing requires subtracting the mean value of an image to each of its pixels. The image is then passed through a series of convolution layers having the kernel of size (3 x 3). In one of its configurations, it uses a kernel of size (1 x 1).

The stride is fixed at 1 pixel. It uses five max-pooling layers, performing Spatial pooling, that follow some of the convolution layers. Max-pooling is carried out with the stride of 2 on a (2 x 2) pixel window.

After the convolution layers, we move on to the fully-connected layers which consist of three layers. The first two layers have 4096 layers and the third layer consists of 1000 layers which correspond to each class label. The final layer is a soft-max layer. All the hidden layers use ReLU as activation function.


First import the essential libraries

import keras
import numpy as np
from keras.applications import vgg16
from keras.preprocessing import image
from keras.preprocessing.image import load_img
from keras.preprocessing.image import img_to_array
from keras.applications.vgg16 import preprocess_input
from keras.applications.vgg16 import decode_predictions
from keras.utils.vis_utils import plot_model

Download the VGG16 model

vgg_model = vgg16.VGG16(weights='imagenet')

Visualize the structure of VGG16 model

plot_model(vgg_model, to_file='vgg_model.png')

Get the structure of the training model

Model: "vgg16" _________________________________________________________________  
Layer (type)                Output Shape              Param #    =================================================================  
input_1 (InputLayer)        [(None, 224, 224, 3)]     0                                                                             block1_conv1 (Conv2D)       (None, 224, 224, 64)      1792                                                                          block1_conv2 (Conv2D)       (None, 224, 224, 64)      36928                                                                         block1_pool (MaxPooling2D)  (None, 112, 112, 64)      0                                                                             block2_conv1 (Conv2D)       (None, 112, 112, 128)     73856                                                                         block2_conv2 (Conv2D)       (None, 112, 112, 128)     147584                                                                        block2_pool (MaxPooling2D)  (None, 56, 56, 128)       0                                                                             block3_conv1 (Conv2D)       (None, 56, 56, 256)       295168                                                                        block3_conv2 (Conv2D)       (None, 56, 56, 256)       590080                                                                        block3_conv3 (Conv2D)       (None, 56, 56, 256)       590080                                                                        block3_pool (MaxPooling2D)  (None, 28, 28, 256)       0                                                                             block4_conv1 (Conv2D)       (None, 28, 28, 512)       1180160                                                                       block4_conv2 (Conv2D)       (None, 28, 28, 512)       2359808                                                                       block4_conv3 (Conv2D)       (None, 28, 28, 512)       2359808                                                                       block4_pool (MaxPooling2D)  (None, 14, 14, 512)       0                                                                             block5_conv1 (Conv2D)       (None, 14, 14, 512)       2359808                                                                       block5_conv2 (Conv2D)       (None, 14, 14, 512)       2359808                                                                       block5_conv3 (Conv2D)       (None, 14, 14, 512)       2359808                                                                       block5_pool (MaxPooling2D)  (None, 7, 7, 512)         0                                                                             flatten (Flatten)           (None, 25088)             0                                                                             fc1 (Dense)                 (None, 4096)              102764544                                                                     fc2 (Dense)                 (None, 4096)              16781312                                                                      predictions (Dense)         (None, 1000)              4097000                                                                      ================================================================= 
Total params: 138,357,544 
Trainable params: 138,357,544 
Non-trainable params: 0 _________________________________________________________________ 

Load the sample image, I used an image of snow bear. You can download the image using this link:

sample_image = load_img('/content/snow_bear.jpg', target_size=(224, 224))
# convert the image to numpy array
sample_image = img_to_array(sample_image)

# reshape the image for the model
sample_image = sample_image.reshape((1, sample_image.shape[0], sample_image.shape[1], sample_image.shape[2]))

# preprocessing the image
sample_image = preprocess_input(sample_image)

# predicting the probability of the image belong to a particular class
y_pred = vgg_model.predict(sample_image)

# converting the probability to class label
label = decode_predictions(y_pred)

# retrieve the result with highest probability
label = label[0][0]

# classification result
print('%s (%.2f%%)' % (label[1], label[2]*100))


ice_bear (99.99%)
