INTRODUCTION
As Convolutional Neural Network models have become more popular in Computer Vision, a number of attempts have been made to significantly improve the accuracy of these models. One of those successful attempts is VGG. In this blog, you will be introduced to VGG and its architecture.
VGG is a convolutional neural network model which was proposed by Karen Simonyan & Andrew Zissermanv. The concept of VGG starts with the purpose to investigate the accuracy of the convolutional network, with increasing depth, in the large-scale image recognition configuration. While doing that, it uses a filter, of size 3x3, to extract the features from the images. VGG proves that a significant improvement can be achieved by increasing the depth of the CovNets.
ARCHITECTURE OF ConvNet
VGG takes the input images of fixed size, that is, 224 x 224 RGB images. The preprocessing requires subtracting the mean value of an image to each of its pixels. The image is then passed through a series of convolution layers having the kernel of size (3 x 3). In one of its configurations, it uses a kernel of size (1 x 1).
The stride is fixed at 1 pixel. It uses five max-pooling layers, performing Spatial pooling, that follow some of the convolution layers. Max-pooling is carried out with the stride of 2 on a (2 x 2) pixel window.
After the convolution layers, we move on to the fully-connected layers which consist of three layers. The first two layers have 4096 layers and the third layer consists of 1000 layers which correspond to each class label. The final layer is a soft-max layer. All the hidden layers use ReLU as activation function.
IMPLEMENTATION
First import the essential libraries
import keras
import numpy as np
from keras.applications import vgg16
from keras.preprocessing import image
from keras.preprocessing.image import load_img
from keras.preprocessing.image import img_to_array
from keras.applications.vgg16 import preprocess_input
from keras.applications.vgg16 import decode_predictions
from keras.utils.vis_utils import plot_model
Download the VGG16 model
vgg_model = vgg16.VGG16(weights='imagenet')
Visualize the structure of VGG16 model
plot_model(vgg_model, to_file='vgg_model.png')
Get the structure of the training model
Model: "vgg16" _________________________________________________________________
Layer (type) Output Shape Param # =================================================================
input_1 (InputLayer) [(None, 224, 224, 3)] 0 block1_conv1 (Conv2D) (None, 224, 224, 64) 1792 block1_conv2 (Conv2D) (None, 224, 224, 64) 36928 block1_pool (MaxPooling2D) (None, 112, 112, 64) 0 block2_conv1 (Conv2D) (None, 112, 112, 128) 73856 block2_conv2 (Conv2D) (None, 112, 112, 128) 147584 block2_pool (MaxPooling2D) (None, 56, 56, 128) 0 block3_conv1 (Conv2D) (None, 56, 56, 256) 295168 block3_conv2 (Conv2D) (None, 56, 56, 256) 590080 block3_conv3 (Conv2D) (None, 56, 56, 256) 590080 block3_pool (MaxPooling2D) (None, 28, 28, 256) 0 block4_conv1 (Conv2D) (None, 28, 28, 512) 1180160 block4_conv2 (Conv2D) (None, 28, 28, 512) 2359808 block4_conv3 (Conv2D) (None, 28, 28, 512) 2359808 block4_pool (MaxPooling2D) (None, 14, 14, 512) 0 block5_conv1 (Conv2D) (None, 14, 14, 512) 2359808 block5_conv2 (Conv2D) (None, 14, 14, 512) 2359808 block5_conv3 (Conv2D) (None, 14, 14, 512) 2359808 block5_pool (MaxPooling2D) (None, 7, 7, 512) 0 flatten (Flatten) (None, 25088) 0 fc1 (Dense) (None, 4096) 102764544 fc2 (Dense) (None, 4096) 16781312 predictions (Dense) (None, 1000) 4097000 =================================================================
Total params: 138,357,544
Trainable params: 138,357,544
Non-trainable params: 0 _________________________________________________________________
None
Load the sample image, I used an image of snow bear. You can download the image using this link: https://unsplash.com/photos/qQWV91TTBrE
sample_image = load_img('/content/snow_bear.jpg', target_size=(224, 224))
# convert the image to numpy array
sample_image = img_to_array(sample_image)
# reshape the image for the model
sample_image = sample_image.reshape((1, sample_image.shape[0], sample_image.shape[1], sample_image.shape[2]))
# preprocessing the image
sample_image = preprocess_input(sample_image)
# predicting the probability of the image belong to a particular class
y_pred = vgg_model.predict(sample_image)
# converting the probability to class label
label = decode_predictions(y_pred)
# retrieve the result with highest probability
label = label[0][0]
# classification result
print('%s (%.2f%%)' % (label[1], label[2]*100))
RESULT
ice_bear (99.99%)
If you need implementation for any of the topics mentioned above or assignment help on any of its variants, feel free to contact us.
Comments