Handwritten digit recognition is a classic problem in the field of computer vision and machine learning. The goal is to develop a model that can correctly identify digits (0-9) from images of handwritten numbers. This blog walks you through the process of building a Convolutional Neural Network (CNN) to recognize digits using the MNIST dataset. The provided code is structured in a Jupyter Notebook, and we will explain each part in detail.
Introduction
The MNIST dataset is a large database of handwritten digits that is commonly used for training various image processing systems. In this project, we will load the MNIST dataset, preprocess the data, build a CNN model, train it, and evaluate its performance. Additionally, we will explore predictions on random test images and test custom images to check the model's accuracy.
What You Will Learn
By the end of this tutorial, you will learn:
How to load and preprocess the MNIST dataset.
How to build a CNN model using TensorFlow and Keras.
How to train and evaluate the model.
How to make predictions on test images and custom images.
Prerequisites
Before starting, ensure you have the following installed:
Python
TensorFlow and Keras
Jupyter Notebook
Understanding the Provided Code
Let’s go through the provided code step by step.
1. Importing Required Libraries
import tensorflow as tf
import numpy as np
from keras.datasets import mnist
from keras.utils import to_categorical
import matplotlib.pyplot as plt
import os
from ipywidgets import interact,fixed, interact_manual, IntSlider
import ipywidgets as widgets
TensorFlow and Keras: Used for building and training the neural network.
NumPy: Used for numerical operations on data arrays.
Matplotlib: Used for data visualization.
IPyWidgets: Provides interactive widgets for the Jupyter Notebook.
2. Loading and Analyzing the Dataset
(x_train, y_train), (x_test, y_test) = mnist.load_data()
print(x_train.shape)
print(y_train.shape)
print(x_test.shape)
print(y_test.shape)
Loading the Dataset: The MNIST dataset is loaded, which consists of 60,000 training images and 10,000 test images, all of size 28x28 pixels.
Analyzing the Data: The shapes of the training and testing datasets are printed to understand their structure.
Output :
Visualizing Random Images from the Dataset
def show_images():
array = np.random.randint(low=1, high=10000, size=400)
fig = plt.figure(figsize=(30, 35))
for i in range(400):
fig.add_subplot(20, 20, i + 1)
plt.xticks([])
plt.yticks([])
plt.title(y_train[array[i]], color='red', fontsize=20)
plt.imshow(x_train[array[i]], cmap="gray")
show_images()
Random Image Display: The show_images function randomly selects 400 images from the training set and displays them in a grid. This helps in visualizing the diversity of the handwritten digits.
Output :
3. Preprocessing the Data
One-Hot Encoding the Labels
y_train_enc = to_categorical(y_train)
y_test_enc = to_categorical(y_test)
One-Hot Encoding: The labels are converted to one-hot encoded vectors. For example, the label 5 becomes [0, 0, 0, 0, 0, 1, 0, 0, 0, 0].
Normalizing the Images
x_train_norm = x_train / 255.
x_test_norm = x_test / 255.
Normalization: The pixel values of the images are normalized to the range [0, 1] by dividing by 255. This helps in faster convergence during training.
4. Building the CNN Model
from keras.models import Sequential
from keras.layers import Conv2D, MaxPooling2D, Dense, Flatten, BatchNormalization
from keras.optimizers import SGD
model = Sequential([Conv2D(32, (3, 3), activation='relu', kernel_initializer='he_uniform', input_shape=(28, 28, 1)),
MaxPooling2D(2, 2),
Conv2D(64, (3, 3), activation='relu', kernel_initializer='he_uniform'),
Conv2D(64, (3, 3), activation='relu', kernel_initializer='he_uniform'),
MaxPooling2D(2, 2),
Flatten(),
Dense(100, activation='relu', kernel_initializer='he_uniform'),
BatchNormalization(),
Dense(10, activation='softmax')])
model.compile(loss='categorical_crossentropy', optimizer=SGD(lr=0.01, momentum=0.9),
metrics=['accuracy'])
model.summary()
Model Architecture: The CNN model is built using Sequential from Keras. The architecture consists of:
Convolutional layers with ReLU activation for feature extraction.
MaxPooling layers to reduce the spatial dimensions of the feature maps.
Flatten layer to convert the 2D matrices into a 1D vector.
Dense layers for classification.
BatchNormalization layer to normalize the activations.
The final layer uses Softmax activation to output probabilities for each digit class.
Compilation: The model is compiled using categorical_crossentropy as the loss function and SGD as the optimizer.
Output :
5. Reshaping and Splitting Data
x_train_norm = x_train_norm.reshape((x_train_norm.shape[0], 28, 28, 1))
x_test_norm = x_test_norm.reshape((x_test_norm.shape[0], 28, 28, 1))
from sklearn.model_selection import train_test_split as tts
x_val, x_test_, y_val, y_test_ = tts(x_test_norm, y_test_enc, test_size=0.5)
Reshaping: The training and testing images are reshaped to include a channel dimension, converting them from (28, 28) to (28, 28, 1).
Train-Test Split: The test data is further split into validation and testing sets, each containing 50% of the original test data.
print(x_train_norm.shape)
print(x_test_norm.shape)
print(x_val.shape)
print(y_val.shape)
print(x_test_.shape)
print(y_test_.shape)
Output :
6. Training the Model
history = model.fit(x=x_train_norm, y=y_train_enc,
batch_size=64,
validation_data=(x_val, y_val),
epochs=15)
Training: The model is trained for 15 epochs using the training data. The validation data is used to monitor the model's performance during training.
Output :
Plotting Training History
val_loss = history.history['val_loss']
val_accuracy = history.history['val_accuracy']
loss = history.history['loss']
accuracy = history.history['accuracy']
fig = plt.figure(figsize=(15, 15))
fig.add_subplot(2, 1, 1)
plt.title('Cross Entropy Loss')
plt.plot(loss, color='blue', label='train')
plt.plot(val_loss, color='red', label='test')
plt.legend()
fig.add_subplot(2, 1, 2)
plt.title('Classification Accuracy')
plt.plot(accuracy, color='blue', label='train')
plt.plot(val_accuracy, color='red', label='test')
plt.legend()
Visualization: The training and validation loss and accuracy are plotted to visualize the model's learning process.
Output :
7. Evaluating the Model
metrics = model.evaluate(x_test_, y_test_)
print("Test Accuracy is : {:.2f}".format(metrics[1] * 100))
print("Test Loss is : {:.2f}".format(metrics[0]))
model.save('my_model')
loaded_model = tf.keras.models.load_model('my_model')
metrics = loaded_model.evaluate(x_test_, y_test_)
print("Test Accuracy is : {:.2f}".format(metrics[1] * 100))
print("Test Loss is : {:.2f}".format(metrics[0]))
Evaluation: The model is evaluated on the test set, and the accuracy and loss are printed.
Model Saving and Loading: The trained model is saved to disk and then reloaded to ensure that it can be reused.
Output :
8. Making Predictions
def test_images(n=10):
index = np.random.randint(low=0, high=5000, size=n)
fig = plt.figure(figsize=(n, 4))
for i in range(n):
[pred] = model.predict(x_test_[index[i]].reshape(1, 28, 28, 1))
pred = np.argmax(pred)
actual = np.argmax(y_test_[index[i]])
fig.add_subplot(2, n//2, i + 1)
plt.xticks([])
plt.yticks([])
if actual == pred:
plt.title(pred, color='green')
else:
plt.title(pred, color='red')
plt.imshow(x_test_[index[i]].reshape(28, 28))
test_images(10)
Test Predictions: The test_images function randomly selects test images, predicts their labels using the trained model, and compares them to the actual labels. Correct predictions are displayed in green, while incorrect ones are in red.
Output :
9. Testing with Custom Images
import cv2
def number_recognize(filepath):
image = cv2.imread(filepath)
image = cv2.medianBlur(image, 7)
grey = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
thresh = cv2.adaptiveThreshold(grey, 200, cv2.ADAPTIVE_THRESH_MEAN_C, cv2.THRESH_BINARY_INV, 33, 25)
contours, hierarchy = cv2.findContours(thresh, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
preprocessed_digits = []
boundingBoxes = [cv2.boundingRect(c) for c in contours]
(contours, boundingBoxes) = zip(*sorted(zip(contours, boundingBoxes), key=lambda b: b[1][0], reverse=False))
for c in contours:
x, y, w, h = cv2.boundingRect(c)
cv2.rectangle(image, (x, y), (x + w, y + h), color=(255, 0, 0), thickness=2)
digit = thresh[y:y + h, x:x + w]
resized_digit = cv2.resize(digit, (18, 18))
padded_digit = np.pad(resized_digit, ((5, 5), (5, 5)), "constant", constant_values=0)
preprocessed_digits.append(padded_digit)
plt.imshow(image, cmap="gray")
plt.title("Contoured Image", color='red')
plt.show()
inp = np.array(preprocessed_digits)
figr = plt.figure(figsize=(len(inp), 4))
nums = []
for i, digit in enumerate(preprocessed_digits):
[prediction] = loaded_model.predict(digit.reshape(1, 28, 28, 1) / 255.)
pred = np.argmax(prediction)
nums.append(pred)
figr.add_subplot(1, len(inp), i + 1)
plt.xticks([])
plt.yticks([])
plt.imshow(digit.reshape(28, 28), cmap="gray")
plt.title(pred)
print("The Recognized Numbers are:", *nums)
number_recognize('2.jpg')
number_recognize('3.jpg')
number_recognize('4.jpg')
number_recognize('5.jpg')
number_recognize('6.jpg')
Custom Image Testing: The number_recognize function processes a custom image containing handwritten digits. The image is preprocessed, contours are detected, and each digit is isolated, resized, and passed through the model for prediction. The recognized digits are displayed along with the processed image.
Ouput :
Running the Code
To run this notebook:
Ensure you have the required libraries installed (TensorFlow, Keras, NumPy, Matplotlib).
Place your custom images in the appropriate directory.
Run each cell sequentially in a Jupyter Notebook environment.
Project Demo Video
This blog has walked you through the process of building a handwritten digit recognition model using the MNIST dataset. The steps included loading and preprocessing the dataset, building and training a CNN model, evaluating the model, and making predictions on both test and custom images. This project demonstrates the effectiveness of CNNs in image classification tasks and provides a foundation for more complex computer vision projects.
If you require any assistance with this project or Machine Learning projects, please do not hesitate to contact us. We have a team of experienced developers who specialize in Machine Learning and can provide you with the necessary support and expertise to ensure the success of your project. You can reach us through our website or by contacting us directly via email or phone.
Comments