top of page

Optical Character Recognition Using Convolutional neural network

Updated: Jul 30, 2021



In this article you will learn about optical character recognition (OCR)? How does optical character recognition work? Let's start


What is OCR?


OCR stands for optical character recognition. We have plenty of information in the form of printed documents, handwritten scripts and images. OCR is the process to recognize scanned images of both handwritten and printed characters and convert it into a machine readable and digital format. There are three main aspect of OCR approach:


  • Preprocessing

  • Character recognition

  • Character segmentation and presentation of data


main aspect of OCR
main aspect of OCR


The OCR can be implemented by using convolutional neural networks. CNN is popular in deep neural network architecture.


How does Optical character recognition Work?


Techniques


Image Processing in OCR


The aim of pre-processing is to improve the quality of image data so that OCR model gives you accurate output. Mostly the OCR model gives an accurate output with 300 DPI. Image scaling refers to the resizing of a digital image. When we scanned the document sometimes the document was not properly aligned. Skewed image defines the image which is not straight. It directly impacts the line segmentation of the OCR model which reduces the accuracy rate. It may need to be tilted a few degree clockwise or counterclockwise in order to make lines of text perfectly horizontal or vertical.


Character Recognition in OCR :


There are two types of OCR algorithm.


  1. Matrix matching compares the image which is scanned by OCR scanner as a character with a library of character matrices or templates. When an image matches one of these matrices of dots within a given level of similarity. The computer labels the image according to that ASCII character.


  1. Feature Extraction is OCR without strict matching to prescribed templates. Also known as Intelligent Character Recognition (ICR), or Topological Feature Analysis, this method varies by how much "computer intelligence" is applied by the manufacturer. The computer looks for general features such as open areas, closed shapes, diagonal lines, line intersections, etc. This method is more versatile than matrix matching. Matrix matching works best if the OCR encounters a limited repertoire of type styles, with little or no variation within each style. Where the characters are less predictable, feature, or topographical analysis is superior.


Post-Processing in OCR


It is the error correction technique to ensure high accuracy of the OCR model. OCR accuracy can be increased if the output is constrained by lexicon. In this way the algorithm can make a list of words that are allowed to occur in the scanned document.


Convolutional Neural Network


CNNs are made of a large number of interconnected neurons that have learnable weights and biases. In CNN architecture the neurons are organized as layers. It contains the hidden layers, input layer, and output layer. When the large number of hidden layers in the network is generally said to be a deep neural network. The hidden layer neurons of CNN are connected to a small region of the input space generated from the previous layer instead of connecting to all, as in the fully connected network like Multi Layered Perceptron networks. This method reduces the number of connection weights in CNN compared to MLP. CNN takes less time to train for networks of similar size. The input to the typical CNN are two dimensional arrays of data such as images. Unlike the regular neural network the layers of a CNN are arranged in three dimensions.


  • Basically the input layer is a buffer to hold the input and go to the next layer

  • CNN performs the core operation of feature extraction and convolutional operation of the input data.

  • ReLU Rectified Linear Unit is an activation function used to introduce non linearity. It replaces the negative value with zero. It can speed up the learning process. Every output of the convolutional layer is passed through the activation function.

  • Pooling layer reduces the spatial size of each feature map, hence the computation is reduced in the network. It also uses a sliding window that moves in stride across the feature map and transforms it into representative values.

  • Fully connected layers connect every neuron in the layer to all the neurons in the previous layer. It learns non-linear combinations of features and is used to classify or estimate output. For classification problems, the fully connected layer is followed by a soft-max layer, It produces the probability of each class for the given input. And for regression problems, it is followed by a regression layer to predict the output.



Thank you


Comments


bottom of page