How to make a simple Optical Character Recognition script.

Published Apr 26, 2018Last updated Oct 23, 2018

Optical character recognition is the recognition of typed, handwritten or printed text and converting them into text. OCR can be used to automate various task involving humans, like in banking, OCR is being used to process checks without human involvement, generating content of documents from their scanned images, it can also be helpful for visually impaired people, etc.
For this OCR we'll be using Microsoft's Computer Vision API. We'll do a post request for making a API call in python. and in response, we'll get output in JSON format.
To get started you are required to have a Microsoft account, and after that, you can get a free subscription to computer vision API for 30 days. You have to acquire your secret subscription key which looks similar to this 98f714r6vb2e193018b28fg1u9b3b0d7e7

#Defining base url for API call.
base_url = "https://westcentralus.api.cognitive.microsoft.com/vision/v1.0/"
ocr_url = base_url + "ocr"

#Defining subscription key and headers for subscription key.
sub = "98f714r6vb2e193018b28fg1u9b3b0d7e7"
headers  = {'Ocp-Apim-Subscription-Key': sub}

Microsoft OCR API is quite flexible and we can define many parameters depending upon our use cases, here we defining two parameters, the language which is English in our case (defined by 'unk') and whether to detect orientation of text or not, which is defined as true in our case. We also need url of the image on which we want to run our OCR (we can also upload a local image for OCR), so we'll define url of the image.

#Defining parameters and orientation
params   = {'language': 'unk', 'detectOrientation ': 'true'}

#Defining image url
img = "https://quotefancy.com/download/18846/original/wallpaper.jpg"
data = {'url': img}

Following is the image at above link

Now we'll import requests for making a post request mentioning ocr_url, headers, params and json.

import requests
response = requests.post(ocr_url, headers=headers, params=params, json=data)
response.raise_for_status()
analysis = response.json()
print analysis

The JSON output of the above script contains data about bounding box coordinates, orientation and text angle, for each word line by line. Here's the ouput

{  
   'language':'en',
   'orientation':'Up',
   'textAngle':0.0,
   'regions':[  
      {  
         'boundingBox':'689,768,2462,1049',
         'lines':[  
            {  
               'boundingBox':'689,768,2462,180',
               'words':[  
                  {  
                     'boundingBox':'689,768,541,158',
                     'text':'Work'
                  },
                  {  
                     'boundingBox':'1293,768,450,158',
                     'text':'hard'
                  },
                  {  
                     'boundingBox':'1816,768,158,156',
                     'text':'in'
                  },
                  {  
                     'boundingBox':'2041,768,771,180',
                     'text':'silence,'
                  },
                  {  
                     'boundingBox':'2889,768,262,158',
                     'text':'Let'
                  }
               ]
            },
            {  
               'boundingBox':'689,1037,2454,181',
               'words':[  
                  {  
                     'boundingBox':'689,1075,399,143',
                     'text':'your'
                  },
                  {  
                     'boundingBox':'1135,1074,722,103',
                     'text':'success'
                  },
                  {  
                     'boundingBox':'1918,1037,217,140',
                     'text':'be'
                  },
                  {  
                     'boundingBox':'2184,1075,399,143',
                     'text':'your'
                  },
                  {  
                     'boundingBox':'2638,1037,505,140',
                     'text':'noise.'
                  }
               ]
            },
            {  
               'boundingBox':'1717,1358,408,52',
               'words':[  
                  {  
                     'boundingBox':'1717,1359,173,51',
                     'text':'Frank'
                  },
                  {  
                     'boundingBox':'1913,1358,212,52',
                     'text':'Ocean'
                  }
               ]
            },
            {  
               'boundingBox':'1782,1765,276,52',
               'words':[  
                  {  
                     'boundingBox':'1782,1765,276,52',
                     'text':'@quoteßancu'
                  }
               ]
            }
         ]
      }
   ]
}

Enjoy!

P.S: Just in case if you need any clarification do post a comment.

Computer vision Microsoft cogntive services Python Api

Report

Enjoy this post? Give Akhand Pratap Mishra a like if it's helpful.

Akhand Pratap Mishra

Software Developer | AI | Android | Deep Learning

I'm a software developer and have experience in various fields in computer science. Some of my expertise includes Android Application Development, Deep Learning, Algorithms and Data Structures, Competitive Programming, OS, compi...

Discover and read more posts from Akhand Pratap Mishra

get started

2Replies

Andy Mohan

7 years ago

I really liked your post…I am new to python basically a layman .
So this API would help in scanning hand written text as well ?

Akhand Pratap Mishra

7 years ago

Hi Andy,
If you want to read a handwritten text from an image, just do the following changes
1: Replace
ocr_url = base_url + “ocr” to
ocr_url = base_url + “RecognizeText”

2: Redefine the params as
params = {‘handwriting’ : True}

You should also keep in mind that handwritten API would not directly return you the text but an “Operation Location” URL in the response header. You have to poll the URL to get the result of the operation. Also instead of a bounding box, handwritten API will return you the bounding polygon.

Here’s the python code for polling

import time

analysis = {}
while not “recognitionResult” in analysis:
response_final = requests.get(response.headers[“Operation-Location”], headers=headers)
analysis = response_final.json()
time.sleep(1)

The generation of the result may take some time in some cases, that’s why we’re using a loop in the above code.
The result is stored in “analysis” variable which contains text and polygon’s coordinates.