Extract Text from Image Left-to-Right and Top-to-Bottom with Keras-OCR
Make computers read text in a more ‘human’ way.
What is OCR? OCR stands for Optical Character Recognition. It recognizes text within a digital image. In this article, I will discuss how I made improvements to a library called Keras-OCR in order to return text in an ordered, human-readable format (left to right, top to bottom).
Here is the documentation to the library below:
Simple implementation following the source documents:
import keras_ocrdef detect_w_keras(image_path):
"""Function returns detected text from image"""
# Initialize pipeline
pipeline = keras_ocr.pipeline.Pipeline() # Read in image path
read_image = keras_ocr.tools.read(image_path) # prediction_groups is a list of (word, box) tuples
prediction_groups = pipeline.recognize([read_image]) return prediction_groups[0]
Sample prediction with bounding box coordinates (word, box) tuple:
# In the order of...
# (word, ([[top-left], [top-right], [bottom-right], [bottom-left]]))('those',
array([[299.41794 , 82.824036],
[483.91843 , 86.465485],
[482.73495 , 146.42897 ],
[298.23447 , 142.78752 ]], dtype=float32))
A typical bounding box usually looks like…
After using Keras-OCR to extract any detectable text in an image, I used the Pythagorean Theorem (hello middle-school) to order the bounding boxes. Each bounding box’s center will have a distance from the origin at (0,0) and that list of distances are then sorted by its distinguished rows and columns. Note: Matplotlib displays images where the y-axis is inverted. This is normal in computer vision.
Now to put these yellow triangles into code…if triangle gets wider, same row; if triangle gets longer past the specified threshold, new row. First, get the list of all distances from origin for each bounding box. Results are stored in a list of dictionaries with multiple (key, value) pairs.
import mathdef get_distance(predictions):
"""
Function returns dictionary with (key,value):
* text : detected text in image
* center_x : center of bounding box (x)
* center_y : center of bounding box (y)
* distance_from_origin : hypotenuse
* distance_y : distance between y and origin (0,0)
"""
# Point of origin
x0, y0 = 0, 0 # Generate dictionary
detections = []
for group in predictions:
# Get center point of bounding box
top_left_x, top_left_y = group[1][0]
bottom_right_x, bottom_right_y = group[1][1]
center_x = (top_left_x + bottom_right_x) / 2
center_y = (top_left_y + bottom_right_y) / 2 # Use the Pythagorean Theorem to solve for distance from origin
distance_from_origin = math.dist([x0,y0], [center_x, center_y]) # Calculate difference between y and origin to get unique rows
distance_y = center_y - y0 # Append all results
detections.append({
'text':group[0],
'center_x':center_x,
'center_y':center_y,
'distance_from_origin':distance_from_origin,
'distance_y':distance_y
}) return detections
Next, distinguish and split detections by rows and columns. Each sublist is a new row. Threshold helps determine when a row breaks off into a new row and may need to be adjusted depending on how spaced out the text is in the original image. 15 is the default value and is a good number for most syntactic texts within images.
def distinguish_rows(lst, thresh=15):
"""Function to help distinguish unique rows"""
sublists = []
for i in range(0, len(lst)-1):
if lst[i+1]['distance_y'] - lst[i]['distance_y'] <= thresh:
if lst[i] not in sublists:
sublists.append(lst[i])
sublists.append(lst[i+1])
else:
yield sublists
sublists = [lst[i+1]]
yield sublists
Final results:
def main(image_path, thresh, order='yes'):
"""
Function returns predictions in human readable order
from left to right & top to bottom
"""
predictions = detect_w_keras(image_path)
predictions = get_distance(predictions)
predictions = list(distinguish_rows(predictions, thresh)) # Remove all empty rows
predictions = list(filter(lambda x:x!=[], predictions)) # Order text detections in human readable format
ordered_preds = []
ylst = ['yes', 'y']
for pr in predictions:
if order in ylst:
row = sorted(pr, key=lambda x:x['distance_from_origin'])
for each in row:
ordered_preds.append(each['text']) return ordered_preds
Source code and Jupyter Notebook can be accessed at: https://github.com/shegocodes/keras-ocr. Let me know if you run into any issues with my code. Feel free to contact me here.
Thank you so much for making it to the end of this page. If you found me helpful, please feel free to support me and Shegocodes by giving me a follow here on Medium and/or buying me a cup of coffee so I can continue to contribute to open source work and build. Happy Coding!