Poor man's OCR: need performance improvement tips
tvrtko.sokolovski at gmail.com
tvrtko.sokolovski at gmail.com
Sun Sep 25 04:10:25 EDT 2005
Imagine a large matrix with dimensions [W,H], and a lots of smaller
matrices with dimensions [p,q1], [p,q1], [p,q2], [p,q1], ... I have to
slide a small window [p,q] horizontally over a larger matrix. After
each slide I have to compare smaller matrices with the data from larger
matrix (as defined by sliding window).
I'm currently trying to use other kinds of optimizations (linearize
data by columns), but the program no longer works, and it is so hard to
debug. But it works very fast :)
Here is an example of linearization by columns that i'm currently using
:
# setup: convert to 1 bit image
img = Image.open(file_name)
img2 = img.point([0]*255 + [1], "1")
# find ocr lines, and for each do ...
# extract OCR line
region = img2.crop((0, ocrline.base_y-13, width, ocrline.base_y+3)) #
h=16
region.load()
# clean up upper two lines which should be empty but
# sometimes contain pixels from other ocr line directly above
draw = ImageDraw.Draw(region)
draw.line((0,0,width,0), fill=1)
draw.line((0,1,width,1), fill=1)
# transpose data so I get columns as rows
region = region.transpose(Image.FLIP_LEFT_RIGHT)
region = region.transpose(Image.ROTATE_90)
ocrline.data = region.tostring() # packs 8 pixels into 1 octet
I do the same for known letters/codes (alphabet). Then I compare like
this:
def recognize (self, ocrline, x):
for code_len in self.code_lengths: # descending order
code = ocrline.data[x:x+code_len]
ltr = self.letter_codes.get(code, None)
if ltr is not None:
return ltr, code_len # So I can advance x
This currently "works" two orders of magnitude faster.
More information about the Python-list
mailing list