[PYTHON IMAGE-SIG] OCR information

Andrew Kuchling amk@magnet.com
Fri, 21 Mar 1997 18:44:04 -0500 (EST)

```David Ascher wrote:
> I don't want to discourage such a worthy endeavor, but I think writing a
> competent OCR package from scratch is hardly worth the effort.  If you can
> steal an established algorithm without too much work (e.g. from NIST),
> then by all means do it.

Well, this is also for my own amusement and instruction, and
I'll try to get a few tutorial articles out of it.  I found a copy of
the NIST OCR system at ftp://ftp.cygnus.com/pub/, which seems to aim
at handwriting (and not typeset character) recognition, but it's
fearsome stuff, with code to do dictionary searches, neural
networks...eek.  Without more understanding of the algorithms
involved, using that code is quite unlikely.

There's another package, xocr, that does something much
simpler.  According to INFO.ENGLISH, there are various heuristics to
guess where the next letter is, and then, to quote:

WSA-algorithm: (Degree-cut-analysis)
Every character is zoomed to a fixed size. (here: 16x16 pixel)
Parallel lines are layerd over the picture. (here: 16 lines)
Now, all Pixels which are set in the picture and placed on a
line are counted.  (here: 0..24 Points) After that the lines
will be turned by a fixed degree-value and again calculated
like above.  All lines will be turned step by step until 180
degrees are reached.  We have 128 values calculated.  These
values are coresponding with a 128 dimensional space.

Now , all trained characters are points in this space. The
lowest distance between the character we want to know and all
of the trained characters will be calculated.  If this
distance is very small the character will be accepted as
well-recognized, otherwise the user is consulted if it was
right detected !

This looks fairly simple, and not out of the reach of PIL and the
Numeric extension, but how well does it work in practice?  Again, I
have no way to tell...  So, any suggestions for good pattern
recognition books?

Andrew Kuchling
amk@magnet.com
http://people.magnet.com/%7Eamk/
Save the Gutenberg Project! http://www.promo.net/pg/nl/pgny_nov96.html

_______________
IMAGE-SIG - SIG on Image Processing with Python

send messages to: image-sig@python.org
administrivia to: image-sig-request@python.org
_______________

```