[PYTHON IMAGE-SIG] OCR information

David Ascher da@maigret.cog.brown.edu
Fri, 21 Mar 1997 17:54:05 -0500 (EST)

I don't want to discourage such a worthy endeavor, but I think writing a
competent OCR package from scratch is hardly worth the effort.  If you can
steal an established algorithm without too much work (e.g. from NIST),
then by all means do it.  But doing OCR is hard, and I suspect it would be
more cost-effective for you to work for a few hours and buy an OCR package
with the proceeds than to write something which comes even close in
performance in less than a decade or so.  As far as helping the Gutenberg
project, I suspect that's a more efficient use of your skills...  

Now, if what you really want is to *learn* about OCR, by all means, go


PS: I believe that most OCR packages use templates of various kinds which
    are derived from very very large statistical analyses of huge text 
    corpora.  Just that step takes a huge amount of computational

