[Mailman-Developers] Re: Opening up a few can o' worms here...

Jerry Stratton jerry@sandiego.edu
Tue, 30 Jul 2002 14:05:46 -0700


>  > Have you seen what the off the shelf OCR systems like OmniPage do these
>  > days?
>
>Yes -- the performance is awful.  And that's on ordinary printed text
>that's supposed to be readable, not on text that has been intentionally
>obfuscated.

My experience is otherwise; I use OmniPage 7.0--an old version on our
Macs here at the school--to OCR out-of-copyright texts for placement 
on-line. All of these books are old, and many are dirty, ripped, 
and/or faded. Some are in strange fonts, tiny fontsizes, and multiple 
styles. Sometimes I can even see the text on the other side of the 
paper. OmniPage not only gets the correct text (sometimes text that I 
wasn't even sure about until I saw OmniPage's "guess"), but it also 
keeps the italicization, bolding, subscripts, and superscripts. It 
recognizes columns, and even recognizes and automatically reorients 
when I accidentally put the book in upside down. And this is from 
1997 or earlier technology!

Scanning has come a *long* way from the old Kurzweil washing-machine 
that we used to scan Freud's text back in the eighties.

Jerry
-- 
jerry@sandiego.edu
http://www.sandiego.edu/~jerry/
Serra 188B/x8773
--
The more restrictions there are, the poorer the people become. The 
greater the government's power, the more chaotic the nation would 
become. The more the ruler imposes laws and prohibitions on his 
people, the more frequently evil deeds would occur.
--The Silence of the Wise: The Sayings of Lao Zi