[spambayes-dev] Tesseract OCR
Mark Hammond
mhammond at skippinet.com.au
Tue Feb 20 00:16:44 CET 2007
> I just discovered the existence of Tesseract OCR, whose
> homepage[1] says:
>
> A commercial quality OCR engine originally developed at HP between
> 1985 and 1995. In 1995, this engine was among the top 3 evaluated by
> UNLV. It was open-sourced by HP and UNLV in 2005.
>
> I thought some of you (Skip, Mark) might be interested if you hadn't
> heard about this software yet.
You could help us out here too, by running some of your image spam against
the various engines and manually inspecting the accuracy of the text versus
what you actually see in the image. My quick experiments show that
tesseract is very close to the results I get from gocr, and significantly
better than ocrad.
Mark
More information about the spambayes-dev
mailing list