[spambayes-dev] Tesseract OCR

Mark Hammond mhammond at skippinet.com.au
Tue Feb 20 00:16:44 CET 2007


> I just discovered the existence of Tesseract OCR, whose
> homepage[1] says:
>
>   A commercial quality OCR engine originally developed at HP between
>   1985 and 1995. In 1995, this engine was among the top 3 evaluated by
>   UNLV. It was open-sourced by HP and UNLV in 2005.
>
> I thought some of you (Skip, Mark) might be interested if you hadn't
> heard about this software yet.

You could help us out here too, by running some of your image spam against
the various engines and manually inspecting the accuracy of the text versus
what you actually see in the image.  My quick experiments show that
tesseract is very close to the results I get from gocr, and significantly
better than ocrad.

Mark



More information about the spambayes-dev mailing list