[Spambayes] Image spam

Amedee Van Gasse amedee at amedee.be
Mon Jun 12 01:18:53 CEST 2006


On Mon, June 12, 2006 1:07, skip at pobox.com said:
>
>     >> I found an interesting program that might be exactly what you are
>     >> looking for: ocrad. This is GNU software, can accept pbm files or
>     >> standard input, and outputs text to standard output. So this is a
>     >> commandline ocr program that can be used in a script. Don't worry
>     >> about the pbm files, the ocred manual describes how to convert
> other
>     >> image formats to pbm (jpeg, png, ps, pdf,...)
>
> I gave this a whirl.  Not so good for the samples I pulled out of my
> current
> spam database.  The first image yielded:

*snip*

> Doesn't look so useful to me.  According to the ocrad README file:
>
>     Caveats.
>     For better results the characters should be at least 20 pixels high.
>     Merged characters are always a problem. Try to avoid them.
>     Very bold or very light (broken) characters are also a problem.
>     Always see with your own eyes the pnm file before blaming ocrad for
> the
>     results. Remember the saying, "garbage in, garbage out".
>
> Maybe a more mature OCR program would help, but ocrad seems to have a ways
> to go.

Skip,

Thanks for testing.
Well, it's better than nothing at all, and at least one of your tests
revealed a well known blue pill :)
If anyone knows about a better OCR program that works without a gui (and
works on Linux) we could try some more.

-- 
Amedee



More information about the SpamBayes mailing list