[Spambayes] Windows compatibility - OCR [was: Unwanted
grevsen at gmail.com
Sat Nov 4 14:19:56 CET 2006
>> OCR code's now been tweaked and tested to work in both WinXP and
>> This should work in unix as well.
>> Here is a summary:
>> 1. Put ocrad 0.16 in the path
> As a note, for Windows you need a copy of ocrad with skip patch that
> opens pnm files in binary mode otherwise ocrad will fail on a lot of
Actually you're probably refering to my "patch"? (Ocrad/CygWin1.dll)
If you have MinGW experience - which I don't - I think you can compile
an exe-only which don't need the dll. But then I don't know if it is actually
working because of the POSIX emulation or they did change the source.
(I did not...)
You're right Skip pointed it out in the ocrad forum, but the developer was
reluctant to change this then so I don't actually know why 0.16 is working...
Just know it is, which is fine for me.
> Have you tried other ocr programs?
No, not yet.
Tony Meyer suggested Tesseract:
but there seemed to be build issues... I haven't tried..
I mailed with NoSpam Today! Support (spamassasin based) before I chose SB.
They were doing research on FuzzyOcr and ImageInfo. Maybe we could ask
again about their results. I believe FuzzyOcr is gocr-based?
> I tried gocr and I think that its result are somewhat better but version
> 0.41 + pgm patch almost hangs
Ok, probably needs some tweaking then.
Since the ocr is working with ocrad and - as you see below - I get very
good results I will be moving on to the next area now.
I think it is far more beneficial to do more research into the actual processing
as you commented elsewhere than to start the whole testing/tweaking all over
again with a new ocr engine. Of course that is just my opinion...
>> 5. Finally I sugest you change the default scale from 1 to 2 like in
>> this line
>> scale = options["Tokenizer", "ocrad_scale"] or 2
> changing this surely doesn't hurt but ocrad_scale it's already set to 2
> in Options.py
Ok, I missed that. Don't know which one has prevalence.
ImageStripper.py, Options.py or bayescustomize.ini.
With 2 you should get this quality image tokens:
That is about a 90% recognition or so.
> probably should be removed (or set to 2 as you suggest)
Then I suggest removal as you say. Better avoid redundancy ( clutter :) )
Happy coding :)
More information about the SpamBayes