[Spambayes] Analyzing text in image spam

Wed Aug 23 02:33:04 CEST 2006

I am not getting any funny tokens with underscores when I run spamcounts.

I tried converting my image with giftopnm, which gives a warning, but I can 
view the resultant image as a portable bitmap image OK. I then ran ocrad on 
it, which said "bad magic number - not a pbm file"
I looked in ImageStripper.py to see what options to use with ocrad, and saw 
that ocrad is being called with the -s option. My ocrad (version 0.9) says -s 
is an invalid option.

When I set the globals/verbose option in .spambayesrc, spamcounts reported:
saving 23 items to /home/peterb/.image_cache.pickle 100.00% hit rate. 

I have attached the gif image I used for testing.

Thanks,
Peter Barker

>     Peter> I have been running the code from CVS for a couple of days, and I
>     Peter> am not sure if analyzing text in images is making a
>     Peter> difference. Can I tell from the Evidence header (or by other
>     Peter> means) if the image analyzer is actually being used, and what
>     Peter> evidence it is finding?
>
> Sure, you'll probably see lots of tokens with runs of underscore
> characters, such as (from spamcounts output):
>
>     token,nspam,nham,spam prob
>     yn__,1,0,0.844827586207
>     _ol__,2,0,0.908163265306
>     __leht,1,0,0.844827586207
>     _omo____,1,0,0.844827586207
>     rpo_la_o__,1,0,0.844827586207
>     _lo__,4,0,0.949438202247
>     __a_,1,0,0.844827586207
>
> Those correspond to characters it could tell were there, but didn't
> recognize.
>
> Did you start training from scratch?
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image001.gif
Type: image/gif
Size: 19184 bytes
Desc: not available
Url : http://mail.python.org/pipermail/spambayes/attachments/20060823/22a9cf4a/attachment-0001.gif