[Spambayes] Analyzing text in image spam (was: Spam in Images)

skip at pobox.com skip at pobox.com
Fri Nov 3 16:56:07 CET 2006

    >> Once you're ready to go, add the following to your SpamBayes options:
    >> x-lookup_ip: True
    >> lookup_ip_cache: ~/.dnscache

    Luigi> Is someone using this option?  To me seems that this option alone
    Luigi> do nothing. You have to enable both x-lookup_ip and
    Luigi> x-pick_apart_urls.  Is it right or am I missing something?

Perhaps.  I can't recall.  Do you have PyDNS installed?

    Luigi> Once both are enabled it seems to work but the mail processing is
    Luigi> very very slow.

First time through, yes.  After that, it should (in theory) rely on its
cache of IP address information.  I may have some pending checkins for that
though (*).  Note also that a fairly small training database works for me (fewer
than 100 hams, 250-300 spams).  If you have a massive training database,
then, yes, this will slow things down dramatically.  The IP lookup and image
OCR stuff changes the properties of your database enough that I think it's
worth retraining from scratch.


(*) Alas, I didn't get around to checking stuff in last night.  Maybe over
the weekend.


