[Spambayes] Analyzing text in image spam (was: Spam in Images)
skip at pobox.com
skip at pobox.com
Sun Nov 5 20:24:05 CET 2006
Luigi> Once both are enabled it seems to work but the mail processing is
Luigi> very very slow.
>> First time through, yes. After that, it should (in theory) rely on
>> its cache of IP address information. I may have some pending
>> checkins for that though (*). Note also that a fairly small training
>> database works for me (fewer than 100 hams, 250-300 spams). If you
>> have a massive training database, then, yes, this will slow things
>> down dramatically. The IP lookup and image OCR stuff changes the
>> properties of your database enough that I think it's worth retraining
>> from scratch.
Luigi> I have tried on a sample of 5000 emails but I stopped it because
Luigi> after more than half an hour it didn't finish. From tcpdump I
Luigi> could see a request every 1,2 seconds (or something like that)
Luigi> now even considering that not every mail contains an url it was
Luigi> very slow. As a note I tried it on windows XP with ocr scanning
Luigi> enabled but ocr alone was much faster.
I can't imagine a scenario where I would need 5000 emails to get decent
results with SpamBayes. If that was the common case, everyone would give up
on it long before it was of any use. I still suggest you try starting from
scratch.
Skip
More information about the SpamBayes
mailing list