[Spambayes] slow training and x-lookup-ip

David Abrahams dave at boostpro.com
Fri Feb 12 21:23:21 CET 2010


Hi,

I'm using the train-to-exhaustion script, and it seems to be taking a
seriously long time to process some of the messages.  

It turns out that using -o Tokenizer:x-lookup_ip:True causes a serious
hit to training speed.  I do have Tokenizer:lookup_ip_cache set.  Not
only that, but it seems to go slowly even on the second and subsequent
training passes, by which time I'd think the cache would be full.  So
I'm wondering if the cache is really working, or if it's size-limited
so that it's blown by my large training set, or if there's some other
issue.

I notice that the cache—as integrated into SpamBayes—doesn't support
selecting a timeout other than “10” nor does it support choosing the
DNS server, even though the cache class itself allows that to be
tuned.  I don't have any good reason to think either of these are the
problem.

Any insight you can offer would be very much appreciated.

Thanks,

-- 
Dave Abrahams           Meet me at BoostCon: http://www.boostcon.com
BoostPro Computing
http://www.boostpro.com





More information about the SpamBayes mailing list