[spambayes-dev] spoof detector

David Abrahams dave at boost-consulting.com
Sat Jul 7 04:51:35 CEST 2007


on Fri Jul 06 2007, David Abrahams <dave-UB3wUj7V41K5azolltMz9laTQe2KTcn/-AT-public.gmane.org> wrote:

> on Fri Jul 06 2007, skip-AT-pobox.com wrote:
>
>> Try these two settings
>>
>>     x-pick_apart_urls:True
>>     x-lookup_ip:True
>>
>> and see if they help.

Oh, and these go in the [Tokenizer] section, right?

> Well, they sure make training slow to a crawl!
> Is there any effective way of cacheing those DNS lookups?

I did eventually find the lookup_ip_cache option, but frankly the
results are disappointing.  I would have expected one slow round in my
train-to-exhaustion regime and then all following rounds to go very
quickly, but that doesn't appear to be the case.  The first round took
18.5 minutes and it doesn't look like the 2nd round is going to be
much faster.  Oh, and right now the dnscache file is 414 bytes long
and is full of stuff that mostly doesn't look like it has any
relevance to dns lookup.  I realize I shouldn't expect to be able to
read a pickle by eye, but there is one string in there that looks like
a domain name so I expect to see the others.

Aha!  spambayes is relying on atexit to close the cache and write it
out to disk, and tte obviously goes many rounds without doing that.

Problem is, my ssh connection to the server always drops before
training completes, and I'm not sure why (my ssh connections seem to
time out).

-- 
Dave Abrahams
Boost Consulting
http://www.boost-consulting.com

The Astoria Seminar ==> http://www.astoriaseminar.com



More information about the spambayes-dev mailing list