[spambayes-dev] spoof detector

skip at pobox.com skip at pobox.com
Fri Jul 6 17:46:50 CEST 2007


    David> Something that comes up over and over in spam is a link of the
    David> form:

    David>     <a href="http://url/of/spammers/site">
    David>        http://url/of/some/legit/site
    David>     </a>

    David> Does SpamBayes have a token that represents that information and
    David> an option I can set that will use it?

The SpamBayes tokenizer essentially splits the message at word boundaries,
so the two urls are considered separately.  Their physical and structural
proximity is not noted.  Synthetic tokens based on hostname or IP address in
the urls will be generated if you add x-pick_apart_urls:True to the
Tokenizer section of your config file.  For completeness here is my current
set of tokenizer settings (haven't changed them in a long while):

    [Tokenizer]
    record_header_absence:True
    summarize_email_prefixes:True
    summarize_email_suffixes:True
    mine_received_headers:True
    x-pick_apart_urls:True
    x-fancy_url_recognition:False
    x-lookup_ip:True
    lookup_ip_cache:~/tmp/dnscache.pck
    x-image_size:True
    x-crack_images:True
    x-ocr_engine:gocr
    max_image_size:100000
    crack_image_cache:~/tmp/imagecache.pck

Skip



More information about the spambayes-dev mailing list