[spambayes-dev] 1070 spam, 1 false positive

Greg Ward greg at python.org
Fri Jun 20 09:03:17 EDT 2003


On 19 June 2003, Tim Peters said:
> spambayes considers all email from AOL to be spam, you know <wink>.

Try <0.01 wink> -- wait till you see the incriminating clues!

> It would be interesting to see the whole "clue" list -- I'm guessing there
> must have been more damaging stuff in the HTML part (spambayes looks at all
> text/* parts).

Yup, forgot to post that last night.  First here's my config filen -- or
at least the [Tokenizer] section:

basic_header_tokenize: True
basic_header_skip: received envelope-to delivered-to delivery-date x-spam-flag x-spam-status content-type list-*
record_header_absence: True
address_headers: from to cc sender reply-to
mine_received_headers: True

And here's the complete token list:

Y 0.996 save/ham/cur/1056075169.22878_58.mail:2,S
        '*H*': 0.007
        '*S*': 1.000
        'date:EDT': 0.016
        'message-id:skip:b 20': 0.172
        'charset:us-ascii': 0.270
        'content-type:text/plain': 0.273
        'reply-to:none': 0.380
        'date:Wed': 0.384
        'header:Received:2': 0.391
        'date:2003': 0.630
        'header:MIME-Version:1': 0.647
        'date:Jun': 0.655
        'to:addr:python.org': 0.686
        'to:python.org': 0.686
        'x-mailer:Windows': 0.689
        'please': 0.718
        'unsubscribe': 0.781
        'email addr:aol.com': 0.845
        'to:addr:tutor': 0.845
        'content-type:multipart/alternative': 0.916
        'x-mailer:for': 0.924
        'received:aol.com': 0.965
        'received:mx.aol.com': 0.965
        'return-path:aol.com': 0.973
        'content-type:text/html': 0.978
        'x-mailer:sub': 0.978
        'from:addr:aol.com': 0.983
        'from:aol.com': 0.983

...so your flippant remark about AOL was not all that far off!

Oh yeah, here's what I get using the "list-misc" training DB -- ie. the
DB that would have been used if this message had been sent (correctly)
to tutor-request at python.org:

N 0.024 save/ham/cur/1056075169.22878_58.mail:2,S
        '*H*': 0.957
        '*S*': 0.004
        'date:EDT': 0.029
        'message-id:@aol.com': 0.081
        'message-id:aol.com': 0.081
        'message-id:skip:b 20': 0.081
        'x-mailer:for': 0.141
        'to:Tutor': 0.183
        'x-mailer:8.0': 0.183
        'charset:us-ascii': 0.222
        'content-type:text/plain': 0.228
        'received:mx.aol.com': 0.230
        'email addr:aol.com': 0.268
        'subject:tutor': 0.268
        'please': 0.297
        'x-mailer:Windows': 0.324
        'received:com': 0.335
        'header:Received:2': 0.349
        'unsubscribe': 0.373
        'from:no real name:2**0': 0.622
        'date:2003': 0.663
        'date:Jun': 0.681
        'content-type:multipart/alternative': 0.743
        'content-type:text/html': 0.931

All very interesting, no doubt.

        Greg
-- 
Greg Ward <gward at python.net>                         http://www.gerg.ca/
Time flies like an arrow; fruit flies like a banana.



More information about the spambayes-dev mailing list