[spambayes-dev] 1070 spam, 1 false positive
Greg Ward
greg at python.org
Fri Jun 20 09:03:17 EDT 2003
On 19 June 2003, Tim Peters said:
> spambayes considers all email from AOL to be spam, you know <wink>.
Try <0.01 wink> -- wait till you see the incriminating clues!
> It would be interesting to see the whole "clue" list -- I'm guessing there
> must have been more damaging stuff in the HTML part (spambayes looks at all
> text/* parts).
Yup, forgot to post that last night. First here's my config filen -- or
at least the [Tokenizer] section:
basic_header_tokenize: True
basic_header_skip: received envelope-to delivered-to delivery-date x-spam-flag x-spam-status content-type list-*
record_header_absence: True
address_headers: from to cc sender reply-to
mine_received_headers: True
And here's the complete token list:
Y 0.996 save/ham/cur/1056075169.22878_58.mail:2,S
'*H*': 0.007
'*S*': 1.000
'date:EDT': 0.016
'message-id:skip:b 20': 0.172
'charset:us-ascii': 0.270
'content-type:text/plain': 0.273
'reply-to:none': 0.380
'date:Wed': 0.384
'header:Received:2': 0.391
'date:2003': 0.630
'header:MIME-Version:1': 0.647
'date:Jun': 0.655
'to:addr:python.org': 0.686
'to:python.org': 0.686
'x-mailer:Windows': 0.689
'please': 0.718
'unsubscribe': 0.781
'email addr:aol.com': 0.845
'to:addr:tutor': 0.845
'content-type:multipart/alternative': 0.916
'x-mailer:for': 0.924
'received:aol.com': 0.965
'received:mx.aol.com': 0.965
'return-path:aol.com': 0.973
'content-type:text/html': 0.978
'x-mailer:sub': 0.978
'from:addr:aol.com': 0.983
'from:aol.com': 0.983
...so your flippant remark about AOL was not all that far off!
Oh yeah, here's what I get using the "list-misc" training DB -- ie. the
DB that would have been used if this message had been sent (correctly)
to tutor-request at python.org:
N 0.024 save/ham/cur/1056075169.22878_58.mail:2,S
'*H*': 0.957
'*S*': 0.004
'date:EDT': 0.029
'message-id:@aol.com': 0.081
'message-id:aol.com': 0.081
'message-id:skip:b 20': 0.081
'x-mailer:for': 0.141
'to:Tutor': 0.183
'x-mailer:8.0': 0.183
'charset:us-ascii': 0.222
'content-type:text/plain': 0.228
'received:mx.aol.com': 0.230
'email addr:aol.com': 0.268
'subject:tutor': 0.268
'please': 0.297
'x-mailer:Windows': 0.324
'received:com': 0.335
'header:Received:2': 0.349
'unsubscribe': 0.373
'from:no real name:2**0': 0.622
'date:2003': 0.663
'date:Jun': 0.681
'content-type:multipart/alternative': 0.743
'content-type:text/html': 0.931
All very interesting, no doubt.
Greg
--
Greg Ward <gward at python.net> http://www.gerg.ca/
Time flies like an arrow; fruit flies like a banana.
More information about the spambayes-dev
mailing list