[spambayes-dev] default to mine_received_headers=True,
"may be forged"
Skip Montanaro
skip at pobox.com
Mon Dec 22 21:17:07 EST 2003
Richie> Your script didn't define 'pat' - I've assumed you meant:
Richie> pat = re.compile(r'\(\w+(?:\s+\w+)+\)')
Whoops. I was cutting-n-pasting from an interpreter session. 'pat' was
actually
pat = re.compile(r'\([a-z]+(?:\s+[a-z]+)+\)', re.I)
but yours is close enough. Thanks for the input/output.
Richie> Here's what I get from my corpus of 20,000 verified spams:
...
Richie> (3, '(untrusted sender)'),
...
Richie> (149, '(may be forged)')]
Richie> And these from the 12,000 or so message in the spambayes and
Richie> spambayes-dev archives - not 100% spam-free, but very very
Richie> nearly:
...
Richie> (51, '(may be forged)'),
...
Richie> (158, '(untrusted sender)'),
...
Richie> "(untrusted sender)".... Ah - all are either attbi.com or
Richie> comcast.net. Here's an example of an attbi.com one:
Yup, this tag is almost certainly added by Comcast's MTA (they bought AT&T's
cable internet business not that long ago).
It's interesting that you seem to have a lot of HELO's with the same value.
Frequent correspondents perhaps? I don't see that many HELO's (some from
localhost). Are they generated close to your machine (in a late Received:
header)?
Skip
More information about the spambayes-dev
mailing list