[spambayes-dev] default to mine_received_headers=True, "may be forged"

Skip Montanaro skip at pobox.com
Mon Dec 22 21:17:07 EST 2003


    Richie> Your script didn't define 'pat' - I've assumed you meant:

    Richie> pat = re.compile(r'\(\w+(?:\s+\w+)+\)')

Whoops.  I was cutting-n-pasting from an interpreter session.  'pat' was
actually

    pat = re.compile(r'\([a-z]+(?:\s+[a-z]+)+\)', re.I)

but yours is close enough.  Thanks for the input/output.

    Richie> Here's what I get from my corpus of 20,000 verified spams:

    ...
    Richie>  (3, '(untrusted sender)'),
    ...
    Richie>  (149, '(may be forged)')]

    Richie> And these from the 12,000 or so message in the spambayes and
    Richie> spambayes-dev archives - not 100% spam-free, but very very
    Richie> nearly:

    ...
    Richie>  (51, '(may be forged)'),
    ...
    Richie>  (158, '(untrusted sender)'),
    ...

    Richie> "(untrusted sender)".... Ah - all are either attbi.com or
    Richie> comcast.net.  Here's an example of an attbi.com one:

Yup, this tag is almost certainly added by Comcast's MTA (they bought AT&T's
cable internet business not that long ago).

It's interesting that you seem to have a lot of HELO's with the same value.
Frequent correspondents perhaps?  I don't see that many HELO's (some from
localhost).  Are they generated close to your machine (in a late Received:
header)?

Skip




More information about the spambayes-dev mailing list