[spambayes-dev] default to mine_received_headers=True,
"may be forged"
Skip Montanaro
skip at pobox.com
Mon Dec 22 12:10:13 EST 2003
>> While I was messing with the received header regular expressions
>> today I also noticed that Sendmail sometimes adds "may be forged" to
>> a header....
>> I'm inclined to trust sendmail on this one and just add it. It seems
>> like a very objective feature.
Tim> I agree -- it's extremely unlikely to lose. The ones to worry
Tim> about are things spammers could inject to push things in the ham
Tim> direction, but they're not gonna get far forging "may be forged"
Tim> unless I have a *very* weird idea of ham <wink>.
I just checked in tokenizer.py with this change. Note that it's guarded by
options["Tokenizer", "mine_received_headers"].
Skip
Tim> I noticed this in the headers of a spam today:
Tim> Received: from shawmail-cg-shawcable-net
Tim> (c-24-9-163-244.client.comcast.net[24.9.163.244](untrusted sender))
Tim> by rwcrmxc11.comcast.net (rwcrmxc11) with SMTP
Tim> id <20031220054919r1100n4pj1e>; Sat, 20 Dec 2003 05:49:20 +0000
Tim> It's the "(untrusted sender)" part that's interesting. I'd suggest
Tim> *not* folding that in with "may be forged", though. There probably
Tim> aren't a lot of strings of this nature, so the database burden
Tim> should be trivial, and I *bet* different strings will prove to have
Tim> different spamprobs.
You're probably right. In this case it may just be that an ident lookup
failed (many servers don't run identd), so the assertion that the message is
spam would be much weaker.
Poking around Google a bit suggests "(untrusted sender)" is something
specific to Comcast. I'm happy to add it if you would like, but in the mail
I've saved it actually seems to turn up a bit more in ham (six messages)
than in spam (one message) and not at all in my current training database.
All such lines also match "client2?\.attbi\.com".
Skip
More information about the spambayes-dev
mailing list