[Spambayes] better Received header tokens
Neil Schemenauer
nas at python.ca
Sun Mar 9 12:08:09 EST 2003
I wasted some time today trying to improve the mine_received_headers
option. The goal was to generate fewer more useful tokens. Also,
I wanted to be resistent to received header forgery. For the sake of
posterity, here's what I came up with:
ippat = '\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}'
received_re = re.compile(r"from .*\b(%s)[)\]].*\b"
r"by (\S+)\s+([^;]*)" % ippat, re.M|re.S)
hops = 0
network = None
for hdr in msg.get_all("received", []):
m = received_re.search(hdr)
if m:
ip = m.group(1)
n = '.'.join(ip.split('.')[:2])
if n != network:
hops += 1
network = n
yield 'received:%d:%s' (hops, network)
yield 'received:%d' % hops
I expected this to do better than the current code. Testing shows
otherwise. Perhaps using a more specific or more general network
(instead of /16) would help.
Neil
More information about the Spambayes
mailing list