[Spambayes] better Received header tokens

Neil Schemenauer nas at python.ca
Sun Mar 9 12:08:09 EST 2003


I wasted some time today trying to improve the mine_received_headers
option.  The goal was to generate fewer more useful tokens.  Also,
I wanted to be resistent to received header forgery.  For the sake of
posterity, here's what I came up with:

    ippat = '\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}'
    received_re = re.compile(r"from .*\b(%s)[)\]].*\b"
                             r"by (\S+)\s+([^;]*)" % ippat, re.M|re.S)
    hops = 0
    network = None
    for hdr in msg.get_all("received", []):
        m = received_re.search(hdr)
        if m:
            ip = m.group(1)
            n = '.'.join(ip.split('.')[:2])
            if n != network:
                hops += 1 
                network = n
                yield 'received:%d:%s' (hops, network)
    yield 'received:%d' % hops

I expected this to do better than the current code.  Testing shows
otherwise.  Perhaps using a more specific or more general network
(instead of /16) would help.

  Neil



More information about the Spambayes mailing list