[Spambayes] Another software in the field

Justin Mason jm@jmason.org
Tue Nov 19 14:14:11 2002


(a bit late in replying! I suffered from inbox overload ;)

T. Alexander Popiel said:
> If the received parser were a little smarter about parsing iPlanet
> received lines, it would have "pcp736393pcs.reston01.va.comcast.net"
> instead of "cj569191b" as the first element in the sequence, and
> the match list would have been 2 -> 1 -> 2 -> 0 -> 0, yielding:
> 
>   message-id-generation:skipped 0
> 
> I suspect that high skipped numbers would be a strong spam indicator,
> howing where message ids were omitted in the sent mail and/or received
> headers naively forged to prevent backtracking.

It would be interesting to test this; we do something similar in
SpamAssassin to find possibly-forged hostnames in the Received
headers, and we do try to figure out where in the Received chain
the Message-id was added.

Two problems we've seen:

  - some totally-legit senders, especially auto-generated mails, have a
    bad habit of leaving out the Message-Id until it gets to *your* MX.
    Annoying, but allowed by the RFCs.  This test would have to figure
    this out in some way; maybe by adding the sender's hostname or domain
    to the token, so the legit folks gain ham hits, but spammers remain
    as 1-spam 0-ham hapaxes?

  - some senders use e.g. hostname "mylittlecompany.com" on their desktop
    machine or home LAN, then connect via a commodity-DSL connection,
    resulting in a reverse-lookup of "dsl43-234.bigisp.net".  In other
    words, the rDNS does not match what the sender wishes it did ;)
    Not a problem in this case, but worth noting when talking about
    Received-header parsing.

--j.



More information about the Spambayes mailing list