[Spambayes] Beyond Spambayes

Seth Goodman sethg at GoodmanAssociates.com
Thu Feb 23 00:27:27 CET 2006

On Wednesday, February 22, 2006 12:21 PM -0600,
spambayes-bounces at python.org wrote:

>    From: "Seth Goodman" <sethg at GoodmanAssociates.com>


> A problem is that with the rise of botnet armies, we're the majority
> of spam actually coming from bots, not "bulletproof" servers or open
> relays.  That is, a majority of spam is identical spam (indicating it
> was sent at the behest of one individual), but was sent from a large
> number of different sources via different paths.  In short, a
> "perfect" RBL (one that had 100% perfect input and propagated it at
> superluminal velocity) would still only get about 40% of the spammers.

You're right that spammers now favor trojaned Windows machines for
message delivery.  Fortunately, the great majority of those are on
dynamic IP's, while virtually no legitimate mailers are.  You can use a
dynamic IP RBL and/or a PERL regexp to weed those out.  You can get well
over 80% reduction with a combination of a couple of RBL's and some
heuristics.  That's before running a single instance of SpamAssassin.
For the oddball dynamic IP from which you need to receive messages, add
them to a whitelist.  Static IP's that are repeat offenders tend to
remain listed longer.  If they are businesses, they usually do something
about it quickly and avoid recurrences.  If it is an elementary school
in Korea with no sysadmin, they just may wind up blacklisted for a long

Some people hate DNSBL's because they or someone they know has at one
time or another been falsely listed (i.e. one of their own users
mistakenly reports them).  Or perhaps they were listed for cause and
removed the spammer, but then had trouble getting delisted fast enough
to suit them or had to pay a fine.  Despite what some detractors would
have you believe, a well-run MTA rarely winds up on a DNSBL.  If you
reject rather than discard, the sender knows immediately since it's a
5xx permanent error that should occur before any greylisting delay.

> > Similarly, there are a number of heuristics that can catch this
> > type of spammer early:  put in a delay after the connection request
> > before you send the banner.  Anyone who doesn't wait for the end of
> > banner can be safely disconnected and blacklisted for the future.
> > If you want to perform a public service, tarpit them instead of
> > merely rejecting and blacklisting.
> I was under the impression that a pipelining MTA doesn't care what
> happens after the port opens successfully.  In that case, tarpitting
> matter; they're not waiting for the ACK packets.

You're right, you can't tarpit an MTA that abuses pipelining the first
time around.  Once you detect that behavior, and many of them will fall
for the delayed banner bait right at the beginning, there's no need to
examine anything else sitting in the input buffer for that socket.
Clear the buffer, add the IP to your local blacklist and either block or
tarpit at their next connection attempt.  Many of these hosts will make
more than one connection attempt, and then you've got them.

> It's all one big mess, if you ask me.  :(

If it were easy, there'd be no spam, but you can keep the great majority
of it out.

> Adding an answerback at the end of DATA (like three-phase commit)
> would have been a nice thing, but it's a little late for that.

You can accept or reject at the end of DATA and you are theoretically
supposed to wait for the SMTP client to close the connection, so for
sane MTA's, this amounts to a three-way handshake of sorts.  Spammers
may not wait around for your response, but a compliant MTA will return a
non-delivery notice to the sender for rejections at the end of DATA and
will hopefully return the error information you gave it.  By rejecting
at the end of DATA, you've completed your responsibilities under SMTP.
If the sending MTA doesn't report the non-delivery to their user, that
MTA is broken, not yours.

Seth Goodman

More information about the SpamBayes mailing list