[Spambayes] Beyond Spambayes

Seth Goodman sethg at GoodmanAssociates.com
Sun Feb 26 05:13:13 CET 2006

On Saturday, February 25, 2006 12:04 AM -0600, Allen wrote:

> Seth Goodman wrote:
> > On Thursday, February 23, 2006 3:01 PM -0600,
> > netsecurity at sound-by-design.com wrote:
> >
> > > From this point of view it is my opinion that we'd all be better
> > > off if the mail did not get delivered in the first place. This is
> > > why I suggest that adding an additional layer of robustness might
> > > prove useful.
> >
> > I agree with you, but with a caveat.


> > The answer is to reject, i.e. not accept in the first place, spam
> > messages during the SMTP transaction.


> If routers on the backbone deleted messages with forged headers, then
> there would be a lot less for mail servers to do. Are router obligated
> to pass along all packets, both forged and malicious?

Routers on large pipes are moving packets at wire speed, which means
gigabits/second.  They don't have the resources to do much besides
figure out where to send each packet next.  They cost hundreds of
thousands of USD and they don't get replaced every other year.  Keep in
mind that these routers are not aware of protocols at a level as high as
SMTP.  They generally are concerned only about individual packets,
unless the pipe is running ATM or frame relay, where the routers have to
guarantee bandwidth for virtual circuits.

Email is transmitted by SMTP, which in turn sits on top of TCP/IP.  The
packets making up a message therefore do not necessarily travel via the
same route from source to destination, nor do they have to arrive in the
correct order.  Individual packets can have errors, and the recipient
can request retransmission of bad packets at any point.  There is no way
that an individual router, which may only see some of the packets in a
particular TCP transaction, possibly out of order and with errors, can
determine what is going on.  TCP is complicated and has long timeouts
(minutes).  A backbone router cannot afford to be very aware of TCP, and
it has no chance of being able to deal with something like SMTP, which
is layered on top of TCP.

Even if each router did magically obtain orders of magnitude more CPU
and memory at no cost, many of the authentication schemes available
require that you have the entire message.  This means the router becomes
a SMTP relay.  Now you have the problem that whomever owns the relay
decides the policy for what mail to reject and what to deliver.  That
would probably be someone like AT&T, Sprint, Comcast, BT, the government
of the PRC, etc.  Need I say more?


> > Since the classifier that can avoid false positives has yet
> > to be invented, it is very important to not silently delete
> > suspected spam, especially at the server level.
> The second point is about the obligation to deliver the mail
> regardless. To use the snail mail parallel, they have no obligation
> to deliver letter bombs, why should email servers be any different?

Because a parcel in snail mail travels in one piece, it is relatively
easy to X-ray it and analyze the image to determine if it contains a
bomb.  An SMTP email transaction consists of dozens of TCP packets that
can take different routes, arrive out of order and have errors in some
packets.  Examining an email in transit is virtually impossible.  Even
if it were, who is going to be the universal arbiter of what is spam?

> Spam or "legitimate" mail that carries a hostile payload is in the
> same category as letter bombs. Dump 'em in a bucket of water then
> dispose of them in a way that does no harm.

If that were practical, we wouldn't have spam.

> If you really wanted absolute delivery of every bit of mail regardless
> of source or content, then what do you do about routers along the way
> that get overloaded and dump masses of packets?

Since the SMTP session runs over TCP, the recipient will request
retransmission of bad or missing packets.  TCP is designed so that the
recipient can detect either of these conditions.  The internet was
designed for survivability, not efficiency.  The one thing that TCP,
combined with the routing protocols, does very well is to complete
transactions.  It may take a long time and the packets may travel a
circuitous path, but if there is any route to reach a node, the system
will find it.  Even temporary outages don't affect final delivery.  Most
reasonably configured MTA's queue temporary delivery failures for
retransmission at regular intervals for up to five days.

> According to some figures I've seen (I'm not sure I believe them
> myself) about 30% of all e-mail is never delivered.

That would probably include mail that can't be delivered due to
addressing problems, and a lot of that is spam.

> While I wouldn't put the figure this high I do know about 2-3% of
> my mail is never delivered and I'd say the incoming loss is in the
> same range.

I'd suggest that you look for a new provider.  This would make doing
business via email virtually impossible, and something I would surely
have noticed over the years.  I don't know about the situation in other
countries, but my guess is that in the U.S., it's far less than that.
The only real reason for a message not getting delivered without a DSN
showing up at the sender is either a server-side spam filter that
silently deletes the message, or a badly misconfigured MTA (I would
argue the former is a subset of the latter).  That would only create a
problem at a specific site and is not general to email delivery.  If you
are experiencing 2-3% incoming and outgoing non-delivery without
appropriate notification, it sounds like your provider has a badly
configured MTA.

Seth Goodman

More information about the SpamBayes mailing list