[Spambayes] Hi - just got on the list....

Greg Ward gward@python.net
Mon, 16 Sep 2002 11:40:36 -0400


On 14 September 2002, John Draper said:
> Allow me to introduce myself...

You forgot to introduce yourself!  Luckily Google helped out.  I'm not
sure if it's depressing or heartening that even the old-guard phone
phreaks are overwhelmed by spam.

> My partner (who is very good programmer) wrote a MTA in Python (in
> just a half hour), then I modified it to handle multiple threads,
> and wish to use this as a base of a development of a "proof of
> concept" to see how well this might work.

I have been having a load of fun with Python embedded in Exim, my
MTA-of-choice.  I imagine I would have even more fun if the whole MTA
were written in Python.  But I have to admit I'm skeptical that a
half-hour job can approach the stability, robustness, performance, and
flexibility of something like postfix, Exim, or qmail.

> Right now, I'm just trying to determine what's out there, so want to
> participate in this list to see what the current anti-spam technology
> level is at, and what you guys are doing.

If I were a spammer, I'd be worried.  For the first couple years, the
main force behind anti-spam technology was anger.  That hasn't worked --
look at how many open relays there still are.  Look at how many ISPs
wink and nod at spammers, as long as they pay the bills.  Now the tide
is shifting towards intelligence.  Spammers are mean, they're spiteful,
and above all they're persistent -- but intelligent they are not.

> As a result of my efforts to eliminate spam, I have 33 megs of spam
> I've collected (About 6000 spam messages) over the past year.  I
> didn't save ALL of it, as I've eliminated redundancy, and can make
> them available if anyone wants it.

That *could* be a useful contribution.  Probably the best thing you can
do with it is put it on a web site somewhere and let us know the URL.
However, the fact that all of this spam is from your inbox -- as opposed
to a front-line MTA somewhere -- means that it will be indelibly tagged
with the path from various MTAs to your inbox.  Lots of it will have
"To" headers pointing at you (OK, some spammers have figured out that
one), and all of it will have "Received" headers outlining the path from
your ISP/employer/web host/email host/whatever to your inbox.  All of
those can provide clues for a content-scanner to latch onto, which is
precisely what we don't want: eg. a large body of known spam, 50% of
which has "To: crunch@shopip.com", means that "To: crunch@shopip.com" is
an excellent spam clue *in the context of that corpus*.  Not exactly
what you want.

Anyways, you could do much worse than put your spam collection on the
web somewhere.

        Greg
-- 
Greg Ward <gward@python.net>                         http://www.gerg.ca/
God made machine language; all the rest is the work of man.