> We have to be careful with this.  It would be relatively simple to
> stymie, by simply adding two urls, the spam one, and an unrelated
> innocent site.  Or three urls, or whatever...

Spammers are simple folk. They won't be putting no innocent url's in 
these spams...

> We definitely should NOT crawl the site, just in case it really is an
> innocent url.  The load can crush a site, particularly if it's hosted.

Nah. You need to throw thousands of requests at a half-decent web 
server before it gives up the ghost. And if they're sending out 10 
million mail pieces, they should expect their http server to take 
some load. These are definitely NOT innocent emails. They come from 
bogus senders, have minimal headers (deliberately), and contain 
*nothing* but a url. Which points, via redirect naturally, to an 
incest porn or get-a-huge-penis site, etc.

> Spambayes is superb at recognizing spam based solely upon the payload
> received.  If these mails are slipping through, then we need to
> examine the clues and see why.

I couldn't agree more! Here's one which got a resounding "unsure" 
(p=0.5130) from my classifier first time through. After slurping that 
url, it shot up to p=0.9893, exactly where it belongs!

Return-Path: <tkeamou at kerchunk.com>
Received: from kerchunk.com ([]) by 
www1.kc.aoindustries.com (8.11.6/8.11.6) with SMTP id h2V3DST27976 
for <richard at jowsey.com>;
Date: 29 Mar 2003 04:44:15 -0400 
From: Ella Bunton <tkeamou at kerchunk.com> 
To: richard at jowsey.com 
Subject: inside Daughter 
Message-ID: <20030330002313.YhnvhVSzPGVA at kerchunk.com> 
Content-type: text/plain; charset="us-ascii" 



