[Spambayes] Is this a sign of future problems ?

Tim Peters tim.one at comcast.net
Tue Dec 16 14:50:50 EST 2003

[Chuck Lewis]
> Got this from a friend that runs another mailing list:
> =========================================================================
> Interesting trend ... Garbage spam
> I've noticed an interesting trend recently ... a lot of the 'spam' I'm
> receiving lately is totally garbage.  No content whatsoever ... not
> even hidden in HTML.
> Here's an example ...
>> Subject: Re: AMUKZGO, was beyond even
>> From: "Oliver" <tdjrryefyyy at canada.com>
>> Date: Wed, 17 Dec 2003 09:44:46 +0600
>> To: midrange-jobs at midrange.com, midrange-l-admin at midrange.com,
>> midrange-l-owner at midrange.com, midrange-l-request at midrange.com,
>> midrange-l-sub at midrange.com, midrange-l-unsub at midrange.com
>> papaw bedimmed prophetic cocky
>> farfetched conceive auction ergodic robbin lullaby omaha
>> manslaughter pea celanese florentine assure depressible bowl cannel
>> ewe gertrude
> The only reason I can think of is that the spammers are trying to
> poison the Baysian statistics that are being gathered, so more of the
> legitimate spam will be let through.
> David
> =========================================================================
> So is he on to something here ? This sounds plausible from my
> admittedly limited understanding of these tools.
> Thought/Comments ?

The point of inserting random gibberish is to frustrate fingerprinting
schemes (if no two spam are the same, comparing new spam simple-mindedly to
a database of known spam won't catch anything new).  Spam has always done
this.  The modern variation is inserting random dictionary words instead of
completely random strings, because some fingerprinting schemes have grown
smart enough to ignore non-dictionary strings, or even to penalize their

Neither variation is much use against Bayesian filters (e.g., is bedimmed a
strong ham word for you?  heh).  More effective against those is to include
the text of a randomly chosen contemporary news story (then it stands a
decent chance of sneaking by the filters of those to whom the content of the
news story matches things they normally correspond about).

