[Spambayes] spam designed to defeat Bayesian filters

Ryan Malayter rmalayter at bai.org
Wed Nov 19 14:46:27 EST 2003


> From: Seth Goodman

> 1) What is this thing?  Does it harvest addresses when rendered?

Exactly. It's called a web bug. Opening this email in a program that
automatically downloads pictures would tell them that your address is
valid, and you viewed their spam.
 
> 2) Are there any approaches that have been discussed to 
> ignore the "almost white" text during parsing?

Yes, and most have been rejected, because they have not shown an
increase in capture rate. For the most part, the URLs and the headers
used often condemn this type of spam despite the random words. However,
that doesn't seem to be the case with this one. If you can propose a
method, then get someone to code it into the tokenizer (or do it
yourself) for testing, let us know.

The really strange thing is that most of the random words had a very
high ham probability with my corpus as well as yours. The message scored
as a 0% on my SpamBayes installation as well. It looks like someone is
analyzing a bunch of mail with a bayesian filter, and coming up with
their own list of "universal" list of innocent words.

Since this message is malformed (the bad headers prevent the HTML from
showing up correctly), and it doesn't really contain any sales pitch,
this is probably a test a spammer is using to see how many of this style
of spam he/she can get through. Of course, once sales-pitch words are
added to it, with will score much higher.

Regards,
	Ryan





More information about the Spambayes mailing list