[Spambayes] Randomized Spam Beating SpamBayes
Amedee Van Gasse
amedee at amedee.be
Wed Oct 18 00:24:44 CEST 2006
Op dinsdag 17-10-2006 om 16:12 uur [tijdzone -0600], schreef Quinn:
> > Sounds like something a "disociated press" or other random text
> > generator created. Perhaps you know about the monkeys with a
> > typewriter? If you let a thousand monkeys press random keys on
> > a typewriter, eventually one of them will by accident write a
> > few lines from a Shakespeare sonnet. These random text
> > generators work in a similar way.
> Interesting. I hadn't realized that was being used to actually do anything;
> that's kind of cool. Not sure if these are coming from that sort of thing,
> though. There are references to specific websites and publications
> scattered around self-referentially. I really think they're somehow farming
> real source and taking strings of variable length and just stringing them
> together. It's a pretty good way to produce coherent-ish body text that
> doesn't read as gibberish from an electronic standpoint.
> So, does this sort of thing defeat SpamBayes? They're making it through the
> filter with great regularity, and have been for quite a while, so the
> algorithms haven't figured it out in several hundred messages. Is there
> _any_ way to deal with it, in SB or any other filter other than sender
> black- or white lists?
I suppose there must be some way, because I don't get them.
Your message with the example scored as unsure:
X-Spambayes-Classification: unsure; 0.79
If it didn't include the typical spambayes mailing list headers, I'm
sure it would have gotten an even higher spam score.
More information about the SpamBayes