[Spambayes] Spam clues for 0% spam

Tim Peters tim.peters at gmail.com
Fri Dec 10 23:08:10 CET 2004


[Kenny Pitt]
> Well, I guess it was bound to happen eventually. A spammer has
> finally managed to include a significant enough number of words
> that are hammy in my SpamBayes database to produce a 0%
> spam score.

Congratulations to "Winter Savings 4 Men (G)"!  That's quite an accomplishment.

> A snippet from a Google FAQ is an interesting choice of "random"
> content. I hope this isn't a harbinger of things to come <0.5 wink>.

Well, it *wouldn't* have been, except you just announced how effective
it was <wink>.

The layers of transformations that occurred between you receiving this
and my seeing the forward make it hard to tell, so maybe you can tell
more easily on your end (provided you saved the email, which I hope
you did):  it *looks* to me like the ripped-off Google text was
entirely contained in an ill-formed HTML comment.  By which I mean I
saw an HTML comment start:

    <!--

and then a bunch of ripped-off Google text, and then the msg ended
abruptly.  There was no HTML comment end (nor any other proper HTML
ending invocations).

The tokenizer intends to throw away text in HTML comments, so this may
be significant.  If it really was an unterminated HTML comment, the
tokenizer would not have thrown away any of it.  If that's the case,
then someone should investigate making the tokenizer more robust
against unterminated HTML comments.

My first guess is not that a clever spammer did this on purpose, but
that the spammer software screwed up, perhaps confused by the
unusually large amount of "decoy text" in the comment.  So I'm not
particularly worried about this unless it shows up a lot more.


More information about the Spambayes mailing list