[Spambayes] Messages not moving / Sneaky HTML spam

Gray Norton gray at stanfordalumni.org
Tue Oct 28 12:44:52 EST 2003


> -----Original Message-----
> From: Tony Meyer [mailto:tameyer at ihug.co.nz]
> Sent: Monday, October 27, 2003 4:40 PM
> To: 'Gray Norton'; spambayes at python.org
> Subject: RE: [Spambayes] Messages not moving / Sneaky HTML spam
> 
> > or are entirely new tokens not considered in calculating the score?
> 
> They're not (by default).  If a token hasn't been seen before, then it
> will score 0.5.  Any tokens that score between 0.4 and 0.6 aren't
included > in calculating the message's score, so any new tokens won't
be used.  So
> it's only if the spam contains tokens that you've already trained as
good
> that this technique will have any effect.

OK, that's very helpful to know. Unfortunately, by the time I got your
response I had trained against the new message, making it impossible (as
far as I know) to view the clues as they had appeared when the message
arrived. However, I got two new copies of the mail today, with different
white-on-white text. What I discovered was very interesting...

A look at the clues for the new messages revealed that there were a
bunch of tokens with a #ham value of 1 and a #spam value of 0; it is
apparently by virtue of these tokens that the messages are slipping
through. The odd thing is that I am 99.99% certain that these tokens
have never appeared in a piece of ham (I did a full-text search for a
few of the tokens on my previously trained ham and confirmed that they
were not there...besides, many of them are extremely obscure words or
typos).

It is AS IF this message somehow fooled Spambayes into training on it as
ham prior to scoring it. I know this makes no sense, but that's really
how it looks. Does anyone have a clue how to explain this?

Thanks,

Gray





More information about the Spambayes mailing list