[Spambayes] Spam clues for 0% spam

Kenny Pitt kennypitt at hotmail.com
Mon Dec 13 17:02:16 CET 2004


Tim Peters wrote:
> The layers of transformations that occurred between you receiving
> this and my seeing the forward make it hard to tell, so maybe you can
> tell more easily on your end (provided you saved the email, which I
> hope you did):

Well, I intended to.  Unfortunately, when I trained it as spam it of course
went to my "Junk E-mail" folder and a thoughtless Empty "Junk E-mail" Folder
command then sent it the way of the dinosaurs.

> it *looks* to me like the ripped-off Google text was
> entirely contained in an ill-formed HTML comment.  By which I mean I
> saw an HTML comment start:     
> 
>     <!--
> 
> and then a bunch of ripped-off Google text, and then the msg ended
> abruptly.  There was no HTML comment end (nor any other proper HTML
> ending invocations).

I did look at the message enough to know that that was the case.

> The tokenizer intends to throw away text in HTML comments, so this
> may be significant.  If it really was an unterminated HTML comment,
> the tokenizer would not have thrown away any of it.  If that's the
> case, then someone should investigate making the tokenizer more
> robust against unterminated HTML comments.

I had forgotten that we attempted to throw away the comments at all.  In
most cases, browsers seem to treat the end of the data as an implied end to
any open tags.  Interestingly, though, when Outlook displayed the original
message it showed the malformed comment text as if it were normal content.

Since I doubt that a botched comment like this would ever appear in ham
message, though, I'll look into discarding unterminated comments as well as
correct ones.

-- 
Kenny Pitt



More information about the Spambayes mailing list