[Spambayes] Eliminating many malformed spams
T.A.Meyer at massey.ac.nz
Fri Jun 20 12:09:56 EDT 2003
> The specific problem that concerned
> me most was when a message's headers would contain an HTML
> comment, as this appears to be the most frequent
> malformation, leading to the following typical error trace:
> email.Errors.HeaderParseError: Not a header, not a
> continuation: ``<!--/ad/2/CD7-->''
I'm fairly sure that the malformation here is *not* due to the comment.
"Not a header, not a continuation" usually means that the empty line
between the headers and the body is missing (so the parser thinks that
the first line of the body is a header line, and tries to treat it as
> For now, though, I've found that it's at
> least helpful to be able to ignore such HTML comments.
Spambayes should be stripping everything in comments when it tokenizes
anyway (so that spammers can't put lots of 'good' words in comments).
IMO, it would be wrong for the parser itself to ignore comments -
sometimes there are valid reasons for them to be there, especially when
it is valid markup (as in the above case). As I said above, I'm pretty
sure that it's not the comment, but the position of the comment, that is
causing the problem anyway.
(But as Barry said, hopefully the new parser will help these things).
More information about the Spambayes