[Spambayes] Suggestion for HTML analysis
Matthew Dixon Cowles
matt at mondoinfo.com
Sun Sep 14 18:02:27 EDT 2003
> I'm new to the list.
Hello and welcome.
> recently I've gotten HTML-formatted spam that attempts
> to circumvent recognition by inserting copious amounts of HTML
> garbage tags between letters
> I think Spambayes is fooled by this technique, because I don't see
> any of the operative words in the analysis
Tim Peters added that in May. From the CVS checkin comment:
I dug into a small collection of Unsures that looked like blatant
spam, and discovered they were all using this kind of trick:
Wr<!$FS|i|R3$s80sA >inkle Reduc<!$FS|i|R3$s80sA >tion
That is, disguising words by inserting HTML nonsense tags. We
replaced each tag with a blank, yielding the pretty useless
tokens "Wr", "inkle", "Reduc" and "tion". We previously fixed a
similar problem using embedded HTML comments. I should have
fixed this other one then.
More information about the Spambayes