[Spambayes] Proposing to drop ignore_redundant_html
Tim Peters
tim.one@comcast.net
Sat Oct 26 03:29:34 2002
Proposing to drop the option
ignore_redundant_html
This has been False by default for a long time, and there are no known
clients. I used it early in the project, before we stripped HTML tags, else
(at the time) there was no way to get any multipart/alternative msg with a
text/html part to score as ham in the c.l.py tests.
Since then,
A. We strip HTML tags by default (and character entities --
that's a change I made recently I probably didn't announce here,
although I mentioned it often enough <wink>).
B. We know that sometimes multipart/alternative msgs have different
content in the text/plain and text/html parts, and in particular
that some spam can be identified only by staring at the HTML part.
C. We no longer count multiple instances of a word in a msg multiple
times during training. So if text/html and text/plain parts are
in fact redundant, training isn't affected by seeing the content
twice. It used to be.
IOW, ignore_redundant_html has nothing going for it anymore.