[Spambayes] More HTML strippage.

Tim Peters tim.one@comcast.net
Thu, 26 Sep 2002 21:28:49 -0400


I beefed up the HTML stripping.  This actually redeemed one of my marginal
false positives under the f(w) scheme, leaving it with 2 fp (out of 20,000)
and 18 fn (out of 14,000).  It reduced both the ham and the spam mean scores
a little, increased the ham score variance, and decreased the spam score
variance.

Laugh of the day.  Even an idiot could have identified one of the f-n just
by looking at who it was addressed to:

To: <bait@blast.net>
Cc: <bait@accsoft.com.au>,
        <bait@cac.net>,
        <bait@agrc.com>,
        <bait@acenet.com.au>,
        <bait@europe.com>,
        <bait@access.net.au>,
        <bait@fishhunt.com>,
        <bait@eclipse.net>,
        <bait@australis.net.au>,
        <bait@enterprise.net>,
        <bait@ix.netcom.com>,
        <bait@lakemichiganangler.com>,
        <bait@em.ca>,
        <bait@goldrush.net>,
        <bait@addr.com>,
        <bait@lycosmail.com>,
        <bait99@yahoo.com>,
        <bait@charleston.net>,
        <bait@adnet.aust.com>,
        <bait@dragon.acadiau.ca>

I leave you to guess the nature of this spam by quoting a misleading part of
it <wink>:

"""
Q) One of his favourite expressions was, "Fuck him! I hope he
dies!" Who was he?

A) I'll give you another clue: The former part of the expression
proved all too apt when we finally found out, at his death in 1985,
that he was gay. Yup! You got it! It was Rock Hudson.
"""

prob('favourite') = 0.00556242, btw.  The British aren't known for spamming.