[Spambayes] "Lindsey Carter": Re: [Zope-Annce] New
Mon Nov 4 21:26:17 2002
> At least the last 4 are probably unique to this particular spam, so
> you must've trained on it.
Read my reply -- *all* the words here were hapaxes. No exceptions.
> That should explain why it's now considered spam. Unfortunately you've
> also made zope-announce posts look more spammy! :-(
As soon as he trains on just one ham from zope-announce, the spamprob will
fall to 0.5. Scoring relying on hapaxes is brittle, despite the instant
gratification it supplies; the correct cure is to train over a random
sampling of all your email regularly, and whether or not it's been correctly
classified. I got a dozen stronger-than-hapax spam clues out of your email
example (all from the spam part of it), because I keep training even on spam
that scores 1.0 and ham that scores 0.0; this moves spamprobs out of the
brittle hapax range into a reflection of what email *really* looks like.
More information about the Spambayes