[Spambayes] "Lindsey Carter": Re: [Zope-Annce] New zope.orgdevelopment

Tim Peters tim.one@comcast.net
Mon Nov 4 21:26:17 2002


[Guido]
> ...
> At least the last 4 are probably unique to this particular spam, so
> you must've trained on it.

Read my reply -- *all* the words here were hapaxes.  No exceptions.

> That should explain why it's now considered spam.  Unfortunately you've
> also made zope-announce posts look more spammy! :-(

As soon as he trains on just one ham from zope-announce, the spamprob will
fall to 0.5.  Scoring relying on hapaxes is brittle, despite the instant
gratification it supplies; the correct cure is to train over a random
sampling of all your email regularly, and whether or not it's been correctly
classified.  I got a dozen stronger-than-hapax spam clues out of your email
example (all from the spam part of it), because I keep training even on spam
that scores 1.0 and ham that scores 0.0; this moves spamprobs out of the
brittle hapax range into a reflection of what email *really* looks like.




More information about the Spambayes mailing list