[Spambayes] RE: Watch out for digests...
Robert K. Coe
bob at 1776.com
Fri Dec 12 13:20:46 EST 2003
What's a "hapax"?
> -----Original Message-----
> From: Skip Montanaro [mailto:skip at pobox.com]
> Sent: Wednesday, December 10, 2003 9:23 PM
> To: Tony Meyer
> Cc: spambayes at python.org; spambayes-dev at python.org
> Subject: RE: [Spambayes] Watch out for digests...
> >> Big mistake. Stuff started getting wacky real fast.... Guess what?
> >> One of the messages in the digest was an obvious spam.
> Tony> This is perhaps a drawback of the minimalist database size
> Tony> training strategy. I'm guessing that if you had a larger
> Tony> database, the effect wouldn't have been as pronounced?
> Maybe. At the moment, I have 9768 tokens in my database and 7731 of them
> are hapaxes. As you suggest, it would appear mistakes can throw things off
> more dramatically, but it is also easier to detect.
> I'd be interested to see what others' hapax fractions are:
More information about the Spambayes