[spambayes-dev] Hapaxes? (was: How low can you go?)
Eli Stevens (WG.c)
listsub at wickedgrey.com
Thu Dec 18 21:33:55 EST 2003
Tony Meyer wrote:
> Wrt SpamBayes, 'word' is a token, and 'corpus' is the token database.
> Is this enough information?
Yes, thank you. :) Heh, glossary. Who'd have thunk?
Another newbie Q: were hapaxes not stored at one time? Some of the
recent discussion implies that a recent change (storing them?) has
increased the DB size considerably. Was that the only heuristic, or was
it tokens seen less than N times...?
Just trying to get up to speed. :)
More information about the spambayes-dev