[spambayes-dev] Hapaxes? (was: How low can you go?)

Eli Stevens (WG.c) listsub at wickedgrey.com
Thu Dec 18 21:33:55 EST 2003


Tony Meyer wrote:

> 
> <http://spambayes.sourceforge.net/docs.html#glossary>
> 
> Wrt SpamBayes, 'word' is a token, and 'corpus' is the token database.
> 
> Is this enough information?


Yes, thank you.  :)  Heh, glossary.  Who'd have thunk?

Another newbie Q: were hapaxes not stored at one time?  Some of the 
recent discussion implies that a recent change (storing them?) has 
increased the DB size considerably.  Was that the only heuristic, or was 
it tokens seen less than N times...?

Just trying to get up to speed.  :)
Eli





More information about the spambayes-dev mailing list