[spambayes-dev] How much tokenizer improvement is enough to justifya change?

Mon Aug 4 15:24:48 EDT 2003

> I ran the current tokenizer on timcv.py on 10000 hams and 
> 10000 spams split into 10 buckets.
> Here's the base set ... Comments?
[...]
> After change (breaking up compound words > maxwordlen into 
> smaller words)

Have you got a patch for this so that we can see what results we get
too?  (well, the two or three people still interested in testing,
anyway!).

As for the "how much" question, I'll leave that to Tim ;)

=Tony Meyer