[spambayes-dev] Evaluating a training corpus

Neil Schemenauer nas at python.ca
Sun Jun 8 15:13:23 EDT 2003


Greg Ward wrote:
> I'm mulling ways to evaluate the quality of a training corpus, and was
> I know there's code lurking in there somewhere (timcv.py?) for training
> on 90% of the corpus, and then evaluating the other 10% under the
> resulting database.

mboxtest.py is probably the easiest to get going.  I think timcv.py
gives better results but it's a little more trouble to setup your test
data.  See README.txt for a short explaination of the tools.  If you
want to use timcv.py, you can use splitndirs.py to create the test
data.

HTH,

  Neil



More information about the spambayes-dev mailing list