[spambayes-dev] Evaluating a training corpus
Neil Schemenauer
nas at python.ca
Sun Jun 8 15:13:23 EDT 2003
Greg Ward wrote:
> I'm mulling ways to evaluate the quality of a training corpus, and was
> I know there's code lurking in there somewhere (timcv.py?) for training
> on 90% of the corpus, and then evaluating the other 10% under the
> resulting database.
mboxtest.py is probably the easiest to get going. I think timcv.py
gives better results but it's a little more trouble to setup your test
data. See README.txt for a short explaination of the tools. If you
want to use timcv.py, you can use splitndirs.py to create the test
data.
HTH,
Neil
More information about the spambayes-dev
mailing list