[spambayes-dev] auto-training w/ small db seems like a bad idea

Tony Meyer tameyer at ihug.co.nz
Wed Jan 14 19:43:40 EST 2004

> "small DB/non-edge training" may very well be a great idea. 

[Eli Stevens]
> Is there a regime that simulates what you had been doing manually? 

The "non-edge" part is definitely there (the 'nonedge' regime).  To keep the
db small, you could either modify the regime to have bigger edges, or do
something more fancy.  It does usually keep it pretty small as is, though -
on the testing set I've been using the average number of messages used for
nonedge was 2.3/day, whereas 'perfect' (train on everything) was something
like 30/day.

BTW, if you're using the incremental testing stuff, you might want to use
the versions in CVS - the sort+group.py script was improved by Tim a while
back, and the scripts have also grown docstrings, which might help.

=Tony Meyer

