[spambayes-dev] Reduced training test results

T. Alexander Popiel popiel at wolfskeep.com
Mon Dec 29 12:51:22 EST 2003

In message:  <3FEFF5F6.1090004 at hooft.net>
             Rob Hooft <rob at hooft.net> writes:
>T. Alexander Popiel wrote:
>> Training on just those messages whose score isn't 0.00 or 1.00
>> (rounded) seems to be a huge win over training on everything.
>Told you:
>See the section "Train on Errors, Unsures, and non-obvious correct 
>decisions" at http://www.entrian.com/sbwiki/TrainingIdeas

Hrm.  I suppose that I ought to actually look at the wiki. ;-)

Is there any way for me to upload my plots to go along with any
discussion that I might add to the above page?  I could just
reference them on my machine, but it seems better to keep the
wiki content all in one place.

>> Not so much because the accuracy is better (though accuracy
>> does seem to be improved by neglecting those messages that it's
>> already certain about), but because of a hugely reduced training
>> set (and thus database). 
>Both are effects I can feel in practice!

FWIW, using this training style with my nightly retrains cut my
database size in half (from 21 meg to 10 meg).  This is with a
4-month horizon, too, so the difference would likely be even
greater with a longer span.

- Alex

