[spambayes-dev] Incremental training results

Fri Jan 9 12:58:27 EST 2004

In message:  <16382.48364.409795.370110 at montanaro.dyndns.org>
             Skip Montanaro <skip at pobox.com> writes:
>
>Thanks for the extra info.  Where do I find understandable definitions of
>the different training regimes ("perfect", "nonedge", "expire4months",
>"corrected", etc)?  Even after reading incremental.HOWTO.txt and regimes.py
>in the testtools directory I don't understand what the different regimes
>mean.  For instance, what is "perfect" training?  How is it different from
>"nonedge"?  What does "properly classified with extreme confidence" mean?

Argh.  Most of the confusion arises from a complete lack of
documentation on the interface to the regimes: what their
parameters mean, what the return code means, etc.  I'll try
to get to that soon... unless someone beats me to it.  Reading
incremental.py is pretty much required until such docs get
written.

'perfect' and 'corrected' are both train-on-everything regimes.
With 'perfect', the trainer is given perfect and immediate knowledge
of the proper classification (as defined by location in the Data
directory tree).  With 'corrected', the trainer trusts the classifier
result until end-of-group, at which point all mistrained (or
non-trained) items (fp, fn, and unsure) are corrected to be trained
with their proper classification.

'expire4months' is like 'perfect', except that messages are
untrained after 120 groups have passed.

'nonedge', 'fpfnunsure', and 'fnunsure' are all partial-training
regimes, where some messages are never trained on at all.

'nonedge' trains only on messages which are not properly classified
with scores of 1.00 or 0.00 (rounded).  False positives at 1.00 and
false negatives at 0.00 _are_ trained.

'fpfnunsure' only trains on fp, fn, and unsure.  'fnunsure' only
trains on fn and unsure.

- Alex