[Spambayes] train on error - to exhaustion?

Bill Yerazunis wsy at merl.com
Mon Dec 2 19:43:18 2002


   From: Greg Louis <glouis@dynamicro.on.ca>

   Training on error means "classify messages from the training corpus in
   random order; if the classifier errs or is uncertain, submit that
   message (once?) for training."  Has anyone tried either of:

   1) when the classifier errs or is uncertain, train on that message
   until the classifier gets it right,

I've looked into that on CRM114; the circumstance never happens.
I typically submit the erroneous message three times in rapid
succession:

  - once to get a "before training" value to confirm the misclassify;
  - once with "train this message as" turned on (*)
  - and once again to get an "after training" result and verify the learn.

It's never misclassified any message ever on the "after training"
verification, so I don't know if it would change anything or not
to re-train again and again until it gets the classification correct.

   2) train once on each error, but then repeat the whole training process
   until all messages are classified correctly?

   I'd think the latter might be beneficial, but haven't tried it yet
   myself.


Hmmm... that would be a good way to do regression checking to 
verify that every message that is classified correctly once
is classified correctly forevermore.

   -Bill Y.

(*) this is the step that seems to be running "LEARNing", but
for some reason sendmail is getting upset at me and returning an
error message _as well as_ the confirmation message.  Bizarre.

I'm working on it.



More information about the Spambayes mailing list