[Spambayes] Question about training via the web interface

Katz, Amir Amir_Katz at bmc.com
Tue Apr 13 01:17:27 EDT 2004


Tony,
First, thanks for the explanation. Now, if I try to convert theory into
practice, this is what I understand:
1) I should train SB only on its mistakes or failure to categorize.
2) The way to do it would be to change the web interface default to
'discard' (so that all messages that were identified correctly would not be
used to train)
3) Click on the appropriate radio button only on those messages that SB
either mis-categorized that it was unsure about.

Please confirm.

Amir

-----Original Message-----
From: Tony Meyer [mailto:tameyer at ihug.co.nz]
Sent: Tuesday, April 13, 2004 06:20
To: 'Katz, Amir'; 'Spambayes mailing list (E-mail)'
Subject: RE: [Spambayes] Question about training via the web interface


> I RTFMed and could not find an explanation for this.

You need to RTFW: <http://entrian.com/sbwiki>, <wink>.

> 3. SB was %100 correct in its analysis, and I do not
> need to click on any message to change its category.
>
> My question is: in this case, does it make any sense
> to click on the 'train' button? As I understand it,
> for those messages, SB it does not need any further training.
>
> Am I right or am I right (:-) ?

Maybe.  There hasn't really been enough testing on different training
methods to be able to make a conclusive statement.  However, IMO (based on
testing and reading mail here):

  1.  Training on everything is not a good idea.
  2.  The three best training methods (so far) are "mistake based training"
(train false positives, false negatives, unsures), "non-edge training"
(train everything inside certain edges, say 0.05 and 0.95), and "train to
exhaustion" (complex; there's a Robinson blog about it).  From what I've
seen, I'd say that "train to exhaustion" gives the best results, but is very
slow, and the other two are about even - nonedge tends to just win for me
personally.

With the web interface, you don't really (yet) have a convenient way to do
"train to exhaustion", which leaves the other alternatives.  The defaults
really point towards "train on everything" (which may change) - but in the
Advanced Configuration you can set the buttons to default to other
categories ('discard', for example), to lean towards "mistake based
training".  IIRC, there was something added in 1.0a9 to help with nonedge
training, but I can't recall what it was - I'm sure it's there somewhere.

Hope this helps!

=Tony Meyer



More information about the Spambayes mailing list