[spambayes-dev] Another incremental training idea...

Seth Goodman nobody at spamcop.net
Tue Jan 13 19:01:18 EST 2004


>     Seth> Another related idea is to dynamically move the edge thresholds
>     Seth> until the training ratio averages 1:1.
>
[Skip Montanaro]
> I think you'll quickly wind up moving the ham edge threshold to
> 0.00 and the
> spam edge threshold would wind up very near to your spam cutoff.
> That's not
> necessarily a bad thing, but it has to be considered.

Good point.  Given an unbalanced input mail flow, like most of us seem to
have, if you want to have 1:1 training, this is inevitable on one side or
the other (unless you want to use a random sampling method to select a
subset of nonedge spam to train - that scares me as well).  We might as well
list it as another variation on nonedge:  I suggest calling it one-edge.  It
doesn't give me a particularly good feeling to train on all ham but only
non-edge spam, but maybe the 1:1 training ratio will allow it to perform
despite the unsatisfying way the balance is achieved?

--
Seth Goodman

  Humans:   off-list replies to sethg [at] GoodmanAssociates [dot] com

  Spambots: disregard the above




More information about the spambayes-dev mailing list