[spambayes-dev] A spectacular false positive

T. Alexander Popiel popiel at wolfskeep.com
Mon Nov 17 14:08:26 EST 2003


In message:  <E1ALlWk-0007h5-EU at mail.python.org>
             "Kenny Pitt" <kennypitt at hotmail.com> writes:
>Tim Peters wrote:
>> Sigh -- we need solid research on training disciplines that work
>> great in real-life use, respecting that anything requiring human
>> input will barely get used except by geeks who never tire of watching
>> the training process.

FWIW, this sort of research is what I built the incremental harness
for.  It really ought to be named something like the time-sequence
harness, but I didn't think of that at the time.

In any case, use the harness, you can specify (in regimes.py) any
particular training behaviour you want.  Using that, you can run
cv-esque tests to check effectiveness.

Unfortunately, after building the harness, I lost all will to actually
use it. :-/

>To try to work around the problem, I implemented two experimental
>options to train on all certain ham and train on all certain spam.
>Since I can turn them on or off independently, I can use them to get my
>ratio back in balance and then turn them off.  What I'd like to
>implement is a way to do this automatically.  I'd like to say something
>like, "If my spam count reaches twice my ham count then train on all
>certain hams until the counts are within 5% of each other again."  These
>cutoffs would of course be configurable.

This is a training behaviour which is easily emulated using the harness
above.  I'd love to see some quantitative numbers on it vs. training on
everything or training on just mistakes and unsures (both of which are
preexisting regimes).

>It will take me a little while to get around to implementing this and
>even longer to see if it is effective, but I'll report results (or at
>least perceptions) when I have them.

Cool.

- Alex



More information about the spambayes-dev mailing list