[Spambayes] incremental testing with CL2/CL3?

Sun, 06 Oct 2002 18:10:51 -0400

[Brad Clements]
> ...
> I notice in the TestDriver, comments like:
>
>     # CAUTION:  this just doesn't work for incrememental training when
>     # options.use_central_limit is in effect.
>     def train(self, ham, spam):
>
> I'm not planning on using untrain(), so does this comment still apply?

Yes, afraid so.  A do-something compute_population_stats() is unique to the
central limit schemes, and all it knows about the world is the ham and spam
passed to train().  If you had trained on 20000 ham and 20000 spam, and then
passed 10 of each to train() in another call, the population statistics for
the previous 40000 of each would be lost, overwritten by the stats for the
new 20 msgs.

I don't see an obvious way to fix this, alas.  It would be easiest to fix
under clt1.  You could train on every previous msg every time, but that's a
quadratic-time proposition overall.