[Spambayes] spamprob combining

Tim Peters tim.one@comcast.net
Wed, 09 Oct 2002 23:29:38 -0400


[T. Alexander Popiel]
> Oooh, goodie!  Another thing to consume CPU-hours!

Yup, that's the only idea here <wink>.

> I'll run this one after I get done with my initial clt tests
> (which are taking about 4.5 hours each :-/ ).

Use less data?

> I can't really say anything else, yet, but clt seems _much_ slower
> than the default classifier.

I haven't really noticed that.  If you're using your "--trainstyle full"
patch with timcv, then, yes, it would be enormously slower -- timcv gets
enormous *efficiency* benefits (both instruction-count and temporal cache
locality) out of incremental learning and unlearning.

The "third training pass" unique to the clt methods also doubles the
training time (each msg in the training data is tokenized once to update the
wordprobs, and then a second time to compute the clt ham and spam population
statistics).