[Spambayes] Better optimization loop
Thu Nov 21 00:13:44 2002
So then, "T. Alexander Popiel" <email@example.com> is all like:
> Argh. I was working on it, too... hence the patch I just sent out.
> Oh, well... no big deal. It looks like our implementations are
> significantly different, though. Might be worth looking at both
> and seeing which is better.
I think what you did is a little closer to what Rob suggested to me in
response. It sounds like a pretty good idea to me. What I've been
doing in my idle time for the past few hours is playing around with
having the WordInfo class compute its own probability. I did this by
defining two new methods:
if not self.spamprob:
def update_probability(self, nham, nspam):
[basically the same code as Bayes.update_probabilites]
My idea was that you'd have to score the probability for each word
whenever you use it first, but after that the probability is cached.
Long-running things like the pop proxy will get the benefit of the
cached probabilities, and short-lived things like hammiefilter get much
faster training, and only slightly slower scoring. At least, that's
what I expect. I haven't tested this yet.
More information about the Spambayes