[Spambayes] proposed changes to hammie & co.
T. Alexander Popiel
popiel@wolfskeep.com
Thu Nov 21 17:18:00 2002
In message: <w53y97nxxof.fsf@woozle.org>
Neale Pickett <neale@woozle.org> writes:
>So then, "T. Alexander Popiel" <popiel@wolfskeep.com> is all like:
>
>> In message: <w53d6ozzhyt.fsf@woozle.org>
>> Neale Pickett <neale@woozle.org> writes:
>> >
>> >I'm currently entwined with mucking the heck out of WordInfo. I've got
>> >a neato scheme based on Alex's patch and comments where the WordInfo
>> >classes still compute their own probabilities, but also keep a revision
>> >number which is compared against a MetaInfo class.
>>
>> Eww, do we gotta? I thought I was trying to make the DB smaller. ;-)
>
>Ah, but the only thing *stored* is (spamcount, hamcount). The
>probability is calculated the first time you ask for it. If you don't
>update nspam or nham, the next time you ask for it it gives the cached
>value. So the database is small, but you still get the in-memory
>probability caching if you're using a pickle or ZODB.
Sounds like there is no caching benefit for one-message-per-invocation
situations like running out of procmail, then. Ouch. Unless I'm
mistaken, by the time that the probability is being computed in your
scheme, the identity of the word has been lost, and thus the probability
can't be stored in a secondary database like I had written, either.
I suppose that there's enough performance penalties in the procmail
scenario (python startup, options loading, other various overhead)
that computing all the probabilities from counts is small change.
- Alex (overly critical)
More information about the Spambayes
mailing list