[Spambayes] proposed changes to hammie & co.

Tim Stone - Four Stones Expressions tim@fourstonesExpressions.com
Wed Nov 20 05:28:55 2002


11/19/2002 11:09:35 PM, Neale Pickett <neale@woozle.org> wrote:

>So then, Tim Stone - Four Stones Expressions <tim@fourstonesExpressions.com> 
is all like:
>
>> Neale, I just checked in dbdict and Bayes.  Lemme know what you think.  
>
>Okay, so you're just copying the file and then renaming it later.  It
>looks like you're trying to wrap the dbm file with a transactional
>model.  Copying isn't an atomic operation though, so locking will be a
>problem.  

See below... I don't think Richie is so much after transactionality as 'forget 
it' mode.

I agree that locking is a problem.  I don't like the implementation too 
much... I experimented with keeping an in-memory cache, but that gets hard to 
manage memory consumption.  These bayes databases might get kinda large... So 
I figured I'd let the dbm implementation manage memory.  If it's too stupid to 
do a good job, then we (someone) should fix that.

Perhaps in the long run, ZODB is the final answer.  But pickles in particular 
are so portable... dbm files are so fast... different strokes for different 
folks, I guess.
 
>
>I still don't understand why a DBDict needs load/store.  It'd be so much
>easier just have store() call self.db.sync() and make load() a noop.  Is
>there something out there which depends on the disk version being
>different from the memory version?

As nearly as I can tell, the dbm implementations vary on when they write stuff 
to persistent storage.  Sync only offers the guarantee that the memory and 
persistent versions match.  Richie has presented the requirement that the 
dictionary be able to forget what has happened...

>
>> Also, I tried pop3proxy with the playground branch, and it doesn't
>> work.  It looks like we got a back level of Options.py.  I'm not sure
>> how to get it up to snuff...
>
>There was a thinko in pop3proxy, but now I'm getting a weird
>AssertionError.  Is this something with ther Persistence classes, maybe?
>It looks like nspam isn't getting udpated:
>
>Traceback (most recent call last):
>  File "/usr/lib/python2.3/threading.py", line 410, in __bootstrap
>    self.run()
>  File "/usr/lib/python2.3/threading.py", line 398, in run
>    apply(self.__target, self.__args, self.__kwargs)
>  File "./pop3proxy.py", line 1306, in runProxy
>    state.bayes.learn(tokenizer.tokenize(spam1), True)
>  File "classifier.py", line 298, in learn
>    self.update_probabilities()
>  File "classifier.py", line 345, in update_probabilities
>    assert spamcount <= nspam
>AssertionError
>
>Workin' on it.
>
>
- Tim
www.fourstonesExpressions.com 




More information about the Spambayes mailing list