[spambayes-dev] comment assertion error? revisitDBDictClassifierassumptions?

Kenny Pitt kennypitt at hotmail.com
Wed Dec 24 08:44:48 EST 2003


Tim Peters wrote:
> [Kenny Pitt]
>> With the caching and optimization in the database engines being what
>> it is today, it seems that we might be better off to always write
>> changes to the DB immediately and dispense with the whole
>> self.changed_words thing altogether.
> 
> This should be measured; it's not (or shouldn't be) a religious
> issue.  I have no experience with general-purpose database engines
> that are actually fast; only some that aren't as slow as others
> <0.5 wink>. 

As always, never assume anything without thorough testing, right? <wink>

>> When there are multiple processes that could be using the database
>> at the same time, any caching (read or write) that we do ourselves
>> outside the database engine has the potential to generate
>> inconsistencies in the data anyway.
> 
> A conclusion there, one way or the other, depends on specific details.
> Concurrent read-write access is never simple, and I'm not sure anyone
> uses spambayes that way anyway.

As far as I can tell, this should only happen with
sb_filter/sb_mboxtrain.  All the other solutions that I know about
(Outlook, sb_server, sb_imapfilter, sb_xmlrpcserver) have a single
server process that handles all database access.  Out of any remaining
solutions, I also suspect they are rarely used since I hardly ever see
them mentioned on any of the mailing lists.

This leads to a question regarding the proposed direct BerkeleyDB
storage.  If we never access the database from more than one process at
the same time, do we really need a full-fledged multi-process
environment for Berkeley?  You can do private, multi-thread environments
that provide sufficient locking with less overhead for a single process.
Any guesses from anyone as to what cases would require cross-process
locking?

-- 
Kenny Pitt




More information about the spambayes-dev mailing list