[spambayes-dev] Re: [Spambayes] fatal error?
Skip Montanaro
skip at pobox.com
Tue Aug 26 13:32:38 EDT 2003
>> I'll check with Sleepycat, but it seems to me that the most expedient
>> course would be to acquire a lock around database accesses.
Tim> Brrrr. Running a Berkeley backend is already soooooo much slower
Tim> than running from a dict. I didn't really notice that until the
Tim> SoBig worm turds starting swamping my inbox, but after a few days
Tim> of that I switched back to using a pickled dict. Adding a lock
Tim> around each stinkin' access is a good way to soak up excess cycles,
Tim> anyway <wink>.
I suspect that the Outlook plugin simply makes it easier to find problems
(more users, more worm mail, more concurrent threads, whatever). I think
the same (or a similar) problem would exist were two instances of
hammiefilter running at the same time, both trying to update the file. I'm
just fortunate enough to have never encountered that problem. Even using a
pickle, you really ought to use some sort of lock protocol when reading or
writing the pickle file if there's any chance of concurrent access by
another process or thread. That you only read it at the beginning and write
it at the end only limits the opportunity for collision.
I just (re)ran a little experiment. (I'm sure we've done this in the past.)
I took my current hammie.db (153685 keys, no hapaxes, the result of
processing 11,000+ hams and 8,000+ spams) and converted it to a pickle using
dbExpImp. Startup time is dramatically different:
% time python -c 'import pickle ; db = pickle.load(open("hammie.pck"))'
real 0m32.193s
user 0m22.850s
sys 0m0.430s
% time python -c 'import cPickle ; db = cPickle.load(open("hammie.pck"))'
real 0m5.650s
user 0m3.720s
sys 0m0.350s
% time python -c 'import shelve ; db = shelve.open("hammie.db")'
real 0m0.155s
user 0m0.050s
sys 0m0.050s
This is not to imply that my huge database is typical or that my usage of
hammiefilter is either. Using pickles for moderately sized training
databases would probably work, regardless of the application. With
long-running SB apps like the Outlook plugin or pop3proxy, pickles are
probably the way to go. (Maybe it's time to give up on hammiefilter
altogether.)
Skip
More information about the spambayes-dev
mailing list