[spambayes-dev] Strange performance dip and DBRunRecoveryError retreat

Richie Hindle richie at entrian.com
Wed Dec 31 20:54:51 EST 2003


As part of trying to reproduce the DBRunRecoveryError problems (a task
that I'm giving up on for now - see below) I've written a script to hammer
the core SpamBayes code, repeatedly training and classifying using
faked-up messages.  It manages about 40 train-and-classify loops per
second on my 2.4GHz P4, *except* between about 100 and 400 messages, when
the performance drops to about a tenth of that and then recovers.

I've done enough investigation to know that the time is being spent in the
core SpamBayes code and not my script, that it's only the occasional
message that takes a long time (around a second in a few cases) and that
it can be either training or classifying that slows down.

I've committed the script as testtools/hammer.py, and I offer this as a
curiosity to anyone interested.  I'm not going to pursue this myself
because I've never seen a similar complaint about real-world SpamBayes
use.

The script includes code to build fake emails that look similar to
real-world ones, but which are all unique and include random elements.
Maybe this will be useful to someone in the future.  It works by taking a
small collection of real emails and chopping pieces out of them at random,
then stitching them back togther.

I don't think the script is going to be a lot of use in tracking down
DBRunRecoveryErrors - it *will* reproduce them as it is, but only by
mimicking a bug that was fixed in 1.0a6, and people have still been
complaining about DBRunRecoveryErrors in 1.0a6 and 1.0a7.

Having read up on full-mode bsddb, and bsddb-backed ZODB (including the
phrases "The underlying Berkeley database technology requires maintenance,
careful system resource planning, and tuning for performance." and
"BerkeleyDB never deletes "old" log files. Eventually, if you do not
maintain your Berkeley database by deleting "old" log files, you will run
out of disk space") I've given up - for the moment at least - on trying to
use full-mode bsddb (with or without ZODB).  sb_server users should use a
pickle and be done with it.  Maybe we should change the default.  Maybe
it's five to two and I should be in bed.

-- 
Richie Hindle
richie at entrian.com




More information about the spambayes-dev mailing list