[spambayes-dev] Strange performance dip andDBRunRecoveryErrorretreat

Tim Peters tim.one at comcast.net
Thu Jan 1 22:41:03 EST 2004


[Tony Meyer]
> ...
> Maybe this is one of the causes for the RUNRECOVERY errors - the user
> doesn't close sb_server properly, so the db isn't properly
> closed/saved. Some time later the error occurs.  It would explain why
> they are less frequent with the plug-in, because the plug-in saves
> the db much more often (after every "delete as spam"/"recover from
> spam" event, and IIRC after any incremental training).

Well, yes and no <wink>.  It depends on the backend database.  If it's a
giant pickled dict, the addin almost never saves it (just at "full retrain"
and "shutdown" times).  But if it's a Berkeley backend, the db is saved
after every training event, via DBDictClassifier.store(), which calls the
db's sync() method at the end.

I think you may be on to something here!  It's always baffled me that I
couldn't provoke a DB corruption problem from Outlook even when I
deliberately power-cycled the box *while* spambayes was scoring new
messages.  I damned near lost my main .pst file doing crap like that, but
the Berkeley DB was never bothered.  But the db is never in an unsync'ed
state during scoring, it's only in an unsync'ed state after
DBDictClassifier._wordinfoset() writes out a hapax, or __wordinfodel()
removes a key from the database, during learning or unlearning, and then the
addin syncs again right after the (single) message is learned or unlearned.

> Does anyone see how it could hurt to have sb_server save the db after
> doing a page of training?  (This would just be a one line addition to
> onReview in ProxyUI.py).

+1 on trying it.  The corruption problems are critical, and this may well
help.  Hell, sync it after each message gets trained.

> ...
> * Everything crashes sooner or later on this machine - python.exe,
> gcc.exe, IE, ... I'm sure that it's unrelated to spambayes or python.

Ya, that's a well-known Linux bug <wink>.  Computers suck, you know.




More information about the spambayes-dev mailing list