[spambayes-dev] A new and altogether different bsddb breakage
Tim Peters
tim.one at comcast.net
Wed Dec 17 20:16:30 EST 2003
[Richie]
> Fantastic. So in theory at least...
>
> o All the SpamBayes programs could use BDB-backed ZODB instead of
> directly using bsddb.
Yes, but then they also have to use persistent objects in ZODB's sense of
the word. That's not scary to me, because SpamBayes was originally designed
with ZODB's BTrees in mind as the mapping data structure. There is no
*direct* access to bsddb via ZODB, you interact with ZODB's view of the
world then, and BDB is just a (mostly) invisible, and wholly inaccessible,
implementation detail.
> o They would automatically work nicely together with a single writer
> (eg. sb_server is training while sb_filter is classifying),
Surprise! Nope. The reader will suffer a ReadConflictError if it tries to
access anything that's been modified by the writer since the reader began
its current transaction. This protects the reader from seeing inconsistent
data. The reader is always in *some* transaction, so you can't worm around
this.
ZODB 3.3 will support "multiversion concurrency control", which will deliver
the state of the data (to the reader) current *at* the time the transaction
began, and there are no ReadConflictErrors then. But that hasn't been
released yet.
> and with a bit more work catching ConflictErrors, we could even have
> multiple writers.
ConflictErrors can only be guaranteed not to happen now if there are no
writers.
> o The database wouldn't get significantly bigger than with direct
> use of bsddb.
That one's hard to guess in advance. The BDB back end creates a number of
distinct database tables to support ZODB's ideas of object identity, object
revisions, and how objects all tie together. That's all metadata, on top of
the application data we work with directly. But BTrees are a pretty
space-efficient structure, and there are builtin flavors of BTree that are
especially compact for mappings having integers as keys or values.
> o Since BDB uses bsddb in transaction mode rather than single-file
> mode, we can say goodbye to those nasty little DBRunRecovery
> errors. Yay!
That would be great -- although I still haven't seen one of these, despite
running 3 different Outlooks on 3 different bsddb3's for a loooong time now!
>> I'm half ready to declare that ZODB is the only database anyone
>> should ever use
> apply to BDB-backed ZODB, or only to ZODB's native storage?
ZODB's BTrees rock. The backend storage format is just a detail. ZODB
doesn't have a native format, BTW -- you get the kind of storage you
explicitly ask for (there is no default), and I bet there are at least 10
flavors of storage by now. FileStorage is by far the most frequently used.
We should all be aware that BDB-backed ZODB is a pretty new thing, and isn't
yet used in production anywhere that I'm aware of. FileStorage has been
through the wringer at sites with enormous loads for years, so is easier to
trust -- and its pragmatics are much better understood too. Tuning BDB
appears to be a major undertaking even on a tuning-friendly platform like
Linux.
> Unless there's something I'm missing (licensing problems, deployment
> problems, portability problems...?)
ZODB is OSI-certified Open Source, like Python. You can even piss on it and
sell the result as art, if you want to <wink>.
> it could be that we should replace our current DBDictClassifier (which
> suffers from DBRunRecovery errors and isn't multiprocess-safe) with a
> ZODBClassifier using a BDB back end. From a position of complete
> ignorance, I'd hazard a guess that the implementation would end up a
> lot simpler than rewriting DBDictClassifier to use bsddb in full-on
> transactional mode - the hassles of doing that have already been
> sorted out in ZODB.
Having never written anything myself using bsddb3's "real" interface, I
can't say how hard that would be. I *expect* it would actually be easy for
someone with a non-trivial understanding of BDB. The only use we have for
BDB now is to use it as if it were a giant dict -- it probably doesn't get
any simpler than that.
> Am I in cloud cuckoo land?
Na, talk is cheap and always sane <wink>.
More information about the spambayes-dev
mailing list