[spambayes-dev] A new and altogether different bsddb breakage
Tim Peters
tim.one at comcast.net
Wed Dec 17 17:16:44 EST 2003
[Richie Hindle]
> This is probably a hopelessly naive question, but can I have the best
> of both worlds? If I use ZODB with a BerkeleyDB back end, will that
> be process- and thread-safe (without using ZEO)?
My understanding is that, regardless of back end, ZODB is thread-safe among
the threads in a single process, but that you cannot open a connection to a
ZODB database from more than one process simultaneously without using ZEO.
Don't consider ZEO to be such a big deal, though: code using ZEO looks
exactly the same as code not using ZEO, except for the lines that initially
open the database. Where a direct use of ZODB may open a FileStorage, for
example, the same code wishing to use ZEO would open a ClientStorage
instead, and that's it. Once you are using ZEO, you get distributed access
for free (you can connect to the ZEO server via an arbitrary <hostname,
port> pair, so can access a ZODB database living anywhere your network can
reach).
Note that Jeremy already wrote code to run spambayes via ZEO, in the
project's pspam/ directory. I don't know how much bitrot that's suffered.
Note too that in addition to getting the best of both worlds, you may also
get the worst of both worlds. For example, if BDB really does suffer
corruption problems, then it would be something of a miracle if ZODB-on-BDB
were somehow immune.
Also note that the full ZODB back ends (like FileStorage and Berkeley)
support unlimited undo, so the physical database keeps every revision ever
made to every object. So they need 'pack' steps from time to time to
announce that you promise never to care about revisions before a time you
specify to pack, so that the physical database can reclaim their space.
Finally, note that any form of concurrent modification can end up creating
inconsistent data. ZODB solves this by raising ConflictError whenever
inconsistency is possible, and the app has to be prepared to catch that (the
usual response then is to try the transaction again, and on the second
attempt it will *start* with the data successfully committed by the other
transaction(s) involved in the conflict). That could be a real problem if
many threads or processes keep modifying the same info simultaneously (like
the counts attached to, say, "the").
More information about the spambayes-dev
mailing list