[Spambayes] The database question that would not die

Skip Montanaro skip@pobox.com
Mon Dec 2 13:03:52 2002


    Richie>  o There may be platforms on which anydbm defaults to bsddb
    Richie>    1.85, but for which installing bsddb3 is a pain.  Any takers?

I think there are some misunderstandings still out there about the various
incarnations of the bsddb module and the underlying Berkeley DB code.  Even
if everyone understands what's what, the language I see used suggests they
might not.  Let me try and make sure every has a similar grasp of the issues
and terminology.

There has been a bsddb or dbhash module in Python for quite awhile (five
years at least).  It requires the Berkeley DB library, originally available
from UC Berkeley, but now from Sleepycat (whose founders where grad students
at Berkeley when they wrote the earliest versions).  The original bsddb
module was originally written against Berkeley DB 1.85.  That version
created two interfaces, a C API (the version 1.85 API), and a file format
(the version 1.85 file format.  If you ask file(1) about a file created with
it the version numbers will likely differ.  File format versions and library
release versions have no obvious correspondence to the untrained observer.

There were various bugs in the code in db 1.85.  To correct (some of) those
bugs, file format changes were necessary.  This originally happened in
version 1.86, which, unfortunately, was never widely adopted (licensing
issues?).  The C API didn't change.

When version 2.x of Berkeley DB was released (I think by Sleepycat shortly
after its founding), they changed the file formats again and added a new C
API.  The old version 1.85 C API was still available (and still is even in
the most recent versions).  This API is what the original bsddb module was
written against.

When version 3.x of Berkeley DB was released, Sleepycat added another C API
(or at least extended the version 2 API significantly).  Pybsddb (aka
bsddb3, aka the current bsddb module in CVS) was written against this richer
API.  This API remained current through version 4.0.x of Sleepycat's
offerings.  Unfortunately, in version 4.1.x, they changed some aspects of it
which cause problems for Pybsddb.  Consequently, you can't build Pybsddb
against the 4.1.x library.

So, here's a summary of what works with what:

    The historic bsddb module (bsddb185 in CVS now) works with any version
    of the Berkeley DB library as long as the 1.85 C API is enabled.  If you
    use it with version 1.85 of the library you may experience data
    corruption problems because of bugs in the code and file structure (not
    the 1.85 API).  You can use it safely with later versions of the library
    as the 1.85 API was enabled during configuration.

    The current bsddb module (Pybsddb, bsddb3, bsddb in CVS) works with
    versions 3.x and 4.0.x of the Berkeley DB library.

Skip



More information about the Spambayes mailing list