[Spambayes] The database question that would not die

Skip Montanaro skip at pobox.com
Tue Dec 3 18:10:28 2002

    richie> Are there any platforms on which, when you ask anydbm to create
    richie> a database, it uses version 1.85 of the underlying Berkeley DB
    richie> library to do that?

Yes, unfortunately the Python Windows installer is distributed with Berkeley
DB 1.85.  On other platforms it's a hit-or-miss proposition.  I don't
believe any Linux vendors ship with db1 as the default anymore, but I could
easily be disabused of that notion.  I don't know about the commercial Unix

Has anyone considered Sleepycat's caveats about using 1.85?  The relevant
page is here:


The q/a about 1.85 is:

    Are there known problems with the 1.85 and 1.86 versions? 

    Yes. Specifically, we recommend that you avoid the following operations
    when using versions 1.85 and 1.86:

    * Btree cursor (seq and put using a cursor) operations. 
    * Large numbers of btree duplicates (specifically, avoid migrating
      duplicate keys to internal pages).
    * Large numbers of btree deletes (you should periodically dump and
      rebuild the database if you delete large numbers of records).
    * Overwriting or deleting overflow hash key/data pairs (pairs with items
      larger than the page size).
    * Intermixing hash cursor operations with deletes. 

    In addition: 

    * As there was no locking support in version 1.85, you cannot perform
      concurrent read/write operations in the database.
    * As there was no logging or transaction support in version 1.85, you
      must re-create your database whenever abnormal application termination
      occurs (e.g., either the application or the system crashes) as the
      database may have been left in a corrupted state.

    Finally, you should not upgrade your GNU gcc or Solaris
    compiler. Optimizations in versions of gcc 2 that were in alpha test in
    the summer of 1997, and a version of the standard Solaris WorkShop
    Compiler that was in beta test in the fall of 1997, trigger bugs in
    versions 1.85 and 1.86 that will cause sporadic core dumps.

It seems to me the most important issues for us are the last two bullets in
the first section and the last bullet in the second section.  How close can
we come to avoiding them?  I don't think we should have any overflow has
key/data pairs.  The largest item in my current hammie.db file is only 108
bytes.  Does the code do things like

    foo = db.next()
    if someprop(foo):
        del db[foo[0]]

?  If not that may not be a problem either.  The "abnormal termination" bit
bothers me some, based on historical prejudices about Windows'
(in)stability.  I imagine others can speak to that.


More information about the Spambayes mailing list