[Patches] [ python-Patches-553108 ] Deprecate bsddb

Tue, 02 Jul 2002 14:52:16 -0700

Patches item #553108, was opened at 2002-05-07 05:46
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=553108&group_id=5470

Category: Modules
Group: Python 2.3
>Status: Open
Resolution: Accepted
Priority: 5
Submitted By: Garth T Kidd (gtk)
Assigned to: Skip Montanaro (montanaro)
Summary: Deprecate bsddb

Initial Comment:
Large numbers of inserts break bsddb, as first 
discovered in Python 1.5 (bug 408271). 

According to Barry Warsaw, "trying to get the bsddb 
module that comes with Python to work is a hopeless 
cause." 

If it's broken, let's discourage people from using it. 
In particular, let's ensure that people importing 
shelve or anydbm don't end up using it by default. 

The submitted patch adds a DeprecationWarning to the 
bsddb module and removes bsddb from the list of db 
module candidates in anydbm. 

----------------------------------------------------------------------

>Comment By: Jack Jansen (jackjansen)
Date: 2002-07-02 23:52

Message:
Logged In: YES 
user_id=45365

Skip,
I'm reopening this bug report: the fix breaks builds on Mac OS X, and I haven't a clue as to how to fix this so I hope you can help. MacOSX has /usr/include/ndbm.h (implemented with Berkeley DB, I think) but it doesn't have any of the libraries (I assume everything needed is in libc).

Everything worked fine until last week, when configure still took care of defining HAVE_NDBM_H.

----------------------------------------------------------------------

Comment By: Skip Montanaro (montanaro)
Date: 2002-06-14 22:32

Message:
Logged In: YES 
user_id=44345

Implemented in
  setup.py 1.93
  README 1.147
  configure 1.315
  configure.in 1.325
  pyconfig.h.in 1.42
  Modules/dbmmodule 2.30

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-06-14 09:16

Message:
Logged In: YES 
user_id=21627

The patch looks good, please apply it.

----------------------------------------------------------------------

Comment By: Skip Montanaro (montanaro)
Date: 2002-06-14 05:33

Message:
Logged In: YES 
user_id=44345

a couple more tweaks... I forgot to include dbmmodule.c in 
previous patches.  This version of the patch also includes a 
modified README file that adds a section about building the 
bsddb and dbm modules.

----------------------------------------------------------------------

Comment By: Skip Montanaro (montanaro)
Date: 2002-06-13 09:35

Message:
Logged In: YES 
user_id=44345

Here's an updated patch.  It's different in a couple ways:

  * support for Berkeley DB 4.x was added.  You will need to
    configure iBerkdb with the 1.85 compatibility stuff.

  * I cleaned up the dbm build code a bit.

  * I added a diff for the configure file for people who don't
    have autoconf handy.

Skip

----------------------------------------------------------------------

Comment By: Skip Montanaro (montanaro)
Date: 2002-06-11 18:09

Message:
Logged In: YES 
user_id=44345

I think deprecating bsddb is too drastic.  In the first place, the problems
you refer to are in the underlying Berkeley DB library, not in the bsddb
code itself.  In the second place, later versions of the library fix the
problem.

The attached patch attempts to modify setup.py and configure.in to
solve the problem.  It does a couple things differently than the current
CVS version:

  1. It only searches for versions 2 and 3 of the Berkeley DB library by
   default.  People who know what they are doing can uncomment the
   information relevant to version 1.

  2. It moves all the checking code into setup.py.  The header file checks
  in configure.in were deleted.

  3. The ndbm lookalike stuff for the dbm module is done differently.  This
  has not really been tested yet.  I anticipate further changes will be
  necessary with this code.

I'm sure it's not perfect.  Please give it a try and let me know how it
works for you.

All that said, I think a better migration path is to replace the current
module with the bsddb3/pybsddb stuff.  I think that would effectively
restrict you to versions 3 or 4 of the underlying Berkeley DB library, so
it probably couldn't be done with impunity. 

Skip

----------------------------------------------------------------------

Comment By: Martin D Katz, Ph.D. (drbits)
Date: 2002-05-20 20:14

Message:
Logged In: YES 
user_id=276840

#!/bin/python
# Test for Python bug report 553108
# This program shows that bsddb seems to work reliably with
# the btopen database format.

# This is based on the test program
# in the discussion of bug report 445862
# This has been enhanced to perform read, modify,
# write operations in random order.

# This is only one of several tests I performed.
# This included 4,000,000 read, modify, write operations to 
90,909 records
# (an average of 44,000 writes for each record).
# Note: This program took approximately 50 hours to run
# on my 930MHz Pentium 3 under Windows 2000 with
# ActiveState Python version 2.1.1 build 212
import unittest, sys, os, math, time

LIMIT=4000000
DISPLAY_AT_END=1

USE_RANDOM=100  # If set, number of keys is approximately 
LIMIT/USE_RANDOM
AUTO_RANDOM=1
if USE_RANDOM and AUTO_RANDOM:
    USE_RANDOM=int(math.sqrt(math.sqrt(LIMIT)))
    if USE_RANDOM < 2:
        USE_RANDOM = 2
##  The format of the value string is
##      count|hash|hash...|b
##  Where
##      count is an 8 byte hexadecimal count of the number 
of times
##          this record has been written.
##      hash is the md5 hash of the random value that 
created this record.
##          It is the key for this record. It is appended 
once for each
##          time the record is written (that is, it occurs 
count times).
##      b is 129 '!'
## if USE_RANDOM is set, its value should be >= 2

class BreakDB(unittest.TestCase):
    def runTest(self):
        import md5, bsddb, os
        if USE_RANDOM:
            import random
            random.seed()
            max_key=int(LIMIT / USE_RANDOM)
        m = md5.new()
        b = "!" * 129       # small string to write
        db = bsddb.btopen(self.dbname, 'c')
        try:
            self.db = db
            for count in xrange(1, LIMIT+1):
                if count % 100==0:
                    print >> sys.stderr, " %10d\r" % 
(count),
                if USE_RANDOM:
                    r = random.randrange(0, max_key)
                    m = md5.new(str(r))
                    key = m.hexdigest()
                    if db.has_key(key):
                        rec = db[key]
                        old_count = int(rec[0:8], 16)
                        should_be = '%08X|%s%s'% (old_count,
                                                  ((key+'|')
*old_count), b)
                        if rec != should_be:
                            self.fail("Mismatched data: db
["+repr(key)+"]="+
                                repr(db[key])+". Should 
be "+repr(should_be))
                            return 1
                    else: # New record
                        rec = '00000000|'+b
                        old_count = 0
                    new_count = old_count+1
                    new_rec = '%08X|%s%s'% (new_count, key, 
rec[8:], )
                    db[key] = new_rec
                else:
                    m.update(str(count))
                    db[m.digest()] = b
            try:
                db.sync()
            except:
                pass
            if DISPLAY_AT_END:
                rec = db.first()
                count = 0
                while 1:
                    print >> sys.stderr, "  count = %6i db[%
s]=%s" % (
                        count, rec[0], rec[1], )
                    count += 1
                    try:
                        rec = db.next()
                    except KeyError:
                        break
        finally:
            db.close()

    def unlinkDB(self):
        import os
        if os.path.exists(self.dbname):
            os.unlink(self.dbname)

    def setUp(self):
        self.dbname = 'test.db'
        self.unlinkDB()

    def tearDown(self):
        self.db.close()
        self.unlinkDB()

if __name__ == '__main__':
    runner = unittest.TextTestRunner()
    runner.run(unittest.TestSuite([BreakDB()]))

----------------------------------------------------------------------

Comment By: Martin D Katz, Ph.D. (drbits)
Date: 2002-05-17 01:10

Message:
Logged In: YES 
user_id=276840

I am not sure there is a reason to deprecate bsddb. The 
btopen format appears to be stable enough for normal work. 
Maybe 2.3 should change dbhash to use btopen?

----------------------------------------------------------------------

Comment By: Garth T Kidd (gtk)
Date: 2002-05-09 05:12

Message:
Logged In: YES 
user_id=59803

Let's not turn a simple patch into something requiring a 
PEP, compulsory thrashing on comp.lang.python, SleepyCat 
being willing to change their distribution model, lawyers 
(to make sure the licences are compatible), and so on. 

I'd hate it if other people spent the kind of time I did 
trying to get shelve to work only to find that a known-
broken bsddb was causing all the problems, and that a patch 
was there to gently guide them to gdbm, but it got jammed 
because of scope-creep. 

Let's get this one, very simple and necessary (bsddb IS 
broken) change out of the way, and THEN start negotiating, 
thrashing, and integrating. :) 

I firmly believe bsddb3 should be one of the included 
batteries. Let's do it, but let's guide people away from 
broken code first. 

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-05-08 11:01

Message:
Logged In: YES 
user_id=21627

I'm in favour of this change, but I'd like simultaneously
incorporate bsddb3.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=553108&group_id=5470