[Patches] [ python-Patches-553108 ] Deprecate bsddb
noreply@sourceforge.net
noreply@sourceforge.net
Tue, 02 Jul 2002 15:17:37 -0700
Patches item #553108, was opened at 2002-05-06 22:46
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=553108&group_id=5470
Category: Modules
Group: Python 2.3
Status: Open
Resolution: Accepted
Priority: 5
Submitted By: Garth T Kidd (gtk)
Assigned to: Skip Montanaro (montanaro)
Summary: Deprecate bsddb
Initial Comment:
Large numbers of inserts break bsddb, as first
discovered in Python 1.5 (bug 408271).
According to Barry Warsaw, "trying to get the bsddb
module that comes with Python to work is a hopeless
cause."
If it's broken, let's discourage people from using it.
In particular, let's ensure that people importing
shelve or anydbm don't end up using it by default.
The submitted patch adds a DeprecationWarning to the
bsddb module and removes bsddb from the list of db
module candidates in anydbm.
----------------------------------------------------------------------
>Comment By: Skip Montanaro (montanaro)
Date: 2002-07-02 17:17
Message:
Logged In: YES
user_id=44345
Jack,
Sorry to here you're having trouble. Alas, my MacOS X system is with
my wife at the moment, so I can't dig into the problem much. Can you
provide me with some background info? If you can send me your copy
of ndbm.h (I doubt it's using Berkeley DB) and figure out which library
dbm_open resides in, that would be great. Also, can you provide me
with the output of the build process so I can see just what errors are
being generated?
Skip
----------------------------------------------------------------------
Comment By: Jack Jansen (jackjansen)
Date: 2002-07-02 16:52
Message:
Logged In: YES
user_id=45365
Skip,
I'm reopening this bug report: the fix breaks builds on Mac OS X, and I haven't a clue as to how to fix this so I hope you can help. MacOSX has /usr/include/ndbm.h (implemented with Berkeley DB, I think) but it doesn't have any of the libraries (I assume everything needed is in libc).
Everything worked fine until last week, when configure still took care of defining HAVE_NDBM_H.
----------------------------------------------------------------------
Comment By: Skip Montanaro (montanaro)
Date: 2002-06-14 15:32
Message:
Logged In: YES
user_id=44345
Implemented in
setup.py 1.93
README 1.147
configure 1.315
configure.in 1.325
pyconfig.h.in 1.42
Modules/dbmmodule 2.30
----------------------------------------------------------------------
Comment By: Martin v. Löwis (loewis)
Date: 2002-06-14 02:16
Message:
Logged In: YES
user_id=21627
The patch looks good, please apply it.
----------------------------------------------------------------------
Comment By: Skip Montanaro (montanaro)
Date: 2002-06-13 22:33
Message:
Logged In: YES
user_id=44345
a couple more tweaks... I forgot to include dbmmodule.c in
previous patches. This version of the patch also includes a
modified README file that adds a section about building the
bsddb and dbm modules.
----------------------------------------------------------------------
Comment By: Skip Montanaro (montanaro)
Date: 2002-06-13 02:35
Message:
Logged In: YES
user_id=44345
Here's an updated patch. It's different in a couple ways:
* support for Berkeley DB 4.x was added. You will need to
configure iBerkdb with the 1.85 compatibility stuff.
* I cleaned up the dbm build code a bit.
* I added a diff for the configure file for people who don't
have autoconf handy.
Skip
----------------------------------------------------------------------
Comment By: Skip Montanaro (montanaro)
Date: 2002-06-11 11:09
Message:
Logged In: YES
user_id=44345
I think deprecating bsddb is too drastic. In the first place, the problems
you refer to are in the underlying Berkeley DB library, not in the bsddb
code itself. In the second place, later versions of the library fix the
problem.
The attached patch attempts to modify setup.py and configure.in to
solve the problem. It does a couple things differently than the current
CVS version:
1. It only searches for versions 2 and 3 of the Berkeley DB library by
default. People who know what they are doing can uncomment the
information relevant to version 1.
2. It moves all the checking code into setup.py. The header file checks
in configure.in were deleted.
3. The ndbm lookalike stuff for the dbm module is done differently. This
has not really been tested yet. I anticipate further changes will be
necessary with this code.
I'm sure it's not perfect. Please give it a try and let me know how it
works for you.
All that said, I think a better migration path is to replace the current
module with the bsddb3/pybsddb stuff. I think that would effectively
restrict you to versions 3 or 4 of the underlying Berkeley DB library, so
it probably couldn't be done with impunity.
Skip
----------------------------------------------------------------------
Comment By: Martin D Katz, Ph.D. (drbits)
Date: 2002-05-20 13:14
Message:
Logged In: YES
user_id=276840
#!/bin/python
# Test for Python bug report 553108
# This program shows that bsddb seems to work reliably with
# the btopen database format.
# This is based on the test program
# in the discussion of bug report 445862
# This has been enhanced to perform read, modify,
# write operations in random order.
# This is only one of several tests I performed.
# This included 4,000,000 read, modify, write operations to
90,909 records
# (an average of 44,000 writes for each record).
# Note: This program took approximately 50 hours to run
# on my 930MHz Pentium 3 under Windows 2000 with
# ActiveState Python version 2.1.1 build 212
import unittest, sys, os, math, time
LIMIT=4000000
DISPLAY_AT_END=1
USE_RANDOM=100 # If set, number of keys is approximately
LIMIT/USE_RANDOM
AUTO_RANDOM=1
if USE_RANDOM and AUTO_RANDOM:
USE_RANDOM=int(math.sqrt(math.sqrt(LIMIT)))
if USE_RANDOM < 2:
USE_RANDOM = 2
## The format of the value string is
## count|hash|hash...|b
## Where
## count is an 8 byte hexadecimal count of the number
of times
## this record has been written.
## hash is the md5 hash of the random value that
created this record.
## It is the key for this record. It is appended
once for each
## time the record is written (that is, it occurs
count times).
## b is 129 '!'
## if USE_RANDOM is set, its value should be >= 2
class BreakDB(unittest.TestCase):
def runTest(self):
import md5, bsddb, os
if USE_RANDOM:
import random
random.seed()
max_key=int(LIMIT / USE_RANDOM)
m = md5.new()
b = "!" * 129 # small string to write
db = bsddb.btopen(self.dbname, 'c')
try:
self.db = db
for count in xrange(1, LIMIT+1):
if count % 100==0:
print >> sys.stderr, " %10d\r" %
(count),
if USE_RANDOM:
r = random.randrange(0, max_key)
m = md5.new(str(r))
key = m.hexdigest()
if db.has_key(key):
rec = db[key]
old_count = int(rec[0:8], 16)
should_be = '%08X|%s%s'% (old_count,
((key+'|')
*old_count), b)
if rec != should_be:
self.fail("Mismatched data: db
["+repr(key)+"]="+
repr(db[key])+". Should
be "+repr(should_be))
return 1
else: # New record
rec = '00000000|'+b
old_count = 0
new_count = old_count+1
new_rec = '%08X|%s%s'% (new_count, key,
rec[8:], )
db[key] = new_rec
else:
m.update(str(count))
db[m.digest()] = b
try:
db.sync()
except:
pass
if DISPLAY_AT_END:
rec = db.first()
count = 0
while 1:
print >> sys.stderr, " count = %6i db[%
s]=%s" % (
count, rec[0], rec[1], )
count += 1
try:
rec = db.next()
except KeyError:
break
finally:
db.close()
def unlinkDB(self):
import os
if os.path.exists(self.dbname):
os.unlink(self.dbname)
def setUp(self):
self.dbname = 'test.db'
self.unlinkDB()
def tearDown(self):
self.db.close()
self.unlinkDB()
if __name__ == '__main__':
runner = unittest.TextTestRunner()
runner.run(unittest.TestSuite([BreakDB()]))
----------------------------------------------------------------------
Comment By: Martin D Katz, Ph.D. (drbits)
Date: 2002-05-16 18:10
Message:
Logged In: YES
user_id=276840
I am not sure there is a reason to deprecate bsddb. The
btopen format appears to be stable enough for normal work.
Maybe 2.3 should change dbhash to use btopen?
----------------------------------------------------------------------
Comment By: Garth T Kidd (gtk)
Date: 2002-05-08 22:12
Message:
Logged In: YES
user_id=59803
Let's not turn a simple patch into something requiring a
PEP, compulsory thrashing on comp.lang.python, SleepyCat
being willing to change their distribution model, lawyers
(to make sure the licences are compatible), and so on.
I'd hate it if other people spent the kind of time I did
trying to get shelve to work only to find that a known-
broken bsddb was causing all the problems, and that a patch
was there to gently guide them to gdbm, but it got jammed
because of scope-creep.
Let's get this one, very simple and necessary (bsddb IS
broken) change out of the way, and THEN start negotiating,
thrashing, and integrating. :)
I firmly believe bsddb3 should be one of the included
batteries. Let's do it, but let's guide people away from
broken code first.
----------------------------------------------------------------------
Comment By: Martin v. Löwis (loewis)
Date: 2002-05-08 04:01
Message:
Logged In: YES
user_id=21627
I'm in favour of this change, but I'd like simultaneously
incorporate bsddb3.
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=553108&group_id=5470