[Python-bugs-list] [ python-Bugs-445862 ] bsddb fails for larger amount of data

noreply@sourceforge.net noreply@sourceforge.net
Sat, 04 Aug 2001 19:59:19 -0700


Bugs item #445862, was opened at 2001-07-30 00:21
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=105470&aid=445862&group_id=5470

Category: Extension Modules
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Nobody/Anonymous (nobody)
Assigned to: Nobody/Anonymous (nobody)
Summary: bsddb fails for larger amount of data

Initial Comment:
The attached script fails after approx. 72500 insert 
operations. If you vary the size of the keys and/or 
the values, the bug occurs earlier or later, but even 
with a value size of 1 the bug will occur. Probably, 
this explains also bug #408271 ("crash in shelve 
module").

Platform: W2K



----------------------------------------------------------------------

>Comment By: Barry Warsaw (bwarsaw)
Date: 2001-08-04 19:59

Message:
Logged In: YES 
user_id=12800

if you can live with the licensing for sleepycat's db3, do 
yourself a huge favor and go to pybsddb.sf.net.  robin 
dunn's got a very excellent, stable, new python binding, 
which i would like to integrate into the standard distro 
for the py2.2 release.  it claims to support db1.85, 
although i've only tried it with a very recent v3.9.x.

----------------------------------------------------------------------

Comment By: Skip Montanaro (montanaro)
Date: 2001-08-04 19:42

Message:
Logged In: YES 
user_id=44345

   I don't know anything about the history, present, or 
   prospects for bsddb -- like, is there a more recent 
   unencumbered version we could use?

Ya got me.  I've been using lib db 2 for quite awhile.  They
recently released lib db 3 (again, with file format incom-
patibilities).  I don't know the details of their license.
It just comes with whatever version of Linux I happen to be
running.

Saw this on the Sleepycat website:

   The Berkeley DB 3.0 source code is available for download
   at no charge from Sleepycat Software's Web site, at
   www.sleepycat.com. It runs on all common versions of
   UNIX, and on Windows 95, Windows 98 and Windows NT.
   Berkeley DB is an Open Source product, and may be
   redistributed without charge in many circumstances.
   Licensing and pricing information are available from
   the company. 

My guess would be that you can distribute lib db 3 with 
the binary version of Python.  I am, as they say, "not
a lawyer", so YMMV.  For a definitive answer I think
you'll have to ask Sleepycat.

Skip


----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2001-08-04 17:02

Message:
Logged In: YES 
user_id=31435

Skip, I reran the test after changing the open line to

db = bsddb.btopen("test.dbm", "n")

I killed it by hand at this point:

Last i: 326577, last key:abcdef4387101.63608

because Win98SE gets mondo unstable when it starts 
thrashing madly to disk, and it became impossible to get 
any work done while this was running.

I don't know anything about the history, present, or 
prospects for bsddb -- like, is there a more recent 
unencumbered version we could use?  It looks like Sam's 
1.85 Windows port is over 5 years old.

----------------------------------------------------------------------

Comment By: Nobody/Anonymous (nobody)
Date: 2001-08-04 15:59

Message:
Logged In: NO 

According to www.sleepycat.com/historic.html, 
talking about bsd db:
"we recommend that you avoid the following operations when 
using versions 1.85 and 1.86: 

o Btree cursor (seq and put using a cursor) operations. 
o Large numbers of btree duplicates (specifically, avoid 
migrating duplicate keys to internal pages). 
o Large numbers of btree deletes (you should periodically 
dump and rebuild the database if you delete large numbers 
of records). 
o Overwriting or deleting overflow hash key/data pairs 
(pairs with items larger than the page size). 
o Intermixing hash cursor operations with deletes. "


My problem arises, I think, because I have been doing the 
fourth of these operations - i.e. overwriting long items in 
a hash. The problems others are experiencing perhaps have a 
similar cause, though the original problem summary 
says "even with a value size of 1 the bug will occur", so 
perhaps not.

I'm now using a workaround which involves writing several 
shorter items, each containing a slice of the data formerly 
held in the one long item. For keys I use my old key with a 
subscript number appended. It isn't nice, but it seems to 
be working.

Martin Gradwell.

----------------------------------------------------------------------

Comment By: Skip Montanaro (montanaro)
Date: 2001-08-04 08:12

Message:
Logged In: YES 
user_id=44345

Based upon the traceback Tim reported, my guess is that 
the exception is being raised near the end of bsddb_ass_sub.
Tim, can you give it a try changing anydbm.open to
bsddb.btopen?  As I recall, the significant bug(s) in libdb
were in the hash file implementation.  It's unfortunate
that anydbm has used the hash file all these years, but
it's a bit late to spring that change on unsuspecting
users now without going through a significant transition
period.

Skip


----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2001-08-03 14:40

Message:
Logged In: YES 
user_id=31435

Thanks for taking a look, Skip!  On Win98SE it dies for me 
like so:

...
70000
71000
72000
Last i: 72758, last key:abcdef1691515.8934
Traceback (most recent call last):
  File "ka.py", line 15, in ?
    db[key] = val
bsddb.error: (0, 'Error')

test.dbm is 37,778,944 bytes at the end.  I assume 
Anonymous has the same problem (if not, he/she should say 
so).

On Windows we use the ancient db.1.85.win32.zip, from 
the "bsd db" (not "bsddb"!) link at

http://www.nightmare.com/software.html

I doubt Sam has done any maintenance on that in years; and 
afraid I don't know anything else about this.

----------------------------------------------------------------------

Comment By: Skip Montanaro (montanaro)
Date: 2001-08-03 13:25

Message:
Logged In: YES 
user_id=44345

What version of libdb are you using?  I'm running your
script on Linux at the moment.  I had to change it slightly
because the only machine I have available with the spare 
cojones to run that script is running 1.5.2 (so I call
random.uniform instead of using a Random instance).  On that
machine I'm sort of ashamed to say I'm still running the
known buggy libdb 1.85.  So far I'm up to 680,000 keys with
a db file of over 166MB with no problem.  On my laptop
running 2.1 and libdb3 (and a much more modestly performing
disk drive) I gave up after about 287,000 keys.
I then changed the db open call to bsddb.btopen and watched
it march (slowly) up to 183,000 keys and a 32MB file on
disk before I killed it.  Aside from the grief it gives my
disk drives, I don't see anything particularly bad
happening.

You didn't include a traceback with your bug report.  What
was printed?  Perhaps it's something simple like running
out of disk space.  In any case, I think trying to create a
libdb database of 1,000,000 sort of random keys is going to
strain that package and most disk drives in any case, bugs
or no bugs.

My guess is that if there's a bug it's in libdb, not the
bsddb module.

----------------------------------------------------------------------

Comment By: Nobody/Anonymous (nobody)
Date: 2001-08-03 00:50

Message:
Logged In: NO 

Here it is:

import anydbm
import bsddb
import random

MAX = 1000000
r = random.Random(42)
r.seed(1017)
db = anydbm.open("test.dbm", "n")
#db = bsddb.hashopen("test.dbm", "n")
try:
    for i in xrange(0, MAX):
        if i % 1000 == 0: print i
        key = "abcdef" + str(r.uniform(0, 10 * MAX))
        val = "a" * 80 + str(i)
        db[key] = val
finally:
    db.close()
    print "Last i: %s, last key:%s" % (i,key) 

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2001-08-02 12:41

Message:
Logged In: YES 
user_id=31435

Alas, there's no script attached -- please attach one, so 
we have something concrete to investigate.

----------------------------------------------------------------------

Comment By: Nobody/Anonymous (nobody)
Date: 2001-08-02 03:08

Message:
Logged In: NO 

I was getting crashes in shelve module, Using NT4 (Python 
2.1 (#15, Apr 16 2001, 18:25:49) [MSC 32 bit (Intel)] on 
win32). I've changed my program to re-read previously 
written keys fairly frequently, and I get keyerrors for 
keys that have definitely been written, and that gave no 
error a little earlier in the same program. The program 
doesn't contain any delete statements.

The same program works when using dumbdbm instead of bsddb 
(but produces huge indexes), so there definitely appears to 
be a problem with bsddbm on windows NT.




----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=105470&aid=445862&group_id=5470