[Python-Dev] Re: test_bsddb blocks testing popitem - reason

Mon Oct 27 04:40:45 EST 2003

> > It is unfortuantely entirely possible that various berkeleydb libraries
> > have bugs.  Since the BerkeleyDB db->del() call isn't returning it is
> > presumably stuck in a lock waiting for who knows what.
> 
> Right.  But the SAME berkeley db library is being used for my build of
> both Python 2.4 alpha 0, and 2.3 maintenance branch, both from cvs,
> and I can't see any difference in what they're doing with bsddb -- so
> clearly I must be missing something because it's hanging on EVERY
> attempt to run the unittest w/2.4, but never w/2.3.

The big difference i see between 2.3cvs and 2.4cvs that could "explain"
it is that Lib/bsddb/__init__.py has been updated to use a private
(in memory, single process only) DBEnv with locking and thread support
enabled.  That explains why db->del() would be doing locking.  But not
why it would deadlock.

This is also easily reproducable here.  No special platform or berkeleydb
version should be required.

Looking closer I suspect what is happening is that Lib/bsddb/__init__.py
implementation is not threadsafe.  It wants to maintain the current
iterator location using a DBCursor object.  However, having an active
DBCursor holds a lock in the database.  DictMixin's popitem() is
effectively:

    k, v = self.iteritems().next()
    del self[k]
    return (k, v)

The iteritems() call creates an internal DBCursor object for the iterator.
The next() call on the iterator (DBCursor) looks up the value for k.
The following delete attempts to delete the record without using the
DBCursor; thus the deadlock.

If we implement our own popitem() for the bsddb dictionary object
(_DBWithCursor) to perform the delete using the cursor this deadlock in
the unit tests would go away.  That won't stop users from intermixing
iteration over a database with modifications to the database; causing
their own deadlocks (very unexpected in single threaded code).

Proposed fix: It should be possible for the bsddb object to maintain
internal state of its own about what key is is on and close any
internal DB cursor on all non-cursor database accesses leaving the
iteration functions to detect this and reopen and reposition the cursor.
Since the basic bsddb interface doesn't allow databases with duplicate
keys it shouldn't be too difficult.

Its not efficient but a user who cares about efficient use of berkeleydb
should use the real DB/DBEnv interface directly.

How do python dictionaries deal with modifications to the dictionary
intermixed with iteration?

Greg