
On Mon, Oct 27, 2003 at 11:25:16AM +0100, Alex Martelli wrote:
I still don't quite see how the lock ends up being "held", but, don't mind me -- the intricacy of mixins and wrappings and generators and delegations in those modules is making my head spin anyway, so it's definitely not surprising that I can't quite see what's going on.
BerkeleyDB internally always grabs a read lock (i believe at the page level; i don't think BerkeleyDB does record locking) for any database read when opened with DB_THREAD | DB_INIT_LOCK flags. I believe the problem is that a DBCursor object holds this lock as long as it is open/exists. Other reads can go on happily, but writes must to wait for the read lock to be released before they can proceed.
How do python dictionaries deal with modifications to the dictionary intermixed with iteration?
In general, Python doesn't deal well with modifications to any iterable in the course of a loop using an iterator on that iterable.
The one kind of "modification during the loop" that does work is:
for k in somedict: somedict[k] = ...whatever...
i.e. one can change the values corresponding to keys, but not change the set of keys in any way -- any changes to the set of keys can cause unending loops or other such misbehavior (not deadlocks nor crashes, though...).
However, on a real Python dict, k, v = thedict.iteritems().next() doesn't constitute "a loop" -- the iterator object returned by the iteritems call is dropped since there are no outstanding references to it right after this statement. So, following up with del thedict[k] is quite all right -- the dictionary isn't being "looped on" at that time.
What about the behaviour of multiple iterators for the same dict being used at once (either interleaved or by multiple threads; it shouldn't matter)? I expect that works fine in python. This is something the _DBWithCursor iteration interface does not currently support due to its use of a single DBCursor internally. _DBWithCursor is currently written such that the cursor is never closed once created. This leaves tons of potential for deadlock even in single threaded apps. Reworking _DBWithCursor into a _DBThatUsesCursorsSafely such that each iterator creates its own cursor in an internal pool and other non cursor methods that would write to the db destroy all cursors after saving their current() position so that the iterators can reopen+reposition them is a solution.
Given that in bsddb's case that iteritems() first [and only] next() boils down to a self.first() which in turn does a self.dbc.first() I _still_ don't see exactly what's holding the lock. But the simplest fix would appear to be in __delitem__, i.e., if we have a cursor we should delete through it:
def __delitem__(self, key): self._checkOpen() if self.dbc is not None: self.dbc.set(key) self.dbc.delete() else: del self.db[key]
...but this doesn't in fact remove the deadlock on the unit-test for popitem, which just confirms I don't really grasp what's going on, yet!-)
hmm. i would've expected your __delitem__ to work. Regardless, using the debugger I can stop the deadlock from occurring if i do "self.dbc.close(); self.dbc = None" just before popitem's "del self[k]" Greg