[Python-Dev] OrderedDict.values() behavior for modified instance

Nikolaus Rath Nikolaus at rath.org
Sat Oct 26 05:22:53 CEST 2013


Hello,

The documentation says the following about modifying a dict while
iterating through its view:

| Iterating views while adding or deleting entries in the dictionary may
| raise a RuntimeError or fail to iterate over all entries.
(http://docs.python.org/3/library/stdtypes.html#dict-views)

The OrderedDict documentation doesn't have anything to say on the
subject. In practice, its implementation actually mostly correponds to
naive expectations:

>>> from collections import OrderedDict
>>> d = OrderedDict()
>>> for i in range(5):
...    d[i] = i
... 
>>> i = iter(d.values())
>>> next(i)
0
>>> del d[0]
>>> next(i)
1
>>> del d[2]
>>> next(i)
3
>>> d.move_to_end(1)
>>> next(i)
4
>>> next(i)
1
>>> 

etc.

However, there is one case that results in a rather confusing error:

>>> a = OrderedDict()
>>> a[0] = 0
>>> a[1] = 1
>>> a[2] = 2
>>> i = iter(a.values())
>>> next(i)
0
>>> del a[0]
>>> del a[1]
>>> next(i)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python3.3/collections/abc.py", line 495, in __iter__
    yield self._mapping[key]
KeyError: 1


What happens here is that a[0] is returned from the linked list, but it
still contains links to a[1]. When a[1] is deleted, the links of its
predecessor and successor are updated, but a[0] still points to
a[1]. The OrderedList then hands a non-existing key to the values()
implementation in collections.abc.

The result is not only mightily confusing, but it is also not covered by
the documentation (KeyError isn't a RuntimeError).


I would very much like to see this fixed, but I'm not sure if there's a
good way to do that.

I see the following options:

(a) When deleting an element from an OrderedList, update the pointers in
    the corresponding linked list element to the end of the list. Then
    removing the currently "active" element during the iteration would
    automatically end the iteration.

    For that, we'd just have to add two lines to __delitem__:

    def __delitem__(self, key, dict_delitem=dict.__delitem__):
        dict_delitem(self, key)
        link = self.__map.pop(key)
        link_prev = link.prev
        link_next = link.next
        link_prev.next = link_next
        link_next.prev = link_prev

        link.prev = root # new 
        link.next = root # new
    

    The new behavior is explicitly covered by the documentation
    (changing the dict may result in incomplete iteration), but it's a
    change to what happened before.

(b) When iterating through an OrderedDict, check that the next element
    is actually still in the hash, i.e. change

    def __iter__(self):
        root = self.__root
        curr = root.next
        while curr is not root:
            yield curr.key
            curr = curr.next
    to

    def __iter__(self):
        root = self.__root
        curr = root.next
        while curr is not root and curr.key in self:
            yield curr.key
            curr = curr.next

    that would terminate the iteration only in the special case, but
    incur an extra dict lookup at every iteration.

    Alternatively, one could try very hard to not stop the iteration:
    
        while curr is not root:
            yield curr.key
            while curr is not root:
                curr = curr.next
                if curr.key in self:
                    break

(c) Catch the KeyError in values(), and re-raise the proper exception in
    class ValuesView:

    def __iter__(self):
        for key in self._mapping:
            try:
                yield self._mapping[key]
            except KeyError:
                raise RuntimeError("underlying Mapping instance modified during iteration")


    I consider this a bit ugly, because it may mask other problems in a
    custom Mapping implementation (that may raise a KeyError because of
    a bug in the Mapping implementation itself).
    
(d) Extend the OrderedDict documentation to explicity document this
    case.

    This has the drawback that it would probably be nicer to just have
    the behavior be consistent and correspond to intuitive expectations.

    
Would any of those fixes be acceptable? Or is there an even better option?    


Best,
Nikolaus


-- 
Encrypted emails preferred.
PGP fingerprint: 5B93 61F8 4EA2 E279 ABF6  02CF A9AD B7F8 AE4E 425C

             »Time flies like an arrow, fruit flies like a Banana.«


More information about the Python-Dev mailing list