OrderedDict.values() behavior for modified instance
Hello, The documentation says the following about modifying a dict while iterating through its view: | Iterating views while adding or deleting entries in the dictionary may | raise a RuntimeError or fail to iterate over all entries. (http://docs.python.org/3/library/stdtypes.html#dict-views) The OrderedDict documentation doesn't have anything to say on the subject. In practice, its implementation actually mostly correponds to naive expectations:
from collections import OrderedDict d = OrderedDict() for i in range(5): ... d[i] = i ... i = iter(d.values()) next(i) 0 del d[0] next(i) 1 del d[2] next(i) 3 d.move_to_end(1) next(i) 4 next(i) 1
etc. However, there is one case that results in a rather confusing error:
a = OrderedDict() a[0] = 0 a[1] = 1 a[2] = 2 i = iter(a.values()) next(i) 0 del a[0] del a[1] next(i) Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/usr/lib/python3.3/collections/abc.py", line 495, in __iter__ yield self._mapping[key] KeyError: 1
What happens here is that a[0] is returned from the linked list, but it still contains links to a[1]. When a[1] is deleted, the links of its predecessor and successor are updated, but a[0] still points to a[1]. The OrderedList then hands a non-existing key to the values() implementation in collections.abc. The result is not only mightily confusing, but it is also not covered by the documentation (KeyError isn't a RuntimeError). I would very much like to see this fixed, but I'm not sure if there's a good way to do that. I see the following options: (a) When deleting an element from an OrderedList, update the pointers in the corresponding linked list element to the end of the list. Then removing the currently "active" element during the iteration would automatically end the iteration. For that, we'd just have to add two lines to __delitem__: def __delitem__(self, key, dict_delitem=dict.__delitem__): dict_delitem(self, key) link = self.__map.pop(key) link_prev = link.prev link_next = link.next link_prev.next = link_next link_next.prev = link_prev link.prev = root # new link.next = root # new The new behavior is explicitly covered by the documentation (changing the dict may result in incomplete iteration), but it's a change to what happened before. (b) When iterating through an OrderedDict, check that the next element is actually still in the hash, i.e. change def __iter__(self): root = self.__root curr = root.next while curr is not root: yield curr.key curr = curr.next to def __iter__(self): root = self.__root curr = root.next while curr is not root and curr.key in self: yield curr.key curr = curr.next that would terminate the iteration only in the special case, but incur an extra dict lookup at every iteration. Alternatively, one could try very hard to not stop the iteration: while curr is not root: yield curr.key while curr is not root: curr = curr.next if curr.key in self: break (c) Catch the KeyError in values(), and re-raise the proper exception in class ValuesView: def __iter__(self): for key in self._mapping: try: yield self._mapping[key] except KeyError: raise RuntimeError("underlying Mapping instance modified during iteration") I consider this a bit ugly, because it may mask other problems in a custom Mapping implementation (that may raise a KeyError because of a bug in the Mapping implementation itself). (d) Extend the OrderedDict documentation to explicity document this case. This has the drawback that it would probably be nicer to just have the behavior be consistent and correspond to intuitive expectations. Would any of those fixes be acceptable? Or is there an even better option? Best, Nikolaus -- Encrypted emails preferred. PGP fingerprint: 5B93 61F8 4EA2 E279 ABF6 02CF A9AD B7F8 AE4E 425C »Time flies like an arrow, fruit flies like a Banana.«
Nikolaus, Good write-up. Please submit it to the bug tracker: http://bugs.python.org -- ~Ethan~
Ethan Furman
Nikolaus,
Good write-up. Please submit it to the bug tracker: http://bugs.python.org
Submitted as http://bugs.python.org/issue19414. If someone gives me the go-ahead for one of the proposed solutions, I'd be happy to create a full patch. Best, Nikolaus -- Encrypted emails preferred. PGP fingerprint: 5B93 61F8 4EA2 E279 ABF6 02CF A9AD B7F8 AE4E 425C »Time flies like an arrow, fruit flies like a Banana.«
On 10/26/2013 08:32 PM, Nikolaus Rath wrote:
Ethan Furman
writes: Good write-up. Please submit it to the bug tracker: http://bugs.python.org
Submitted as http://bugs.python.org/issue19414.
If someone gives me the go-ahead for one of the proposed solutions, I'd be happy to create a full patch.
That would be Raymond's call. Give him a few days to respond. -- ~Ethan~
participants (2)
-
Ethan Furman
-
Nikolaus Rath