set and dict iteration

Aaron Brady castironpi at gmail.com
Thu Aug 16 20:00:01 CEST 2012


Hello,

I observed an inconsistency in the behavior of 'set' and 'dict' iterators.  It is "by design" according to the docs.

'''
http://docs.python.org/dev/library/stdtypes.html#dict-views

iter(dictview).  Iterating views while adding or deleting entries in the dictionary may raise a RuntimeError or fail to iterate over all entries.
'''

The 'set' has the same behavior.  Iteration might also complete successfully.

The inconsistency is, if we remove an element from a set and add another during iteration, the new element might appear later in the iteration, and might not, depending on the hash code; therefore comparing the size of the set between iterations isn't adequate.  Example:

http://home.comcast.net/~castironpi-misc/clpy-0062%20set%20iterators.py  

'''
# py: { 'ver': '3' }

set0= set( ( 1, 2 ) )

iter0= iter( set0 )
print( next( iter0 ) )

set0.add( 3 )
set0.remove( 2 )
print( next( iter0 ) )


print( )

set0= set( ( 6, 7 ) )

iter0= iter( set0 )
print( next( iter0 ) )

set0.add( 8 )
set0.remove( 7 )
print( next( iter0 ) )
'''

Output:

'''
1
3

6
Traceback (most recent call last):
  File [...] line 22, in <module>
    print( next( iter0 ) )
StopIteration
'''

Iteration should behave the same regardless of the contents of the set.  Continuing iteration over sets and dicts after a modification isn't defined; it should unconditionally raise an error.

What's going on, is '8' is added before the position of the iterator due to hashing in the second part, but the size doesn't change, so the iterator reaches the end of the set after '7' is removed.

The inconsistency isn't easily solved.  One possibility is to use a timestamp or other serial index in the object and iterators, and compare them on every iteration to determine if a modification has occurred.

Another possibility which the author prefers, is to maintain a secondary collection of the iterators of an object, and invalidate them upon modification.  The applicable collection structure is a doubly-linked linked list, informally depicted:

http://home.comcast.net/~castironpi-misc/clpy-0062%20set%20iterators.png

Upon modification, the set traverses its iterators, setting an 'invalid' flag on each; and subsequent calls to any of them raise an 'IterationError'.  Adding and removing iterators to and from the secondary list is performed in O( 1 ) time with no penalty.

The above example depicted a 'Set'.  'Dicts' have the same anomaly, but the solution is ambiguous, since dict values can be changed meaningfully without altering the structure of the object.  In the author's opinion, the dict should not raise an 'IterationError' on value changes, only key changes like the set, but the argument isn't conclusive.



More information about the Python-list mailing list