Invalidate list iterators after size changes?

Right now, the following code is valid in Python 3.9 (and infinitely loops): ``` lst = [1, 2, 3] for i in lst: lst.append(i) ``` IMO this is a source of bugs, and even though this behavior is technically documented (https://bugs.python.org/issue32767), I don't know if it should belong in the core language. For example, you cannot change the size of a set or dict during iteration; you will get a RuntimeError. Allegedly this is due to the underlying implementation of the dict/set iterators (https://bugs.python.org/issue22084) but I think it would be beneficial this was made part of the public API for list, set, and dict iterators. Concretely, I'm proposing the following set of changes to Python: 1. If the size of a list, set, or dict changes, invalidate all existing iterators to that containers. Document this behavior as part of the public API. 2. Attempting to call next() on an invalidated iterator raises a RuntimeError. 3. Add a key_iter function to itertools (name up for debate), which replicates the existing functionality for lists. It could also make a copy of the keys of a set/dict, so you could use it to mutate those as well during iteration (to allow users to write functions that mutate an arbitrary duck-typed collection) 4. Gate (1, 2) for lists behind a __future__ flag with a long migration window (e.g. ~3.16). Benefits: 1. Reduce a source of bugs (deleting while iterating forward through a list often means that your logic skips elements; appending can lead to infinite loops or unexpected behavior). 2. Make the iterator invalidation consistent between lists, sets, and dicts. 3. More consistent behavior with collections in other languages (C++, Java). Drawbacks: 1. Migration costs. But mutating during iteration should be relatively rare, given that most people discourage mutating while iterating (https://google.github.io/styleguide/pyguide.html#284-decision, https://stackoverflow.com/a/1637875/1625484) __future__ feature flag plus deprecation warnings should go along way to help people migrate their code. We would also offer an easy in-place fix (key_iter). 2. You can't write algorithms that append to a list (e.g. BFS) with the normal list iterator. Again, you could use key_iter (or use a deque with block size == 1; see Open Questions). Open Questions Is (3) necessary? It's not hard to write an implementation yourself. What should be done about deque for block sizes > 1? If this change gets made, should we add way to create a deque with block size == 1 whose iterators are never invalidated?

09.10.21 01:24, Tyler Hou via Python-ideas пише:
I see two drawbacks: 1. It is a breaking change. I do not think there is much code which modify a list during iteration, but still. 2. It adds an overhead to iteration and mutating operations. And this is more severe drawback, because it affects a much of code. And the only argument for this change is consistency with set and dict. It is very weak argument, because list differs from set and dict in so many ways. Iteration of lists after mutating is deterministic and well predictable. This is why it can be used in code and no RuntimeError is raised. In contrary, iteration of sets and dicts after mutating is unpredicable. You can miss some items, get some items multiple times, and it depends on the history of set and dict, types and values of keys, and just be random. We try to detect this and raise RuntimeError because there is no way to use it correctly.

Deleting is quite error prone, but appending during iteration is very useful in implementing an iterative form of recursive algorithms. It effectively gives a FIFO queue which remembers all items after you're done. In general the behaviour of a list iterator can be understood completely if you explain that it simply remembers the current index.

On 9/10/21 11:24 am, Tyler Hou via Python-ideas wrote:
This would break Plex. I use a loop like that as part of the NFA-to-DFA conversion to calculate the epsilon-closure of a state (although obviously there are conditions on adding things to the list that ensure it terminates). -- Greg

Is this behavior of list iteration guaranteed by documentation anywhere? A quick search didn't yield any results for me. If not, I suggest either: a) it gets documented so developers can rely on it, or b) the lack of guarantee is documented so that we can reserve a change of behavior later. Paul On Sun, 2021-10-10 at 13:17 +1300, Greg Ewing wrote:

I've known and relied on the behavior since Python 1.4. I mean, not often, but it's been consistent. In the case where I want to loop over what the loop was at start, rather than what it might become, a slice comes in handy: for item in mylist[:]: # do stuff, maybe mutate list On Sat, Oct 9, 2021, 9:37 PM Paul Bryan <pbryan@anode.ca> wrote:

09.10.21 01:24, Tyler Hou via Python-ideas пише:
I see two drawbacks: 1. It is a breaking change. I do not think there is much code which modify a list during iteration, but still. 2. It adds an overhead to iteration and mutating operations. And this is more severe drawback, because it affects a much of code. And the only argument for this change is consistency with set and dict. It is very weak argument, because list differs from set and dict in so many ways. Iteration of lists after mutating is deterministic and well predictable. This is why it can be used in code and no RuntimeError is raised. In contrary, iteration of sets and dicts after mutating is unpredicable. You can miss some items, get some items multiple times, and it depends on the history of set and dict, types and values of keys, and just be random. We try to detect this and raise RuntimeError because there is no way to use it correctly.

Deleting is quite error prone, but appending during iteration is very useful in implementing an iterative form of recursive algorithms. It effectively gives a FIFO queue which remembers all items after you're done. In general the behaviour of a list iterator can be understood completely if you explain that it simply remembers the current index.

On 9/10/21 11:24 am, Tyler Hou via Python-ideas wrote:
This would break Plex. I use a loop like that as part of the NFA-to-DFA conversion to calculate the epsilon-closure of a state (although obviously there are conditions on adding things to the list that ensure it terminates). -- Greg

Is this behavior of list iteration guaranteed by documentation anywhere? A quick search didn't yield any results for me. If not, I suggest either: a) it gets documented so developers can rely on it, or b) the lack of guarantee is documented so that we can reserve a change of behavior later. Paul On Sun, 2021-10-10 at 13:17 +1300, Greg Ewing wrote:

I've known and relied on the behavior since Python 1.4. I mean, not often, but it's been consistent. In the case where I want to loop over what the loop was at start, rather than what it might become, a slice comes in handy: for item in mylist[:]: # do stuff, maybe mutate list On Sat, Oct 9, 2021, 9:37 PM Paul Bryan <pbryan@anode.ca> wrote:
participants (6)
-
David Mertz, Ph.D.
-
Greg Ewing
-
Paul Bryan
-
Serhiy Storchaka
-
Spencer Brown
-
Tyler Hou