Warn when iterating over an already exhausted generator

This is my first post to this list, apologies in advance if this is the wrong place or it's a known topic that my search failed to find. As a Python user since 2.6, I make stupid mistakes with exhausted generators more often than I'd like to admit: numbers = (i for i in range(5)) assert 5 not in numbers sorted(numbers) # [] Then the empty list creates hard-to-track bugs. I'm familiar with the iterator protocol and why the behavior above happens, but couldn't it be prevented? A naive example of the behavior I wish it had: class SafeGenerator: def __init__(self, generator): self.generator = generator self.is_exhausted = False def __iter__(self): if self.is_exhausted: raise ValueError("can't iterate over an already exhausted generator") return self def __next__(self): try: return next(self.generator) except StopIteration: self.is_exhausted = True raise safe_generator = SafeGenerator(i for i in range(5)) assert 5 not in safe_generator sorted(safe_generator) # ValueError: can't iterate over an already exhausted generator Note that the error is raised on `__iter__`, not `__next__`, so idioms like `zip(generator, generator)` would still work. I can't imagine any sane code that would break under this change, but even if there is, what's the downside of at least emitting a warning when calling `__iter__` on an exhausted generator?

On 12 Jun 2023, at 16:55, BoppreH via Python-ideas <python-ideas@python.org> wrote:
Then the empty list creates hard-to-track bugs. I'm familiar with the iterator protocol and why the behavior above happens, but couldn't it be prevented?
I don’t think so. It is not always a bug that an iterator is empty. For example this pattern: args = iter(sys.argv) progname = next(args) mandatory_arg = next(args) for arg in args: print(‘optional arg’, next(arg)) Your proposal will traceback for the for loop if there are no optional args. Barry

That's a good scenario, but it doesn't traceback with the SafeGenerator implementation above. args = SafeGenerator(iter(['mkdir.py', 'dir'])) progname = next(args) mandatory_arg = next(args) # Generator is empty, but "we don't know it yet" (no StopIteration), so is_exhausted == False for arg in args: print('optional arg', next(arg)) # StopIteration is raised, so is_exhausted == True. Still no traceback. # Added mistake: print('Parsed args:', ' '.join(args)) # ValueError: can't iterate over an already exhausted generator And even if there are rare false positives, I'd still love a flag to enable this warning. I hit this bug every few weeks and it's a pain to troubleshoot. Worse than strays NaN, None, or Javascript's "undefined", because logging it shows a normal-looking empty list. BoppreH On Mon, Jun 12, 2023, at 10:11 PM, Barry wrote:

On 12/06/2023 21:11, Barry wrote:
I don't think so. Using the OP's implementation, the is_exhausted flag has not been set when the for-loop is entered. It is of course set after the for-loop has been exited, so a subsequent attempt to iterate over "args" will raise the Exception. (The OP's post confirming this arrived while I was typing this e-mail. 😁) Also the OP's request was for generators, not for any iterator. Nonetheless, I do not support the proposal, for backward compatibility reasons. I *can* imagine sane code that relies on the existing behaviour. Best wishes Rob Cliffe

On 13/06/23 9:59 am, Rob Cliffe via Python-ideas wrote:
Also the OP's request was for generators, not for any iterator.
IMO it would be a bad idea to make generators behave differently from other iterators in this regard. And it's far too late to redefine the iterator protocol in general, because that would require changing all existing iterators and all existing code that relies on the current behaviour. -- Greg

The original example is: numbers = (i for i in range(5)) assert 5 not in numbers sorted(numbers) I like this example. It provides an opportunity to improve the documentation. The problems goes away if we write any of the following numbers = [i for i in range(5)] numbers = tuple(i for i in range(5)) numbers = tuple(range(5)) numbers = list(range(5)) It also goes away if we write any of numbers = [0, 1, 2, 3, 4] numbers = (0, 1, 2, 3, 4) In fact, the various ways of making the problem go away produce either a list or a tuple 0, 1, 2, 3, 4. A simpler way to create the problem is x = iter(range(5)) 4 in x sorted(x) # returns an empty tuple. If instead we write x = range(5) then the problem goes away. I hope this helps. -- Jonathan

On Tue, 13 Jun 2023 at 21:03, BoppreH via Python-ideas <python-ideas@python.org> wrote:
Any thoughts on logging a warning, perhaps behind an opt-in flag? I could not find a single false positive scenario, so I expect the signal-to-noise ratio to be high and prevent lots of bugs.
Shadow the iter() function and make your own. That's your opt-in flag. ChrisA

@ChrisA: Shadowing "iter()" would only help with Barry's example. @Jonathan: Updating documentation is helpful, but I find an automated check better. Too often the most obvious way to accomplish something silently triggers this behavior: strings = ['aa', '', 'bbb', 'c'] strings = filter(bool, strings) # Adding this step makes n_unique always 0. longest = max(strings, key=len) n_unique = len(set(strings)) I feel like a warning here would save time and prevent bugs, and that my is_exhausted proposal, if implemented directly in the generators, is an easy way to accomplish this. And I have to say I'm surprised by the responses. Does nobody else hit bugs like this and wish they were automatically detected? To be clear, raising ValueError is just an example; logging a warning would already be helpful, like Go's race condition detector. -- BoppreH

On Wed, 14 Jun 2023 at 01:07, BoppreH via Python-ideas <python-ideas@python.org> wrote:
And I have to say I'm surprised by the responses. Does nobody else hit bugs like this and wish they were automatically detected?
Nope, I've never had that happen to me, and I *have* made use of calling iter() on potentially-exhausted iterators (usually implicitly). ChrisA

In close to 10 years of experience with python I have never encountered anything like this. If I need to use a list later I never do ANY assignments to it. Why would I? In the last example I would: ``` strings = ['aa', '', 'bbb', 'c’] longest = max(filter(bool, strings), key=len) n_unique = len(set(strings)) ``` And in initial example I don’t see why would I ever do this. It is very unclear what is the scenario here: ```??? numbers = (i for i in range(5)) assert 5 not in numbers sorted(numbers) ``` 1. If I wanted sorted numbers, then ValueError wouldn’t help, because I do not get sorted numbers. 2. If I wanted unmodified list and if it was modified then it is an error, your solution doesn’t work either. 3. If sorting is ok only on non-empty iterator, then just `assert sorted` after sorting. If you could give a full real-life scenario, then it might expose the problem (if it exists) better. "There should be one-- and preferably only one --obvious way to do it.” There is either: something to be improved or you are not using that "one obvious" way.

In close to 10 years of experience with python I have never encountered anything like this.
Here's a small selection of the StackOverflow questions from people who encountered this exact issue: https://stackoverflow.com/questions/25336726/why-cant-i-iterate-twice-over-t... https://stackoverflow.com/questions/10255273/iterating-on-a-file-doesnt-work-the-second-time?noredirect=1&lq=1 https://stackoverflow.com/questions/3906137/why-cant-i-call-read-twice-on-an... https://stackoverflow.com/questions/17777219/zip-variable-empty-after-first-... https://stackoverflow.com/questions/42246819/loop-over-results-from-path-glo... https://stackoverflow.com/questions/21715268/list-returned-by-map-function-d... https://stackoverflow.com/questions/14637154/performing-len-on-list-of-a-zip... https://stackoverflow.com/questions/44420135/filter-object-becomes-empty-aft... Note that questions usually get few votes, and "what's wrong with my code" questions are especially poorly received, so getting even a couple of votes is a strong signal. The questions above range from 10 to 124 (!) votes, and have a combined 250k+ views. These are the people I'd like to help.
If you could give a full real-life scenario, then it might expose the problem (if it exists) better.
Open a log file, count the number of lines, then find both the longest and number of unique "error" entries. Implemented in the most obvious way I can, using builtin functions, it has *two* such bugs (reusing the exhausted "f" and "error_lines"). import re error_regex = re.compile('^ERROR: ') with open('logs.txt') as f: n_lines = len(list(f)) error_lines = filter(error_regex.match, f) longest_error = max(error_lines, key=len, default='') n_unique_errors = len(set(error_lines)) print(f'{n_lines=}\n{longest_error=}\n{n_unique_errors=}') Is it hard to fix? No, not all, just store "list(f)" and replace "filter" with a longer list comprehension. Is it easy to spot? For an experienced developer, in this short example, with all the parts introduced together, yes. But having a natural solution silently give wrong answers is dangerous. At least having a warning would break the false sense of security.
If I wanted sorted numbers, then ValueError wouldn’t help, because I do not get sorted numbers.
I do want sorted numbers, but what can Python do in the face of broken code? There's a reason it raises errors for 1/0, str.invalid, and len(None). It's not "helpful" to the program, but it stops execution from continuing with a bad state. I understand that backwards compatibility will probably prevent us from raising a new error. But a warning could help a lot of people. I'm tempted to patch the Python interpreter and test some popular packages, to verify if doing this on purpose is as rare as I think it is. On Tue, Jun 13, 2023, at 6:50 PM, Dom Grigonis wrote:

On Wed, 14 Jun 2023 at 07:02, BoppreH via Python-ideas <python-ideas@python.org> wrote:
In close to 10 years of experience with python I have never encountered anything like this.
Here's a small selection of the StackOverflow questions from people who encountered this exact issue:
But now try to find people who would be adversely affected by your proposed change. Unless you do it in a purely backward compatible way such as the local shadowing of iter(), you WILL break other people's code. What you've shown is that a small handful of people have wondered at the reiterability of generators, which is NOT the same as wanting a warning in these situations. Even if we consider that every single upvote represents a person who wants this feature, you've shown, what, a thousand people total? Across the whole world? That's not exactly an overwhelming number of people, and hardly enough to make a backward-incompatible language change. Let's go back to your earlier incredulity:
And I have to say I'm surprised by the responses. Does nobody else hit bugs like this and wish they were automatically detected?
You've found a dozen questions that have been upvoted by a maximum of 124 people, by your own count (I didn't bother going through all the questions to check). Let's make some VERY generous estimates: 1) Every upvote represents a unique person (pretending that nobody browses multiple questions and upvotes them all) 2) Each of those people agrees with your proposal 3) The total upvote count is 1000 (feel free to go and sum them for me, I can't be bothered) 4) For everyone who upvotes, nine others don't bother to upvote That'll give an incredibly generous figure of 10,000 Stack Overflow users who might support your proposal. Stack Overflow has 21 million users [1]. If we assume that those who answer their survey are representative (impossible to prove, but the best we can do), about half of those are Python users [2]. That's roughly 10,000,000 Stack Overflow users who use Python. Even if we assume that Stack Overflow users are representative of the internet at large (they're definitely not, but again, it's good to at least having some figures), that's 0.1% of people. So..... yeah, I'm not surprised that none of us here has run into a problem. I strongly recommend reconsidering the "shadow iter() in your own code" solution, as it is entirely backward compatible. ChrisA [1] https://stackexchange.com/sites and select Stack Overflow - it says "21m" [2] https://stackoverflow.blog/2023/06/13/developer-survey-results-are-in/

On Wed, 14 Jun 2023 at 07:52, BoppreH via Python-ideas <python-ideas@python.org> wrote:
Sorry, I'm new to the list and was not aware the burden of proof was so high. Can you point me to one or two successful posts in Python-ideas where I can learn how to show there's a real need for a feature?
It's more a matter of the need required for backward incompatibility. Have a look through the mailing list archives; although, something with no controversy will be hard to find, since it's just a couple of posts and then it goes to the tracker and that's that, it's implemented. ChrisA

@ChrisA: There are already flags for enabling warnings on dangerous bytearray comparisons[1] or relying on locale-dependent encodings[2], not to mention a whole Development Mode flag[3] that adds extra checks. Some of those checks affect fewer people than my proposal. A "warn on reused generators" flag would fit right in, maintain backwards compatibility, and help some people (admittedly less than an opt-out one). [1] https://docs.python.org/3/using/cmdline.html#cmdoption-b [2] https://docs.python.org/3/using/cmdline.html#cmdoption-X [3] https://docs.python.org/3/library/devmode.html#devmode @Dom: Thank you for your last message. That was genuinely kind and helpful, and the type of feedback I was hoping for. I also appreciate the others who said they haven't experienced this; maybe I'm doing something wrong, or misremembering how often it happens. I'd love to stay and improve the points you mentioned, but to be honest I started regretting this thread when the first reply was a counter-example that didn't work. And then Chris' messages started rolling in. If anybody wants to take the idea forward, be my guest.

On Wed, 14 Jun 2023 at 09:05, BoppreH via Python-ideas <python-ideas@python.org> wrote:
@ChrisA: There are already flags for enabling warnings on dangerous bytearray comparisons[1] or relying on locale-dependent encodings[2], not to mention a whole Development Mode flag[3] that adds extra checks. Some of those checks affect fewer people than my proposal. A "warn on reused generators" flag would fit right in, maintain backwards compatibility, and help some people (admittedly less than an opt-out one).
Citation needed, how do you calculate that they affect fewer people than your proposal? The byte/str comparison warning relates to Py2/Py3 changes, which affected huge numbers of people; locale-dependent encodings affect everyone whose systems default to those encodings; and I don't know what the debug build does exactly, but it's specifically meant to cover things that are too expensive to check for normally. Also, question: Would your proposal even be useful to people if they had to run Python with a special parameter to get it? Wouldn't it be just as easy to shadow iter(), like I have already suggested multiple times as a much better way to do this?
And then Chris' messages started rolling in.
Backward compatibility is WAY more important than a lot of proposals seem to acknowledge. You're welcome to hate me for saying it, but frankly, you're also welcome to ignore my posts. I don't have to prove anything to you; the onus is on you to demonstrate the value of your proposal, and at the moment, you've shown a benefit in a very small number of cases, contrasted with a potentially huge number of situations where this would create a spurious warning. Blaming me for the pushback is a tad unfair. But that's your prerogative. ChrisA

I know this is not the point of this thread, but when i saw: ``` numbers = (i for i in range(10)) assert 3 in numbers next(numbers) # 4! ``` I was totally surprised that generator expressions support `in` -- WTF? they very much are NOT containers! And, in fact, all iterators support `in` -- wow, just wow. I know this is not going to change, but it strikes me as a greater source of mysterious bugs than not getting an error from iterating an exhausted iterator. Getting back OT: The distinction between iterators and containers is simply something that you need to learn -- it's an inevitable consequence of the benefits of iterators. And the original example is an easy one -- Python supports both list comprehensions and generator comprehensions -- you really need to know which to use when. -CHB On Tue, Jun 13, 2023 at 4:18 PM Chris Angelico <rosuav@gmail.com> wrote:
-- Christopher Barker, PhD (Chris) Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython

I've also been a Python user for 24 years now. Since long before iterators were a feature of Python. I wrote quite a few widely read articles about iterators when they were introduced, including the first about leverageing iterators for coroutines. I can't say I've NEVER encountered a glitch with exhausted iterators. But it's been rare, and it never motivated me to want the behavior changed. On Tue, Jun 13, 2023, 5:52 PM BoppreH via Python-ideas < python-ideas@python.org> wrote:

numbers = (i for i in range(10))
assert 3 in numbers
next(numbers) # 4!
This wouldn’t raise an error with your fix, but still would be a bug.
Here's a small selection of the StackOverflow questions from people who encountered this exact issue:
This raises a question - how many people have done the same mistake more than once? I.e. are all of those just google searches on first encounter of an iterator, or is it difficult to memorise never to use iterator twice... Regardless, python has largely adapted iterator standard in many places so that memory consumption is reduced. It has served me numerous times when I did not have enough memory to hold the full list. This comes at the expense of having to be aware of it. The full issue as I see it is “warn/error error, if `__next__` has been called at least once". While the solution you are proposing is only covering half of the issue, which is “warn/error if the iterator has been FULLY consumed”. So, in my opinion, the solution to this bug prevention, given the situation, should be more comprehensive and well thought out, so that: 1. It breaks minimal amount of existing code. 2. It has enough positive impact for it to be worthwhile. Your current proposal, as I see it, is lacking on both points. Some of the thoughts: 1. General `dummy` global python option, which raises an error or issues a warning if `__iter__` method has been called a second time, so that the ones, who want, can set it to true. But this would cause library cross-compatibility issues and would generally prevent adapting a coding style which was intended when creating iterators. 2. Using another paradigm. E.g. piping via functional programming. Something like: ``` import re error_regex = re.compile('^ERROR: ‘) with open('logs.txt') as f: n_lines, longest_error, n_unique_errors = f.multiplex( len, filter(error_regex.match).max(key=len, default=‘’).multiplex( PIPE, len().set() ) ]) print(f’{n_lines=}\n{longest_error=}\n{n_unique_errors=}') ``` It actually would be nice to have a standard library with comprehensive list of functional programming tools. I think iterators are powerful and joining them with a flexible with multi-input/multi-output framework would bring them to the next level. It’s hello world would be "pub/sub pattern in 5 minutes”. I went off the track a bit, but I am working on similar stuff now so couldn’t resist. :) But it does seem that it’s either-either: 1. You work with iterators, get benefits, but be aware of how they work. 2. get used to pre-converting to lists and don’t care I think your proposal is trying to mix things up here and I am not very positive about it. Maybe optional arguments, whether to return list or iterator… But the sensible default I think is iterator anyways, so doesn’t solve anything. In general, I see your point. If it took care of partial exhaustion prevention as well, I might be more positive about it. As your current proposition goes, I would choose to leave things as they are.

I think the discussion is sort of missing a very common use case, when a user calls func(iterator) and the function is expecting an iterable but not an iterator. The author of the called code might be thinking that the input is a list but that doen't mean the caller thinks that. Even worse is a case like this: def func(data_list): if VERBOSE: log(... [ ... for i in data_list ... ] ...) for i in data_list : do stuff with i The behavor of this code changes (significantly!) when VERBOSE is set. That's going to make debugging hard. [Yes, I know this can be safely written using itertools.tee. That doesn't mean that everyone will do that.] Here's a simple way to describe what the behavior could be (although obviously this doesn't work): def safer_iter_behavior(iterator): yield from iterator raise StopIteration raise RuntimeError That is, when you iterate the first time, it acts as expected, raising StopIteration when the list is exhausted. If you continue to iterate after the list is exhausted, it raises RuntimeError. In addition to the double iteration case above, it guards against other cases like: for x in iterator1: y = next(iterator2) do stuff when the iterator2 is unexpectedly shorter than iterator1. Of course, naively guarding against that by adding if len(iterator1) != len(iterator2): ... would be a bad idea and would also result in RuntimeError. --- Bruce

On 12 Jun 2023, at 16:55, BoppreH via Python-ideas <python-ideas@python.org> wrote:
Then the empty list creates hard-to-track bugs. I'm familiar with the iterator protocol and why the behavior above happens, but couldn't it be prevented?
I don’t think so. It is not always a bug that an iterator is empty. For example this pattern: args = iter(sys.argv) progname = next(args) mandatory_arg = next(args) for arg in args: print(‘optional arg’, next(arg)) Your proposal will traceback for the for loop if there are no optional args. Barry

That's a good scenario, but it doesn't traceback with the SafeGenerator implementation above. args = SafeGenerator(iter(['mkdir.py', 'dir'])) progname = next(args) mandatory_arg = next(args) # Generator is empty, but "we don't know it yet" (no StopIteration), so is_exhausted == False for arg in args: print('optional arg', next(arg)) # StopIteration is raised, so is_exhausted == True. Still no traceback. # Added mistake: print('Parsed args:', ' '.join(args)) # ValueError: can't iterate over an already exhausted generator And even if there are rare false positives, I'd still love a flag to enable this warning. I hit this bug every few weeks and it's a pain to troubleshoot. Worse than strays NaN, None, or Javascript's "undefined", because logging it shows a normal-looking empty list. BoppreH On Mon, Jun 12, 2023, at 10:11 PM, Barry wrote:

On 12/06/2023 21:11, Barry wrote:
I don't think so. Using the OP's implementation, the is_exhausted flag has not been set when the for-loop is entered. It is of course set after the for-loop has been exited, so a subsequent attempt to iterate over "args" will raise the Exception. (The OP's post confirming this arrived while I was typing this e-mail. 😁) Also the OP's request was for generators, not for any iterator. Nonetheless, I do not support the proposal, for backward compatibility reasons. I *can* imagine sane code that relies on the existing behaviour. Best wishes Rob Cliffe

On 13/06/23 9:59 am, Rob Cliffe via Python-ideas wrote:
Also the OP's request was for generators, not for any iterator.
IMO it would be a bad idea to make generators behave differently from other iterators in this regard. And it's far too late to redefine the iterator protocol in general, because that would require changing all existing iterators and all existing code that relies on the current behaviour. -- Greg

The original example is: numbers = (i for i in range(5)) assert 5 not in numbers sorted(numbers) I like this example. It provides an opportunity to improve the documentation. The problems goes away if we write any of the following numbers = [i for i in range(5)] numbers = tuple(i for i in range(5)) numbers = tuple(range(5)) numbers = list(range(5)) It also goes away if we write any of numbers = [0, 1, 2, 3, 4] numbers = (0, 1, 2, 3, 4) In fact, the various ways of making the problem go away produce either a list or a tuple 0, 1, 2, 3, 4. A simpler way to create the problem is x = iter(range(5)) 4 in x sorted(x) # returns an empty tuple. If instead we write x = range(5) then the problem goes away. I hope this helps. -- Jonathan

On Tue, 13 Jun 2023 at 21:03, BoppreH via Python-ideas <python-ideas@python.org> wrote:
Any thoughts on logging a warning, perhaps behind an opt-in flag? I could not find a single false positive scenario, so I expect the signal-to-noise ratio to be high and prevent lots of bugs.
Shadow the iter() function and make your own. That's your opt-in flag. ChrisA

@ChrisA: Shadowing "iter()" would only help with Barry's example. @Jonathan: Updating documentation is helpful, but I find an automated check better. Too often the most obvious way to accomplish something silently triggers this behavior: strings = ['aa', '', 'bbb', 'c'] strings = filter(bool, strings) # Adding this step makes n_unique always 0. longest = max(strings, key=len) n_unique = len(set(strings)) I feel like a warning here would save time and prevent bugs, and that my is_exhausted proposal, if implemented directly in the generators, is an easy way to accomplish this. And I have to say I'm surprised by the responses. Does nobody else hit bugs like this and wish they were automatically detected? To be clear, raising ValueError is just an example; logging a warning would already be helpful, like Go's race condition detector. -- BoppreH

On Wed, 14 Jun 2023 at 01:07, BoppreH via Python-ideas <python-ideas@python.org> wrote:
And I have to say I'm surprised by the responses. Does nobody else hit bugs like this and wish they were automatically detected?
Nope, I've never had that happen to me, and I *have* made use of calling iter() on potentially-exhausted iterators (usually implicitly). ChrisA

In close to 10 years of experience with python I have never encountered anything like this. If I need to use a list later I never do ANY assignments to it. Why would I? In the last example I would: ``` strings = ['aa', '', 'bbb', 'c’] longest = max(filter(bool, strings), key=len) n_unique = len(set(strings)) ``` And in initial example I don’t see why would I ever do this. It is very unclear what is the scenario here: ```??? numbers = (i for i in range(5)) assert 5 not in numbers sorted(numbers) ``` 1. If I wanted sorted numbers, then ValueError wouldn’t help, because I do not get sorted numbers. 2. If I wanted unmodified list and if it was modified then it is an error, your solution doesn’t work either. 3. If sorting is ok only on non-empty iterator, then just `assert sorted` after sorting. If you could give a full real-life scenario, then it might expose the problem (if it exists) better. "There should be one-- and preferably only one --obvious way to do it.” There is either: something to be improved or you are not using that "one obvious" way.

In close to 10 years of experience with python I have never encountered anything like this.
Here's a small selection of the StackOverflow questions from people who encountered this exact issue: https://stackoverflow.com/questions/25336726/why-cant-i-iterate-twice-over-t... https://stackoverflow.com/questions/10255273/iterating-on-a-file-doesnt-work-the-second-time?noredirect=1&lq=1 https://stackoverflow.com/questions/3906137/why-cant-i-call-read-twice-on-an... https://stackoverflow.com/questions/17777219/zip-variable-empty-after-first-... https://stackoverflow.com/questions/42246819/loop-over-results-from-path-glo... https://stackoverflow.com/questions/21715268/list-returned-by-map-function-d... https://stackoverflow.com/questions/14637154/performing-len-on-list-of-a-zip... https://stackoverflow.com/questions/44420135/filter-object-becomes-empty-aft... Note that questions usually get few votes, and "what's wrong with my code" questions are especially poorly received, so getting even a couple of votes is a strong signal. The questions above range from 10 to 124 (!) votes, and have a combined 250k+ views. These are the people I'd like to help.
If you could give a full real-life scenario, then it might expose the problem (if it exists) better.
Open a log file, count the number of lines, then find both the longest and number of unique "error" entries. Implemented in the most obvious way I can, using builtin functions, it has *two* such bugs (reusing the exhausted "f" and "error_lines"). import re error_regex = re.compile('^ERROR: ') with open('logs.txt') as f: n_lines = len(list(f)) error_lines = filter(error_regex.match, f) longest_error = max(error_lines, key=len, default='') n_unique_errors = len(set(error_lines)) print(f'{n_lines=}\n{longest_error=}\n{n_unique_errors=}') Is it hard to fix? No, not all, just store "list(f)" and replace "filter" with a longer list comprehension. Is it easy to spot? For an experienced developer, in this short example, with all the parts introduced together, yes. But having a natural solution silently give wrong answers is dangerous. At least having a warning would break the false sense of security.
If I wanted sorted numbers, then ValueError wouldn’t help, because I do not get sorted numbers.
I do want sorted numbers, but what can Python do in the face of broken code? There's a reason it raises errors for 1/0, str.invalid, and len(None). It's not "helpful" to the program, but it stops execution from continuing with a bad state. I understand that backwards compatibility will probably prevent us from raising a new error. But a warning could help a lot of people. I'm tempted to patch the Python interpreter and test some popular packages, to verify if doing this on purpose is as rare as I think it is. On Tue, Jun 13, 2023, at 6:50 PM, Dom Grigonis wrote:

On Wed, 14 Jun 2023 at 07:02, BoppreH via Python-ideas <python-ideas@python.org> wrote:
In close to 10 years of experience with python I have never encountered anything like this.
Here's a small selection of the StackOverflow questions from people who encountered this exact issue:
But now try to find people who would be adversely affected by your proposed change. Unless you do it in a purely backward compatible way such as the local shadowing of iter(), you WILL break other people's code. What you've shown is that a small handful of people have wondered at the reiterability of generators, which is NOT the same as wanting a warning in these situations. Even if we consider that every single upvote represents a person who wants this feature, you've shown, what, a thousand people total? Across the whole world? That's not exactly an overwhelming number of people, and hardly enough to make a backward-incompatible language change. Let's go back to your earlier incredulity:
And I have to say I'm surprised by the responses. Does nobody else hit bugs like this and wish they were automatically detected?
You've found a dozen questions that have been upvoted by a maximum of 124 people, by your own count (I didn't bother going through all the questions to check). Let's make some VERY generous estimates: 1) Every upvote represents a unique person (pretending that nobody browses multiple questions and upvotes them all) 2) Each of those people agrees with your proposal 3) The total upvote count is 1000 (feel free to go and sum them for me, I can't be bothered) 4) For everyone who upvotes, nine others don't bother to upvote That'll give an incredibly generous figure of 10,000 Stack Overflow users who might support your proposal. Stack Overflow has 21 million users [1]. If we assume that those who answer their survey are representative (impossible to prove, but the best we can do), about half of those are Python users [2]. That's roughly 10,000,000 Stack Overflow users who use Python. Even if we assume that Stack Overflow users are representative of the internet at large (they're definitely not, but again, it's good to at least having some figures), that's 0.1% of people. So..... yeah, I'm not surprised that none of us here has run into a problem. I strongly recommend reconsidering the "shadow iter() in your own code" solution, as it is entirely backward compatible. ChrisA [1] https://stackexchange.com/sites and select Stack Overflow - it says "21m" [2] https://stackoverflow.blog/2023/06/13/developer-survey-results-are-in/

On Wed, 14 Jun 2023 at 07:52, BoppreH via Python-ideas <python-ideas@python.org> wrote:
Sorry, I'm new to the list and was not aware the burden of proof was so high. Can you point me to one or two successful posts in Python-ideas where I can learn how to show there's a real need for a feature?
It's more a matter of the need required for backward incompatibility. Have a look through the mailing list archives; although, something with no controversy will be hard to find, since it's just a couple of posts and then it goes to the tracker and that's that, it's implemented. ChrisA

@ChrisA: There are already flags for enabling warnings on dangerous bytearray comparisons[1] or relying on locale-dependent encodings[2], not to mention a whole Development Mode flag[3] that adds extra checks. Some of those checks affect fewer people than my proposal. A "warn on reused generators" flag would fit right in, maintain backwards compatibility, and help some people (admittedly less than an opt-out one). [1] https://docs.python.org/3/using/cmdline.html#cmdoption-b [2] https://docs.python.org/3/using/cmdline.html#cmdoption-X [3] https://docs.python.org/3/library/devmode.html#devmode @Dom: Thank you for your last message. That was genuinely kind and helpful, and the type of feedback I was hoping for. I also appreciate the others who said they haven't experienced this; maybe I'm doing something wrong, or misremembering how often it happens. I'd love to stay and improve the points you mentioned, but to be honest I started regretting this thread when the first reply was a counter-example that didn't work. And then Chris' messages started rolling in. If anybody wants to take the idea forward, be my guest.

On Wed, 14 Jun 2023 at 09:05, BoppreH via Python-ideas <python-ideas@python.org> wrote:
@ChrisA: There are already flags for enabling warnings on dangerous bytearray comparisons[1] or relying on locale-dependent encodings[2], not to mention a whole Development Mode flag[3] that adds extra checks. Some of those checks affect fewer people than my proposal. A "warn on reused generators" flag would fit right in, maintain backwards compatibility, and help some people (admittedly less than an opt-out one).
Citation needed, how do you calculate that they affect fewer people than your proposal? The byte/str comparison warning relates to Py2/Py3 changes, which affected huge numbers of people; locale-dependent encodings affect everyone whose systems default to those encodings; and I don't know what the debug build does exactly, but it's specifically meant to cover things that are too expensive to check for normally. Also, question: Would your proposal even be useful to people if they had to run Python with a special parameter to get it? Wouldn't it be just as easy to shadow iter(), like I have already suggested multiple times as a much better way to do this?
And then Chris' messages started rolling in.
Backward compatibility is WAY more important than a lot of proposals seem to acknowledge. You're welcome to hate me for saying it, but frankly, you're also welcome to ignore my posts. I don't have to prove anything to you; the onus is on you to demonstrate the value of your proposal, and at the moment, you've shown a benefit in a very small number of cases, contrasted with a potentially huge number of situations where this would create a spurious warning. Blaming me for the pushback is a tad unfair. But that's your prerogative. ChrisA

I know this is not the point of this thread, but when i saw: ``` numbers = (i for i in range(10)) assert 3 in numbers next(numbers) # 4! ``` I was totally surprised that generator expressions support `in` -- WTF? they very much are NOT containers! And, in fact, all iterators support `in` -- wow, just wow. I know this is not going to change, but it strikes me as a greater source of mysterious bugs than not getting an error from iterating an exhausted iterator. Getting back OT: The distinction between iterators and containers is simply something that you need to learn -- it's an inevitable consequence of the benefits of iterators. And the original example is an easy one -- Python supports both list comprehensions and generator comprehensions -- you really need to know which to use when. -CHB On Tue, Jun 13, 2023 at 4:18 PM Chris Angelico <rosuav@gmail.com> wrote:
-- Christopher Barker, PhD (Chris) Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython

I've also been a Python user for 24 years now. Since long before iterators were a feature of Python. I wrote quite a few widely read articles about iterators when they were introduced, including the first about leverageing iterators for coroutines. I can't say I've NEVER encountered a glitch with exhausted iterators. But it's been rare, and it never motivated me to want the behavior changed. On Tue, Jun 13, 2023, 5:52 PM BoppreH via Python-ideas < python-ideas@python.org> wrote:

numbers = (i for i in range(10))
assert 3 in numbers
next(numbers) # 4!
This wouldn’t raise an error with your fix, but still would be a bug.
Here's a small selection of the StackOverflow questions from people who encountered this exact issue:
This raises a question - how many people have done the same mistake more than once? I.e. are all of those just google searches on first encounter of an iterator, or is it difficult to memorise never to use iterator twice... Regardless, python has largely adapted iterator standard in many places so that memory consumption is reduced. It has served me numerous times when I did not have enough memory to hold the full list. This comes at the expense of having to be aware of it. The full issue as I see it is “warn/error error, if `__next__` has been called at least once". While the solution you are proposing is only covering half of the issue, which is “warn/error if the iterator has been FULLY consumed”. So, in my opinion, the solution to this bug prevention, given the situation, should be more comprehensive and well thought out, so that: 1. It breaks minimal amount of existing code. 2. It has enough positive impact for it to be worthwhile. Your current proposal, as I see it, is lacking on both points. Some of the thoughts: 1. General `dummy` global python option, which raises an error or issues a warning if `__iter__` method has been called a second time, so that the ones, who want, can set it to true. But this would cause library cross-compatibility issues and would generally prevent adapting a coding style which was intended when creating iterators. 2. Using another paradigm. E.g. piping via functional programming. Something like: ``` import re error_regex = re.compile('^ERROR: ‘) with open('logs.txt') as f: n_lines, longest_error, n_unique_errors = f.multiplex( len, filter(error_regex.match).max(key=len, default=‘’).multiplex( PIPE, len().set() ) ]) print(f’{n_lines=}\n{longest_error=}\n{n_unique_errors=}') ``` It actually would be nice to have a standard library with comprehensive list of functional programming tools. I think iterators are powerful and joining them with a flexible with multi-input/multi-output framework would bring them to the next level. It’s hello world would be "pub/sub pattern in 5 minutes”. I went off the track a bit, but I am working on similar stuff now so couldn’t resist. :) But it does seem that it’s either-either: 1. You work with iterators, get benefits, but be aware of how they work. 2. get used to pre-converting to lists and don’t care I think your proposal is trying to mix things up here and I am not very positive about it. Maybe optional arguments, whether to return list or iterator… But the sensible default I think is iterator anyways, so doesn’t solve anything. In general, I see your point. If it took care of partial exhaustion prevention as well, I might be more positive about it. As your current proposition goes, I would choose to leave things as they are.

I think the discussion is sort of missing a very common use case, when a user calls func(iterator) and the function is expecting an iterable but not an iterator. The author of the called code might be thinking that the input is a list but that doen't mean the caller thinks that. Even worse is a case like this: def func(data_list): if VERBOSE: log(... [ ... for i in data_list ... ] ...) for i in data_list : do stuff with i The behavor of this code changes (significantly!) when VERBOSE is set. That's going to make debugging hard. [Yes, I know this can be safely written using itertools.tee. That doesn't mean that everyone will do that.] Here's a simple way to describe what the behavior could be (although obviously this doesn't work): def safer_iter_behavior(iterator): yield from iterator raise StopIteration raise RuntimeError That is, when you iterate the first time, it acts as expected, raising StopIteration when the list is exhausted. If you continue to iterate after the list is exhausted, it raises RuntimeError. In addition to the double iteration case above, it guards against other cases like: for x in iterator1: y = next(iterator2) do stuff when the iterator2 is unexpectedly shorter than iterator1. Of course, naively guarding against that by adding if len(iterator1) != len(iterator2): ... would be a bad idea and would also result in RuntimeError. --- Bruce
participants (11)
-
Barry
-
BoppreH
-
Bruce Leban
-
Chris Angelico
-
Christopher Barker
-
David Mertz, Ph.D.
-
Dom Grigonis
-
Greg Ewing
-
Jonathan Fine
-
MRAB
-
Rob Cliffe