PEP 479 and take()

I've been away from Python-related mailing lists for almost a year now and only a few days ago started following them again just in time to see the "hopefully final" text of PEP 479 posted on python-dev. I had been intending to just lurk around for a while but felt compelled to post something after seeing that. It's taken a while for me to find the time to read through what's already been written but I see that PEP 479 is apparently a done deal so I won't try to argue with it (except to register a quick -1 here). Somehow the discussion of PEP 479 became obsessed with the idea that leaked StopIteration is a problem for generators when it is a problem for all iterators. I haven't seen this pointed out yet so I'll demonstrate that map() is susceptible to the same problem: $ cat tmp.py people = [ ['John Cleese', 1, 0, 1], ['Michael Palin', 123, 123], [], # Whoops! ['Terry Gilliam', 12, False, ''], ] def first_name(person): return next(iter(person)).split()[0] for name in map(first_name, people): print(name) $ python3.4 tmp.py John Michael (i.e. Terry was not printed and no error was raised.) There's nothing hacky about the use of map above: the mistake is just the bare next call. The same thing happens with filter, takewhile, etc. Essentially any of the itertools style functions that takes a user-defined function allows StopIteration to pass from the user function to the parent iterator-consumer. I believe this is by design since apparently the author Raymond Hettinger (like me) considered StopIteration fall-through a deliberate design feature of the iterator protocol. Fixing this problem so that a leaked StopIteration turns into a loud error message has been deemed important enough that a partial fix (applying only to generators) warrants breaking the backward compatibility of the core language in a minor release. So what should happen with all the other places that are susceptible? Is StopIteration fall-through to be considered an anti-pattern that anyone implementing the iterator protocol should avoid? With or without PEP 479 the root of the problem is simply in the careless use of next(). The PEP proposes to make this easier to track down but once located the problem will be an unguarded next call that needs to be fixed. (It will also force people to "fix" other things like "raise StopIteration" but these were not actually problematic before). Clearly people want a function like next() that isn't susceptible to this problem or they wouldn't use next in this way and the problem wouldn't exist. So I propose a new function called take() with the following semantics: class TakeError(Exception): pass def take(iterator, n=None): if n is None: try: return next(iterator) except StopIteration: raise TakeError else: return tuple(take(iterator) for _ in range(n)) The idea is that take(iterator) is the generic way to get the next item from the iterator and assert that the item should exist. When you use take(iterator) your intention that the item should exist is self documenting whereas a bare next() is ambiguous without a comment: x = next(iterator) # Never raises StopIteration x = next(iterator) # Propagate StopIteration x = next(iterator) # Haven't considered StopIteration This gives users a clear and un-ugly fix to any code that uses next inappropriately: s/next/take so that there is no excuse for not fixing that code to: x = take(iterator) # Either I get an item or a proper Error is raised. Similarly take(iterator, n) is like islice except that it immediately advances the iterator and raises if the required number of items was not found. Essentially this is a safer version of: firstn = [next(iterator) for _ in range(n)] # Leaks StopIteration firstn = tuple(next(iterator) for _ in range(n)) # Terminates silently firstn = list(islice(iterator, n)) # Terminates silently (Actually the second example would raise RuntimeError with PEP 479) With take it becomes: firstn = take(iterator, n) # n items returned or an Error Oscar

Hm... The type of take() seems schizophrenic: take() with one argument returns a single item, while take() with a count returns a tuple of that many items. It would be better if these were two separate functions. Other than that, it's a simple function that people can easily code up themselves (or perhaps there's already a variant in itertools :-). BTW did you know that next(iterator, default) returns default if the iterator is exhausted? IOW this will never raise StopIteration. It's similar to dict.get(key, default) or getattr(obj, attrname, default). On Wed, Dec 10, 2014 at 4:05 AM, Oscar Benjamin <oscar.j.benjamin@gmail.com> wrote:
-- --Guido van Rossum (python.org/~guido)

On 10 December 2014 at 21:42, Greg Ewing <greg.ewing@canterbury.ac.nz> wrote:
No because TakeError is an Error whereas StopIteration as a flow control construct (as Ethan pointed out in another thread). On one level it really doesn't matter what exception it is as long as it's not StopIteration since StopIteration is caught in unexpected places e.g. a for-loop. Of course introducing a completely new error mitigates the possibility of masking some other error from the underlying iterator. Oscar

On 10 December 2014 at 18:35, Guido van Rossum <guido@python.org> wrote:
Fair enough. It seemed like a natural extension to me but since it can easily be done with a list comprehension it's not really necessary.
It's not hard to write your own take(). It's also not hard to call next() properly and catch the StopIteration provided you're aware of the danger that a loose StopIteration poses (and never make any mistakes!). I've made the mistake before in real code and I've seen plenty of experienced pythoneers making it when posting snippets on this list. I've tried to point it out where possible e.g.: https://mail.python.org/pipermail/python-ideas/2013-February/019627.html I guess it's just easy to reach into builtins looking for take() and then pull out next() because take() isn't there. The point is that while next() makes perfect sense as the canonical way to call __next__() it is often inappropriate as an end user function where the user simply wants to get items from an iterator. i.e. when you're just using iterators and not consciously implementing them next() does the wrong thing. Consider the different uses of next(): 1) You want to leak StopIteration. 2) You want to catch StopIteration. 3) You know that StopIteration will never be raised. 4) You haven't considered the StopIteration. If you had both next() and take() to choose from then the only time next() would be preferable is when you want to leak StopIteration (a pattern that is now largely broken by PEP 479).
I more often find that I want an error than a default value and of course being able to detect the default requires knowing that it cannot be returned by the iterator. You can't always use next() this way but when you do of course there is no problem. I don't really need the dict.get method if I want an error since I can simply write dict[key]. The point of PEP 479 is about silently masking an error where loudly reporting it is considered preferable: in this case the error is easy to avoid for someone who understands everything perfectly and is explicitly thinking about it. Oscar

On 12/13/2014 7:45 AM, Oscar Benjamin wrote:
On 10 December 2014 at 18:35, Guido van Rossum <guido@python.org> wrote:
I once proposed, either here or on python-list, and propose again, that the signature of next be expanded so that the user could specify the ending exception. If possible, the stop object could either be an exception class, which would be called with a generic message, or an exception instance. Then the awkward try: item = next(it) except StopIteration: raise ValueError('iterable must not be empty') from None would simply be, for instance, item = next(it, stop=ValueError('iterable must not be empty')) The doc for next could say that, except in __next__ methods, the default stop exception, StopIteration, should be either overriden or caught, to prevent it from leaking out of the containing function. No need for a new builtin to change the exception. -- Terry Jan Reedy

I've also never needed or desired this. Several people on the list have become obsessed with the notion of "leaking StopIteration" and there seem to be no end of proposals that fight the basic mechanics of "exceptions propagate upwards until caught". Everyone, please, please stop directing so much effort to revise the fundamentals of the language -- it's NOT broken. The iterator protocol and exceptions have been around for a long time. They have been one of Python's greatest success stories (not just in Python, but in many other languages as well). Raymond

On 12/13/2014 5:36 PM, Antoine Pitrou wrote:
Others do, whenever the first item of an iterable needs special treatment. And others *have* forgotten to catch StopIteration when calling next(it). The code for reduce *is* written with such code.
But the current equivalent code in the doc is buggy because it was not. def reduce(function, iterable, initializer=None): it = iter(iterable) if initializer is None: value = next(it) else: value = initializer for element in it: value = function(value, element) return value
The equivalent code now would be try: value = next(it) except StopIteration: raise TypeError("reduce() of empty sequence with no initial value") from None http://bugs.python.org/issue23049? That a core developer would miss this illustrates to me why the addition is needed. With this proposal, the correct equivalent would be value = next(it. stop=TypeError( "reduce() of empty sequence with no initial value")) -- Terry Jan Reedy

On Sat, Dec 13, 2014 at 07:39:25PM -0500, Terry Reedy wrote: [...]
And if they forget that, what makes you think they will remember to write next(it, stop=ValueError)?
One has to expect that pedagogical examples often avoid complicating the code with error handling, and consequently are buggy compared to the real production code. It's almost a truism that an algorithm that takes 5 lines in text books might take 50 lines in production code :-)
Seems perfectly reasonable to me. Many user-written functions don't bother to catch exceptions and convert them to a standard exception, instead they just let them bubble up. That's not a bug, it's a feature of Python. Generators are rather special, but for non-generator functions like reduce, StopIteration is just another exception, no different from ValueError, IndexError, UnicodeDecodeError, etc.
I don't think that this example needs to be fixed, but I don't object if you do fix it.
That a core developer would miss this [...]
You're assuming that it was an oversight. It's more likely that the author of that code snippet simply didn't care to complicate the code just for the sake of raising the same exception type as the built-in reduce. If you read the docs for reduce, it doesn't make any promises about what exceptions it will raise. Raising ValueError is not a documented part of the API. https://docs.python.org/3/library/functools.html#functools.reduce So the Devil's Advocate might argue that the bug is not in the pure Python code but in the built-in reduce, since it wrongly raises ValueError when it should raise StopIteration. -- Steven

...
That a core developer would miss this illustrates to me why the addition is needed. With this proposal, the correct equivalent would be
A number of people seem to take the "python equivalent" code far too literally. Its purpose is primarily documentation to show the overall logic of reducing. In some cases you can make the "equivalent" more exact, but make the documentation worse because it starts to lose its explanatory value. I'm feeling upset by the never ending flurry of proposals on python-ideas. At their root is doing battle the with the core concept that StopIteration is an exception that can propagate upward like all other exceptions. There seems to be a not-so-secret wish that StopIteration was a sentinel value that can only be checked immediately and can't be propagated. In order words, the proponents seem to reject a core idea in the iterator protocol as implemented in Python for over a decade and as implemented in a number of other languages (https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Iterators_and_...) or as specified in Design Patterns. In rejecting the notion of StopIteration for control flow, a number of people on this list are going nuts and coming with one random suggestion after another, ignoring the years of success or the experiences of devs in other languages. You guys seem to be just making up new rules about how other people should write programs to express their reasoning. Some of the discussed proposals seem like no more than stylistic pedantry. I wish more people on this list had experience teaching Python. It would give greater appreciation for language simplicity and not introducing twists in the road. At one time, all we had was it.next() which either returned a value or raised StopIteration. Then, next() was added as shortcut for it.__next__(). The relationship was no more special than the relation between len(s) and s.__len__(). Then, next() grew a default argument. This wasn't bad because it paralleled default arguments in dict.get() and dict.pop() -- also it simplified a common pattern for supplying a default. But now, you want to add "stop=TypeError("reduce() of empty sequence with no initial value") which has no precedent and must be accompanied by explanation your admonitions about how you want people to always catch StopIteration immediately. It stops being easily teachable and becomes one more piece of arcana different from the rest of Python, different from the last 15 years of Python iterator thinking, and different from every other iterator protocol implementation in other languages. I do not look forward to teaching this. For Christmas, it would be a nice gift if Guido could put an end to this line of thinking. The conversations are difficult because several of the participants are deeply convinced that there is something fundamentally wrong with StopIteration. I don't think any of them can easily be convinced to the leave the iterator protocol alone and to continue enjoying one of our best success stories. sometimes-the-cure-is-worse-than-the-disease-ly, Raymond P.S. "Leaking StopIteration" is a phrase that I've only heard here -- this seems to be a villain of legend rather than something that arises in the lives of ordinary Python programmers.

On Sat, Dec 13, 2014 at 6:08 PM, Raymond Hettinger < raymond.hettinger@gmail.com> wrote:
It ends with PEP 479. I have no intention to let this go any further. I am now going to mute any further threads on the topic, and I recommend you do the same, Raymond. Happy holidays! -- --Guido van Rossum (python.org/~guido)

On 14 December 2014 at 12:31, Guido van Rossum <guido@python.org> wrote:
Hear, hear! For myself, I've made one last contribution to a couple of the threads (in an attempt to convey how PEP 479 manages to make the language *smaller* rather than larger when it comes to *writing* generator functions). I'll monitor replies to those to see if further clarifications may be useful, but that's specifically aimed at helping folks to understand the design decision, rather than encouraging further debate about it. Regards, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Hm... The type of take() seems schizophrenic: take() with one argument returns a single item, while take() with a count returns a tuple of that many items. It would be better if these were two separate functions. Other than that, it's a simple function that people can easily code up themselves (or perhaps there's already a variant in itertools :-). BTW did you know that next(iterator, default) returns default if the iterator is exhausted? IOW this will never raise StopIteration. It's similar to dict.get(key, default) or getattr(obj, attrname, default). On Wed, Dec 10, 2014 at 4:05 AM, Oscar Benjamin <oscar.j.benjamin@gmail.com> wrote:
-- --Guido van Rossum (python.org/~guido)

On 10 December 2014 at 21:42, Greg Ewing <greg.ewing@canterbury.ac.nz> wrote:
No because TakeError is an Error whereas StopIteration as a flow control construct (as Ethan pointed out in another thread). On one level it really doesn't matter what exception it is as long as it's not StopIteration since StopIteration is caught in unexpected places e.g. a for-loop. Of course introducing a completely new error mitigates the possibility of masking some other error from the underlying iterator. Oscar

On 10 December 2014 at 18:35, Guido van Rossum <guido@python.org> wrote:
Fair enough. It seemed like a natural extension to me but since it can easily be done with a list comprehension it's not really necessary.
It's not hard to write your own take(). It's also not hard to call next() properly and catch the StopIteration provided you're aware of the danger that a loose StopIteration poses (and never make any mistakes!). I've made the mistake before in real code and I've seen plenty of experienced pythoneers making it when posting snippets on this list. I've tried to point it out where possible e.g.: https://mail.python.org/pipermail/python-ideas/2013-February/019627.html I guess it's just easy to reach into builtins looking for take() and then pull out next() because take() isn't there. The point is that while next() makes perfect sense as the canonical way to call __next__() it is often inappropriate as an end user function where the user simply wants to get items from an iterator. i.e. when you're just using iterators and not consciously implementing them next() does the wrong thing. Consider the different uses of next(): 1) You want to leak StopIteration. 2) You want to catch StopIteration. 3) You know that StopIteration will never be raised. 4) You haven't considered the StopIteration. If you had both next() and take() to choose from then the only time next() would be preferable is when you want to leak StopIteration (a pattern that is now largely broken by PEP 479).
I more often find that I want an error than a default value and of course being able to detect the default requires knowing that it cannot be returned by the iterator. You can't always use next() this way but when you do of course there is no problem. I don't really need the dict.get method if I want an error since I can simply write dict[key]. The point of PEP 479 is about silently masking an error where loudly reporting it is considered preferable: in this case the error is easy to avoid for someone who understands everything perfectly and is explicitly thinking about it. Oscar

On 12/13/2014 7:45 AM, Oscar Benjamin wrote:
On 10 December 2014 at 18:35, Guido van Rossum <guido@python.org> wrote:
I once proposed, either here or on python-list, and propose again, that the signature of next be expanded so that the user could specify the ending exception. If possible, the stop object could either be an exception class, which would be called with a generic message, or an exception instance. Then the awkward try: item = next(it) except StopIteration: raise ValueError('iterable must not be empty') from None would simply be, for instance, item = next(it, stop=ValueError('iterable must not be empty')) The doc for next could say that, except in __next__ methods, the default stop exception, StopIteration, should be either overriden or caught, to prevent it from leaking out of the containing function. No need for a new builtin to change the exception. -- Terry Jan Reedy

I've also never needed or desired this. Several people on the list have become obsessed with the notion of "leaking StopIteration" and there seem to be no end of proposals that fight the basic mechanics of "exceptions propagate upwards until caught". Everyone, please, please stop directing so much effort to revise the fundamentals of the language -- it's NOT broken. The iterator protocol and exceptions have been around for a long time. They have been one of Python's greatest success stories (not just in Python, but in many other languages as well). Raymond

On 12/13/2014 5:36 PM, Antoine Pitrou wrote:
Others do, whenever the first item of an iterable needs special treatment. And others *have* forgotten to catch StopIteration when calling next(it). The code for reduce *is* written with such code.
But the current equivalent code in the doc is buggy because it was not. def reduce(function, iterable, initializer=None): it = iter(iterable) if initializer is None: value = next(it) else: value = initializer for element in it: value = function(value, element) return value
The equivalent code now would be try: value = next(it) except StopIteration: raise TypeError("reduce() of empty sequence with no initial value") from None http://bugs.python.org/issue23049? That a core developer would miss this illustrates to me why the addition is needed. With this proposal, the correct equivalent would be value = next(it. stop=TypeError( "reduce() of empty sequence with no initial value")) -- Terry Jan Reedy

On Sat, Dec 13, 2014 at 07:39:25PM -0500, Terry Reedy wrote: [...]
And if they forget that, what makes you think they will remember to write next(it, stop=ValueError)?
One has to expect that pedagogical examples often avoid complicating the code with error handling, and consequently are buggy compared to the real production code. It's almost a truism that an algorithm that takes 5 lines in text books might take 50 lines in production code :-)
Seems perfectly reasonable to me. Many user-written functions don't bother to catch exceptions and convert them to a standard exception, instead they just let them bubble up. That's not a bug, it's a feature of Python. Generators are rather special, but for non-generator functions like reduce, StopIteration is just another exception, no different from ValueError, IndexError, UnicodeDecodeError, etc.
I don't think that this example needs to be fixed, but I don't object if you do fix it.
That a core developer would miss this [...]
You're assuming that it was an oversight. It's more likely that the author of that code snippet simply didn't care to complicate the code just for the sake of raising the same exception type as the built-in reduce. If you read the docs for reduce, it doesn't make any promises about what exceptions it will raise. Raising ValueError is not a documented part of the API. https://docs.python.org/3/library/functools.html#functools.reduce So the Devil's Advocate might argue that the bug is not in the pure Python code but in the built-in reduce, since it wrongly raises ValueError when it should raise StopIteration. -- Steven

...
That a core developer would miss this illustrates to me why the addition is needed. With this proposal, the correct equivalent would be
A number of people seem to take the "python equivalent" code far too literally. Its purpose is primarily documentation to show the overall logic of reducing. In some cases you can make the "equivalent" more exact, but make the documentation worse because it starts to lose its explanatory value. I'm feeling upset by the never ending flurry of proposals on python-ideas. At their root is doing battle the with the core concept that StopIteration is an exception that can propagate upward like all other exceptions. There seems to be a not-so-secret wish that StopIteration was a sentinel value that can only be checked immediately and can't be propagated. In order words, the proponents seem to reject a core idea in the iterator protocol as implemented in Python for over a decade and as implemented in a number of other languages (https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Iterators_and_...) or as specified in Design Patterns. In rejecting the notion of StopIteration for control flow, a number of people on this list are going nuts and coming with one random suggestion after another, ignoring the years of success or the experiences of devs in other languages. You guys seem to be just making up new rules about how other people should write programs to express their reasoning. Some of the discussed proposals seem like no more than stylistic pedantry. I wish more people on this list had experience teaching Python. It would give greater appreciation for language simplicity and not introducing twists in the road. At one time, all we had was it.next() which either returned a value or raised StopIteration. Then, next() was added as shortcut for it.__next__(). The relationship was no more special than the relation between len(s) and s.__len__(). Then, next() grew a default argument. This wasn't bad because it paralleled default arguments in dict.get() and dict.pop() -- also it simplified a common pattern for supplying a default. But now, you want to add "stop=TypeError("reduce() of empty sequence with no initial value") which has no precedent and must be accompanied by explanation your admonitions about how you want people to always catch StopIteration immediately. It stops being easily teachable and becomes one more piece of arcana different from the rest of Python, different from the last 15 years of Python iterator thinking, and different from every other iterator protocol implementation in other languages. I do not look forward to teaching this. For Christmas, it would be a nice gift if Guido could put an end to this line of thinking. The conversations are difficult because several of the participants are deeply convinced that there is something fundamentally wrong with StopIteration. I don't think any of them can easily be convinced to the leave the iterator protocol alone and to continue enjoying one of our best success stories. sometimes-the-cure-is-worse-than-the-disease-ly, Raymond P.S. "Leaking StopIteration" is a phrase that I've only heard here -- this seems to be a villain of legend rather than something that arises in the lives of ordinary Python programmers.

On Sat, Dec 13, 2014 at 6:08 PM, Raymond Hettinger < raymond.hettinger@gmail.com> wrote:
It ends with PEP 479. I have no intention to let this go any further. I am now going to mute any further threads on the topic, and I recommend you do the same, Raymond. Happy holidays! -- --Guido van Rossum (python.org/~guido)

On 14 December 2014 at 12:31, Guido van Rossum <guido@python.org> wrote:
Hear, hear! For myself, I've made one last contribution to a couple of the threads (in an attempt to convey how PEP 479 manages to make the language *smaller* rather than larger when it comes to *writing* generator functions). I'll monitor replies to those to see if further clarifications may be useful, but that's specifically aimed at helping folks to understand the design decision, rather than encouraging further debate about it. Regards, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
participants (8)
-
Antoine Pitrou
-
Greg Ewing
-
Guido van Rossum
-
Nick Coghlan
-
Oscar Benjamin
-
Raymond Hettinger
-
Steven D'Aprano
-
Terry Reedy