[Python-ideas] PEP 479 and take()

Wed Dec 10 19:35:35 CET 2014

Hm... The type of take() seems schizophrenic: take() with one argument
returns a single item, while take() with a count returns a tuple of that
many items. It would be better if these were two separate functions. Other
than that, it's a simple function that people can easily code up themselves
(or perhaps there's already a variant in itertools :-).

BTW did you know that next(iterator, default) returns default if the
iterator is exhausted? IOW this will never raise StopIteration. It's
similar to dict.get(key, default) or getattr(obj, attrname, default).

On Wed, Dec 10, 2014 at 4:05 AM, Oscar Benjamin <oscar.j.benjamin at gmail.com>
wrote:

> I've been away from Python-related mailing lists for almost a year now
> and only a few days ago started following them again just in time to
> see the "hopefully final" text of PEP 479 posted on python-dev. I had
> been intending to just lurk around for a while but felt compelled to
> post something after seeing that. It's taken a while for me to find
> the time to read through what's already been written but I see that
> PEP 479 is apparently a done deal so I won't try to argue with it
> (except to register a quick -1 here).
>
> Somehow the discussion of PEP 479 became obsessed with the idea that
> leaked StopIteration is a problem for generators when it is a problem
> for all iterators. I haven't seen this pointed out yet so I'll
> demonstrate that map() is susceptible to the same problem:
>
> $ cat tmp.py
>
> people = [
>     ['John Cleese', 1, 0, 1],
>     ['Michael Palin', 123, 123],
>     [],  # Whoops!
>     ['Terry Gilliam', 12, False, ''],
> ]
>
> def first_name(person):
>     return next(iter(person)).split()[0]
>
> for name in map(first_name, people):
>     print(name)
> $ python3.4 tmp.py
> John
> Michael
>
> (i.e. Terry was not printed and no error was raised.)
>
> There's nothing hacky about the use of map above: the mistake is just
> the bare next call. The same thing happens with filter, takewhile,
> etc. Essentially any of the itertools style functions that takes a
> user-defined function allows StopIteration to pass from the user
> function to the parent iterator-consumer. I believe this is by design
> since apparently the author Raymond Hettinger (like me) considered
> StopIteration fall-through a deliberate design feature of the iterator
> protocol.
>
> Fixing this problem so that a leaked StopIteration turns into a loud
> error message has been deemed important enough that a partial fix
> (applying only to generators) warrants breaking the backward
> compatibility of the core language in a minor release. So what should
> happen with all the other places that are susceptible? Is
> StopIteration fall-through to be considered an anti-pattern that
> anyone implementing the iterator protocol should avoid?
>
> With or without PEP 479 the root of the problem is simply in the
> careless use of next(). The PEP proposes to make this easier to track
> down but once located the problem will be an unguarded next call that
> needs to be fixed. (It will also force people to "fix" other things
> like "raise StopIteration" but these were not actually problematic
> before).
>
> Clearly people want a function like next() that isn't susceptible to
> this problem or they wouldn't use next in this way and the problem
> wouldn't exist. So I propose a new function called take() with the
> following semantics:
>
> class TakeError(Exception):
>     pass
>
> def take(iterator, n=None):
>     if n is None:
>         try:
>             return next(iterator)
>         except StopIteration:
>             raise TakeError
>     else:
>         return tuple(take(iterator) for _ in range(n))
>
> The idea is that take(iterator) is the generic way to get the next
> item from the iterator and assert that the item should exist. When you
> use take(iterator) your intention that the item should exist is self
> documenting whereas a bare next() is ambiguous without a comment:
>
> x = next(iterator)  # Never raises StopIteration
> x = next(iterator)  # Propagate StopIteration
> x = next(iterator)  # Haven't considered StopIteration
>
> This gives users a clear and un-ugly fix to any code that uses next
> inappropriately: s/next/take so that there is no excuse for not fixing
> that code to:
>
> x = take(iterator)  # Either I get an item or a proper Error is raised.
>
> Similarly take(iterator, n) is like islice except that it immediately
> advances the iterator and raises if the required number of items was
> not found. Essentially this is a safer version of:
>
> firstn = [next(iterator) for _ in range(n)]  # Leaks StopIteration
> firstn = tuple(next(iterator) for _ in range(n))  # Terminates silently
> firstn = list(islice(iterator, n))  # Terminates silently
>
> (Actually the second example would raise RuntimeError with PEP 479)
> With take it becomes:
>
> firstn = take(iterator, n) # n items returned or an Error
>
>
> Oscar
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/
>

-- 
--Guido van Rossum (python.org/~guido)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20141210/1a714176/attachment-0001.html>