[Python-ideas] PEP 479 and take()

Oscar Benjamin oscar.j.benjamin at gmail.com
Wed Dec 10 13:05:06 CET 2014


I've been away from Python-related mailing lists for almost a year now
and only a few days ago started following them again just in time to
see the "hopefully final" text of PEP 479 posted on python-dev. I had
been intending to just lurk around for a while but felt compelled to
post something after seeing that. It's taken a while for me to find
the time to read through what's already been written but I see that
PEP 479 is apparently a done deal so I won't try to argue with it
(except to register a quick -1 here).

Somehow the discussion of PEP 479 became obsessed with the idea that
leaked StopIteration is a problem for generators when it is a problem
for all iterators. I haven't seen this pointed out yet so I'll
demonstrate that map() is susceptible to the same problem:

$ cat tmp.py

people = [
    ['John Cleese', 1, 0, 1],
    ['Michael Palin', 123, 123],
    [],  # Whoops!
    ['Terry Gilliam', 12, False, ''],
]

def first_name(person):
    return next(iter(person)).split()[0]

for name in map(first_name, people):
    print(name)
$ python3.4 tmp.py
John
Michael

(i.e. Terry was not printed and no error was raised.)

There's nothing hacky about the use of map above: the mistake is just
the bare next call. The same thing happens with filter, takewhile,
etc. Essentially any of the itertools style functions that takes a
user-defined function allows StopIteration to pass from the user
function to the parent iterator-consumer. I believe this is by design
since apparently the author Raymond Hettinger (like me) considered
StopIteration fall-through a deliberate design feature of the iterator
protocol.

Fixing this problem so that a leaked StopIteration turns into a loud
error message has been deemed important enough that a partial fix
(applying only to generators) warrants breaking the backward
compatibility of the core language in a minor release. So what should
happen with all the other places that are susceptible? Is
StopIteration fall-through to be considered an anti-pattern that
anyone implementing the iterator protocol should avoid?

With or without PEP 479 the root of the problem is simply in the
careless use of next(). The PEP proposes to make this easier to track
down but once located the problem will be an unguarded next call that
needs to be fixed. (It will also force people to "fix" other things
like "raise StopIteration" but these were not actually problematic
before).

Clearly people want a function like next() that isn't susceptible to
this problem or they wouldn't use next in this way and the problem
wouldn't exist. So I propose a new function called take() with the
following semantics:

class TakeError(Exception):
    pass

def take(iterator, n=None):
    if n is None:
        try:
            return next(iterator)
        except StopIteration:
            raise TakeError
    else:
        return tuple(take(iterator) for _ in range(n))

The idea is that take(iterator) is the generic way to get the next
item from the iterator and assert that the item should exist. When you
use take(iterator) your intention that the item should exist is self
documenting whereas a bare next() is ambiguous without a comment:

x = next(iterator)  # Never raises StopIteration
x = next(iterator)  # Propagate StopIteration
x = next(iterator)  # Haven't considered StopIteration

This gives users a clear and un-ugly fix to any code that uses next
inappropriately: s/next/take so that there is no excuse for not fixing
that code to:

x = take(iterator)  # Either I get an item or a proper Error is raised.

Similarly take(iterator, n) is like islice except that it immediately
advances the iterator and raises if the required number of items was
not found. Essentially this is a safer version of:

firstn = [next(iterator) for _ in range(n)]  # Leaks StopIteration
firstn = tuple(next(iterator) for _ in range(n))  # Terminates silently
firstn = list(islice(iterator, n))  # Terminates silently

(Actually the second example would raise RuntimeError with PEP 479)
With take it becomes:

firstn = take(iterator, n) # n items returned or an Error


Oscar


More information about the Python-ideas mailing list