[Python-ideas] Membership of infinite iterators

Wed Oct 18 06:22:47 EDT 2017

On 18 October 2017 at 03:39, Koos Zevenhoven <k7hoven at gmail.com> wrote:

> On Tue, Oct 17, 2017 at 5:26 PM, Serhiy Storchaka <storchaka at gmail.com>
> wrote:
>
>> 17.10.17 17:06, Nick Coghlan пише:
>>
>>> Keep in mind we're not talking about a regular loop you can break out of
>>> with Ctrl-C here - we're talking about a tight loop inside the interpreter
>>> internals that leads to having to kill the whole host process just to get
>>> out of it.
>>>
>>
>> And this is the root of the issue. Just let more tight loops be
>> interruptible with Ctrl-C, and this will fix the more general issue.
>>
>>
> Not being able to interrupt something with Ctrl-C in the repl or with the
> interrupt command in Jupyter notebooks is definitely a thing I sometimes
> encounter. A pity I don't remember when it happens, because I usually
> forget it very soon after I've restarted the kernel and continued working.
> But my guess is it's usually not because of an infinite iterator.
>

Fixing the general case is hard, because the assumption that signals are
only checked between interpreter opcodes is a pervasive one throughout the
interpreter internals.  We certainly *could* redefine affected C APIs as
potentially raising KeyboardInterrupt (adjusting the signal management
infrastructure accordingly), and if someone actually follows through and
implements that some day, then the argument could then be made that given
such change, it might be reasonable to drop any a priori guards that we
have put in place for particular *detectable* uninterruptible infinite
loops.

However, that's not the design question being discussed in this thread. The
design question here is "We have 3 known uninterruptible infinite loops
that are easy to detect and prevent. Should we detect and prevent them?".
"We shouldn't allow anyone to do this easy thing, because it would be
preferable for someone to instead do this hard and complicated thing that
nobody is offering to do" isn't a valid design argument in that situation.

And I have a four step check for that which prompts me to say "Yes, we
should detect and prevent them":

1. Uninterruptible loops are bad, so having fewer of them is better
2. These particular cases can be addressed locally using existing
protocols, so the chances of negative side effects are low
3. The total amount of code involved is likely to be small (a dozen or so
lines of C, a similar number of lines of Python in the tests) in
well-isolated protocol functions, so the chances of introducing future
maintainability problems are low
4. We have a potential contributor who is presumably offering to do the
work (if that's not the case, then the question is moot anyway until a
sufficiently interested volunteer turns up)

As an alternative implementation approach, the case could also be made that
these iterators should be raising TypeError in __length_hint__, as that
protocol method is explicitly designed to be used for finite container
pre-allocation. That way things like "list(itertools.count())" would fail
immediately (similar to the way "list(range(10**100))" already does) rather
than attempting to consume all available memory before (hopefully) finally
failing with MemoryError.

If we were to do that, then we *could* make the solution to the reported
problem more general by having all builtin and standard library operations
that expect to be working with finite iterators (the containment testing
fallback, min, max, sum, any, all, functools.reduce, etc) check for a
length hint, even if they aren't actually pre-allocating any memory. Then
the general purpose marker for "infinite iterator" would be "Explicitly
defines __length_hint__ to raise TypeError", and it would prevent a priori
all operations that attempted to fully consume the iterator.

That more general approach would cause some currently "working" code (like
"any(itertools.count())" and "all(itertools.count())", both of which
consume at most 2 items from the iterator) to raise an exception instead,
and hence would require the introduction of a DeprecationWarning in 3.7
(where the affected APIs would start calling length hint, but suppress any
exceptions from it), before allowing the exception to propagate in 3.8+.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20171018/6c776771/attachment-0001.html>