[Python-ideas] Membership of infinite iterators

MRAB python at mrabarnett.plus.com
Wed Oct 18 14:24:42 EDT 2017

On 2017-10-18 15:48, Nick Coghlan wrote:
> On 18 October 2017 at 22:36, Koos Zevenhoven <k7hoven at gmail.com 
> <mailto:k7hoven at gmail.com>> wrote:
>     On Wed, Oct 18, 2017 at 2:08 PM, Nick Coghlan <ncoghlan at gmail.com
>     <mailto:ncoghlan at gmail.com>> wrote:
>         That one can only be fixed in count() - list already checks
>         operator.length_hint(), so implementing
>         itertools.count.__length_hint__() to always raise an exception
>         would be enough to handle the container constructor case.
>     While that may be a convenient hack to solve some of the cases,
>     maybe it's possible for list(..) etc. to give Ctrl-C a chance every
>     now and then? (Without a noticeable performance penalty, that is.)
>     That would also help with *finite* C-implemented iterables that are
>     just slow to turn into a list.
>     If I'm not mistaken, we're talking about C-implemented functions
>     that iterate over C-implemented iterators. It's not at all obvious
>     to me that it's the iterator that should handle Ctrl-C.
> It isn't, it's the loop's responsibility. The problem is that one of the 
> core design assumptions in the CPython interpreter implementation is 
> that signals from the operating system get handled by the opcode eval 
> loop in the main thread, and Ctrl-C is one of those signals.
> This is why "for x in itertools.cycle(): pass" can be interrupted, while 
> "sum(itertools.cycle())" can't: in the latter case, the opcode eval loop 
> isn't running, as we're inside a tight loop inside the sum() implementation.
> It's easy to say "Well those loops should all be checking for signals 
> then", but I expect folks wouldn't actually like the consequences of 
> doing something about it, as:
> 1. It will make those loops slower, due to the extra overhead of 
> checking for signals (even the opcode eval loop includes all sorts of 
> tricks to avoid actually checking for new signals, since doing so is 
> relatively slow)
> 2. It will make those loops harder to maintain, since the high cost of 
> checking for signals means the existing flat loops will need to be 
> replaced with nested ones to reduce the per-iteration cost of the more 
> expensive checks

The re module increments a counter on each iteration and checks for 
signals when the bottom 12 bits are 0.

The regex module increments a 16-bit counter on each iteration and 
checks for signals when it wraps around to 0.

> 3. It means making the signal checking even harder to reason about than 
> it already is, since even C implemented methods that avoid invoking 
> arbitrary Python code could now still end up checking for signals
> It's far from being clear to me that making such a change would actually 
> be a net improvement, especially when there's an opportunity to mitigate 
> the problem by having known-infinite iterators report themselves as such.

More information about the Python-ideas mailing list