[Python-ideas] Membership of infinite iterators
MRAB
python at mrabarnett.plus.com
Wed Oct 18 14:24:42 EDT 2017
On 2017-10-18 15:48, Nick Coghlan wrote:
> On 18 October 2017 at 22:36, Koos Zevenhoven <k7hoven at gmail.com
> <mailto:k7hoven at gmail.com>> wrote:
>
> On Wed, Oct 18, 2017 at 2:08 PM, Nick Coghlan <ncoghlan at gmail.com
> <mailto:ncoghlan at gmail.com>> wrote:
>
> That one can only be fixed in count() - list already checks
> operator.length_hint(), so implementing
> itertools.count.__length_hint__() to always raise an exception
> would be enough to handle the container constructor case.
>
>
> While that may be a convenient hack to solve some of the cases,
> maybe it's possible for list(..) etc. to give Ctrl-C a chance every
> now and then? (Without a noticeable performance penalty, that is.)
> That would also help with *finite* C-implemented iterables that are
> just slow to turn into a list.
>
> If I'm not mistaken, we're talking about C-implemented functions
> that iterate over C-implemented iterators. It's not at all obvious
> to me that it's the iterator that should handle Ctrl-C.
>
>
> It isn't, it's the loop's responsibility. The problem is that one of the
> core design assumptions in the CPython interpreter implementation is
> that signals from the operating system get handled by the opcode eval
> loop in the main thread, and Ctrl-C is one of those signals.
>
> This is why "for x in itertools.cycle(): pass" can be interrupted, while
> "sum(itertools.cycle())" can't: in the latter case, the opcode eval loop
> isn't running, as we're inside a tight loop inside the sum() implementation.
>
> It's easy to say "Well those loops should all be checking for signals
> then", but I expect folks wouldn't actually like the consequences of
> doing something about it, as:
>
> 1. It will make those loops slower, due to the extra overhead of
> checking for signals (even the opcode eval loop includes all sorts of
> tricks to avoid actually checking for new signals, since doing so is
> relatively slow)
> 2. It will make those loops harder to maintain, since the high cost of
> checking for signals means the existing flat loops will need to be
> replaced with nested ones to reduce the per-iteration cost of the more
> expensive checks
The re module increments a counter on each iteration and checks for
signals when the bottom 12 bits are 0.
The regex module increments a 16-bit counter on each iteration and
checks for signals when it wraps around to 0.
> 3. It means making the signal checking even harder to reason about than
> it already is, since even C implemented methods that avoid invoking
> arbitrary Python code could now still end up checking for signals
>
> It's far from being clear to me that making such a change would actually
> be a net improvement, especially when there's an opportunity to mitigate
> the problem by having known-infinite iterators report themselves as such.
>
More information about the Python-ideas
mailing list