[Python-ideas] Membership of infinite iterators

Wed Oct 18 11:27:56 EDT 2017

On Wed, Oct 18, 2017 at 5:48 PM, Nick Coghlan <ncoghlan at gmail.com> wrote:

> On 18 October 2017 at 22:36, Koos Zevenhoven <k7hoven at gmail.com> wrote:
>
>> On Wed, Oct 18, 2017 at 2:08 PM, Nick Coghlan <ncoghlan at gmail.com> wrote:
>>
>>> That one can only be fixed in count() - list already checks
>>> operator.length_hint(), so implementing itertools.count.__length_hint__()
>>> to always raise an exception would be enough to handle the container
>>> constructor case.
>>>
>>
>> While that may be a convenient hack to solve some of the cases, maybe
>> it's possible for list(..) etc. to give Ctrl-C a chance every now and then?
>> (Without a noticeable performance penalty, that is.) That would also help
>> with *finite* C-implemented iterables that are just slow to turn into a
>> list.
>>
>> If I'm not mistaken, we're talking about C-implemented functions that
>> iterate over C-implemented iterators. It's not at all obvious to me that
>> it's the iterator that should handle Ctrl-C.
>>
>
> It isn't, it's the loop's responsibility. The problem is that one of the
> core design assumptions in the CPython interpreter implementation is that
> signals from the operating system get handled by the opcode eval loop in
> the main thread, and Ctrl-C is one of those signals.
>
> This is why "for x in itertools.cycle(): pass" can be interrupted, while
> "sum(itertools.cycle())" can't: in the latter case, the opcode eval loop
> isn't running, as we're inside a tight loop inside the sum() implementation.
>
> It's easy to say "Well those loops should all be checking for signals
> then", but I expect folks wouldn't actually like the consequences of doing
> something about it, as:
>
> 1. It will make those loops slower, due to the extra overhead of checking
> for signals (even the opcode eval loop includes all sorts of tricks to
> avoid actually checking for new signals, since doing so is relatively slow)
> 2. It will make those loops harder to maintain, since the high cost of
> checking for signals means the existing flat loops will need to be replaced
> with nested ones to reduce the per-iteration cost of the more expensive
> checks
>

Combining points 1 and 2, I don't believe nesting loops is strictly a
requirement.

> 3. It means making the signal checking even harder to reason about than it
> already is, since even C implemented methods that avoid invoking arbitrary
> Python code could now still end up checking for signals
>

So you're talking about code that would make a C-implemented Python
iterable of strictly C-implemented Python objects and then pass this to
something C-implemented like list(..) or sum(..), while expecting no Python
code to be run or signals to be checked anywhere while doing it. I'm not
really convinced that such code exists. But if such code does exist, it
sounds like the code is heavily dependent on implementation details.

>
> It's far from being clear to me that making such a change would actually
> be a net improvement, especially when there's an opportunity to mitigate
> the problem by having known-infinite iterators report themselves as such.
>
>

I'm not against that, per se. I just don't think that solves the quite
typical case of *finite* but very tedious or memory-consuming loops that
one might want to break out of. And raising an exception from
.__length_hint__() might also break some obscure, but completely valid,
operations on *infinite* iterables.

––Koos

> Cheers,
> Nick.
>
> --
> Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
>

-- 
+ Koos Zevenhoven + http://twitter.com/k7hoven +
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20171018/a7b0e142/attachment.html>