[Python-ideas] Membership of infinite iterators

Thu Oct 19 03:54:36 EDT 2017

On Thu, Oct 19, 2017 at 3:42 AM, Nick Coghlan <ncoghlan at gmail.com> wrote:

> On 19 October 2017 at 08:34, Greg Ewing <greg.ewing at canterbury.ac.nz>
> wrote:
>
>> Nick Coghlan wrote:
>>
>>> since breaking up the current single level loops as nested loops would
>>> be a pre-requisite for allowing these APIs to check for signals while
>>> they're running while keeping the per-iteration overhead low
>>>
>>
>> Is there really much overhead? Isn't it just checking a flag?
>>
>
> It's checking an atomically updated flag, so it forces CPU cache
> synchronisation, which means you don't want to be doing it on every
> iteration of a low level loop.
>
>
Even just that it's a C function call makes me not want to recommend doing
it in a lot of tight loops. Who knows what the function does anyway, let
alone what it might or might not do in the future.

> However, reviewing Serhiy's PR reminded me that PyErr_CheckSignals()
> already encapsulates the "Should this thread even be checking for signals
> in the first place?" logic, which means the code change to make the
> itertools iterators inherently interruptible with Ctrl-C is much smaller
> than I thought it would be.
>

And if it didn't encapsulate that, you would probably have written a
wrapper that does. Good thing it's the wrapper that's exposed in the API.

> That approach is also clearly safe from an exception handling point of
> view, since all consumer loops already need to cope with the fact that
> itr.__next__() may raise arbitrary exceptions (including KeyboardInterrupt).
>
>
So that change alone already offers a notable improvement, and combining it
> with a __length_hint__() implementation that keeps container constructors
> from even starting to iterate would go even further towards making the
> infinite iterators more user friendly.
>
> Similar signal checking changes to the consumer loops would also be
> possible, but I don't think that's an either/or decision: changing the
> iterators means they'll be interruptible for any consumer, while changing
> the consumers would make them interruptible for any iterator, and having
> checks in both the producer & the consumer merely means that you'll be
> checking for signals twice every 65k iterations, rather than once.
>
>
Indeed it's not strictly an either/or decision, but more about where we
might spend time executing C code. But I'm leaning a bit towards doing it
on the consumer side, because there it's more obvious that the code might
take some time to run.

If the consumer ends up iterating over pure-Python objects, there are no
concerns about the overhead. But if it *does* call a C-implemented
__next__, then that's the case where we actully need the whole thing.
Adding the check in both places would double the (small) overhead. And
nested (wrapped) iterators are also a thing.

––Koos

-- 
+ Koos Zevenhoven + http://twitter.com/k7hoven +
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20171019/1ead5c5a/attachment.html>