On Thu, Oct 19, 2017 at 3:42 AM, Nick Coghlan <ncoghlan@gmail.com> wrote:

On 19 October 2017 at 08:34, Greg Ewing <greg.ewing@canterbury.ac.nz> wrote:
Nick Coghlan wrote:

since breaking up the current single level loops as nested loops would be a pre-requisite for allowing these APIs to check for signals while they're running while keeping the per-iteration overhead low

Is there really much overhead? Isn't it just checking a flag?

It's checking an atomically updated flag, so it forces CPU cache synchronisation, which means you don't want to be doing it on every iteration of a low level loop.

Even just that it's a C function call makes me not want to recommend doing it in a lot of tight loops. Who knows what the function does anyway, let alone what it might or might not do in the future. 

However, reviewing Serhiy's PR reminded me that PyErr_CheckSignals() already encapsulates the "Should this thread even be checking for signals in the first place?" logic, which means the code change to make the itertools iterators inherently interruptible with Ctrl-C is much smaller than I thought it would be.

​And if it didn't encapsulate that, you would probably have written a wrapper that does.​ Good thing it's the wrapper that's exposed in the API. 

That approach is also clearly safe from an exception handling point of view, since all consumer loops already need to cope with the fact that itr.__next__() may raise arbitrary exceptions (including KeyboardInterrupt).

So that change alone already offers a notable improvement, and combining it with a __length_hint__() implementation that keeps container constructors from even starting to iterate would go even further towards making the infinite iterators more user friendly.

Similar signal checking changes to the consumer loops would also be possible, but I don't think that's an either/or decision: changing the iterators means they'll be interruptible for any consumer, while changing the consumers would make them interruptible for any iterator, and having checks in both the producer & the consumer merely means that you'll be checking for signals twice every 65k iterations, rather than once.

​Indeed it's not strictly an either/or decision, but more about where we might spend time executing C code. But I'm leaning a bit towards doing it on the consumer side, because there it's more obvious that ​the code might take some time to run.  

If the consumer ends up iterating over pure-Python objects, there are no concerns about the overhead. But if it *does* call a C-implemented __next__, then that's the case where we actully need the whole thing. Adding the check in both places would double the (small) overhead. And nested (wrapped) iterators are also a thing.

​––Koos

+ Koos Zevenhoven + http://twitter.com/k7hoven +