Personally, I'm interested in designing a system, including an event loop, where you can rely on the properties of cooperative scheduling to avoid ever touching (OS) threading locks. I think such a system should be "pure" and all interaction with threads should be mediated by the event loop. (It's okay if this means that the implementation of the event loop must at some point acquire a threading lock.) The Futures used by the tasks to coordinate amongst themselves should not require locking -- they should themselves be able to rely on the guarantees of the event loop not to invoke multiple callbacks in parallel.
Unfortunately, a "pure" system means that no async operation can ever have an OS provided callback (or one that comes from outside the world of the scheduler). The purity in this case becomes infectious and limits what operations can be continued from(/waited on/blocked on/yielded/etc.). Only code invoked by the loop could schedule other code for that loop, whether by modifying a queue or setting a Future. This kind of system does not help with callback-based I/O.
I'm curious what the Twisted folks have to say about this. Or the folks using gevent.
So am I, but my guess would be that as long as you stay within their 'world' everything is fine (I haven't seen any Twisted code to make me believe otherwise, but happy to accept examples - I have no experience with it directly, though I believe I've used similar concepts before). This is fine for a library or framework, but I don't think it's appropriate for a standard library - maybe this is where our views differ?
I think your world view is colored by Windows; that's fine, we need input from experienced Windows users. But I can certainly imagine other ways of dealing with this.
Coloured by threads is probably more accurate, but then again, throwing threads around wildly is definitely a Windows thing :). I also have a background in microcontrollers, including writing my own pre-emptive and cooperative schedulers that worked with external devices, so I'm trying to draw on that as much as my Windows experience.
For example, in CPython, at least, a callback that is called directly by the OS cannot call straight into Python anyway -- you have to acquire the GIL first. This pretty much means that an unconstrained callback directly from the OS cannot call straight into Python -- it has to put something into a queue, and the bytecode interpreter will eventuall call it (possibly in another thread). This is how signal handlers are invoked too.
I'm nervous about relying on the GIL like this, especially since many (most? all?) other interpreters often promote the fact that they don't have a GIL. In any case, it's an implementation detail - if the lock already exists, then we don't need to add another one, but it will need to be noted (in code comments) that we rely on keeping the GIL during the entire callback (which, as I'll go into more detail on later, I don't expect to be very long at all, ever).
That's not to say that I want big heavy locks everywhere, but as soon as you potentially have two interrupt-scheduled pieces of code
If interrupt-scheduled means what I think it means, this can only be C code. For the Python callback, see above.
I basically meant it to mean any code running that interrupts the current code, whether because of a callback or preemption. Because of the GIL, you are right, but since arbitrary Python code could release the GIL at any time I don't think we could rely on it.
queuing to the same loop you need to synchronise access to the data structure. As soon as you get the state and result of a future non-atomically, you need synchronization. I don't doubt there are ways around this (CAS goes a long way, also the GIL will probably help, assuming it's all Python code), and the current implementation of Future is a bit on the heavy side (but also suitable for much more arbitrary uses), but I really believe that avoiding all locks is a bad idea.
I don't actually believe we should avoid all locks. I do believe that there should be a separate mechanism, likely OS-specific, whereby the "pure" async world and the "messy" threading world can hand off data to each other. It is probably unavoidable that the implementation of this mechanism touches a threading lock. But this does not mean that the rest of the "pure" world should need to use a Future class that touches threading locks.
We can achieve this by making the implementation of Future a property of the scheduler. So rather than using 'concurrent.futures.Future' to construct a new future, it could be 'concurrent.eventloop.get_current().Future()'. This way a user can choose a non-thread safe event loop if they know they don't need one (though I guess users/libraries could use a thread-safe Future deliberately when they know that a thread will be involved). This adds another level of optimization on top of the 'get_future_for' function I've already suggested, and does it without exposing any complexity to the user.
(Also, I don't consider cooperative multitasking to be "async" - async requires at least two simultaneous (or at least non-deterministically switching) tasks, whether these are CPU threads or hardware-controlled I/O.)
This sounds like a potentially fatal clash in terminology. In the way I use 'async', Twisted, Tornado and gevent certainly qualify, and all those have huge parts of their API where there is no non-deterministic switching in sight -- in fact, they all carefully fence off the part that does interact with threads. For example, the Twisted folks have argued that one of the big advantages of using Twisted's Deferred class is that while a callback is running, the state of the world remains constant (except for actions made by the callback itself, obviously).
What other term should we use to encompass this world view (which IMO is a perfectly valid abstraction for a lot of I/O-related concurrency)?
It depends on the significance of the callback. In my world view, the callback only ever schedules a task (or I sometime use the word 'continuation') in the main loop. Because the callback could run anywhere, it needs to synchronise the queue, but the continuation is going to run synchronously anyway, so it does not require any locks. (I included the with_options(f, callback_context=None) function to allow the continuation to run wherever the callback does, which _would_ require synchronization, but it also requires an explicit declaration by the developer that they know what they are doing.)
IIUC you can do this on Windows with IOCP too, simply by only having a single thread reading events.
Yes, but unless you run all subsequent code on the IOCP thread (thereby blocking any more completions) you need to schedule it back to another thread. This requires synchronization.
It does sound like this may be unique to Windows, or at least not shared with most of the UNIX world (UNIX ports of IOCP notwithstanding).
IOCP looks like a solution to a problem that was so common they shared it with everyone (I don't say it _IS_ a solution, because I know nothing about its history and I have to be careful of anything I say being taken as fact). You can create threads in any OS to wait for blocking I/O, so it's probably most accurate to say it's unique to IOCP or threadpools in general. Again, it's an implementation detail that doesn't change the public API, which is required to execute continuations within the event loop.
However, as you pointed out earlier, if all you are trying to achieve is "pure" coroutines, then YF is perfectly appropriate. But this is because of the high level of cooperation required between the involved tasklets. As I understand it, coroutines gain me nothing once I call into a long OpenCV operation, because OpenCV does not know that it is supposed to yield occasionally (or substitute any library for OpenCV). Coroutines are great for within a program, but they don't extend so well into libraries, and certainly provide no compatibility with existing ones (whereas, at worst, I can always write "yield thread_pool_executor.queue(cv.do_something, params)" with @async with any existing library [except maybe a threading library... don't take that "any" too literally]).
I don't know what OpenCV is, but assuming it is something that doesn't know about YF, then it needs to run in a thread of its own (or a threadpool). It is perfectly possible to add a primitive operation to the YF scheduler that says "run this in a threadpool and wake me up when it produces a result". The public API for that primitive can certainly use YF itself -- the messing interface with threads can be completely hidden from view. IMO YF scheduler worth using for real work must provide such a primitive (it was one of the first things I had to do in my own prototype, to be able to call socket.getaddrinfo()).
Here's that violent agreement again :) I think this may be a difference of opinion on API design: with @async the user never needs to touch the scheduler directly. All they need are tools that are already in the standard library - threads and futures - and presumably the new set of *_async() functions we will add. The only new thing to learn is @async (and for advanced users, with_options() and YF, but having taught Python to classes of undergraduates I can guarantee that not everyone needs these). Cheers, Steve