On Mon, Oct 22, 2012 at 12:18 PM, Steve Dower <Steve.Dower@microsoft.com> wrote: [Quoting me]
For example, in CPython, at least, a callback that is called directly by the OS cannot call straight into Python anyway -- you have to acquire the GIL first. This pretty much means that an unconstrained callback directly from the OS cannot call straight into Python -- it has to put something into a queue, and the bytecode interpreter will eventuall call it (possibly in another thread). This is how signal handlers are invoked too.
I'm nervous about relying on the GIL like this, especially since many (most? all?) other interpreters often promote the fact that they don't have a GIL. In any case, it's an implementation detail - if the lock already exists, then we don't need to add another one, but it will need to be noted (in code comments) that we rely on keeping the GIL during the entire callback (which, as I'll go into more detail on later, I don't expect to be very long at all, ever).
Ok, forget the GIL (though PyPy has one). Anyway, the existing mechanism I was referring to does *not* guarantee that the callback keeps the GIL as long as it runs. The GIL is used to emulate preemptive scheduling while still protecting CPython's internal data structures from concurrent access. It makes no guarantees for user data. Even "x = d[key]" may release the GIL if the dict contains keys whose __eq__ is implemented in Python. But the crucial point of the mechanism is that you don't call straight into Python from the OS-level callback (which is written in C or some other low-level language). You arrange for the interpreter to call the Python-level callback at some later time. So you might as well use this to enforce single-threading, if that's the way of your world.
That's not to say that I want big heavy locks everywhere, but as soon as you potentially have two interrupt-scheduled pieces of code
If interrupt-scheduled means what I think it means, this can only be C code. For the Python callback, see above.
I basically meant it to mean any code running that interrupts the current code, whether because of a callback or preemption. Because of the GIL, you are right, but since arbitrary Python code could release the GIL at any time I don't think we could rely on it.
At least in CPython, it's not just the GIL. The queue I'm talking about above must exist even in a CPython version that has no threading support (and hence no GIL). You still cannot call into Python from a signal handler or other callback called directly by the OS kernel. You must delay it until the bytecode interpreter is at a good stopping point. Check out this code: http://hg.python.org/cpython/file/daad150b4670/Python/ceval.c#l496 (AddPendingCall and friends).
queuing to the same loop you need to synchronise access to the data structure. As soon as you get the state and result of a future non-atomically, you need synchronization. I don't doubt there are ways around this (CAS goes a long way, also the GIL will probably help, assuming it's all Python code), and the current implementation of Future is a bit on the heavy side (but also suitable for much more arbitrary uses), but I really believe that avoiding all locks is a bad idea.
I don't actually believe we should avoid all locks. I do believe that there should be a separate mechanism, likely OS-specific, whereby the "pure" async world and the "messy" threading world can hand off data to each other. It is probably unavoidable that the implementation of this mechanism touches a threading lock. But this does not mean that the rest of the "pure" world should need to use a Future class that touches threading locks.
We can achieve this by making the implementation of Future a property of the scheduler. So rather than using 'concurrent.futures.Future' to construct a new future, it could be 'concurrent.eventloop.get_current().Future()'. This way a user can choose a non-thread safe event loop if they know they don't need one (though I guess users/libraries could use a thread-safe Future deliberately when they know that a thread will be involved). This adds another level of optimization on top of the 'get_future_for' function I've already suggested, and does it without exposing any complexity to the user.
Yes, this sounds find. I note that the existing APIs already encourage leaving the creation of the Future to library code -- you don't construct a Future, typically, but call an executor's submit() method.
(Also, I don't consider cooperative multitasking to be "async" - async requires at least two simultaneous (or at least non-deterministically switching) tasks, whether these are CPU threads or hardware-controlled I/O.)
This sounds like a potentially fatal clash in terminology. In the way I use 'async', Twisted, Tornado and gevent certainly qualify, and all those have huge parts of their API where there is no non-deterministic switching in sight -- in fact, they all carefully fence off the part that does interact with threads. For example, the Twisted folks have argued that one of the big advantages of using Twisted's Deferred class is that while a callback is running, the state of the world remains constant (except for actions made by the callback itself, obviously).
What other term should we use to encompass this world view (which IMO is a perfectly valid abstraction for a lot of I/O-related concurrency)?
It depends on the significance of the callback. In my world view, the callback only ever schedules a task (or I sometime use the word 'continuation') in the main loop. Because the callback could run anywhere, it needs to synchronise the queue, but the continuation is going to run synchronously anyway, so it does not require any locks. (I included the with_options(f, callback_context=None) function to allow the continuation to run wherever the callback does, which _would_ require synchronization, but it also requires an explicit declaration by the developer that they know what they are doing.)
Hm. I guess you are talking about the low-level (or should I say OS-kernel-called) callback; most event frameworks for Python (except perhaps gevent?) use user-level callback extensively -- in fact that's where Twisted wants you to do all the work. So, again a clash of terminology... (Aside: please don't use 'continuation' for 'task'. The use of this term in Scheme has forever tainted the word for me.)
IIUC you can do this on Windows with IOCP too, simply by only having a single thread reading events.
Yes, but unless you run all subsequent code on the IOCP thread (thereby blocking any more completions) you need to schedule it back to another thread. This requires synchronization.
It does sound like this may be unique to Windows, or at least not shared with most of the UNIX world (UNIX ports of IOCP notwithstanding).
IOCP looks like a solution to a problem that was so common they shared it with everyone (I don't say it _IS_ a solution, because I know nothing about its history and I have to be careful of anything I say being taken as fact). You can create threads in any OS to wait for blocking I/O, so it's probably most accurate to say it's unique to IOCP or threadpools in general. Again, it's an implementation detail that doesn't change the public API, which is required to execute continuations within the event loop.
So maybe IOCP is not all that relevant. Very early on in this discussion, IOCP was brought up as an important example of a system for async I/O that had a significantly *different* API than the typical select/poll/etc.-based systems found on UNIX platforms. But its relevance may well decompose into a few separable concerns: - Don't assume everything is a file descriptor. - On some systems, the natural way to do async I/O is *not* to wait until the socket (or other event source) is ready, but to ask it to perform a specific operation in "overlapping" (or async) mode, and you will get an event back when it is done. - Event queues are powerful. - You cannot ignore threads everywhere.
However, as you pointed out earlier, if all you are trying to achieve is "pure" coroutines, then YF is perfectly appropriate. But this is because of the high level of cooperation required between the involved tasklets. As I understand it, coroutines gain me nothing once I call into a long OpenCV operation, because OpenCV does not know that it is supposed to yield occasionally (or substitute any library for OpenCV). Coroutines are great for within a program, but they don't extend so well into libraries, and certainly provide no compatibility with existing ones (whereas, at worst, I can always write "yield thread_pool_executor.queue(cv.do_something, params)" with @async with any existing library [except maybe a threading library... don't take that "any" too literally]).
I don't know what OpenCV is, but assuming it is something that doesn't know about YF, then it needs to run in a thread of its own (or a threadpool). It is perfectly possible to add a primitive operation to the YF scheduler that says "run this in a threadpool and wake me up when it produces a result". The public API for that primitive can certainly use YF itself -- the messing interface with threads can be completely hidden from view. IMO YF scheduler worth using for real work must provide such a primitive (it was one of the first things I had to do in my own prototype, to be able to call socket.getaddrinfo()).
Here's that violent agreement again :) I think this may be a difference of opinion on API design: with @async the user never needs to touch the scheduler directly. All they need are tools that are already in the standard library - threads and futures - and presumably the new set of *_async() functions we will add. The only new thing to learn is @async (and for advanced users, with_options() and YF, but having taught Python to classes of undergraduates I can guarantee that not everyone needs these).
But @async must imported from *somewhere*, and that's where the decisions are made on how the scheduler works. If you want to use a different scheduler you still have to import a different @async. (TBH I don't understand your with_options() thing. If that's how you propose switching scheduler implementations, there's still a default behavior that you'd have to change on a per-call basis.) And about threads and futures: I am making a principled stance that you shouldn't have to use threads, and you shouldn't have to use a future implementation that's tied to threads. But maybe we should hear from some Twisted folks... -- --Guido van Rossum (python.org/~guido)