Re: [Python-ideas] The async API of the future: Reactors
On Oct 14, 2012, at 9:22 PM, Guido van Rossum
On Sun, Oct 14, 2012 at 10:51 AM, Rene Nejsum
wrote: On the high level (Python) basically what you need is that the queue.get() can handle: 1) Python objects (as today) 2) timeout (as today, maybe in mills instead of seconds) 3) Network (socket input/state change) 4) File desc input/state change 5) Other I/O changes like serial comm, etc. 6) Maybe also yield based coroutine support ?
This requires support from the underlaying OS. A support which is probably not there today ?
As far as I can see, having this one extended queue.get() would nicely enable all high level concurrency issues in Python.
[...]
I believe a "super" queue.get() would solve all use cases.
I have no idea on how difficult it would be to implement in a cross platform manner.
Hm. I know that a common (and often right!) recommendation for thread communication is to use the queue module. But that module is meant to work with threads. I think that the correct I/O primitives are more likely to come by looking at what Tornado and Twisted have done than by trying to "pimp up" the queue module -- it's good for what it does, but trying to add all that new functionality to it doesn't sound like a good fit.
You are probably right about the queue class. Maybe it should be a new class, but I still believe I would be an excellent fit for doing concurrent stuff if Python had a multiplexer message queue, Python is high-level enough to be able to hide thread/select/read etc. A while ago I implemented pyworks (bitbucket.org/raindog/pyworks) which is a kind of Erlang implementation for Python, making objects concurrent and return values Futures, without adding much new code. Methods are sent asynchronous, simply by doing standard obj.method(). obj is a proxy for the real object sending method() as a message to the real object running in a separate thread. Return value is a Future. So you can do val = obj.method() … continue async with method() … and do some other stuff, until: print val which will hang waiting for the Future to complete, if it's not. It has been used in a couple of projects, making it much easier to do concurrent systems. But, it would be great if the object/task could wait for more events than queue.get() br /Rene
-- --Guido van Rossum (python.org/~guido)
On Sun, Oct 14, 2012 at 2:55 PM, Rene Nejsum
On Oct 14, 2012, at 9:22 PM, Guido van Rossum
wrote: On Sun, Oct 14, 2012 at 10:51 AM, Rene Nejsum
wrote: On the high level (Python) basically what you need is that the queue.get() can handle: 1) Python objects (as today) 2) timeout (as today, maybe in mills instead of seconds) 3) Network (socket input/state change) 4) File desc input/state change 5) Other I/O changes like serial comm, etc. 6) Maybe also yield based coroutine support ?
This requires support from the underlaying OS. A support which is probably not there today ?
As far as I can see, having this one extended queue.get() would nicely enable all high level concurrency issues in Python.
[...]
I believe a "super" queue.get() would solve all use cases.
I have no idea on how difficult it would be to implement in a cross platform manner.
Hm. I know that a common (and often right!) recommendation for thread communication is to use the queue module. But that module is meant to work with threads. I think that the correct I/O primitives are more likely to come by looking at what Tornado and Twisted have done than by trying to "pimp up" the queue module -- it's good for what it does, but trying to add all that new functionality to it doesn't sound like a good fit.
You are probably right about the queue class. Maybe it should be a new class, but I still believe I would be an excellent fit for doing concurrent stuff if Python had a multiplexer message queue, Python is high-level enough to be able to hide thread/select/read etc.
I believe that the Twisted and Tornado event loops have APIs to push work into a thread and/or process, and it will be a requirement for the new stdlib event loop. However the main focus of the current effort is not making the distinction between process, threads and tasks (or microthreads or coroutines) disappear -- it is simply to have the most useful API for tasks.
A while ago I implemented pyworks (bitbucket.org/raindog/pyworks) which is a kind of Erlang implementation for Python, making objects concurrent and return values Futures, without adding much new code. Methods are sent asynchronous, simply by doing standard obj.method(). obj is a proxy for the real object sending method() as a message to the real object running in a separate thread. Return value is a Future. So you can do
val = obj.method() … continue async with method() … and do some other stuff, until: print val
which will hang waiting for the Future to complete, if it's not.
That sounds like implicit futures (to use the Wikipedia article's terminology). I'm not a big fan of that. In fact, I'm proposing an API where all task switching is explicit, using the yield keyword (or yield from), and accessing the value of a future is also explicit in such a system.
It has been used in a couple of projects, making it much easier to do concurrent systems. But, it would be great if the object/task could wait for more events than queue.get()
I still think you're focused more on concurrent CPU activity than async I/O. These are quire different fields, even though they often use similar terminology (like future, task/thread/process, concurrent/parallel, spawn/join, queue). I think the keyword that most distinguishes them is "event". If you hear people talk about events they are probably multiplexing I/O, not CPU activities. -- --Guido van Rossum (python.org/~guido)
On Oct 15, 2012, at 12:05 AM, Guido van Rossum
On Sun, Oct 14, 2012 at 2:55 PM, Rene Nejsum
wrote: On Oct 14, 2012, at 9:22 PM, Guido van Rossum
wrote: On Sun, Oct 14, 2012 at 10:51 AM, Rene Nejsum
wrote: On the high level (Python) basically what you need is that the queue.get() can handle: 1) Python objects (as today) 2) timeout (as today, maybe in mills instead of seconds) 3) Network (socket input/state change) 4) File desc input/state change 5) Other I/O changes like serial comm, etc. 6) Maybe also yield based coroutine support ?
This requires support from the underlaying OS. A support which is probably not there today ?
As far as I can see, having this one extended queue.get() would nicely enable all high level concurrency issues in Python.
[...]
I believe a "super" queue.get() would solve all use cases.
I have no idea on how difficult it would be to implement in a cross platform manner.
Hm. I know that a common (and often right!) recommendation for thread communication is to use the queue module. But that module is meant to work with threads. I think that the correct I/O primitives are more likely to come by looking at what Tornado and Twisted have done than by trying to "pimp up" the queue module -- it's good for what it does, but trying to add all that new functionality to it doesn't sound like a good fit.
You are probably right about the queue class. Maybe it should be a new class, but I still believe I would be an excellent fit for doing concurrent stuff if Python had a multiplexer message queue, Python is high-level enough to be able to hide thread/select/read etc.
I believe that the Twisted and Tornado event loops have APIs to push work into a thread and/or process, and it will be a requirement for the new stdlib event loop. However the main focus of the current effort is not making the distinction between process, threads and tasks (or microthreads or coroutines) disappear -- it is simply to have the most useful API for tasks.
A while ago I implemented pyworks (bitbucket.org/raindog/pyworks) which is a kind of Erlang implementation for Python, making objects concurrent and return values Futures, without adding much new code. Methods are sent asynchronous, simply by doing standard obj.method(). obj is a proxy for the real object sending method() as a message to the real object running in a separate thread. Return value is a Future. So you can do
val = obj.method() … continue async with method() … and do some other stuff, until: print val
which will hang waiting for the Future to complete, if it's not.
That sounds like implicit futures (to use the Wikipedia article's terminology). I'm not a big fan of that. In fact, I'm proposing an API where all task switching is explicit, using the yield keyword (or yield from), and accessing the value of a future is also explicit in such a system.
You are right, it's implicit. An I think I understand your concern, how much should be hidden/implicit and how much should be left to the programmer. IMHO Python is such an excellent tool, mainly because it hides a lot of details. Things like Memory management, GC, threads and concurrency should be (and - I believe - can be hidden for the developer.
It has been used in a couple of projects, making it much easier to do concurrent systems. But, it would be great if the object/task could wait for more events than queue.get()
I still think you're focused more on concurrent CPU activity than async I/O. These are quire different fields, even though they often use similar terminology (like future, task/thread/process, concurrent/parallel, spawn/join, queue). I think the keyword that most distinguishes them is "event". If you hear people talk about events they are probably multiplexing I/O, not CPU activities.
Yes and No. My field of concurrency and IO is process control, like controlling high speed sorting machines with a lot of IO from 24V inputs, scanners, scales, OCR, serial ports, etc. So for me it's a combination of concurrent IO, state and parallelism (concurrent CPU). when you have an async (I/O) event, you need some kind of concurrency to handle it at the next level. It is difficult to do concurrent CPU activity without events, even if they are only signal events on a semaphore. One difference from ex. web servers is that we at design time, knows exactly who many tasks we need and what the maximum load is going to be. Typical between 50 to 100 tasks/threads sending messages to each other. br /Rene
-- --Guido van Rossum (python.org/~guido)
On Sun, Oct 14, 2012 at 4:08 PM, Rene Nejsum
On Oct 15, 2012, at 12:05 AM, Guido van Rossum
wrote:
[...]
That sounds like implicit futures (to use the Wikipedia article's terminology). I'm not a big fan of that. In fact, I'm proposing an API where all task switching is explicit, using the yield keyword (or yield from), and accessing the value of a future is also explicit in such a system.
You are right, it's implicit. An I think I understand your concern, how much should be hidden/implicit and how much should be left to the programmer. IMHO Python is such an excellent tool, mainly because it hides a lot of details. Things like Memory management, GC, threads and concurrency should be (and - I believe - can be hidden for the developer.
I don't think you can hide threads or concurrency. You can offer different APIs to work with them that have different advantages and disadvantages, but I don't think you can *hide* them any more than you can hide language constructs like classes or sequences.
I still think you're focused more on concurrent CPU activity than async I/O. These are quire different fields, even though they often use similar terminology (like future, task/thread/process, concurrent/parallel, spawn/join, queue). I think the keyword that most distinguishes them is "event". If you hear people talk about events they are probably multiplexing I/O, not CPU activities.
Yes and No. My field of concurrency and IO is process control, like controlling high speed sorting machines with a lot of IO from 24V inputs, scanners, scales, OCR, serial ports, etc. So for me it's a combination of concurrent IO, state and parallelism (concurrent CPU). when you have an async (I/O) event, you need some kind of concurrency to handle it at the next level. It is difficult to do concurrent CPU activity without events, even if they are only signal events on a semaphore.
Can you do it with threads? Because if threads serve your purpose, they are probably easier to use than the async API we're considering here, especially given your desire to hide unnecessary details. The async APIs under consideration (Twisted, Tornado, coroutines) all intentionally makes task switching explicit. You may also consider greenlets/gevent, which is a compromise that makes task-switching semi-explicit -- only certain calls cause task switches, but those calls may be hidden inside other calls (or even overloaded operations like __getattr__).
One difference from ex. web servers is that we at design time, knows exactly who many tasks we need and what the maximum load is going to be. Typical between 50 to 100 tasks/threads sending messages to each other.
That does sound like threads are just fine for you. Of course you may have to craft your own synchronization primitives out of the lower-level locks and queues offered by the stdlib... -- --Guido van Rossum (python.org/~guido)
On Mon, Oct 15, 2012 at 1:26 AM, Guido van Rossum
I don't think you can hide threads or concurrency. You can offer different APIs to work with them that have different advantages and disadvantages, but I don't think you can *hide* them any more than you can hide language constructs like classes or sequences.
+1. Nice APIs to put padding on the sharp edges, yes. Hiding them? IMHO, usually a mistake. lvh
participants (3)
-
Guido van Rossum
-
Laurens Van Houtven
-
Rene Nejsum