[Python-ideas] The async API of the future: Reactors

Mon Oct 15 00:05:26 CEST 2012

On Sun, Oct 14, 2012 at 2:55 PM, Rene Nejsum <rene at stranden.com> wrote:
>
> On Oct 14, 2012, at 9:22 PM, Guido van Rossum <guido at python.org> wrote:
>
>> On Sun, Oct 14, 2012 at 10:51 AM, Rene Nejsum <rene at stranden.com> wrote:
>>> On the high level (Python) basically what you need is that the queue.get()
>>> can handle:
>>> 1) Python objects (as today)
>>> 2) timeout (as today, maybe in mills instead of seconds)
>>> 3) Network (socket input/state change)
>>> 4) File desc input/state change
>>> 5) Other I/O changes like serial comm, etc.
>>> 6) Maybe also yield based coroutine support ?
>>>
>>> This requires support from the underlaying
>>> OS. A support which is probably not there today ?
>>>
>>> As far as I can see, having this one extended queue.get() would nicely enable
>>> all high level concurrency issues in Python.
>>
>> [...]
>>
>>> I believe a "super" queue.get() would solve all use cases.
>>>
>>> I have no idea on how difficult it would be to implement in
>>> a cross platform manner.
>>
>> Hm. I know that a common (and often right!) recommendation for thread
>> communication is to use the queue module. But that module is meant to
>> work with threads. I think that the correct I/O primitives are more
>> likely to come by looking at what Tornado and Twisted have done than
>> by trying to "pimp up" the queue module -- it's good for what it does,
>> but trying to add all that new functionality to it doesn't sound like
>> a good fit.
>
> You are probably right about the queue class. Maybe it should be a new class,
> but I still believe I would be an excellent fit for doing concurrent stuff if Python
> had a multiplexer message queue, Python is high-level enough to be able to
> hide thread/select/read etc.

I believe that the Twisted and Tornado event loops have APIs to push
work into a thread and/or process, and it will be a requirement for
the new stdlib event loop. However the main focus of the current
effort is not making the distinction between process, threads and
tasks (or microthreads or coroutines) disappear -- it is simply to
have the most useful API for tasks.

> A while ago I implemented pyworks (bitbucket.org/raindog/pyworks) which
> is a kind of Erlang implementation for Python, making objects concurrent and return
> values Futures, without adding much new code. Methods are sent asynchronous, simply
> by doing standard obj.method(). obj is a proxy for the real object sending method() as a
> message to the real object running in a separate thread. Return value is a Future. So
> you can do
>
>         val = obj.method()
>         … continue async with method()
>         … and do some other stuff, until:
>         print val
>
> which will hang waiting for the Future to complete, if it's not.

That sounds like implicit futures (to use the Wikipedia article's
terminology). I'm not a big fan of that. In fact, I'm proposing an API
where all task switching is explicit, using the yield keyword (or
yield from), and accessing the value of a future is also explicit in
such a system.

> It has been used in a couple of projects, making it much easier to do concurrent systems.
> But, it would be great if the object/task could wait for more events than queue.get()

I still think you're focused more on concurrent CPU activity than
async I/O. These are quire different fields, even though they often
use similar terminology (like future, task/thread/process,
concurrent/parallel, spawn/join, queue). I think the keyword that most
distinguishes them is "event". If you hear people talk about events
they are probably multiplexing I/O, not CPU activities.

-- 
--Guido van Rossum (python.org/~guido)