[Python-ideas] The async API of the future: Reactors

Mon Oct 15 01:08:42 CEST 2012

On Oct 15, 2012, at 12:05 AM, Guido van Rossum <guido at python.org> wrote:

> On Sun, Oct 14, 2012 at 2:55 PM, Rene Nejsum <rene at stranden.com> wrote:
>> 
>> On Oct 14, 2012, at 9:22 PM, Guido van Rossum <guido at python.org> wrote:
>> 
>>> On Sun, Oct 14, 2012 at 10:51 AM, Rene Nejsum <rene at stranden.com> wrote:
>>>> On the high level (Python) basically what you need is that the queue.get()
>>>> can handle:
>>>> 1) Python objects (as today)
>>>> 2) timeout (as today, maybe in mills instead of seconds)
>>>> 3) Network (socket input/state change)
>>>> 4) File desc input/state change
>>>> 5) Other I/O changes like serial comm, etc.
>>>> 6) Maybe also yield based coroutine support ?
>>>> 
>>>> This requires support from the underlaying
>>>> OS. A support which is probably not there today ?
>>>> 
>>>> As far as I can see, having this one extended queue.get() would nicely enable
>>>> all high level concurrency issues in Python.
>>> 
>>> [...]
>>> 
>>>> I believe a "super" queue.get() would solve all use cases.
>>>> 
>>>> I have no idea on how difficult it would be to implement in
>>>> a cross platform manner.
>>> 
>>> Hm. I know that a common (and often right!) recommendation for thread
>>> communication is to use the queue module. But that module is meant to
>>> work with threads. I think that the correct I/O primitives are more
>>> likely to come by looking at what Tornado and Twisted have done than
>>> by trying to "pimp up" the queue module -- it's good for what it does,
>>> but trying to add all that new functionality to it doesn't sound like
>>> a good fit.
>> 
>> You are probably right about the queue class. Maybe it should be a new class,
>> but I still believe I would be an excellent fit for doing concurrent stuff if Python
>> had a multiplexer message queue, Python is high-level enough to be able to
>> hide thread/select/read etc.
> 
> I believe that the Twisted and Tornado event loops have APIs to push
> work into a thread and/or process, and it will be a requirement for
> the new stdlib event loop. However the main focus of the current
> effort is not making the distinction between process, threads and
> tasks (or microthreads or coroutines) disappear -- it is simply to
> have the most useful API for tasks.
> 
>> A while ago I implemented pyworks (bitbucket.org/raindog/pyworks) which
>> is a kind of Erlang implementation for Python, making objects concurrent and return
>> values Futures, without adding much new code. Methods are sent asynchronous, simply
>> by doing standard obj.method(). obj is a proxy for the real object sending method() as a
>> message to the real object running in a separate thread. Return value is a Future. So
>> you can do
>> 
>>        val = obj.method()
>>        … continue async with method()
>>        … and do some other stuff, until:
>>        print val
>> 
>> which will hang waiting for the Future to complete, if it's not.
> 
> That sounds like implicit futures (to use the Wikipedia article's
> terminology). I'm not a big fan of that. In fact, I'm proposing an API
> where all task switching is explicit, using the yield keyword (or
> yield from), and accessing the value of a future is also explicit in
> such a system.

You are right, it's implicit. An I think I understand your concern, how
much should be hidden/implicit and how much should be left to the
programmer. IMHO Python is such an excellent tool, mainly
because it hides a lot of details. Things like Memory management, GC,
threads and concurrency should be (and - I believe - can be hidden for
the developer.

>> It has been used in a couple of projects, making it much easier to do concurrent systems.
>> But, it would be great if the object/task could wait for more events than queue.get()
> 
> I still think you're focused more on concurrent CPU activity than
> async I/O. These are quire different fields, even though they often
> use similar terminology (like future, task/thread/process,
> concurrent/parallel, spawn/join, queue). I think the keyword that most
> distinguishes them is "event". If you hear people talk about events
> they are probably multiplexing I/O, not CPU activities.

Yes and No. My field of concurrency and IO is process control, like
controlling high speed sorting machines with a lot of IO from 24V inputs,
scanners, scales, OCR, serial ports, etc. So for me it's a combination of concurrent IO,
state and parallelism (concurrent CPU). when you have an async (I/O) event,
you need some kind of concurrency to handle it at the next level.
It is difficult to do concurrent CPU activity without events, even if 
they are only signal events on a semaphore. 

One difference from ex. web servers is that we at design time, knows
exactly who many tasks we need and what the maximum load is going
to be. Typical between 50 to 100 tasks/threads sending messages to 
each other.

br
/Rene

> 
> -- 
> --Guido van Rossum (python.org/~guido)