[Python-ideas] Async API

Guido van Rossum guido at python.org
Thu Oct 25 19:58:08 CEST 2012


On Thu, Oct 25, 2012 at 9:10 AM, Yury Selivanov <yselivanov.ml at gmail.com> wrote:
> One question: what do we actually want to get?  What're the goals?

Good question. I'm still in the requirements gathering phase myself.

> - A specification (PEP?) of how to make stdlib more async-friendly?

That's one of the hopeful goals, but a lot of things need to be
decided before we can start adapting the stdlib. It is also likely
that this will be a process that will take several release (and may
never finish completely).

> - To develop a separate library that may be included in the stdlib
> one day?

That's one way I am pursuing and I hope others will too.

> - And what's your opinion on writing a PEP about making it possible
> to pass a custom socket-factory to stdlib objects?

That sounds like it might be jumping to a specific solution. I agree
that the stdlib often, unfortunately, couples classes too tightly,
where a class that needs an instance of another class just
instantiates that other class rather than having an instance passed in
(at least as an option). We're doing better with files these days --
most APIs (that I can think of) that work with streams let you pass
one in. So maybe you're on to something. Perhaps, as a step towards
the exploration of this PEP, you could come up with a concrete list of
modules and classes (or other API elements) that you think would
benefit from being able to pass in a socket? Please start another
thread -- python-ideas is fine. I will read it.

> I'm (and I think it's not just me) a bit lost here, after reading 100s
> of emails on python-ideas.  And I just want to know where to channel my
> energy and expertise ;)

Totally understood. I'm overwhelmed myself by the vast array of
options. Still, I have been writing some experimental code myself, and
I am beginning to understand in which direction I'd like to move.

I am thinking of having a strict separation between an event loop, a
task scheduler, specific transports, and protocol implementations.

- The event loop in turn separates into a component that knows how to
poll for I/O (or other) events using the best mechanism available on
the platform, and a part that manages callback functions -- these are
closely tied together, but the idea is that the callback management
part does not have to vary by platform, so only the I/O polling needs
to be a platform-specific. Details subject to bikeshedding (I've only
got something working on Linux and OSX so far). One of the
requirements for this event loop is that it should be possible to run
frameworks like Twisted or Tornado using an adapter to it, and it
should also be possible for Twisted/Tornado/etc. to provide their own
event loop (again via some kind of adaptation) to replace the default
one.

- For the task scheduler I am piling all my hopes on PEP-380, i.e.
yield from. I have not found a single thing that is harder to do using
this style than using the PEP-342 yield <future> style, and I really
don't like mixing the two up (despite what Steve Dower says :-). But I
don't want the event loop interface to know about this at all --
howver the scheduler has to know about the event loop (at least its
interface). I am currently refactoring my ideas in this area; I think
I'll end up with a Task object that smells a bit like a Future, but
represents a whole stack of generator invocations linked via
yield-from, and which allows suspension of the entire stack at once;
user code only needs to use Tasks when it wants to schedule multiple
activities concurrently, not when it just wants to be able to yield.
(This may be the core insight in favor of PEP 380.)

- Transports (e.g. TCP): I feel like a newbie here. I know sockets
pretty well, but the key is to introduce abstractions that let you
easily replace a transport with a different one -- e.g. TCP vs. pipes
vs. SSL. Twisted clearly has paved the way here -- even if we end up
slicing the abstractions somewhat differently, the road to the optimal
interface has to take the same road that Twisted took -- implement a
simple transport using sockets, then add another transport, refactor
the abstractions to share the commonalities and separate the
differences, then try adding yet another transport, rinse and repeat.
We should provide a bunch of common transports but also let people
build new ones; however, there will probably be way fewer transport
implementations than protocol implementations.

- Protocols (e.g. HTTP): A protocol should ideally be able to work
with any transport (though obviously some protocols require certain
transport extensions -- hopefully we'll have a small hierarchy of
abstract classes defining different transport styles and
capabilities). We should provide a bunch of common protocols (e.g. a
good HTTP client and server) but this is where users will most often
be writing their own -- so the APIs used by protocol implementations
must be documented especially well, the standard protocol
implementations must be examples of excellent coding style, and the
transport implementations should not let protocol implementations get
away with undefined behavior. It would be useful to have explicit
testing support too -- just like there's a WSGI validator, we could
have a protocol validator that acts like a particularly picky
transport. (I found this idea in a library written by Jim Fulton for
Zope, I think it's zope.ngi. It's a valuable idea.)

I think it's inevitable that the choice of using PEP-380 will be
reflected in the abstract classes defining transports and protocols.
Hopefully we will be able to bridge between the PEP-380 world and
Twisted's world of Deferred somehow -- the event loop is one interface
layer, but I think we can build adapters for the other levels as well
(at least for transports).

One final thought: async WSGI anyone?

-- 
--Guido van Rossum (python.org/~guido)



More information about the Python-ideas mailing list