[Python-ideas] PEP 3156/Tulip: Extensible EventLoop interface
Guido van Rossum
guido at python.org
Mon Feb 4 20:02:57 CET 2013
I'm going to try and snip as much as I can to get the the heart of this...
On Sun, Feb 3, 2013 at 11:55 AM, Ben Darnell <ben at bendarnell.com> wrote:
> UDP is a real-life example from tornado - we don't have any built-in
> support for UDP, but people who need it have been able to build it without
> touching tornado itself. The same argument would apply to pipes or any
> number of other (admittedly much more esoteric) network protocols. I'll
> elaborate on the UDP example below.
Hm. UDP is relatively easy because it uses sockets. Pipes are harder
-- testing for the presence of add_reader (etc. -- I will leave this
off from now on) isn't enough, because select on Windows does not
support pipes.
Thinking about what you could mean by "more esoteric protocols",
there's really not much at the level of TCP and UDP that comes to
mind. UNIX-domain sockets, and perhaps the (root-only) protocol for
sniffing packets (raw sockets?).
A new feature just landed in Tulip (I still have to update PEP 3156)
where you can pass a pre-constructed socket object to
create_connection() and start_serving(), which will make it a little
easier to support esoteric ways of setting up the socket; however,
create_connection() is still limited to sockets that implement a byte
stream, because of the way the transport/protocol API works.
> Right. Third-party extensions to the event loop interface are inherently
> problematic, so we'll have to provide them in some other way. I'm proposing
> a pattern for that "some other way" and then realizing that I like it even
> for first-party interfaces.
Glad that is out of the way. But I'm still skeptical -- first, as I
explained before, I am actually in favor of using different styles for
1st and 3rd party interfaces, so the status of the interface used is
obvious to the reader (and the coder, in case they are copy-pasting
recipes :-); second, I don't expect there will be too many
opportunities to put the pattern at work.
Note that I'm only talking about 3rd party *interfaces* -- if a 3rd
party module implements a 1st party interface (e.g. the stream
transport/protocol interface specified in PEP 3156) it can just use
the event loop create_connection method (assuming it is also
implementing a new event loop -- otherwise what would be the point of
the 3rd party code?).
And note that even UDP requires a different interface between
transport and protocol -- e.g. the protocol method called should be
packet_received() rather than data_received(), and the protocol should
probably not implement write/writelines but send and send_multiple.
And the signatures of these methods will be different because
(depending on how you use UDP) you have to have a parameter for the
peer address.
And yet, implementing UDP as pure 3rd party code using add_reader is
simple, as long as the event loop supports add_reader. You just can't
use create_connection or start_serving -- but those are really just
convenience methods that are easily reimplemented. (We could refactor
the standard implementations to have more reusable parts, but we'd run
into the same problem as with add_reader -- while most UNIXy event
loops will easily support such refactorings, that's not the case with
event loops based on IOCP, other other libraries that don't naturally
offer add_reader functionality. (Not sure if that's the case for
libuv.)
All this makes me skeptical that a single API should be used to
register "transports". At the very least you will need different
registries for each distinct transport/protocol interface; in
addition, custom transports (even if they implement the same
transport/protocol interface) may have different constructor arguments
(e.g. consider plain TCP vs. SSL in Tulip).
> Suppose twisted did not have UDP support built in. Most reactor
> implementations subclass PosixReactorBase (with IOCPReactor as the notable
> exception). Twisted can add UDP support and implement listenUDP in
> PosixReactorBase and IOCPReactor, and suddenly most reactors (even
> third-party ones like TornadoReactor) support UDP for free.
If you say so. I don't know enough about Twisted's internals to verify
this claim. Depending on how things were factored I could easily
imagining something in PosixReactorBase making the assumption of a
stream protocol somewhere. In a stream protocol like TCP, it is safe
to collapse two consecutive sends into one, and to split one send into
multiples. But not for datagram protocols like UDP. In an ideal world,
knowledge of all this is completely left out of the reactor. But, in a
hypothetical world where Twisted only supported streams, who knows
whether that is done?
> Those that
> don't (a hypothetical LibUVReactor?) can implement it themselves and
> interoperate with everything else.
In practice I suspect that the number of 3rd party event loop
implementations that support add_reader and let a different 3rd
party's UDP implementation succeed will be vanishingly small. Even
smaller if you don't count the ones that are essentially clones or
subclasses of Tulip's UNIX support.
> If a third party wanted to add UDP support separately from twisted's release
> schedule, they can't do with an interface that is generically usable across
> all reactors. They could make a static function listenUDP() that works with
> any IReactorFDSet, and maybe special-case IOCPReactor, but then there'd be
> no way for a third-party LibUVReactor to participate.
This sounds unavoidable no matter how you refactor the interface.
There are potentially event loop implementations that don't use socket
objects at all. (And yes, those will have to reject the 'sock'
argument to create_connection and start_serving; and I have to change
start_serving's return type to be something other than a socket.) When
you implement a new 3rd party transport, you are pretty much
inevitably limiting yourself to a subset of event loops. That subset
won't be empty, and it will be sufficient for your purpose, but the
ideal of portability across all (or even most) event loops, including
ones that haven't been written yet, is unattainable. I certainly
haven't seen an indication that your proposed registry will address
this.
> add_reader is not very limiting except for its platform-specificity. It's
> possible to have a generic protocol across all posixy event loops and then
> special-case the small number of interesting non-posixy ones (or maybe there
> is some other class of methods that could be standardized for other
> platforms? Is there some set of methods analogous to add_reader that
> multiple IOCP-based loops could share?)
Not exactly analogous -- the whole point of IOCP is that it is not
"ready-based" but "completion-based". The sock_recv (etc.) methods on
the event loop are my attempt to suggest a way for other
completion-based event loops to open themselves up for new transport
implementations, but this is much more limiting than add_reader --
e.g. I suspect that IOCP will let you read from a named pipe, but you
must use a different library call than for receiving from a socket;
even receiving a packet from UDP will require a different method. This
issue doesn't exist in the same way for add_reader, because the system
call to do the read is not made by the event loop, it is made by the
transport.
> This version doesn't change much, it's mainly to set the stage for the
> following variations. However, it does have a few nice properties - it
> keeps the (public) event loop interface small and manageable, and callers
> don't need to touch actual event loop objects unless they want to have more
> than one.
Not quite -- the call_soon(), call_later() etc. functionality is also
exposed as event loop methods.
> From a stylistic perspective I like this style of interface more
> than using dozens of methods on the event loop object itself (even if those
> dozens of methods are still there but hidden as an implementation detail).
You can't argue about style. :-)
> When third-party modules get absorbed into the standard library, it's often
> possible to support both just by trying different imports until one works
> (unittest.mock vs mock, json vs simplejson, etc). Sometimes a module's
> interface gets cleaned up and rearranged in the process, but that seems to
> be less common. It would be nice if a third-party transport could get
> standardized and the only thing callers would need to change is their
> imports. However, this is a minor concern; as I wrote up this design I
> realized I liked it for first-party work even if there were no third-party
> modules to be consistent with.
You can't argue about style. :-)
> But libuv doesn't necessarily contain the transport creation function.
But a libuv-based PEP 3156-compliant event loop implementation must.
> The
> idea is that someone can propose a transport interface in a third-party
> module (mymodule.listen_udp in this example), implement it themselves for
> some event loop implementations, and other event loops can declare
> themselves compatible with it.
Aha! This is the executive summary of your proposal, or at least your
goal for it.
This is hypothesizing rather a lot of goodwill and coordination
between different 3rd party developers. And the registry offered by
the event loop comes down to not much more than a dictionary with keys
that follow a certain convention (e.g. fully-qualified package+module
name plus some identifier for the feature) and nothing can be said
about what the items stored in the registry are (since a packet
transport and a stream transport are not interchangeable, and even two
stream transports may not be).
Given that for each 3rd party transport the details of how to
implement a compatible version of it will vary hugely, both depending
on what the transport is trying to do and how the event loop works, I
expect that the market for this registry will be rather small. And
when a particular 3rd party transport wants to enable other 3rd party
events to support them, they can implement their own registry, which
the other 3rd party could then plug into. (But see below.)
> (And in an admittedly far-fetched scenario, if there were two third-party
> UDP interfaces and LibUVEventLoop implemented one of them, yet another party
> could build a bridge between the two, and then they'd plug it in with
> register_implementation)
I do see one argument in favor of having a standard registry on the
event loop, even if it's just a dict with register/lookup APIs and a
naming convention, and no semantics assigned to the items registered.
That argument is to make 3rd party transport implementers aware of the
possibility that some other 3rd party might want to offer a compatible
implementation aimed at an event loop that's not supported natively by
the (former) 3rd party transport. And I could even be convinced that
the standard protocols should use this registry so that the source
code serves as an example of best practices.
Still, it's a pretty weak argument IMO -- I don't expect there to be a
significant cottage industry cranking out 3rd party protocol
implementations, assuming we add UDP to PEP 3156, which is my plan.
And I don't think that create_connection() should be used to create
UDP connections -- the signature of the protocol factory passed in
would be quite different, for starters, and the set of options needed
to configure the transport is also different (assuming we want to
support both connected and connection-less UDP).
> I don't think the status quo prevents the development of third-party
> transports, but it does implicitly encourage two bad habits: A) adding
> methods to the event loop, inviting name collisions, or B) just building on
> add_reader and friends without thinking about non-posix platforms. Of
> course, no one expects a groundswell of third-party development at the event
> loop and transport level, so this could just be so much overengineering, but
> I like it from a stylistic perspective even without the third-party
> benefits.
Yeah, so that's the rub: I'm not so keen on adding extra machinery to
the PEP that I don't expect to be used much. I have more important
fish to fry (such as adding UDP :-). And adding a registry one whole
Python release cycle later (e.g. in 3.5, assuming PEP 3156 is
standardized and included in 3.4) doesn't strike me as such a bad
thing -- I don't think we're painting ourselves into much of a corner
by not having a registry right from the start.
--
--Guido van Rossum (python.org/~guido)
More information about the Python-ideas
mailing list