[Python-ideas] Tulip / PEP 3156 - subprocess events

Thu Jan 17 20:10:57 CET 2013

(I'm responding to two separate messages in one response.)

On Thu, Jan 17, 2013 at 4:23 AM, Paul Moore <p.f.moore at gmail.com> wrote:
> OK, I'm reading the PEP through now. I'm happy with the basics of the
> event loop, and it seems fine to me. When I reached create_transport,
> I had to skip ahead to the definitions of transport and protocol, as
> create_transport makes no sense if you don't know about those.

Whoops, I should fix the order in the PEP, or at least insert forward
references.

> Once
> I've read that, though, the whole transport/protocol mechanism seems
> to make reasonable sense to me. Although the host and port arguments
> to create_transport are clearly irrelevant to the case of a transport
> managing a process as a data source. So (a) I see why you say I'd need
> a new transport creation method, but (b) it strikes me that something
> more general that covered both cases (and any others that may come up
> later) would be better.

This is why there is a TBD item suggesting to rename
create_transport() to create_connection() -- this method is for
creating the most common type of transport only, i.e. one that
connects a client to a server given by host and port.

> On the other hand, given the existence of create_transport, I'm now
> struggling to understand why a user would ever use
> add_reader/add_writer rather than using a transport/protocol. And if
> they do have a reason to do so, why does a similar reason not apply to
> having an add_pipe type of method for waiting on (subprocess) pipes?

add_reader and friends exist for the benefit of Transport
implementations. The PEP even says that not all event loops need to
implement these (though on UNIXy systems it is better if they do, and
I am considering removing or weakening this language.

Because on UNIX pipes are just file descriptors, and work fine with
select()/poll()/etc., there is no need for add_pipe() (assuming that
API would take an existing pipe filedescriptor and a callback), since
add_reader() will do the right thing. (Or add_writer() for the other
end.)

> In general, it still feels to me like the socket use case is being
> treated as "special", and other data sources and sinks (subprocesses
> being my use case, but I'm sure others exist) are either second-class
> or require a whole set of their own specialised methods, which isn't
> practical.

Well, sockets are treated special because on Windows they *are*
special. At least the select() system call only works for sockets.
IOCP supports other types of unusual handles, but the ways to create
handles you can use with it are mostly custom.

Basically, if you want to write code that works both on Windows and on
UNIX, you have to limit yourself to sockets. (And you shouldn't use
add_reader and friends either, because that limits you to the
SelectSelector, whereas if you use the transport/protocol API you will
be compatible with either that or IOCPSelector.)

> As a strawman type of argument in favour of extensibility, consider a
> very specialist user with a hardware device that sends input via (say)
> a serial port. I can easily imagine that user wanting to plug his
> device data into the Python event loop. As this is a very specialised
> area, I wouldn't expect the core code to be able to help, but I would
> expect him to be able to write code that plugs into the standard event
> loop seamlessly. Ideally, I'd like to use the subprocess case as a
> proof that this is practical.
>
> Does that make sense?

Yes, it does make sense, but you have to choose whether to do it on
Windows or on UNIX. If you use UNIX, presumably your serial port is
accessible via a file descriptor that works with select/poll/etc. --
if it doesn't, you are going to have a really hard time integrating it
with the event loop, you may have to use a separate thread that talks
to the device and sends the data to the event loop over a pipe or
something. On Windows, I have no idea how it would work, but I presume
that serial port drivers are somehow hooked up to "handles" and
"waitable events" (or whatever the Microsoft terminology is -- I am
about to get educated about this) and then presumably it will
integrate nicely with IOCP (but not with Select).

I think that for UNIX, hooking a subprocess up to a transport should
be easy enough (except perhaps for the stdout/stderr distinction), and
your transport should use add_reader/writer. For Windows I am not sure
but you can probably crib the details from the Windows-specific code
in subprocess.py in the stdlib.

On Thu, Jan 17, 2013 at 6:35 AM, Paul Moore <p.f.moore at gmail.com> wrote:
> On 17 January 2013 12:23, Paul Moore <p.f.moore at gmail.com> wrote:
>> In general, it still feels to me like the socket use case is being
>> treated as "special", and other data sources and sinks (subprocesses
>> being my use case, but I'm sure others exist) are either second-class
>> or require a whole set of their own specialised methods, which isn't
>> practical.
>
> Thinking about this some more. The key point is that for any event
> loop there can only be one "source of events" in terms of the thing
> that the event loop checks when there are no pending tasks. So the
> event loop is roughly:
>
> while True:
>     process_ready_queue()
>     new_events = block_on_event_source(src, timeout=N)
>     add_to_ready_queue(new_events)
>     add_timed_events_to_ready_queue()
>
> The source has to be a unique object, as there's an OS-level wait in
> there, and you can't do two of them at once.

Right, that's the idea.

> As things stand, methods like add_reader on the event loop object
> should really be methods on the event source object (and indeed,
> that's more or less what Tulip does internally). Would it not make
> more sense to explicitly expose the event source? This is (I guess)
> what the section "Choosing an Event Loop Implementation" in the PEP is
> about. But if the event source is a user-visible object, methods like
> add_reader would no longer be optional event loop methods, but rather
> they would be methods of the event source (but only for those event
> sources for which they make sense).

The problem with this idea is (you may have guessed it by now :-) ...
Windows. On Windows, at least when using a (at this moment purely
hypothetical) IOCP-based implementation of the event loop, there will
*not* be an underlying Selector object. Please track down discussion
of IOCP in older posts on this list. IOCP requires you to use a
different paradigm, which is supported by the separate methods
sock_recv(), sock_sendall() and so on. For I/O objects that are not
sockets, different methods are needed, but the idea is the same: you
specify the I/O, and you get a callback when it is done. This in
contrast with the UNIX selector, where you specify the file descriptor
and I/O direction, and you get a callback when you can read/write
without blocking.

This is why the event loop has the higher-level
transport/protocol-based APIs: an IOCP implementation of these creates
instances of a completely different transport implementation, which
however have the same interface and *meaning* as the UNIX transports
(e.g. the transport created by create_connection() connects to a host
and port over TCP/IP and calls the protocol's connection_made(),
data_received(), connection_lost() methods).

So if you want a transport that encapsulates a subprocess (instead of
a TCP/IP connection), and you want to support both UNIX and Windows,
you have to provide (at least) two separate implementations: one on
UNIX that uses add_reader() and friends, and one on Windows that uses
(I don't know what, but something). Each of these implementations by
itself is dependent on the platform (and the specific event loop
implementation); but together they cover all supported platforms.

If you develop this as 3rd party code, and you want your users not to
have to write platform-specific code, you have to write a "start
subprocess" function that inspects the platform (and the event loop
implementation) and then imports and instantiates the right transport
implementation for the platform. If we want to add this to the PEP,
the right thing is to add a "start subprocess" method to the event
loop API (which can be identical to the start subprocess function in
your 3rd party package :-).

> The point here is that there's a lot of event loop machinery (ready
> queue, timed events, run methods) that are independent of the precise
> means by which you poll the OS to ask "has anything interesting
> happened?" Abstracting out that machinery would seem to me to make the
> design cleaner and more understandable.

It is abstracted out in the implementation, but I hope I have
explained with sufficient clarity why it should not be abstracted out
in the PEP: the Selector abstraction only works on UNIX (or with
sockets on Windows).

Also note a subtlety in the PEP: while it describes a
platform-independent API, it doesn't preclude that some parts of that
API may have platform-specific behaviors -- for example, add_reader()
may only take sockets on Windows (and in Jython, I suspect, where
select() only works with sockets), but takes other file descriptors on
UNIX, so you can implement your own subprocess transport for UNIX.
Similarly, the PEP describes the interface between transports and
protocols, but does not give you a way to construct a transport except
for TCP/IP connections. But the abstraction is usable for other
purposes too, and this is intentional! (E.g. you may be able to create
a transport that uses a subprocess running ssh to talk to a remote
server, which might be used to "tunnel" HTTP, so it would make sense
to connect this custom transport with a standard HTTP protocol
implementation.)

> Other benefits - our hypothetical person with a serial port device can
> build his own event source and plug it into the event loop directly.

I think I've answered that above.

> Or someone could offer a multiplexer that combines two separate
> sources by running them in different threads and merging the output on
> a queue (that may be YAGNI, though).

I think there are Twisted reactor implementations that do things like
this. My hope is that a proxy between the Twisted reactor and the PEP
3156 interface will enable this too -- and the event loop APIs for
working with transports and protocols are essential for this purpose.
(Twisted has a working IOCP reactor, FWIW.)

> This is really just something to think about while I'm trying to build
> a Linux development environment so that I can do a Unix proof of
> concept. Once I get started on that, I'll think about the
> protocol/transport stuff.

I think it would be tremendously helpful if you tried to implement the
UNIX version of the subprocess transport. (Note that AFAIK Twisted has
one of these too, maybe you can get some implementation ideas from
them.)

-- 
--Guido van Rossum (python.org/~guido)