[Python-ideas] Tulip / PEP 3156 - subprocess events

Glyph glyph at twistedmatrix.com
Sat Jan 19 02:23:50 CET 2013


On Jan 18, 2013, at 4:12 PM, Guido van Rossum <guido at python.org> wrote:


> Glyph should really answer this one.

Thanks for pointing it out to me, keeping up with python-ideas is always a challenge :).

> Personally I don't feel strongly
> either way for this case. There may be an advantage to not calling the
> protocol factory if the connection can't be made (in which case the
> Future returned by create_connection() has the exception).



> On Fri, Jan 18, 2013 at 3:59 PM, Greg Ewing <greg.ewing at canterbury.ac.nz> wrote:

>> Guido van Rossum wrote:
>> 
>>> Well, except that you can't just pass CallbackProtocol where a
>>> protocol factory is required by the PEP -- you'll have to pass a
>>> lambda or partial function without arguments that calls
>>> CallbackProtocol with some arguments taken from elsewhere.
>> 
>> Something smells wrong to me about APIs that require protocol
>> factories.

For starters, nothing "smells wrong" to me about protocol factories.  Responding to this kind of criticism is difficult, because it's not substantive - what's the actual problem?  I think that some Python programmers have an aversion to factories because a common path to Python is flight from Java environments that over- or mis-use the factory pattern.

>> I don't see what advantage there is in writing
>> 
>>   create_connection(HTTPProtocol, "some.where.net", 80)
>> 
>> as opposed to just writing something like
>> 
>>   HTTPProtocol(TCPTransport("some.where.net", 80))

Guido mentioned one advantage already; you don't have to create the protocol object if the connection fails, so your protocol objects are real honest-to-goodness connections, not "well, maybe there's a connection or maybe there'll be a connection later".

To be fair, this is rarely of practical utility, but in edge cases where you are doing something like, "simultaneously try to connect to these 1000 hosts, and give up on all outstanding connections when the first 3 connections succeed", being able to avoid all the construction overhead for your protocols if they're not going to be used is nice.

There's a more pressing issue of correctness though: even if you create the protocol in advance, you really don't want to tell it about the transport until the transport truly exists.  The connection to some.where.net (by which I mean, ahem, "somewhere.example.com"; "where.net" will not thank you if you ignore BCP 32 in the documentation or examples) might fail, and if the client wants to issue a client greeting, it should not have access to its half-formed transport before that failure.  Of course, it's possible to present an API that works around this by buffering writes issued before the connection is established, and by the protocol waiting for the connection_made callback before actually doing its work.

Finally, using a factory also makes client-creating and server-creating code more symmetrical, since you clearly need a protocol factory in the listening-socket case.  If your main example protocol is HTTP, this doesn't make sense*, but once you start trying to do things like SIP or XMPP, where the participants in a connection are really peers, having the structure be similar is handy.  In the implementation, it's nice to have things set up this way so that the order of the protocol<->transport symmetric setup is less important and by the time the appropriate methods are being invoked, everybody knows about everybody else.  The transport can't really have a reference to the protocol in the protocol's constructor.

*: Unless you're doing this, of course <http://wiki.secondlife.com/wiki/Reverse_HTTP>.

However, aside from the factory-or-not issue, the fact that TCPTransport's name implies that it is both (1) a class and (2) the actual transport implementation, is more problematic.

TCPTransport will need multiple backends for different multiplexing and I/O mechanisms.  This is why I keep bringing up IOCP; this is a major API where the transport implementation is actually quite different.  In Twisted, they're entirely different classes.  They could probably share a bit more implementation than they do and reduce a little duplication, but it's nice that they don't have to.  You don't want to burden application code with picking the right one, and it's ugly to smash the socket-implementation-selection into a class.  (create_connection really ought to be a method on an event-loop object of some kind, which produces the appropriate implementation.  I think right now it implicitly looks it up in thread-local storage for the "current" main loop, and I'd rather it were more explicit, but the idea is the same.)

Your example is misleadingly named; surely you mean TCPClient, because a TCPTransport would implicitly support both clients and servers - and a server would start with a socket returned from accept(), not a host and port.  (Certainly not a DNS host name.)

create_connection will actually need to create multiple sockets internally.  See <http://tools.ietf.org/html/rfc3493> covers this, in part (for a more condensed discussion, see <https://twistedmatrix.com/trac/ticket/4859>).

>> You're going to have to use the latter style anyway to set up
>> anything other than the very simplist configurations, e.g.
>> your earlier 4-layer protocol stack example.

I don't see how this is true.  I've written layered protocols over and over again in Twisted and never wanted to manually construct the bottom transport for that reason.*  In fact, the more elaborate multi-layered structures you have to construct when a protocol finishes connecting, the more you want to avoid being required to do it in advance of actually needing the protocols to exist.

*: I _have_ had to manually construct transports to deal with some fiddly performance-tuning issues, but those are just deficiencies in the existing transport implementation that ought to be remedied.

>> So create_connection() can't be anything more than a convenience
>> function, and unless I'm missing something, it hardly seems to
>> add enough convenience to be worth the bother.

*Just* implementing the multiple-parallel-connection-attempts algorithm required to deal with the IPv6 transition period would be enough convenience to be worth having a function, even if none of the other stuff I just wrote applied :).

-glyph

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20130118/d28e46e4/attachment.html>


More information about the Python-ideas mailing list