[Python-ideas] PEP 3156 feedback

Tue Dec 18 21:41:04 CET 2012

On Tue, Dec 18, 2012 at 11:21 AM, Antoine Pitrou <solipsis at pitrou.net> wrote:
> On Tue, 18 Dec 2012 10:02:05 -0800 Guido van Rossum <guido at python.org> wrote:
>> > Protocols and transports
>> > ------------------------
>> >
>> > We probably want to provide a Protocol base class and encourage people
>> > to inherit it.
>>
>> Glyph suggested that too, and hinted that it does some useful stuff
>> that users otherwise forget. I'm a bit worried though that the
>> functionality of the base implementation becomes the de-facto standard
>> rather than the PEP. (Glyph mentions that the base class has a method
>> that sets self.transport and without it lots of other stuff breaks.)
>
> Well, in the I/O stack we do have base classes with useful method
> implementations too (IOBase and friends).

True. If we go that way they should be in the PEP as well.

>> > So, when creating a client, I would pass it a protocol instance.
>>
>> Heh. That's how I started, and Glyph told me to pass a protocol
>> factory. It can just be a Protocol subclass though, as long as the
>> constructor has the right signature. So maybe we can avoid calling it
>> protocol_factory and name it protocol_class instead.
>>
>> I struggled with what to do if the socket cannot be connected and
>> hence the transport not created. If you've already created the
>> protocol you're in a bit of trouble at that point. I proposed to call
>> connection_lost() in that case (without ever having called
>> connection_made()) but Glyph suggested that would be asking for rare
>> bugs (the connection_lost() code might not expect a half-initialized
>> protocol instance).
>
> I'm proposing something different: the transport should be created
> before the socket is connected, and it should handle the connection
> itself (by calling sock_connect() on the loop, perhaps).

That's a possible implementation technique. But it will still be
created implicitly by create_transport() or start_serving().

> Then:
> - if connect() succeeds, protocol.connection_made() is called
> - if connect() fails, protocol.connection_failed(exc) is called
> (not connection_lost())

That's what I had, but it just adds extra APIs to the abstract class.
Returning a Future that can succeed (probably returning the protocol)
or fail (with some exception) doesn't require adding new methods.

> I think it makes more sense for the transport to do the connecting: why
> should the I/O loop know about specific transports? Ideally, it should
> only know about socket objects or fds.

Actually, there's one reason why the loop should know (something)
about transports: different loop implementations will want to use
different transport implementations to meet the same requirements.
E.g. an IOCP-based loop will use different transports than a UNIXy
*poll-based loop.

> I don't know if Twisted had a specific reason for having connectTCP()
> and friends on the reactor (other than they want the reactor to be the
> API entry point, perhaps). I'd be curious to hear about it.

That's the reason.

>> Glyph proposed instead that create_transport()
>> should return a Future and the error should be that Future's
>> exception, and I like that much better.
>
> But then you have several API layers with different conventions:
> connection_made() / connection_lost() use well-defined protocol
> methods, while create_transport() returns you a Future on which you
> must register success / failure callbacks.

Different layers have different needs. Note that if you're using
coroutines the Futures are very easy to use. And Twisted will just
wrap the Future in a Deferred.

>> > I think the transport / protocol registration must be done early, not in
>> > connection_made(). Sometimes you will want to do things on a protocol
>> > before you know a connection is established, for example queue things
>> > to write on the transport. An use case is a reconnecting TCP client:
>> > the protocol will continue existing at times when the connection is
>> > down.
>>
>> Hm. That seems a pretty advanced use case. I think it is better
>> handled by passing a "factory function" that returns a pre-created
>> protocol:
>>
>> pr = MyProtocol(...)
>> ev.create_transport(lambda: pr, host, port)
>>
>> However you do this, such a protocol object must expect multiple
>> connection_made - connection_lost cycles, which sounds to me like
>> asking for trouble.
>
> It's quite straightforward actually (*). Of course, only a protocol
> explicitly designed for use with a reconnecting client has to be
> well-behaved in that regard.

Yeah, but it still is an odd corner case. Anyway, I think I've shown
you how to do it in several different ways while still having a
protocol_factory argument.

> (*) I'm using such a pattern at work, where I've stacked a protocol
> abstraction on top of Tornado.
>
>> > * connection_lost(): you definitely want to know whether it's you or the
>> >   other end who closed the connection. Typically, if the other end
>> >   closed the connection, you will have to run some cleanup steps, and
>> >   perhaps even log an error somewhere (if the connection was closed
>> >   unexpectedly).
>>
>> Glyph's idea was to always pass an exception and use special exception
>> subclasses to distinguish the three cases (clean eof from other end,
>> self.close(), self.abort(). I resisted this but maybe it's the only
>> way?
>
> Perhaps both self.close() and self.abort() should pass None.

They do.

> So "if error is None: return" is all you have to do to filter out the
> boring case.

But a clean close from the other end (as opposed to an unexpected
disconnect e.g. due to a sudden network partition) also passes None. I
guess this is okay because in that case eof_received() is first
called. So I guess the PEP is already okay here. :-)

-- 
--Guido van Rossum (python.org/~guido)