[Python-ideas] PEP 3156 feedback

Tue Dec 18 20:21:06 CET 2012

On Tue, 18 Dec 2012 10:02:05 -0800
Guido van Rossum <guido at python.org> wrote:
> > Event loop API
> > --------------
> >
> > I would like to say that I prefer Tornado's model: for each primitive
> > provided by Tornado, you can pass an explicit Loop instance which you
> > instantiated manually.
> > There is no module function or policy object hiding this mechanism:
> > it's simple, explicit and flexible (in other words: if you want a
> > per-thread event loop, just do it yourself using TLS :-)).
> 
> It sounds though as if the explicit loop is optional, and still
> defaults to some global default loop?

Yes.

> Having one global loop shared by multiple threads is iffy though. Only
> one thread should be *running* the loop, otherwise the loop can' be
> used as a mutual exclusion device. Worse, all primitives for adding
> and removing callbacks/handlers must be made threadsafe, and then
> basically the entire event loop becomes full of locks, which seems
> wrong to me.

Hmm, I don't think that's implied. Only call_soon_threadsafe() needs to
be thread-safe. Calling other methods from another thread is simply a
programming error. Since Tornado's and Twisted's global event loops
already work like that, I don't think the surprise will be huge for
users.

> > There are some requirements I've found useful:
> >
> > - being able to instantiate multiple loops, either at the same time or
> >   serially (this is especially nice for unit tests; Twisted has to use
> >   a dedicated test runner just because their reactor doesn't support
> >   multiple instances or restarts)
> 
> Serially, for unit tests: definitely. The loop policy has
> init_event_loop() for this, which forcibly creates a new loop.

Ah, nice.

> > Protocols and transports
> > ------------------------
> >
> > We probably want to provide a Protocol base class and encourage people
> > to inherit it.
> 
> Glyph suggested that too, and hinted that it does some useful stuff
> that users otherwise forget. I'm a bit worried though that the
> functionality of the base implementation becomes the de-facto standard
> rather than the PEP. (Glyph mentions that the base class has a method
> that sets self.transport and without it lots of other stuff breaks.)

Well, in the I/O stack we do have base classes with useful method
implementations too (IOBase and friends).

> > So, when creating a client, I would pass it a protocol instance.
> 
> Heh. That's how I started, and Glyph told me to pass a protocol
> factory. It can just be a Protocol subclass though, as long as the
> constructor has the right signature. So maybe we can avoid calling it
> protocol_factory and name it protocol_class instead.
> 
> I struggled with what to do if the socket cannot be connected and
> hence the transport not created. If you've already created the
> protocol you're in a bit of trouble at that point. I proposed to call
> connection_lost() in that case (without ever having called
> connection_made()) but Glyph suggested that would be asking for rare
> bugs (the connection_lost() code might not expect a half-initialized
> protocol instance).

I'm proposing something different: the transport should be created
before the socket is connected, and it should handle the connection
itself (by calling sock_connect() on the loop, perhaps).

Then:
- if connect() succeeds, protocol.connection_made() is called
- if connect() fails, protocol.connection_failed(exc) is called
(not connection_lost())

I think it makes more sense for the transport to do the connecting: why
should the I/O loop know about specific transports? Ideally, it should
only know about socket objects or fds.

I don't know if Twisted had a specific reason for having connectTCP()
and friends on the reactor (other than they want the reactor to be the
API entry point, perhaps). I'd be curious to hear about it.

> Glyph proposed instead that create_transport()
> should return a Future and the error should be that Future's
> exception, and I like that much better.

But then you have several API layers with different conventions:
connection_made() / connection_lost() use well-defined protocol
methods, while create_transport() returns you a Future on which you
must register success / failure callbacks.

> > I think the transport / protocol registration must be done early, not in
> > connection_made(). Sometimes you will want to do things on a protocol
> > before you know a connection is established, for example queue things
> > to write on the transport. An use case is a reconnecting TCP client:
> > the protocol will continue existing at times when the connection is
> > down.
> 
> Hm. That seems a pretty advanced use case. I think it is better
> handled by passing a "factory function" that returns a pre-created
> protocol:
> 
> pr = MyProtocol(...)
> ev.create_transport(lambda: pr, host, port)
> 
> However you do this, such a protocol object must expect multiple
> connection_made - connection_lost cycles, which sounds to me like
> asking for trouble.

It's quite straightforward actually (*). Of course, only a protocol
explicitly designed for use with a reconnecting client has to be
well-behaved in that regard.

(*) I'm using such a pattern at work, where I've stacked a protocol
abstraction on top of Tornado.

> > * connection_lost(): you definitely want to know whether it's you or the
> >   other end who closed the connection. Typically, if the other end
> >   closed the connection, you will have to run some cleanup steps, and
> >   perhaps even log an error somewhere (if the connection was closed
> >   unexpectedly).
> 
> Glyph's idea was to always pass an exception and use special exception
> subclasses to distinguish the three cases (clean eof from other end,
> self.close(), self.abort(). I resisted this but maybe it's the only
> way?

Perhaps both self.close() and self.abort() should pass None. So
"if error is None: return" is all you have to do to filter out the
boring case.

Regards

Antoine.