
Hello, Here is my own feedback on the in-progress PEP 3156. Please discard it if it's too early to give feedback :-)) Event loop API -------------- I would like to say that I prefer Tornado's model: for each primitive provided by Tornado, you can pass an explicit Loop instance which you instantiated manually. There is no module function or policy object hiding this mechanism: it's simple, explicit and flexible (in other words: if you want a per-thread event loop, just do it yourself using TLS :-)). There are some requirements I've found useful: - being able to instantiate multiple loops, either at the same time or serially (this is especially nice for unit tests; Twisted has to use a dedicated test runner just because their reactor doesn't support multiple instances or restarts) - being able to stop a loop explicitly: having to unregister all handlers or delayed calls is a PITA in non-trivial situations (for example you might have multiple protocol instances, each with a bunch of timers, some perhaps even in third-party libraries; keeping track of all this is the event loop's job) * The optional sock_*() methods: how about having different ABCs, e.g. the EventLoop ABC for basic behaviour, and the NetworkedEventLoop ABC adding the socket helpers? Protocols and transports ------------------------ We probably want to provide a Protocol base class and encourage people to inherit it. It can provide useful functionality (perhaps write() and writelines() shims? it can make mocking easier). My own opinion about Twisted's API is that the Factory class is often useless, and adds a cognitive burden. If you need a place to track all protocols of a given kind (e.g. all connections), you can do it yourself. Also, the Factory implies that you don't control how exactly your protocol gets instantiated (unless you override some method on the Factory I'm missing the name of: it is cumbersome). So, when creating a client, I would pass it a protocol instance. When creating a server, I would pass it a protocol class. Here the base Protocol class comes into play, its __init__() could take the transport as argument and set the "transport" attribute with it. Further args could be optionally passed to the constructor: class MyProtocol(Protocol): def __init__(self, transport, my_personal_attribute): Protocol.__init__(self, transport) self.my_personal_attribute = my_personal_attribute ... def listen(ioloop): # Each new connection will instantiate a MyProtocol with "foobar" # for my_personal_attribute. ioloop.listen_tcp(("0.0.0.0", 8080), MyProtocol, "foobar") (The hypothetical listen_tcp() is just a name: perhaps it's actually start_serving(). It should accept any callable, not just a class: therefore, you can define complex behaviour if you like) I think the transport / protocol registration must be done early, not in connection_made(). Sometimes you will want to do things on a protocol before you know a connection is established, for example queue things to write on the transport. An use case is a reconnecting TCP client: the protocol will continue existing at times when the connection is down. Unconnected protocols need their own base class and API: data_received()'s signature should be (data, remote_addr) or (remote_addr, data). Same for write(). * writelines() sounds ambiguous for datagram protocols: does it send those "lines" as a single datagram, or one separate datagram per "line"? The equivalent code suggests the latter, but which one makes more sense? * connection_lost(): you definitely want to know whether it's you or the other end who closed the connection. Typically, if the other end closed the connection, you will have to run some cleanup steps, and perhaps even log an error somewhere (if the connection was closed unexpectedly). Actually, I'm not sure it's useful to call connection_lost() when you closed the connection yourself: are there any use cases? Regards Antoine.

2012/12/18 Antoine Pitrou <solipsis@pitrou.net>
Factories are useful to implement clients that reconnect automatically: the framework needs to spawn a new protocol object. The connect method could take a protocol class, but how would you implement the reconnect strategy? When creating a server, I would pass it a protocol class. Here the base
...
This is indeed very similar to a factory function (a callback that creates the protocol) Anything with a __call__ would be acceptable IMO. (The hypothetical listen_tcp() is just a name: perhaps it's actually
We should be clear on what a protocol is. In my mind, a protocol manages the events on a given transport; it will also probably buffer data. For example, data for the HTTP protocol always starts with "GET ... HTTP/1.0\r\n". If a protocol can change transports in the middle, it can be difficult to track which socket you write to or receive from, and manage your buffers correctly. An alternative could be a "reset()" method, but then we are not far from a factory class.
The "yourself" can in another part of the code; some protocols will certainly close the connection when they receive unexpected data. Also, this example from Twisted documentation: attempt = myEndpoint.connect(myFactory) reactor.callback(30, attempt.cancel) Even if these lines appear in my code, it's easier to have all errors caught in one place. The alternative would be: attempt = myEndpoint.connect(myFactory) def cancel_attempt_and_notify_error(): attempt.cancel() notify_error("cancelled after timeout") reactor.callback(30, cancel_attempt_and_notify_error) -- Amaury Forgeot d'Arc

Le Tue, 18 Dec 2012 11:54:40 +0100, "Amaury Forgeot d'Arc" <amauryfa@gmail.com> a écrit :
I view it differently: the *same* protocol *instance* should be re-used for the new connection. That's because the protocol can keep data that lasts longer than a single connection (many protocols have session ids or other state that can persist accross connections: this is typical of RPC APIs affecting the state of an always-running equipment).
Well, the problem when switching transports is that you want to: - wait for all outgoing data to be flushed - migrate all pending incoming data to the new transport IMO, this begs for a solution on the transport side, not on the client side (some kind of migrate() API on the transport?). In other words, you switch transports, but you keep the same protocol instance: when your FTP protocol switches from plain TCP to TLS, it remembers the current directory, etc.
Ah, I think there's a misunderstanding. Protocol.connection_lost() should be called when an *established* connection is lost. Indeed, there should be a separate Protocol.connection_failed() method for when the connect() calls never succeeds (either times out or returns with an error). And this is a reason why it is better for the transport to be registered early on the protocol (or vice-versa) :-) Regards Antoine.

On Tue, Dec 18, 2012 at 2:01 AM, Antoine Pitrou <solipsis@pitrou.net> wrote:
Here is my own feedback on the in-progress PEP 3156. Please discard it if it's too early to give feedback :-))
Thank you, it's very to the point.
It sounds though as if the explicit loop is optional, and still defaults to some global default loop? Having one global loop shared by multiple threads is iffy though. Only one thread should be *running* the loop, otherwise the loop can' be used as a mutual exclusion device. Worse, all primitives for adding and removing callbacks/handlers must be made threadsafe, and then basically the entire event loop becomes full of locks, which seems wrong to me. PEP 3156 lets the loop implementation choose the policy, which seems safer than letting the user choose a policy that may or may not be compatible with the loop's implementation. Steve Dower keeps telling me that on Windows 8 the loop is built into the OS. The Windows 8 loop also seems to be eager to use threads, so I don't know if it can be relied on to serialize callbacks, but there is probably a way to do that, or else the Python wrapper could add a lock around callbacks.
Serially, for unit tests: definitely. The loop policy has init_event_loop() for this, which forcibly creates a new loop. At the same time: that seems to be an esoteric use case and not favorable to interop with Twisted. I want the loop to be mostly out of the way of the user, at least for users using the high-level APIs (tasks, futures, transports, protocols). In fact, just for this reason it may be better if the protocol-creating methods had wrapper functions that just called get_event_loop() and then called the corresponding method on the loop, so the user code doesn't have to call get_event_loop() at all, ever (or at least, if you call it, you should feel a slight tinge of guilt about using a low-level API :-).
I've been convinced of that too. I'm just procrastinating on the implementation at this point. TBH the details of what you should put in your main program will probably change a few times before we're done...
Hm. That smells of Twisted's tree of interfaces, which I'm honestly trying to get away from (and Glyph didn't push back on that :-). I'm actually leaning towards requiring these for all loop implementations -- surely they can all be emulated using each other. But I'm not totally wedded to that either. I need more experience using the stuff first. And Steve Dower says he's not interested in any of the async I/O stuff (I suppose he means sockets), just in futures and coroutines. So maybe the socket operations do have to be optional. In that case, I propose to add inquiry functions that can tell you whether certain groups of APIs are supported. Though you can probably get away with hasattr(loop, 'sock_recv') and so on.
Glyph suggested that too, and hinted that it does some useful stuff that users otherwise forget. I'm a bit worried though that the functionality of the base implementation becomes the de-facto standard rather than the PEP. (Glyph mentions that the base class has a method that sets self.transport and without it lots of other stuff breaks.)
It can provide useful functionality (perhaps write() and writelines() shims? it can make mocking easier).
Those are transport methods though.
Yeah, Glyph complains that people laugh at Twisted for using factories. :-)
So, when creating a client, I would pass it a protocol instance.
Heh. That's how I started, and Glyph told me to pass a protocol factory. It can just be a Protocol subclass though, as long as the constructor has the right signature. So maybe we can avoid calling it protocol_factory and name it protocol_class instead. I struggled with what to do if the socket cannot be connected and hence the transport not created. If you've already created the protocol you're in a bit of trouble at that point. I proposed to call connection_lost() in that case (without ever having called connection_made()) but Glyph suggested that would be asking for rare bugs (the connection_lost() code might not expect a half-initialized protocol instance). Glyph proposed instead that create_transport() should return a Future and the error should be that Future's exception, and I like that much better.
I agree that it should be a callable, not necessarily a class. I don't think it should take the transport -- that's what connection_made() is for. I don't think we should make the API have additional arguments either; you can use a lambda or functools.partial to pass those in. (There are too many other arguments to start_serving() to make it convenient or clear to have a *args, I think, though maybe we could rearrange the argument order.)
Hm. That seems a pretty advanced use case. I think it is better handled by passing a "factory function" that returns a pre-created protocol: pr = MyProtocol(...) ev.create_transport(lambda: pr, host, port) However you do this, such a protocol object must expect multiple connection_made - connection_lost cycles, which sounds to me like asking for trouble. So maybe it's better to have a thin protocol class that is newly instantiated for each reconnection but given a pointer to a more permanent data structure that carries state between reconnections.
You mean UDP? Let's put that off until later. But yes, it probably needs more thought.
It is the transport's choice. Twisted has writeSequence(), which is just as ambiguous.
Glyph's idea was to always pass an exception and use special exception subclasses to distinguish the three cases (clean eof from other end, self.close(), self.abort(). I resisted this but maybe it's the only way?
Actually, I'm not sure it's useful to call connection_lost() when you closed the connection yourself: are there any use cases?
Well, close() first has to finish writing buffered data, so any cleanup needs to be done asynchronously after that is taken care off. AFAIK Twisted always calls it, and I think that's the best approach to ensure cleanup is always taken care of. -- --Guido van Rossum (python.org/~guido)

On Tue, 18 Dec 2012 10:02:05 -0800 Guido van Rossum <guido@python.org> wrote:
Yes.
Hmm, I don't think that's implied. Only call_soon_threadsafe() needs to be thread-safe. Calling other methods from another thread is simply a programming error. Since Tornado's and Twisted's global event loops already work like that, I don't think the surprise will be huge for users.
Ah, nice.
Well, in the I/O stack we do have base classes with useful method implementations too (IOBase and friends).
I'm proposing something different: the transport should be created before the socket is connected, and it should handle the connection itself (by calling sock_connect() on the loop, perhaps). Then: - if connect() succeeds, protocol.connection_made() is called - if connect() fails, protocol.connection_failed(exc) is called (not connection_lost()) I think it makes more sense for the transport to do the connecting: why should the I/O loop know about specific transports? Ideally, it should only know about socket objects or fds. I don't know if Twisted had a specific reason for having connectTCP() and friends on the reactor (other than they want the reactor to be the API entry point, perhaps). I'd be curious to hear about it.
But then you have several API layers with different conventions: connection_made() / connection_lost() use well-defined protocol methods, while create_transport() returns you a Future on which you must register success / failure callbacks.
It's quite straightforward actually (*). Of course, only a protocol explicitly designed for use with a reconnecting client has to be well-behaved in that regard. (*) I'm using such a pattern at work, where I've stacked a protocol abstraction on top of Tornado.
Perhaps both self.close() and self.abort() should pass None. So "if error is None: return" is all you have to do to filter out the boring case. Regards Antoine.

On Tue, Dec 18, 2012 at 11:21 AM, Antoine Pitrou <solipsis@pitrou.net> wrote:
True. If we go that way they should be in the PEP as well.
That's a possible implementation technique. But it will still be created implicitly by create_transport() or start_serving().
That's what I had, but it just adds extra APIs to the abstract class. Returning a Future that can succeed (probably returning the protocol) or fail (with some exception) doesn't require adding new methods.
Actually, there's one reason why the loop should know (something) about transports: different loop implementations will want to use different transport implementations to meet the same requirements. E.g. an IOCP-based loop will use different transports than a UNIXy *poll-based loop.
That's the reason.
Different layers have different needs. Note that if you're using coroutines the Futures are very easy to use. And Twisted will just wrap the Future in a Deferred.
Yeah, but it still is an odd corner case. Anyway, I think I've shown you how to do it in several different ways while still having a protocol_factory argument.
They do.
So "if error is None: return" is all you have to do to filter out the boring case.
But a clean close from the other end (as opposed to an unexpected disconnect e.g. due to a sudden network partition) also passes None. I guess this is okay because in that case eof_received() is first called. So I guess the PEP is already okay here. :-) -- --Guido van Rossum (python.org/~guido)

On Tue, Dec 18, 2012 at 12:44 PM, Antoine Pitrou <solipsis@pitrou.net> wrote:
EOF is part of TCP (although I'm sure it has a different name at the protocol level). The sender can force it by using shutdown(SHUT_WR) (== write_eof() in Tulip/PEP 3156) or just by closing the socket (if they don't expect a response). The low-level reader detects this by recv() returning an empty string. Of course, if the other end closed both halves and you try to write before reading, send() may raise an exception and then you'll not get the EOF. And then again, send() may not raise an exception, it all depends on where stuff gets buffered. But arguably you get what you ask for in that case. I plan to call eof_received(), once, if and only if recv() returns an empty byte string. (The PEP says that eof_received() should call close() by default, but I don't actually think that's correct -- it also is hard to put in the abstract Protocol class unless a specific instance variable holding the transport is made part of the spec, which I am hesitant to do. I don't think that ignoring it by default is actually a problem.) -- --Guido van Rossum (python.org/~guido)

About protocols: I think eventloop should support UDP datagrams as well as operations with file descriptors which are not sockets at all. I mean timerfd_create and inotify as examples. On Wed, Dec 19, 2012 at 12:39 AM, Guido van Rossum <guido@python.org> wrote:
-- Thanks, Andrew Svetlov

On Tue, Dec 18, 2012 at 2:49 PM, Andrew Svetlov <andrew.svetlov@gmail.com> wrote:
About protocols: I think eventloop should support UDP datagrams
Supporting UDP should be relatively straightforward, I just haven't used it in ages so I could use some help in describing the needed APIs. There are a lot of recv() variants: recv(), recvfrom(), recvmsg(), and then an _into() variant for each. And for sending there's send()/sendall(), sendmsg(), and sendto(). I'd be ecstatic if someone contributed code to tulip.
as well as operations with file descriptors which are not sockets at all.
That won't work on Windows though. On UNIX you can always use the add/remove reader/writer APIs and make the calls yourself -- the patterns in sock_recv() and sock_sendall() are simple enough. (These are standardized in the PEP mainly because on Windows, with IOCP, the expectation is that they won't use "ready callbacks" (polling using select/*poll/kqueue) but instead Windows-specific APIs for starting I/O operations with a "completion callback".
I mean timerfd_create and inotify as examples.
I think those will work -- they look very platform specific but in the end there's nothing in the add/remove reader/writer API that prevents you from using non-socket FDs on UNIX. (It's different on Windows, where select() is the only pollster supported, and Windows select only works with socket FDs.) -- --Guido van Rossum (python.org/~guido)

2012/12/18 Guido van Rossum <guido@python.org>
The basic idea is to have multiple threads/processes, each running its own IO loop. No locks are required because each IO poller instance will deal with its own socket-map / callbacks-queue and no resources are shared. In asyncore this was achieved by introducing the "map" parameter. Similarly to Tornado, pyftpdlib uses an "ioloop" parameter which can be passed to all the classes which will handle the connection (the handlers). If "ioloop" is provided all the handlers will use that (...and register() against it, add_reader() etc..) otherwise the "global" ioloop instance will be used (default). A dynamic IO poller like this is important because in case the connection handlers are forced to block for some reason, you can switch from a concurrency model (async / non-blocking) to another (multi threads/process) very easily. See: http://code.google.com/p/pyftpdlib/issues/detail?id=212#c9 http://code.google.com/p/pyftpdlib/source/browse/trunk/pyftpdlib/servers.py?spec=svn1137&r=1137 Hope this helps, --- Giampaolo http://code.google.com/p/pyftpdlib/ http://code.google.com/p/psutil/ http://code.google.com/p/pysendfile/

On Wed, Dec 19, 2012 at 6:51 AM, Giampaolo Rodolà <g.rodola@gmail.com> wrote:
I understand that, and the Tulip implementation supports this. However different frameworks may have different policies (e.g. AFAIK Twisted only supports one reactor, period, and it is not threadsafe). I don't want to put requirements in the PEP that *require* compliant implementations to support the loop-per-thread model. OTOH I do want compliant implementations to decide on their own policy. I guess the minimal requirement for a compliant implementation is that callbacks associated with the same loop are serialized and never executed concurrently on different threads.
Read the description in the PEP of the event loop policy, or the default implementation in Tulip. It discourages user code from creating new event loops (since the framework may not support this) but does not prevent e.g. unit tests from creating a new loop for each test (even Twisted supports that).
Did you see run_in_executor() and wrap_future() in the PEP or in the Tulip implementation? They make it perfectly simple to run something in another thread (and the default implementation will use this to call getaddrinfo(), since the stdlib wrappers for it have no async version. The two APIs are even capable of using a ProcessPoolExecutor.
Of course, if all you want is a server that creates a new thread or process for each connection, PEP 3156 and Tulip are overkill -- in that case there's no reason not to use the stdlib's SocketServer class, which has supported this for over a decade. :-) -- --Guido van Rossum (python.org/~guido)

On Wed, 19 Dec 2012 08:55:02 -0800 Guido van Rossum <guido@python.org> wrote:
Why not let implementations raise NotImplementedError when they don't want to support certain use cases?
Is it the plan that code written for an event loop will always work with another one? Will tulip offer more than the GCD of the other event loops? Regards Antoine.

On Wed, Dec 19, 2012 at 10:55 AM, Antoine Pitrou <solipsis@pitrou.net> wrote:
Why not let implementations raise NotImplementedError when they don't want to support certain use cases?
That's always a last resort, but the problem is that an app or library can't be sure that everything will work, and the failure might be subtle and late. That said, my remark about the loop needing to be wholly threadsafe was misguided. I think there are two reasonable policies with regards to thread that any reasonable implementation could follow: 1. There's only one loop, it runs in a dedicated thread, and other threads can only use call_soon_threadsafe(). 2. There's (potentially) a loop per thread, and these are effectively independent. (TBD: How would these pass work or results between one another? Probably by calling call_soon_threadsafe() back and forth.) The default implementation actually takes a halfway position: it supports (2), but you must manually call init_event_loop() in each thread except for the main thread, and you must call run() in each thread, including the main thread. The requirement to call init_event_loop() is to prevent code running in some random thread trying to schedule callback, which would never run because the thread isn't calling run(). When we get further along we may have a compliance test suite, separate from the unittests (I am working on unittests but I'm aware they aren't at all thorough yet).
Is it the plan that code written for an event loop will always work with another one?
The plan is to make it easy to write code that will work with all (or most) event loops, without making it impossible to write code that depends on a specific event loop implementation. This is Python's general attitude about platform-specific APIs.
Will tulip offer more than the GCD of the other event loops?
People writing PEP 3156 compliant implementations on top of some other event loop, whether it's Twisted or libuv, may have to emulate some functionality, and there will also be some functionality that their underlying loop supports that PEP 3156 doesn't. The goal is to offer a wide enough range of features that it's possible to write many useful types of apps without resorting to platform-specific APIs, and to make these fast enough. But if an app knows it will only be used with a certain loop implementation it is free to use extra APIs that only that loop offers. There's still a benefit in that situation: the app may be tied to a platform, but it may still want to use some 3rd party libraries that also require event loop integration, and by conforming to PEP 3156 the platform's loop implementation can ensure that such libraries actually work and interact with the rest of the app in a reasonable manner. (In particular, they should all use the same Future and Task classes.) -- --Guido van Rossum (python.org/~guido)

2012/12/18 Antoine Pitrou <solipsis@pitrou.net>
Factories are useful to implement clients that reconnect automatically: the framework needs to spawn a new protocol object. The connect method could take a protocol class, but how would you implement the reconnect strategy? When creating a server, I would pass it a protocol class. Here the base
...
This is indeed very similar to a factory function (a callback that creates the protocol) Anything with a __call__ would be acceptable IMO. (The hypothetical listen_tcp() is just a name: perhaps it's actually
We should be clear on what a protocol is. In my mind, a protocol manages the events on a given transport; it will also probably buffer data. For example, data for the HTTP protocol always starts with "GET ... HTTP/1.0\r\n". If a protocol can change transports in the middle, it can be difficult to track which socket you write to or receive from, and manage your buffers correctly. An alternative could be a "reset()" method, but then we are not far from a factory class.
The "yourself" can in another part of the code; some protocols will certainly close the connection when they receive unexpected data. Also, this example from Twisted documentation: attempt = myEndpoint.connect(myFactory) reactor.callback(30, attempt.cancel) Even if these lines appear in my code, it's easier to have all errors caught in one place. The alternative would be: attempt = myEndpoint.connect(myFactory) def cancel_attempt_and_notify_error(): attempt.cancel() notify_error("cancelled after timeout") reactor.callback(30, cancel_attempt_and_notify_error) -- Amaury Forgeot d'Arc

Le Tue, 18 Dec 2012 11:54:40 +0100, "Amaury Forgeot d'Arc" <amauryfa@gmail.com> a écrit :
I view it differently: the *same* protocol *instance* should be re-used for the new connection. That's because the protocol can keep data that lasts longer than a single connection (many protocols have session ids or other state that can persist accross connections: this is typical of RPC APIs affecting the state of an always-running equipment).
Well, the problem when switching transports is that you want to: - wait for all outgoing data to be flushed - migrate all pending incoming data to the new transport IMO, this begs for a solution on the transport side, not on the client side (some kind of migrate() API on the transport?). In other words, you switch transports, but you keep the same protocol instance: when your FTP protocol switches from plain TCP to TLS, it remembers the current directory, etc.
Ah, I think there's a misunderstanding. Protocol.connection_lost() should be called when an *established* connection is lost. Indeed, there should be a separate Protocol.connection_failed() method for when the connect() calls never succeeds (either times out or returns with an error). And this is a reason why it is better for the transport to be registered early on the protocol (or vice-versa) :-) Regards Antoine.

On Tue, Dec 18, 2012 at 2:01 AM, Antoine Pitrou <solipsis@pitrou.net> wrote:
Here is my own feedback on the in-progress PEP 3156. Please discard it if it's too early to give feedback :-))
Thank you, it's very to the point.
It sounds though as if the explicit loop is optional, and still defaults to some global default loop? Having one global loop shared by multiple threads is iffy though. Only one thread should be *running* the loop, otherwise the loop can' be used as a mutual exclusion device. Worse, all primitives for adding and removing callbacks/handlers must be made threadsafe, and then basically the entire event loop becomes full of locks, which seems wrong to me. PEP 3156 lets the loop implementation choose the policy, which seems safer than letting the user choose a policy that may or may not be compatible with the loop's implementation. Steve Dower keeps telling me that on Windows 8 the loop is built into the OS. The Windows 8 loop also seems to be eager to use threads, so I don't know if it can be relied on to serialize callbacks, but there is probably a way to do that, or else the Python wrapper could add a lock around callbacks.
Serially, for unit tests: definitely. The loop policy has init_event_loop() for this, which forcibly creates a new loop. At the same time: that seems to be an esoteric use case and not favorable to interop with Twisted. I want the loop to be mostly out of the way of the user, at least for users using the high-level APIs (tasks, futures, transports, protocols). In fact, just for this reason it may be better if the protocol-creating methods had wrapper functions that just called get_event_loop() and then called the corresponding method on the loop, so the user code doesn't have to call get_event_loop() at all, ever (or at least, if you call it, you should feel a slight tinge of guilt about using a low-level API :-).
I've been convinced of that too. I'm just procrastinating on the implementation at this point. TBH the details of what you should put in your main program will probably change a few times before we're done...
Hm. That smells of Twisted's tree of interfaces, which I'm honestly trying to get away from (and Glyph didn't push back on that :-). I'm actually leaning towards requiring these for all loop implementations -- surely they can all be emulated using each other. But I'm not totally wedded to that either. I need more experience using the stuff first. And Steve Dower says he's not interested in any of the async I/O stuff (I suppose he means sockets), just in futures and coroutines. So maybe the socket operations do have to be optional. In that case, I propose to add inquiry functions that can tell you whether certain groups of APIs are supported. Though you can probably get away with hasattr(loop, 'sock_recv') and so on.
Glyph suggested that too, and hinted that it does some useful stuff that users otherwise forget. I'm a bit worried though that the functionality of the base implementation becomes the de-facto standard rather than the PEP. (Glyph mentions that the base class has a method that sets self.transport and without it lots of other stuff breaks.)
It can provide useful functionality (perhaps write() and writelines() shims? it can make mocking easier).
Those are transport methods though.
Yeah, Glyph complains that people laugh at Twisted for using factories. :-)
So, when creating a client, I would pass it a protocol instance.
Heh. That's how I started, and Glyph told me to pass a protocol factory. It can just be a Protocol subclass though, as long as the constructor has the right signature. So maybe we can avoid calling it protocol_factory and name it protocol_class instead. I struggled with what to do if the socket cannot be connected and hence the transport not created. If you've already created the protocol you're in a bit of trouble at that point. I proposed to call connection_lost() in that case (without ever having called connection_made()) but Glyph suggested that would be asking for rare bugs (the connection_lost() code might not expect a half-initialized protocol instance). Glyph proposed instead that create_transport() should return a Future and the error should be that Future's exception, and I like that much better.
I agree that it should be a callable, not necessarily a class. I don't think it should take the transport -- that's what connection_made() is for. I don't think we should make the API have additional arguments either; you can use a lambda or functools.partial to pass those in. (There are too many other arguments to start_serving() to make it convenient or clear to have a *args, I think, though maybe we could rearrange the argument order.)
Hm. That seems a pretty advanced use case. I think it is better handled by passing a "factory function" that returns a pre-created protocol: pr = MyProtocol(...) ev.create_transport(lambda: pr, host, port) However you do this, such a protocol object must expect multiple connection_made - connection_lost cycles, which sounds to me like asking for trouble. So maybe it's better to have a thin protocol class that is newly instantiated for each reconnection but given a pointer to a more permanent data structure that carries state between reconnections.
You mean UDP? Let's put that off until later. But yes, it probably needs more thought.
It is the transport's choice. Twisted has writeSequence(), which is just as ambiguous.
Glyph's idea was to always pass an exception and use special exception subclasses to distinguish the three cases (clean eof from other end, self.close(), self.abort(). I resisted this but maybe it's the only way?
Actually, I'm not sure it's useful to call connection_lost() when you closed the connection yourself: are there any use cases?
Well, close() first has to finish writing buffered data, so any cleanup needs to be done asynchronously after that is taken care off. AFAIK Twisted always calls it, and I think that's the best approach to ensure cleanup is always taken care of. -- --Guido van Rossum (python.org/~guido)

On Tue, 18 Dec 2012 10:02:05 -0800 Guido van Rossum <guido@python.org> wrote:
Yes.
Hmm, I don't think that's implied. Only call_soon_threadsafe() needs to be thread-safe. Calling other methods from another thread is simply a programming error. Since Tornado's and Twisted's global event loops already work like that, I don't think the surprise will be huge for users.
Ah, nice.
Well, in the I/O stack we do have base classes with useful method implementations too (IOBase and friends).
I'm proposing something different: the transport should be created before the socket is connected, and it should handle the connection itself (by calling sock_connect() on the loop, perhaps). Then: - if connect() succeeds, protocol.connection_made() is called - if connect() fails, protocol.connection_failed(exc) is called (not connection_lost()) I think it makes more sense for the transport to do the connecting: why should the I/O loop know about specific transports? Ideally, it should only know about socket objects or fds. I don't know if Twisted had a specific reason for having connectTCP() and friends on the reactor (other than they want the reactor to be the API entry point, perhaps). I'd be curious to hear about it.
But then you have several API layers with different conventions: connection_made() / connection_lost() use well-defined protocol methods, while create_transport() returns you a Future on which you must register success / failure callbacks.
It's quite straightforward actually (*). Of course, only a protocol explicitly designed for use with a reconnecting client has to be well-behaved in that regard. (*) I'm using such a pattern at work, where I've stacked a protocol abstraction on top of Tornado.
Perhaps both self.close() and self.abort() should pass None. So "if error is None: return" is all you have to do to filter out the boring case. Regards Antoine.

On Tue, Dec 18, 2012 at 11:21 AM, Antoine Pitrou <solipsis@pitrou.net> wrote:
True. If we go that way they should be in the PEP as well.
That's a possible implementation technique. But it will still be created implicitly by create_transport() or start_serving().
That's what I had, but it just adds extra APIs to the abstract class. Returning a Future that can succeed (probably returning the protocol) or fail (with some exception) doesn't require adding new methods.
Actually, there's one reason why the loop should know (something) about transports: different loop implementations will want to use different transport implementations to meet the same requirements. E.g. an IOCP-based loop will use different transports than a UNIXy *poll-based loop.
That's the reason.
Different layers have different needs. Note that if you're using coroutines the Futures are very easy to use. And Twisted will just wrap the Future in a Deferred.
Yeah, but it still is an odd corner case. Anyway, I think I've shown you how to do it in several different ways while still having a protocol_factory argument.
They do.
So "if error is None: return" is all you have to do to filter out the boring case.
But a clean close from the other end (as opposed to an unexpected disconnect e.g. due to a sudden network partition) also passes None. I guess this is okay because in that case eof_received() is first called. So I guess the PEP is already okay here. :-) -- --Guido van Rossum (python.org/~guido)

On Tue, Dec 18, 2012 at 12:44 PM, Antoine Pitrou <solipsis@pitrou.net> wrote:
EOF is part of TCP (although I'm sure it has a different name at the protocol level). The sender can force it by using shutdown(SHUT_WR) (== write_eof() in Tulip/PEP 3156) or just by closing the socket (if they don't expect a response). The low-level reader detects this by recv() returning an empty string. Of course, if the other end closed both halves and you try to write before reading, send() may raise an exception and then you'll not get the EOF. And then again, send() may not raise an exception, it all depends on where stuff gets buffered. But arguably you get what you ask for in that case. I plan to call eof_received(), once, if and only if recv() returns an empty byte string. (The PEP says that eof_received() should call close() by default, but I don't actually think that's correct -- it also is hard to put in the abstract Protocol class unless a specific instance variable holding the transport is made part of the spec, which I am hesitant to do. I don't think that ignoring it by default is actually a problem.) -- --Guido van Rossum (python.org/~guido)

About protocols: I think eventloop should support UDP datagrams as well as operations with file descriptors which are not sockets at all. I mean timerfd_create and inotify as examples. On Wed, Dec 19, 2012 at 12:39 AM, Guido van Rossum <guido@python.org> wrote:
-- Thanks, Andrew Svetlov

On Tue, Dec 18, 2012 at 2:49 PM, Andrew Svetlov <andrew.svetlov@gmail.com> wrote:
About protocols: I think eventloop should support UDP datagrams
Supporting UDP should be relatively straightforward, I just haven't used it in ages so I could use some help in describing the needed APIs. There are a lot of recv() variants: recv(), recvfrom(), recvmsg(), and then an _into() variant for each. And for sending there's send()/sendall(), sendmsg(), and sendto(). I'd be ecstatic if someone contributed code to tulip.
as well as operations with file descriptors which are not sockets at all.
That won't work on Windows though. On UNIX you can always use the add/remove reader/writer APIs and make the calls yourself -- the patterns in sock_recv() and sock_sendall() are simple enough. (These are standardized in the PEP mainly because on Windows, with IOCP, the expectation is that they won't use "ready callbacks" (polling using select/*poll/kqueue) but instead Windows-specific APIs for starting I/O operations with a "completion callback".
I mean timerfd_create and inotify as examples.
I think those will work -- they look very platform specific but in the end there's nothing in the add/remove reader/writer API that prevents you from using non-socket FDs on UNIX. (It's different on Windows, where select() is the only pollster supported, and Windows select only works with socket FDs.) -- --Guido van Rossum (python.org/~guido)

2012/12/18 Guido van Rossum <guido@python.org>
The basic idea is to have multiple threads/processes, each running its own IO loop. No locks are required because each IO poller instance will deal with its own socket-map / callbacks-queue and no resources are shared. In asyncore this was achieved by introducing the "map" parameter. Similarly to Tornado, pyftpdlib uses an "ioloop" parameter which can be passed to all the classes which will handle the connection (the handlers). If "ioloop" is provided all the handlers will use that (...and register() against it, add_reader() etc..) otherwise the "global" ioloop instance will be used (default). A dynamic IO poller like this is important because in case the connection handlers are forced to block for some reason, you can switch from a concurrency model (async / non-blocking) to another (multi threads/process) very easily. See: http://code.google.com/p/pyftpdlib/issues/detail?id=212#c9 http://code.google.com/p/pyftpdlib/source/browse/trunk/pyftpdlib/servers.py?spec=svn1137&r=1137 Hope this helps, --- Giampaolo http://code.google.com/p/pyftpdlib/ http://code.google.com/p/psutil/ http://code.google.com/p/pysendfile/

On Wed, Dec 19, 2012 at 6:51 AM, Giampaolo Rodolà <g.rodola@gmail.com> wrote:
I understand that, and the Tulip implementation supports this. However different frameworks may have different policies (e.g. AFAIK Twisted only supports one reactor, period, and it is not threadsafe). I don't want to put requirements in the PEP that *require* compliant implementations to support the loop-per-thread model. OTOH I do want compliant implementations to decide on their own policy. I guess the minimal requirement for a compliant implementation is that callbacks associated with the same loop are serialized and never executed concurrently on different threads.
Read the description in the PEP of the event loop policy, or the default implementation in Tulip. It discourages user code from creating new event loops (since the framework may not support this) but does not prevent e.g. unit tests from creating a new loop for each test (even Twisted supports that).
Did you see run_in_executor() and wrap_future() in the PEP or in the Tulip implementation? They make it perfectly simple to run something in another thread (and the default implementation will use this to call getaddrinfo(), since the stdlib wrappers for it have no async version. The two APIs are even capable of using a ProcessPoolExecutor.
Of course, if all you want is a server that creates a new thread or process for each connection, PEP 3156 and Tulip are overkill -- in that case there's no reason not to use the stdlib's SocketServer class, which has supported this for over a decade. :-) -- --Guido van Rossum (python.org/~guido)

On Wed, 19 Dec 2012 08:55:02 -0800 Guido van Rossum <guido@python.org> wrote:
Why not let implementations raise NotImplementedError when they don't want to support certain use cases?
Is it the plan that code written for an event loop will always work with another one? Will tulip offer more than the GCD of the other event loops? Regards Antoine.

On Wed, Dec 19, 2012 at 10:55 AM, Antoine Pitrou <solipsis@pitrou.net> wrote:
Why not let implementations raise NotImplementedError when they don't want to support certain use cases?
That's always a last resort, but the problem is that an app or library can't be sure that everything will work, and the failure might be subtle and late. That said, my remark about the loop needing to be wholly threadsafe was misguided. I think there are two reasonable policies with regards to thread that any reasonable implementation could follow: 1. There's only one loop, it runs in a dedicated thread, and other threads can only use call_soon_threadsafe(). 2. There's (potentially) a loop per thread, and these are effectively independent. (TBD: How would these pass work or results between one another? Probably by calling call_soon_threadsafe() back and forth.) The default implementation actually takes a halfway position: it supports (2), but you must manually call init_event_loop() in each thread except for the main thread, and you must call run() in each thread, including the main thread. The requirement to call init_event_loop() is to prevent code running in some random thread trying to schedule callback, which would never run because the thread isn't calling run(). When we get further along we may have a compliance test suite, separate from the unittests (I am working on unittests but I'm aware they aren't at all thorough yet).
Is it the plan that code written for an event loop will always work with another one?
The plan is to make it easy to write code that will work with all (or most) event loops, without making it impossible to write code that depends on a specific event loop implementation. This is Python's general attitude about platform-specific APIs.
Will tulip offer more than the GCD of the other event loops?
People writing PEP 3156 compliant implementations on top of some other event loop, whether it's Twisted or libuv, may have to emulate some functionality, and there will also be some functionality that their underlying loop supports that PEP 3156 doesn't. The goal is to offer a wide enough range of features that it's possible to write many useful types of apps without resorting to platform-specific APIs, and to make these fast enough. But if an app knows it will only be used with a certain loop implementation it is free to use extra APIs that only that loop offers. There's still a benefit in that situation: the app may be tied to a platform, but it may still want to use some 3rd party libraries that also require event loop integration, and by conforming to PEP 3156 the platform's loop implementation can ensure that such libraries actually work and interact with the rest of the app in a reasonable manner. (In particular, they should all use the same Future and Task classes.) -- --Guido van Rossum (python.org/~guido)
participants (5)
-
Amaury Forgeot d'Arc
-
Andrew Svetlov
-
Antoine Pitrou
-
Giampaolo Rodolà
-
Guido van Rossum