I am finally ready to show the code I worked on for the past two weeks. This is definitely not ready for anything except as a quick demo, but I learned enough while writing it to feel comfortable with the PEP 380 paradigm. I've set up a Hg repo on code.google.com, and I picked a codename: tulip. View the code here: http://code.google.com/p/tulip/source/browse/ It runs on Linux and OSX; I have no easy access to Windows but I'd be happy to take contributions. Key files in the directory: - main.py: the main program for testing, and a rough HTTP client - sockets.py: transports for sockets and SSL, and a buffering layer - scheduling.py: a Task class and related stuff; this is where the PEP 380 scheduler is implemented - polling.py: an event loop and basic polling implementations for: select(), poll(), epoll(), kqueue() Other junk: .hgignore, Makefile, README, p3time.py (benchmark yield from vs. plain functions), longlines.py (stupid style checker) More detailed discussions per file follows; please read the code along with my description (separately they may not make much sense): polling.py: http://code.google.com/p/tulip/source/browse/polling.py I found it remarkably easy to come up with polling implementations using all those different system calls. I ended up mixing in the pollster class with the event loop class, although I'm not sure that's the best design -- perhaps it's better if the event loop just references the pollster as a separate object. The pollster has a very simple API: add_reader(fd, callback, *args), add_writer(<ditto>), remove_reader(fd), remove_writer(fd), and poll(timeout) -> list of events. (fd means file descriptor.) There's also pollable() which just checks if there are any fds registered. My implementation requires fd to be an int, but that could easily be extended to support other types of event sources. I'm not super happy that I have parallel reader/writer APIs, but passing a separate read/write flag didn't come out any more elegant, and I don't foresee other operation types (though I may be wrong). The event list started out as a tuple of (fd, flag, callback, args), where flag is 'r' or 'w' (easily extensible); in practice neither the fd nor the flag are used, and one of the last things I did was to wrap callback and args into a simple object that allows cancelling the callback; the add_*() methods return this object. (This could probably use a little more abstraction.) Note that poll() doesn't call the callbacks -- that's up to the event loop. The event loop has two basic ways to register callbacks: call_soon(callback, *args) causes callback(*args) to be called the next time the event loop runs; call_later(delay, callback, *args) schedules a callback at some time (relative or absolute) in the future. It also inherits add_reader() and add_writer() from the pollster. Then there is run(), which runs the event loop until there's nothing left to do (no readers, no writers, no soon or later callbacks), and run_once(), which goes through the entire list of event sources once. (I think the order in which I do this isn't quite right but it works for now.) Finally, there's a helper class (ThreadRunner) here which lets you run something in a separate thread using the features of concurrent.futures. It uses the "self-pipe trick" (Google it :-) to ensure that the poll() call wakes up -- this is needed by call_in_thread() at the next layer (scheduling.py). (There may be a race condition here, but I think it can be fixed.) Note that there are no yields (or yield froms) here; that's for the next layer: scheduling.py: http://code.google.com/p/tulip/source/browse/scheduling.py This is the scheduler for PEP-380 style coroutines. I started with a Scheduler class and operations along the lines of Greg Ewing's design, with a Scheduler instance as a global variable, but ended up ripping it out in favor of a Task object that represents a single stack of generators chained via yield-from. There is a Context object holding the event loop and the current task in thread-local storage, so that multiple threads can (and must) have independent event loops. Most user (and much library) code in this system should be written as generators invoking other generators directly using yield from. However to run something as an independent task, you wrap the generator call in a Task() constructor, possibly giving it a timeout, and then calling its start() method. A Task also acts a little like a future -- you can wait() for it, add done-callbacks, and it preserves the return value of the generator call. This can be used to introduce concurrency or to give something a separate timeout. (There are also primitives to wait for the first N completed of a bunch of Tasks.) To invoke a primitive I/O operation, you call the current task's block() method and then immediately yield (similar to Greg Ewing's approach). There are helpers block_r() and block_w() that arrange for a task to block until a file descriptor is ready for reading/writing. Examples of their use are in sockets.py. There is also call_in_thread() which integrates with polling.ThreadRunner to run a function in a separate thread and wait for it. Also used in sockets.py. In the docstrings I use the prefix "COROUTINE:" to indicate public APIs that should be invoked using yield from. sockets.py: http://code.google.com/p/tulip/source/browse/sockets.py This implements some internet primitives using the APIs in scheduling.py (including block_r() and block_w()). I call them transports but they are different from transports Twisted; they are closer to idealized sockets. SocketTransport wraps a plain socket, offering recv() and send() methods that must be invoked using yield from. SslTransport wraps an ssl socket (luckily in Python 2.6 and up, stdlib ssl sockets have good async support!). Then there is a BufferedReader class that implements more traditional read() and readline() coroutines (i.e., to be invoked using yield from), the latter handy for line-oriented transports. Finally there are some functions for connecting sockets, the highest-level one create_transport(). These use call_in_thread() to run socket.getaddrinfo() in a thread (this provides IPv6 support). I don't particularly care about the exact abstractions in this module; they are convenient and I was surprised how easy it was to add SSL, but still these mostly serve as somewhat realistic examples of how to use scheduling.py. (Afterthought: I think the SocketTransport's recv() and send() methods could be made more similar to SslTransport.) More examples in the final file: main.py: http://code.google.com/p/tulip/source/browse/main.py There is a simplistic HTTP client here built on top of the sockets.*Transport abstractions. And the main code exercises this by spawning four tasks fetching a variety of URLs (more when you uncomment a block of code) and waiting for their results. The code is a bit of a mess because I used it as a place to try out various APIs. I'm most interested in feedback on the design of polling.py and scheduling.py, and to a lesser extent on the design of sockets.py; main.py is just an example of how this style works out in practice. Sorry for the brain-dump style; I would like to write it all up better, but at the same time waiting longer doesn't necessarily make it better, so here it is, for all to see. (I also have a list of problems I had to debug during the development and what I learned from that; but that's too raw to post right now.) -- --Guido van Rossum (python.org/~guido)
On 28/10/2012 11:52pm, Guido van Rossum wrote:
I'm most interested in feedback on the design of polling.py and scheduling.py, and to a lesser extent on the design of sockets.py; main.py is just an example of how this style works out in practice.
What happens if two tasks try to do a read op (or two tasks try to do a write op) on the same file descriptor? It looks like the second one to do scheduling.block_r(fd) will cause the first task to be forgotten, causing the first task to block forever. Shouldn't there be a list of pending readers and a list of pending writers for each fd? -- Richard
Richard Oudkerk wrote:
On 28/10/2012 11:52pm, Guido van Rossum wrote:
I'm most interested in feedback on the design of polling.py and scheduling.py, and to a lesser extent on the design of sockets.py; main.py is just an example of how this style works out in practice.
What happens if two tasks try to do a read op (or two tasks try to do a write op) on the same file descriptor? It looks like the second one to do scheduling.block_r(fd) will cause the first task to be forgotten, causing the first task to block forever.
I know I haven't posted my own code yet (coming very soon), but I'd like to put out there that I don't think this is an important sort of question at this time. We both have sample schedulers that work well enough to demonstrate the API, but aren't meant to be production ready. IMO, the important questions are: - how easy/difficult/flexible/restrictive is it to write a new scheduler as a core Python developer? - how easy/difficult/flexible/restrictive is it to write a new scheduler as an end user? - how easy/difficult/flexible/restrictive is it to write async operations as a core Python developer? - how easy/difficult/flexible/restrictive is it to write async operations as an end user? - how straightforward is it to consume async operations? - how easy is it to write async code that is correct? Admittedly, I am writing this preemptively knowing that there are a lot of distractions like this in my code (some people are going to be horrified at what I did with file IO :-) Don't worry, it's only for trying the API). Once we know what interface we'll be coding against we can worry about getting the implementation perfect. Also, I imagine we'll find some more volunteers for coding (hopefully people who have done non-blocking stuff in C or similar before) who are currently avoiding the higher-level ideas discussion. Cheers, Steve
On Mon, Oct 29, 2012 at 7:00 AM, Steve Dower <Steve.Dower@microsoft.com> wrote:
Richard Oudkerk wrote:
On 28/10/2012 11:52pm, Guido van Rossum wrote:
I'm most interested in feedback on the design of polling.py and scheduling.py, and to a lesser extent on the design of sockets.py; main.py is just an example of how this style works out in practice.
What happens if two tasks try to do a read op (or two tasks try to do a write op) on the same file descriptor? It looks like the second one to do scheduling.block_r(fd) will cause the first task to be forgotten, causing the first task to block forever.
I know I haven't posted my own code yet (coming very soon), but I'd like to put out there that I don't think this is an important sort of question at this time.
Kind of. I think if it was an important use case it might affect the shape of the API. However I can't think of a use case where it might make sense for two tasks to read or write the same file descriptor without some higher-level mediation. (Even at a higher level I find it hard to imagine, except for writing to a common log file -- but even there you want to be sure that individual lines aren't spliced into each other, and the semantics of send() don't prevent that.)
We both have sample schedulers that work well enough to demonstrate the API, but aren't meant to be production ready.
IMO, the important questions are:
- how easy/difficult/flexible/restrictive is it to write a new scheduler as a core Python developer? - how easy/difficult/flexible/restrictive is it to write a new scheduler as an end user? - how easy/difficult/flexible/restrictive is it to write async operations as a core Python developer? - how easy/difficult/flexible/restrictive is it to write async operations as an end user? - how straightforward is it to consume async operations? - how easy is it to write async code that is correct?
Yes, these are all important questions. I'm not sure that end users would be writing new schedulers -- but 3rd party library developers will be, and I suppose that's what you are referring to. My own approach to answering these is to first try to figure out what a typical application would be trying to accomplish. That's why I made a point of implementing a 100% async HTTP client -- it's just quirky enough that it exercises various issues (e.g. switching between line-mode and blob mode, and the need to invoke getaddrinfo()).
Admittedly, I am writing this preemptively knowing that there are a lot of distractions like this in my code (some people are going to be horrified at what I did with file IO :-) Don't worry, it's only for trying the API). Once we know what interface we'll be coding against we can worry about getting the implementation perfect. Also, I imagine we'll find some more volunteers for coding (hopefully people who have done non-blocking stuff in C or similar before) who are currently avoiding the higher-level ideas discussion.
I'm looking forward to it! I suspect we'll be merging our designs shortly... -- --Guido van Rossum (python.org/~guido)
On 29/10/2012 2:47pm, Guido van Rossum wrote:
Kind of. I think if it was an important use case it might affect the shape of the API. However I can't think of a use case where it might make sense for two tasks to read or write the same file descriptor without some higher-level mediation. (Even at a higher level I find it hard to imagine, except for writing to a common log file -- but even there you want to be sure that individual lines aren't spliced into each other, and the semantics of send() don't prevent that.)
It is a common pattern to have multiple threads/processes trying to accept connections on an single listening socket, so it would be unfortunate to disallow that. Writing (short messages) to a pipe also has atomic guarantees that can make having multiple writers perfectly reasonable. -- Richard
On Monday 29 Oct 2012, Richard Oudkerk wrote:
Writing (short messages) to a pipe also has atomic guarantees that can make having multiple writers perfectly reasonable.
-- Richard
_______________________________________________ Python-ideas mailing list Python-ideas@python.org http://mail.python.org/mailman/listinfo/python-ideas
Is that actually true? It may be guaranteed on Intel x86 compatibles and Linux (because of the string operations available in the x86 instruction set), but I don't thing anything other than an IPC message has a "you can write a string atomically" guarantee. And I may be misremembering that. And even if it's part of the SUS, how do we know this is true for non-UNIX compatible systems?
On 29/10/2012 4:09pm, Mark Hackett wrote:
Is that actually true? It may be guaranteed on Intel x86 compatibles and Linux (because of the string operations available in the x86 instruction set), but I don't thing anything other than an IPC message has a "you can write a string atomically" guarantee. And I may be misremembering that.
The guarantee I was talking about is for pipes on Unix: <quote> POSIX.1-2001 says that write(2)s of less than PIPE_BUF bytes must be atomic: the output data is written to the pipe as a contiguous sequence. Writes of more than PIPE_BUF bytes may be nonatomic: the kernel may interleave the data with data written by other processes. POSIX.1-2001 requires PIPE_BUF to be at least 512 bytes. (On Linux, PIPE_BUF is 4096 bytes.) ... </quote> On Windows writes to pipes in message oriented mode are also atomic.
And even if it's part of the SUS, how do we know this is true for non-UNIX compatible systems?
We don't, but that isn't necessarily a reason to ban it as evil. -- Richard
On Monday 29 Oct 2012, Richard Oudkerk wrote:
On Windows writes to pipes in message oriented mode are also atomic.
And even if it's part of the SUS, how do we know this is true for non-UNIX compatible systems?
We don't, but that isn't necessarily a reason to ban it as evil.
Hey, good idea I didn't say ban it, then hey? But if the OS cannot guarantee atomic writes (and enforce that size to ensure atomic writes for the system run under), then you cannot just say "Atomic writes mean we can have safely multiple threads accessing the pipe". The multiple access requires atomic access. If that cannot be guaranteed, then you cannot give multiple access.
On Monday 29 Oct 2012, Richard Oudkerk wrote:
On Windows writes to pipes in message oriented mode are also atomic.
PS this means, like I said maybe, that you have to be running an IPC message to get guaranteed atomic writes. If someone has their python programming with multiple thread accessing the pipe, but that pipe is NOT running in message oriented mode, then you will get corruption.
2012/10/29 Mark Hackett <mark.hackett@metoffice.gov.uk>
On Monday 29 Oct 2012, Richard Oudkerk wrote:
Writing (short messages) to a pipe also has atomic guarantees that can make having multiple writers perfectly reasonable.
-- Richard
_______________________________________________ Python-ideas mailing list Python-ideas@python.org http://mail.python.org/mailman/listinfo/python-ideas
Is that actually true? It may be guaranteed on Intel x86 compatibles and Linux (because of the string operations available in the x86 instruction set), but I don't thing anything other than an IPC message has a "you can write a string atomically" guarantee. And I may be misremembering that.
x86 and x64 string operations aren't atomic. Only a few, selected, instructions can be LOCK prefixed (XCHG is the only one that doesn't require it, since it's always locked) to ensure an atomic RMW memory operation. Regards, Cesare
Mark Hackett wrote:
Is that actually true? It may be guaranteed on Intel x86 compatibles and Linux (because of the string operations available in the x86 instruction set), but I don't thing anything other than an IPC message has a "you can write a string atomically" guarantee. And I may be misremembering that.
It seems to be a POSIX requirement: PIPE_BUF POSIX.1-2001 says that write(2)s of less than PIPE_BUF bytes must be atomic: the output data is written to the pipe as a contiguous sequence. (From http://dell9.ma.utexas.edu/cgi-bin/man-cgi?pipe+7) There's no corresponding guarantee for reading, though. The process on the other end can't be sure of getting the data from one write() call in a single read() call. In other words, the write does *not* establish a record boundary. -- Greg
On Mon, Oct 29, 2012 at 9:03 AM, Richard Oudkerk <shibturn@gmail.com> wrote:
On 29/10/2012 2:47pm, Guido van Rossum wrote:
Kind of. I think if it was an important use case it might affect the shape of the API. However I can't think of a use case where it might make sense for two tasks to read or write the same file descriptor without some higher-level mediation. (Even at a higher level I find it hard to imagine, except for writing to a common log file -- but even there you want to be sure that individual lines aren't spliced into each other, and the semantics of send() don't prevent that.)
It is a common pattern to have multiple threads/processes trying to accept connections on an single listening socket, so it would be unfortunate to disallow that.
Ah, but that will work -- each thread has its own pollster, event loop and scheduler and collection of tasks. And listening on a socket is a pretty special case anyway -- I imagine we'd build a special API just for that purpose.
Writing (short messages) to a pipe also has atomic guarantees that can make having multiple writers perfectly reasonable.
That's a good one. I'll keep that on the list of requirements. -- --Guido van Rossum (python.org/~guido)
-----Original Message----- From: Python-ideas [mailto:python-ideas- bounces+kristjan=ccpgames.com@python.org] On Behalf Of Guido van Rossum Sent: 29. október 2012 16:35 To: Richard Oudkerk Cc: python-ideas@python.org Subject: Re: [Python-ideas] Async API: some code to review
It is a common pattern to have multiple threads/processes trying to accept connections on an single listening socket, so it would be unfortunate to disallow that.
Ah, but that will work -- each thread has its own pollster, event loop and scheduler and collection of tasks. And listening on a socket is a pretty special case anyway -- I imagine we'd build a special API just for that purpose.
I don't think he meant actual "threads" but rather thread in the context of coroutines. in StacklessIO (our custom sockets lib for stackless) multiple tasklets can have an "accept" pending on a socket, so that when multiple connections arrive, wakeup time is minimal. We have also been careful to allow multiple operations on sockets, from different tasklets, although the same caveats apply as when multiple threads perform operations, i.e. no guarantees about it making any sense. The important bit is that when such things happen, you get some defined result, rather than for example a tasklet being infinitely blocked. Such errors are suprising and hard to debug. K
[Richard Oudkerk (?)]
It is a common pattern to have multiple threads/processes trying to accept connections on an single listening socket, so it would be unfortunate to disallow that.
[Guido]
Ah, but that will work -- each thread has its own pollster, event loop and scheduler and collection of tasks. And listening on a socket is a pretty special case anyway -- I imagine we'd build a special API just for that purpose.
On Tue, Oct 30, 2012 at 9:05 AM, Kristján Valur Jónsson <kristjan@ccpgames.com> wrote:
I don't think he meant actual "threads" but rather thread in the context of coroutines.
(Yes, we figured that out already. :-)
in StacklessIO (our custom sockets lib for stackless) multiple tasklets can have an "accept" pending on a socket, so that when multiple connections arrive, wakeup time is minimal.
What kind of time savings are we talking about? I imagine that the accept() loop I put in tulip/echosvr.py is fast enough in terms of response time (latency) -- throughput would seem the more important measure (and I have no idea of this yet). http://code.google.com/p/tulip/source/browse/echosvr.py#37
We have also been careful to allow multiple operations on sockets, from different tasklets, although the same caveats apply as when multiple threads perform operations, i.e. no guarantees about it making any sense. The important bit is that when such things happen, you get some defined result, rather than for example a tasklet being infinitely blocked. Such errors are suprising and hard to debug.
That's a good point. It should either cause an immediate, clear exception, or interleave the data without compromising integrity of the scheduler or the app. -- --Guido van Rossum (python.org/~guido)
On 30/10/2012 4:40pm, Guido van Rossum wrote:
What kind of time savings are we talking about? I imagine that the accept() loop I put in tulip/echosvr.py is fast enough in terms of response time (latency) -- throughput would seem the more important measure (and I have no idea of this yet). http://code.google.com/p/tulip/source/browse/echosvr.py#37
With Windows overlapped I/O I think you can get substantially better throughput by starting many AcceptEx() calls in parallel. (For bonus points you can also recycle the accepted connections using DisconnectEx().) Even so, Windows socket code always seems to be much slower than the equivalent on Linux. -- Richard
On Tue, Oct 30, 2012 at 10:50 AM, Richard Oudkerk <shibturn@gmail.com> wrote:
On 30/10/2012 4:40pm, Guido van Rossum wrote:
What kind of time savings are we talking about? I imagine that the accept() loop I put in tulip/echosvr.py is fast enough in terms of response time (latency) -- throughput would seem the more important measure (and I have no idea of this yet). http://code.google.com/p/tulip/source/browse/echosvr.py#37
With Windows overlapped I/O I think you can get substantially better throughput by starting many AcceptEx() calls in parallel. (For bonus points you can also recycle the accepted connections using DisconnectEx().)
Hm... I already have on my list that the transports should probably be platform dependent. So this would suggest that the standard accept loop should be abstracted as a method on the transport object, right?
Even so, Windows socket code always seems to be much slower than the equivalent on Linux.
Is this Python sockets code or are you also talking about other languages, like C++? -- --Guido van Rossum (python.org/~guido)
-----Original Message----- From: gvanrossum@gmail.com [mailto:gvanrossum@gmail.com] On Behalf Of Guido van Rossum Sent: 30. október 2012 16:40 To: Kristján Valur Jónsson Cc: Richard Oudkerk; python-ideas@python.org Subject: Re: [Python-ideas] Async API: some code to review
What kind of time savings are we talking about? I imagine that the accept() loop I put in tulip/echosvr.py is fast enough in terms of response time (latency) -- throughput would seem the more important measure (and I have no idea of this yet). http://code.google.com/p/tulip/source/browse/echosvr.py#37
To be honest, it isn't serious for applications that serve few connections, but for things like web servers, It becomes important. Looking at your code: c a) will always "block", causing the main thread (using the term loosely here) to once through the event loop, possibly doing other housekeepeing, even if a connection was available. I don't think there is no way to selectively do completion based io, i.e. do immediate mode if possible. You either go for one or the other on windows, at least. in select based mecanisms it could be possible to do a select here first and avoid that extra loop, but for the sake of the application it might be confusing. It might be best to stick to one system. b) will either switch to the net task immediately (possible in stackless) or cause the srtart of t to wait until the next round in the event loop. I this case, t will not start executing until after going around the loop twice. A new connection can only be accepted each loop. Imagine two http requests coming in simultaneously, at t=0 The sequence of operations will then be this (assuming FIFO scheduling) main loop runs accept 1 returns. task 1 created. accept 2 scheduled main loop runs making task 1 and accep2 runnable task 1 runs. does processing. performs send, and blocks accept2 returns, task2 created main loop runs, making task2 runnable task2 runs, does processing, performs send. Contributing to latency in this scenario are all the "main loop" runs. Note that I may misunderstand the way your architecture works, perhaps there is no main loop, perhaps everything is interleaved. An alternative something like this: def loop(): while True: conn, addr = yield from listener.accept() handler(conn, addr) for I in range(n_handlers): t = scheduling.Task(loop) t.start() Here, events will be different: main loop runs, accept 1 and accept 2 runnable accept 1 returns, stariting handler, processing and blocking on send accept 2 returns, starting handler, processing, and blocking on send As you see, there is only one initial housekeeping run needed to make both tasklets runnable and ready to run without interruption, giving the lowest possible total latency to the client. In my expericene with RPC systems based this kind of asynchronous python IO, lowering the response time from when user space is made aware of the request and when python actually starts _processing_ it is critical to responsiveness.. Cheers
Ok, this is a good point: the more you can do without having to go through the main loop again the better. I already took this to heart in my recent rewrites of recv() and send() -- they try to read/write the underlying socket first, and if it works, the task isn't suspended; only if they receive EAGAIN or something similar do they block the task and go back to the top. In fact, Listener.accept() does the same thing -- meaning the loop can go around many times without blocking a single time. (The listening socket is in non-blocking mode so accept() will raise EAGAIN when there *isn't* another client connection ready immediately.) This is also one of the advantages of yield-from; you *never* go back to the end of the ready queue just to invoke another layer of abstraction. (Steve tries to approximate this by running the generator immediately until the first yield, but the caller still ends up suspending to the scheduler, because they are using yield which doesn't avoid the suspension, unlike yield-from.) --Guido On Wed, Oct 31, 2012 at 3:07 AM, Kristján Valur Jónsson <kristjan@ccpgames.com> wrote:
-----Original Message----- From: gvanrossum@gmail.com [mailto:gvanrossum@gmail.com] On Behalf Of Guido van Rossum Sent: 30. október 2012 16:40 To: Kristján Valur Jónsson Cc: Richard Oudkerk; python-ideas@python.org Subject: Re: [Python-ideas] Async API: some code to review
What kind of time savings are we talking about? I imagine that the accept() loop I put in tulip/echosvr.py is fast enough in terms of response time (latency) -- throughput would seem the more important measure (and I have no idea of this yet). http://code.google.com/p/tulip/source/browse/echosvr.py#37
To be honest, it isn't serious for applications that serve few connections, but for things like web servers, It becomes important. Looking at your code: c
a) will always "block", causing the main thread (using the term loosely here) to once through the event loop, possibly doing other housekeepeing, even if a connection was available. I don't think there is no way to selectively do completion based io, i.e. do immediate mode if possible. You either go for one or the other on windows, at least. in select based mecanisms it could be possible to do a select here first and avoid that extra loop, but for the sake of the application it might be confusing. It might be best to stick to one system. b) will either switch to the net task immediately (possible in stackless) or cause the srtart of t to wait until the next round in the event loop.
I this case, t will not start executing until after going around the loop twice. A new connection can only be accepted each loop. Imagine two http requests coming in simultaneously, at t=0
The sequence of operations will then be this (assuming FIFO scheduling) main loop runs accept 1 returns. task 1 created. accept 2 scheduled main loop runs making task 1 and accep2 runnable task 1 runs. does processing. performs send, and blocks accept2 returns, task2 created main loop runs, making task2 runnable task2 runs, does processing, performs send.
Contributing to latency in this scenario are all the "main loop" runs. Note that I may misunderstand the way your architecture works, perhaps there is no main loop, perhaps everything is interleaved.
An alternative something like this: def loop(): while True: conn, addr = yield from listener.accept() handler(conn, addr) for I in range(n_handlers): t = scheduling.Task(loop) t.start()
Here, events will be different: main loop runs, accept 1 and accept 2 runnable accept 1 returns, stariting handler, processing and blocking on send accept 2 returns, starting handler, processing, and blocking on send
As you see, there is only one initial housekeeping run needed to make both tasklets runnable and ready to run without interruption, giving the lowest possible total latency to the client.
In my expericene with RPC systems based this kind of asynchronous python IO, lowering the response time from when user space is made aware of the request and when python actually starts _processing_ it is critical to responsiveness..
Cheers
-- --Guido van Rossum (python.org/~guido)
Guido van Rossum wrote:
This is also one of the advantages of yield-from; you *never* go back to the end of the ready queue just to invoke another layer of abstraction. (Steve tries to approximate this by running the generator immediately until the first yield, but the caller still ends up suspending to the scheduler, because they are using yield which doesn't avoid the suspension, unlike yield-from.)
This is easily changed by modifying lines 141 and 180 of scheduler.py to call _step() directly instead of requeuing it. The reason why it currently requeues the task is that there is no guarantee that the caller wanted the next step to occur in the same scheduler, whether because the completed operation or a previous one continued somewhere else. (I removed the option to attach this information to the Future itself, but it is certainly of value in some circumstances, though mostly involving threads and not necessarily sockets.) The change I would probably make here is to test self.target and only requeue if it is different to the current scheduler (alternatively, a scheduler could implement its submit() to do this). Yes, this adds a little more overhead, but I'm still convinced that in general the operations being blocked on will take long enough for it to be insignificant. (And of course using a mechanism to bypass the decorator and use 'yield from' also avoids this overhead, though it potentially changes the program's behaviour). Cheers, Steve
On Wed, Oct 31, 2012 at 8:51 AM, Steve Dower <Steve.Dower@microsoft.com> wrote:
Guido van Rossum wrote:
This is also one of the advantages of yield-from; you *never* go back to the end of the ready queue just to invoke another layer of abstraction. (Steve tries to approximate this by running the generator immediately until the first yield, but the caller still ends up suspending to the scheduler, because they are using yield which doesn't avoid the suspension, unlike yield-from.)
This is easily changed by modifying lines 141 and 180 of scheduler.py to call _step() directly instead of requeuing it. The reason why it currently requeues the task is that there is no guarantee that the caller wanted the next step to occur in the same scheduler, whether because the completed operation or a previous one continued somewhere else. (I removed the option to attach this information to the Future itself, but it is certainly of value in some circumstances, though mostly involving threads and not necessarily sockets.)
I think you are missing the point. Even if you don't make a roundtrip through the queue, *each* yield statement, if it is executed at all, must transfers control to the scheduler. What you're proposing is just making the scheduler immediately resume the generator. So, if you have a trivial task, like this: @async def trivial(x): return x yield # Unreachable, but makes it a generator and a caller: @async caller(): foo = yield trivial(42) print(foo) then the call to trivial(42) returns a Future that already has the result 42 set in it. But caller() still suspends to the scheduler, yielding that Future. The scheduler can resume caller() immediately but the damage (overhead) is done. In contrast, in the yield-from world, we'd write this def trivial(x): return x yield from () # Unreachable def caller(): foo = yield from trivial(42) print(foo) where the latter expands roughly to the following, without reference to the scheduler at all: def caller(): _gen = trivial(42) try: while True: _val = next(_gen) yield _val except StopIteration as _exc: foo = _exc.value print(foo) The first next(gen) call raises StopIteration so the yield is never reached -- the scheduler doesn't know that any of this is going in. And there's no need to do anything special to advance the generator to the first yield manually either. (It's different of course when a generator is wrapped in a Task() constructor. But that should be relatively rare.)
The change I would probably make here is to test self.target and only requeue if it is different to the current scheduler (alternatively, a scheduler could implement its submit() to do this). Yes, this adds a little more overhead, but I'm still convinced that in general the operations being blocked on will take long enough for it to be insignificant. (And of course using a mechanism to bypass the decorator and use 'yield from' also avoids this overhead, though it potentially changes the program's behaviour).
Just get with the program and use yield-from exclusively. -- --Guido van Rossum (python.org/~guido)
On 2012-10-31, at 5:18 PM, Guido van Rossum <guido@python.org> wrote:
@async def trivial(x): return x yield # Unreachable, but makes it a generator
FWIW, just a crazy comment: if we make @async decorator to clone the code object of a passed function and set its (co_flags | 0x0020), then any passed function becomes a generator, even if it doesn't have yields/yield-froms ;) - Yury
Yury, you are really the crazy hacker. Not sure tricks with patching bytecode etc are good for standard library. On Wed, Oct 31, 2012 at 11:31 PM, Yury Selivanov <yselivanov.ml@gmail.com>wrote:
On 2012-10-31, at 5:18 PM, Guido van Rossum <guido@python.org> wrote:
@async def trivial(x): return x yield # Unreachable, but makes it a generator
FWIW, just a crazy comment: if we make @async decorator to clone the code object of a passed function and set its (co_flags | 0x0020), then any passed function becomes a generator, even if it doesn't have yields/yield-froms ;)
- Yury _______________________________________________ Python-ideas mailing list Python-ideas@python.org http://mail.python.org/mailman/listinfo/python-ideas
-- Thanks, Andrew Svetlov
On 2012-10-31, at 5:34 PM, Andrew Svetlov <andrew.svetlov@gmail.com> wrote:
Yury, you are really the crazy hacker. Not sure tricks with patching bytecode etc are good for standard library.
I know that I sort of created an image for myself of "a guy who solves any problem by patching opcodes on live code", but don't worry, I'll never ever recommend such solutions for stdlib/python :) This is, however, a nice technique to rapidly prototype and test interesting ideas. - Yury
Guido van Rossum wrote:
Just get with the program and use yield-from exclusively.
I didn't realise there was a "program" here, just a discussion about an API design. I've already raised my concerns with using yield from exclusively, but since the performance argument trumps all of those then there is little more I can contribute. When a final design begins to stabilise, I will see how I can make use of it in my own code. Until then, I'll continue using Futures, which are ideal for my current needs. I won't be forcing 'yield from' onto my users until its usage is clear and I can provide them with suitable guidance. Cheers, Steve
On Wed, Oct 31, 2012 at 2:31 PM, Steve Dower <Steve.Dower@microsoft.com> wrote:
Guido van Rossum wrote:
Just get with the program and use yield-from exclusively.
I didn't realise there was a "program" here, just a discussion about an API design.
Sorry, I left off a smiley. :-)
I've already raised my concerns with using yield from exclusively, but since the performance argument trumps all of those then there is little more I can contribute.
What about the usability argument? Don't you think users will be confused by the need to use yield from some times and just yield other times? Yes, they may be able to tell by looking up the definition and checking how it is decorated, but that doesn't really help.
When a final design begins to stabilise, I will see how I can make use of it in my own code. Until then, I'll continue using Futures, which are ideal for my current needs. I won't be forcing 'yield from' onto my users until its usage is clear and I can provide them with suitable guidance.
Understood. What exactly is it that makes Futures so ideal for your current needs? Is it integration with threads? Another tack: could you make use of tulip/polling.py? That doesn't use generators of any form; it is meant as an integration point with other styles of async programming (although I am not claiming that it is any good in its current form -- this too is just a strawman to shoot down). -- --Guido van Rossum (python.org/~guido)
Guido van Rossum wrote:
On Wed, Oct 31, 2012 at 2:31 PM, Steve Dower <Steve.Dower@microsoft.com> wrote:
Guido van Rossum wrote:
Just get with the program and use yield-from exclusively.
I didn't realise there was a "program" here, just a discussion about an API design.
Sorry, I left off a smiley. :-)
Always a risk in email communication - no offence taken.
I've already raised my concerns with using yield from exclusively, but since the performance argument trumps all of those then there is little more I can contribute.
What about the usability argument? Don't you think users will be confused by the need to use yield from some times and just yield other times? Yes, they may be able to tell by looking up the definition and checking how it is decorated, but that doesn't really help.
Users only ever _need_ to write yield. The only reason that wattle does not work with Python 3.2 is because of non-blank returns inside generators. There is only one reason to use 'yield from' and that is for the performance optimisation, which I do acknowledge and did observe in my own benchmarks.
When a final design begins to stabilise, I will see how I can make use of it in my own code. Until then, I'll continue using Futures, which are ideal for my current needs. I won't be forcing 'yield from' onto my users until its usage is clear and I can provide them with suitable guidance.
Understood. What exactly is it that makes Futures so ideal for your current needs? Is it integration with threads?
Another tack: could you make use of tulip/polling.py? That doesn't use generators of any form; it is meant as an integration point with other styles of async programming (although I am not claiming that it is any good in its current form -- this too is just a strawman to shoot down).
I know I've been vague about our intended application (deliberately so, to try and keep the discussion neutral), but I'll lay out some details. We're working on adding support for Windows 8 apps (formerly known as Metro) written in Python. These will use the new API (WinRT) which is highly asynchronous - even operations such as opening a file are only* available as an asynchronous function. The intention is to never block on the UI thread. (* Some synchronous Win32 APIs are still available from C++, but these are actively discouraged and restricted in many ways. Most of Win32 is not usable.) The model used for these async APIs is future-based: every *Async() function returns a future for a task that is already running. The caller is not allowed to wait on this future - the only option is to attach a callback. C# and VB use their async/await keywords (good 8 min intro video on those: http://www.visualstudiolaunch.com/vs2012vle/Theater?sid=1778) while JavaScript and C++ have multi-line lambda support. For Python, we are aiming for closer to the async/await model (which is also how we chose the names). Incidentally, our early designs used yield from exclusively. It was only when we started discovering edge-cases where things broke, as well as the impact on code 'cleanliness', that we switched to yield. There are three aspects of this that work better and result in cleaner code with wattle than with tulip: - event handlers can be "async-void", such that when the event is raised by the OS/GUI/device/whatever the handler can use asynchronous tasks without blocking the main thread. In this case, the caller receives a future but ignores it because it does not care about the final result. (We could achieve this under 'yield from' by requiring a decorator, which would then probably prevent other Python code from calling the handler directly. There is very limited opportunity for us to reliably intercept this case.) - the event loop is implemented by the OS. Our Scheduler implementation does not need to provide an event loop, since we can submit() calls to the OS-level loop. This pattern also allows wattle to 'sit on top of' any other event loop, probably including Twisted and 0MQ, though I have not tried it (except with Tcl). - Future objects can be marshalled directly from Python into Windows, completing the interop story. Even with tulip, we would probably still require a decorator for this case so that we can marshal regular generators as iterables (for which there is a specific type). Without a decorator, we would probably have to ban both cases to prevent subtly misbehaving programs. At least with wattle, the user does not have to do anything different from any of their other @async functions. Despite this intended application, I have tried to approach this design task independently to produce an API that will work for many cases, especially given the narrow focus on sockets. If people decide to get hung up on "the Microsoft way" or similar rubbish then I will feel vindicated for not mentioning it earlier :-) - it has not had any more influence on wattle than any of my other past experience has. Cheers, Steve
On Wed, Oct 31, 2012 at 5:36 PM, Steve Dower <Steve.Dower@microsoft.com>wrote:
Despite this intended application, I have tried to approach this design task independently to produce an API that will work for many cases, especially given the narrow focus on sockets. If people decide to get hung up on "the Microsoft way" or similar rubbish then I will feel vindicated for not mentioning it earlier :-) - it has not had any more influence on wattle than any of my other past experience has.
Oh, what sad times are these when passing ruffians can say 'The Microsoft Way' at will to old developers. There is a pestilence upon this land! Nothing is sacred. Even those who arrange and design async APIs are under considerable hegemonic stress at this point in time. /me crawls back under his rock.
On Wed, Oct 31, 2012 at 3:36 PM, Steve Dower <Steve.Dower@microsoft.com> wrote:
Guido van Rossum wrote: There is only one reason to use 'yield from' and that is for the performance optimisation, which I do acknowledge and did observe in my own benchmarks.
Actually, it is not just optimization. The logic of the scheduler also becomes much simpler.
I know I've been vague about our intended application (deliberately so, to try and keep the discussion neutral), but I'll lay out some details.
Actually I wish you'd written this sooner. I don't know about you, but my brain has a hard time understanding abstractions that are presented without concrete use cases and implementations alongside; OTOH I delight in taking a concrete mess and extract abstractions from it. (The Twisted guys are also masters at this.) So far I didn't really "get" the reasons you brought up for some of complications you introduced (like multiple Future implementations). Now I think I'm glimpsing your reasons.
We're working on adding support for Windows 8 apps (formerly known as Metro) written in Python. These will use the new API (WinRT) which is highly asynchronous - even operations such as opening a file are only* available as an asynchronous function. The intention is to never block on the UI thread.
Interesting. The lack of synchronous wrappers does seem a step back, but is probably useful as a forcing function given the desire to keep the UI responsive at all times.
(* Some synchronous Win32 APIs are still available from C++, but these are actively discouraged and restricted in many ways. Most of Win32 is not usable.)
The model used for these async APIs is future-based: every *Async() function returns a future for a task that is already running. The caller is not allowed to wait on this future - the only option is to attach a callback. C# and VB use their async/await keywords (good 8 min intro video on those: http://www.visualstudiolaunch.com/vs2012vle/Theater?sid=1778) while JavaScript and C++ have multi-line lambda support.
Erik Meijer introduced me to async/await on Elba two months ago. I was very excited to recognize exactly what I'd done for NDB with @tasklet and yield, supported by the type checking.
For Python, we are aiming for closer to the async/await model (which is also how we chose the names).
If we weren't so reluctant to introduce new keywords in Python we might introduce await as an alias for yield from in the future.
Incidentally, our early designs used yield from exclusively. It was only when we started discovering edge-cases where things broke, as well as the impact on code 'cleanliness', that we switched to yield.
Very interesting. I'd love to see a much longer narrative on this. (You can send it to me directly if you feel it would distract the list or if you feel it's inappropriate to share widely. I'll keep it under my hat as long as you say so.)
There are three aspects of this that work better and result in cleaner code with wattle than with tulip:
- event handlers can be "async-void", such that when the event is raised by the OS/GUI/device/whatever the handler can use asynchronous tasks without blocking the main thread.
I think this is "fire-and-forget"? I.e. you initiate an action and then just let it run until completion without ever checking the result? In tulip you currently do that by wrapping it in a Task and calling its start() method. (BTW I think I'm going to get rid of start() -- creating a Task should just start it.)
In this case, the caller receives a future but ignores it because it does not care about the final result. (We could achieve this under 'yield from' by requiring a decorator, which would then probably prevent other Python code from calling the handler directly. There is very limited opportunity for us to reliably intercept this case.)
Are you saying that this property (you don't wait for the result) is required by the operation rather than an option for the user? I'm only familiar with the latter -- e.g. I can imagine firing off an operation that writes a log entry somewhere but not caring about whether it succeeded -- but I would still make it *possible* to check on the operation if the caller cares (what if it's a very important log message?). If there's no option for the caller, the API should present itself as a regular function/method and the task-spawning part should be hidden inside it -- I see no need for the caller to know about this. What exactly do you mean by "reliably intercept this case" ? A concrete example would help.
- the event loop is implemented by the OS. Our Scheduler implementation does not need to provide an event loop, since we can submit() calls to the OS-level loop. This pattern also allows wattle to 'sit on top of' any other event loop, probably including Twisted and 0MQ, though I have not tried it (except with Tcl).
Ok, so what is the API offered by the OS event loop? I really want to make sure that tulip can interface with strange event loops, and this may be the most concrete example so far -- and it may be an important one.
- Future objects can be marshalled directly from Python into Windows, completing the interop story.
What do you mean by marshalled here? Surely not the stdlib marshal module. Do you just mean that Future objects can be recognized by the foreign-function interface and wrapped by / copied into native Windows 8 datatypes? I understand your event loop understands Futures? All of them? Or only the ones of the specific type that it also returns?
Even with tulip, we would probably still require a decorator for this case so that we can marshal regular generators as iterables (for which there is a specific type).
I can't quite follow you here, probably due to lack of imagination on my part. Can you help me with a (somewhat) concrete example?
Without a decorator, we would probably have to ban both cases to prevent subtly misbehaving programs.
Concrete example?
At least with wattle, the user does not have to do anything different from any of their other @async functions.
This is because you can put type checks inside @async, which sees the function object before it's called, rather than the scheduler, which only sees what it returned, right? That's a trick I use in NDB as well and I think tulip will end up requiring a decorator too -- but it will just "mark" the function rather than wrap it in another one, unless the function is not a generator (in which case it will probably have to wrap it in something that is a generator). I could imagine a debug version of the decorator that added wrappers in all cases though.
Despite this intended application, I have tried to approach this design task independently to produce an API that will work for many cases, especially given the narrow focus on sockets. If people decide to get hung up on "the Microsoft way" or similar rubbish then I will feel vindicated for not mentioning it earlier :-) - it has not had any more influence on wattle than any of my other past experience has.
No worries about that. I agree that we need concrete examples that takes us beyond the world of sockets; it's just that sockets are where most of the interest lies (Tornado is a webserver, Twisted is often admired because of its implementations of many internet protocols, people benchmark async frameworks on how many HTTP requests per second they can serve) and I haven't worked with any type of GUI framework in a very long time. (Kudos for trying your way Tk!) -- --Guido van Rossum (python.org/~guido)
Guido van Rossum wrote:
On Wed, Oct 31, 2012 at 3:36 PM, Steve Dower <Steve.Dower@microsoft.com> wrote:
Guido van Rossum wrote: There is only one reason to use 'yield from' and that is for the performance optimisation, which I do acknowledge and did observe in my own benchmarks.
Actually, it is not just optimization. The logic of the scheduler also becomes much simpler.
I'd argue that it doesn't, it just happens that the implementation of 'yield from' in the interpreter happens to match the most common case. In any case, the affected area of code (which I haven't been calling 'scheduler', which seems to have caused some confusion elsewhere) only has to be written once and never touched again. It could even be migrated into C, which should significantly improve the performance. (In wattle, this is the _Awaiter class.)
I know I've been vague about our intended application (deliberately so, to try and keep the discussion neutral), but I'll lay out some details.
Actually I wish you'd written this sooner. I don't know about you, but my brain has a hard time understanding abstractions that are presented without concrete use cases and implementations alongside; OTOH I delight in taking a concrete mess and extract abstractions from it. (The Twisted guys are also masters at this.)
So far I didn't really "get" the reasons you brought up for some of complications you introduced (like multiple Future implementations). Now I think I'm glimpsing your reasons.
Part of the art of conversation is figuring out how the other participants need to hear something. My apologies for not figuring this out sooner :)
We're working on adding support for Windows 8 apps (formerly known as Metro) written in Python. These will use the new API (WinRT) which is highly asynchronous - even operations such as opening a file are only* available as an asynchronous function. The intention is to never block on the UI thread.
Interesting. The lack of synchronous wrappers does seem a step back, but is probably useful as a forcing function given the desire to keep the UI responsive at all times.
Indeed. Based on the Win 8 apps I regularly use, it's worked well. On the other hand, updating CPython to avoid the synchronous ones (which I've done, and will be submitting for consideration soon, once I've been able to test on an ARM device) is less fun.
(* Some synchronous Win32 APIs are still available from C++, but these are actively discouraged and restricted in many ways. Most of Win32 is not usable.)
The model used for these async APIs is future-based: every *Async() function returns a future for a task that is already running. The caller is not allowed to wait on this future - the only option is to attach a callback. C# and VB use their async/await keywords (good 8 min intro video on those: http://www.visualstudiolaunch.com/vs2012vle/Theater?sid=1778) while JavaScript and C++ have multi-line lambda support.
Erik Meijer introduced me to async/await on Elba two months ago. I was very excited to recognize exactly what I'd done for NDB with @tasklet and yield, supported by the type checking.
For Python, we are aiming for closer to the async/await model (which is also how we chose the names).
If we weren't so reluctant to introduce new keywords in Python we might introduce await as an alias for yield from in the future.
We discussed that internally and decided that it was unnecessary, or at least that it should be a proper keyword rather than an alias (as in, you can't use 'await' to delegate to a subgenerator). I'd rather see codef added first, since that (could) remove the need for the decorators.
Incidentally, our early designs used yield from exclusively. It was only when we started discovering edge-cases where things broke, as well as the impact on code 'cleanliness', that we switched to yield.
Very interesting. I'd love to see a much longer narrative on this. (You can send it to me directly if you feel it would distract the list or if you feel it's inappropriate to share widely. I'll keep it under my hat as long as you say so.)
If I get a chance to write something up then I will do that. I'll quite happily post it publicly, though it may go on my blog rather than here - this email is going to be long enough already. There is very little already written up since we discussed most of it at a whiteboard, though I do still have some early code iterations.
There are three aspects of this that work better and result in cleaner code with wattle than with tulip:
- event handlers can be "async-void", such that when the event is raised by the OS/GUI/device/whatever the handler can use asynchronous tasks without blocking the main thread.
I think this is "fire-and-forget"? I.e. you initiate an action and then just let it run until completion without ever checking the result? In tulip you currently do that by wrapping it in a Task and calling its start() method. (BTW I think I'm going to get rid of start() -- creating a Task should just start it.)
Yes, exactly. The only thing I dislike about tulip's current approach is that it requires two functions. If/when we support it, we'd provide a decorator that does the wrapping.
In this case, the caller receives a future but ignores it because it does not care about the final result. (We could achieve this under 'yield from' by requiring a decorator, which would then probably prevent other Python code from calling the handler directly. There is very limited opportunity for us to reliably intercept this case.)
Are you saying that this property (you don't wait for the result) is required by the operation rather than an option for the user? I'm only familiar with the latter -- e.g. I can imagine firing off an operation that writes a log entry somewhere but not caring about whether it succeeded -- but I would still make it *possible* to check on the operation if the caller cares (what if it's a very important log message?).
If there's no option for the caller, the API should present itself as a regular function/method and the task-spawning part should be hidden inside it -- I see no need for the caller to know about this.
What exactly do you mean by "reliably intercept this case" ? A concrete example would help.
You're exactly right, there is no need for the original caller (for example, Windows itself) to know about the task. However, every incoming call initially comes through a COM interface that we provide (written in C) that will then invoke the Python function. This is our opportunity to intercept by looking at the returned value from the Python function before returning to the original caller. Under wattle, we can type check here for a Future (or compatible interface), which is only ever used for async functions. On the other hand, we cannot reliable type-check for a generator to determine whether it is supposed to be async or supposed to be an iterator. If the interface we implement expects an iterator then we can assume that we should treat the generator like that. However, if the user intended their code to be async and used 'yield from' with no decorator, we cannot provide any useful feedback: they will simply return a sequence of null pointers that is executed as quickly as the caller wants to - there is no scheduler involved in this case.
- the event loop is implemented by the OS. Our Scheduler implementation does not need to provide an event loop, since we can submit() calls to the OS-level loop. This pattern also allows wattle to 'sit on top of' any other event loop, probably including Twisted and 0MQ, though I have not tried it (except with Tcl).
Ok, so what is the API offered by the OS event loop? I really want to make sure that tulip can interface with strange event loops, and this may be the most concrete example so far -- and it may be an important one.
There are three main APIs involved: * Windows.UI.Core.CoreDispatcher.run_async() (and run_idle_async(), which uses a low priority) * Windows.System.Threading.ThreadPool.run_async() * any API that returns a future (==an object implementing IAsyncInfo) Strictly, the third category covers the first two, since they both return a future, but they are also the APIs that allow the user/developer to schedule work on or off the UI thread (respectively). For wattle, they equate directly to Scheduler.submit, Scheduler.thread_pool.submit (which wasn't in the code, but was suggested in the write-up) and Future.
- Future objects can be marshalled directly from Python into Windows, completing the interop story.
What do you mean by marshalled here? Surely not the stdlib marshal module.
No.
Do you just mean that Future objects can be recognized by the foreign-function interface and wrapped by / copied into native Windows 8 datatypes?
Yes, this is exactly what we would do. The FFI creates a WinRT object that forwards calls between Python and Windows as necessary. (This is a general mechanism that we use for many types, so it doesn't matter how the Future is created. On a related note, returning a Future from Python code into Windows will not be a common occurrence - it is far more common for Python to consume Futures that are passed in.)
I understand your event loop understands Futures? All of them? Or only the ones of the specific type that it also returns?
It's based on an interface, so as long as we can provide (equivalents of) add_done_callback() and result() then the FFI will do the rest.
Even with tulip, we would probably still require a decorator for this case so that we can marshal regular generators as iterables (for which there is a specific type).
I can't quite follow you here, probably due to lack of imagination on my part. Can you help me with a (somewhat) concrete example?
Given a (Windows) prototype: IIterable<String> GetItems(); We want to allow the Python function to be implemented as: def get_items(): for data in ['a', 'b', 'c']: yield data This is a pretty straightforward mapping: Python returns a generator, which supports the same interface as IIterable, so we can marshal the object out and convert each element to a string. The problem is when a (possibly too keen) user writes the following code: def get_items(): data = yield from get_data_async() return data Now the returned generator is full of None, which we will happily convert to a sequence of empty strings (==null pointers in Win8). With wattle, the yielded objects would be Futures, which would still be converted to strings, but at least are obviously incorrect. Also, since the user should be in the habit of adding @async already, we can raise an error even earlier when the return value is a future and not a generator. Unfortunately, nothing can fix this code (except maybe a new keyword): def get_items(): data = yield from get_data_async() for item in data: yield item
Without a decorator, we would probably have to ban both cases to prevent subtly misbehaving programs.
Concrete example?
Given above. By banning both cases we would always raise TypeError when a generator is returned, even if an iterable or an async operation is expected, because we can't be sure which one we have.
At least with wattle, the user does not have to do anything different from any of their other @async functions.
This is because you can put type checks inside @async, which sees the function object before it's called, rather than the scheduler, which only sees what it returned, right? That's a trick I use in NDB as well and I think tulip will end up requiring a decorator too -- but it will just "mark" the function rather than wrap it in another one, unless the function is not a generator (in which case it will probably have to wrap it in something that is a generator). I could imagine a debug version of the decorator that added wrappers in all cases though.
It's not so much the type checks inside @async - those are basically to support non-generator functions being wrapped (though there is little benefit to this apart from maintaining a consistent interface). The benefit is that the _returned object_ is always going to be some sort of Future. Because of the way that our FFI will work, a simple marker on the function would be sufficient for our interop purposes. However, I don't think it is a general enough solution (for example, if the caller is already in Python then they may not get to see the function before it is called - Twisted might be affected by this, though I'm not sure). What might work best is allowing the replacement scheduler/pollster to provide or override the decorator somehow, though I don't see any convenient way to do this
Despite this intended application, I have tried to approach this design task independently to produce an API that will work for many cases, especially given the narrow focus on sockets. If people decide to get hung up on "the Microsoft way" or similar rubbish then I will feel vindicated for not mentioning it earlier :-) - it has not had any more influence on wattle than any of my other past experience has.
No worries about that. I agree that we need concrete examples that takes us beyond the world of sockets; it's just that sockets are where most of the interest lies (Tornado is a webserver, Twisted is often admired because of its implementations of many internet protocols, people benchmark async frameworks on how many HTTP requests per second they can serve) and I haven't worked with any type of GUI framework in a very long time. (Kudos for trying your way Tk!)
I don't blame you for avoiding GUI frameworks... there are very few that work well. Hopefully when we fully support XAML-based GUIs that will change somewhat, at least for Windows developers. Also, I didn't include the Tk scheduler in BitBucket, but just to illustrate the simplicity of wrapping an existing loop I've posted the full code below (it still has some old names in it): import contexts class TkContext(contexts.CallableContext): def __init__(self, app): self.app = app @staticmethod def invoke(callable, args, kwargs): callable(*args, **kwargs) def submit(self, callable, *args, **kwargs): '''Adds a callable to invoke within this context.''' self.app.after(0, TkContext.invoke, callable, args, kwargs) Cheers, Steve
On 11/1/2012 12:44 PM, Steve Dower wrote:
C# and VB use their async/await keywords (good 8 min intro video on those: http://www.visualstudiolaunch.com/vs2012vle/Theater?sid=1778
Thanks for the link. It make much of this discussion more concrete for me. As a potential user, the easy async = @async, await = yield from transformation (additions) is what I would like for Python. I do realize that the particular task was picked to be easy and that other things might be harder (on Windows), and that Python has the additional problem of working on multiple platforms. But I think 'make easy things easy and difficult things possible' applies here. I have no problem with 'yield from' instead of 'await' = 'wait for'. Actually, the caller of the movie list fetcher did *not* wait for the entire list to be fetched, even asynchronously. Rather, it displayed items as they were available (yielded). So the app does less waiting, and 'yield as available' is what 'await' does in that example. Actually, I do not see how just adding 4 keywords would necessarily have the effect it did. I imagine there is a bit more to the story than was shown, like the 'original' code being carefully written so that the change would have the effect it did. The video is, after all, an advertorial. Nonetheless, it was impressive. -- Terry Jan Reedy
Terry Reedy wrote:
On 11/1/2012 12:44 PM, Steve Dower wrote:
C# and VB use their async/await keywords (good 8 min intro video on those: http://www.visualstudiolaunch.com/vs2012vle/Theater?sid=1778
[SNIP]
Actually, I do not see how just adding 4 keywords would necessarily have the effect it did. I imagine there is a bit more to the story than was shown, like the 'original' code being carefully written so that the change would have the effect it did. The video is, after all, an advertorial. Nonetheless, it was impressive.
It is certainly a dramatic demo, and you are right to be skeptical. The "carefully written" part is that the code already used paging as part of its query - the "give me movies from 1950" request is actually a series of "give me 10 movies from 1950 starting from {0, 10, 20, 30, ...}" requests (this is why you see the progress counter go up by 10 each time) - and it's already updating the UI between each page. The "4 keywords" also activate a significant amount of compiler machinery that actually rewrites the original code, much like the conversion to a generator, so there is quite a bit of magic. There are plenty of videos at http://channel9.msdn.com/search?term=async+await that go much deeper into how it all works, including the 3rd-party extensibility mechanisms. (And apologies about the video only being available with Silverlight - I didn't realise this when I originally posted it. The videos at the later link are much more readily available, but also very deeply technical and quite long.) Cheers, Steve
Guido van Rossum wrote:
If we weren't so reluctant to introduce new keywords in Python we might introduce await as an alias for yield from in the future.
Or 'cocall'. :-)
I think tulip will end up requiring a decorator too -- but it will just "mark" the function rather than wrap it in another one, unless the function is not a generator (in which case it will probably have to wrap it in something that is a generator).
I don't see how that helps much, because the scheduler doesn't see generators used in yield-from calls. There is *no* way to catch the mistake of writing foo() when you should have written yield from foo() instead. This is one way that codef/cocall (or some variant on it) would help, by clearly diagnosing that mistake. -- Greg
-----Original Message----- From: gvanrossum@gmail.com [mailto:gvanrossum@gmail.com] On Behalf Of Guido van Rossum Sent: 31. október 2012 15:37 To: Kristján Valur Jónsson Cc: Richard Oudkerk; python-ideas@python.org Subject: Re: [Python-ideas] Async API: some code to review
Ok, this is a good point: the more you can do without having to go through the main loop again the better.
I already took this to heart in my recent rewrites of recv() and send() -- they try to read/write the underlying socket first, and if it works, the task isn't suspended; only if they receive EAGAIN or something similar do they block the task and go back to the top.
Yes, this is possible for non-blocking style IO. However, for IO architectures that are based on completions, you can't always mix and match. On windows, for example it is complicated to do because of how AcceptEx works. I recall socket properties, overlapped property and other things interfering. I also recall testing the use of first trying non-blocking IO (for accept and send/recv) and then resorting to an IOCP call. If I recall correctly, the added overhead of trying a non-blocking call for the usual case of it failing was detrimental to the whole exercise. the non-blocking IO calls took non-trivial time to complete. The approach of having multiple "threads" doing accept also avoids the delay required to dispatch the request from the accepting thread to the worker thread.
In fact, Listener.accept() does the same thing -- meaning the loop can go This is also one of the advantages of yield-from; you *never* go back to the end of the ready queue just to invoke another layer of abstraction.
My experience with this stuff is of course based on stackless/gevent style programming, so some of it may not apply :) Personally, I feel that things should just magically work, from the programmer's point of view, rather than have to manually leave a trace of breadcrumbs through the stack using "yield" constructs. But that's just me. K
Kristján Valur Jónsson wrote:
in StacklessIO (our custom sockets lib for stackless) multiple tasklets can have an "accept" pending on a socket, so that when multiple connections arrive, wakeup time is minimal.
With sufficiently cheap tasks, there's another way to approach this: one task is dedicated to accepting connections from the socket, and it spawns a new task to handle each connection. -- Greg
Steve Dower wrote:
- how easy/difficult/flexible/restrictive is it to write a new scheduler as an end user?
I don't think that writing new schedulers is something an end user will do very often. Or more precisely, it's not something they should *have* to do except in extremely unusual circumstances. I believe it will be possible to provide a scheduler in the stdlib that will be satisfactory for the vast majority of applications. -- Greg
Greg Ewing wrote:
Steve Dower wrote:
- how easy/difficult/flexible/restrictive is it to write a new scheduler as an end user?
I don't think that writing new schedulers is something an end user will do very often. Or more precisely, it's not something they should *have* to do except in extremely unusual circumstances.
I believe it will be possible to provide a scheduler in the stdlib that will be satisfactory for the vast majority of applications.
I agree, and I chose my words poorly for that point: "library/framework developers" is more accurate than "end user". And since I expect every GUI framework is going to need (or at least want) their own scheduler, not to mention all the cases of Python being embedded in other programs, there is some value in helping these developers to get it right by virtue of the design rather than relying on documentation. Cheers, Steve
On Mon, Oct 29, 2012 at 4:26 PM, Steve Dower <Steve.Dower@microsoft.com> wrote:
Greg Ewing wrote:
Steve Dower wrote:
- how easy/difficult/flexible/restrictive is it to write a new scheduler as an end user?
I don't think that writing new schedulers is something an end user will do very often. Or more precisely, it's not something they should *have* to do except in extremely unusual circumstances.
I believe it will be possible to provide a scheduler in the stdlib that will be satisfactory for the vast majority of applications.
I agree, and I chose my words poorly for that point: "library/framework developers" is more accurate than "end user". And since I expect every GUI framework is going to need (or at least want) their own scheduler, not to mention all the cases of Python being embedded in other programs, there is some value in helping these developers to get it right by virtue of the design rather than relying on documentation.
BTW, would it be useful to separate this into pollster, eventloop, and scheduler? At least in my world these are different; of these three, only the pollster contains platform-specific code (and then again the transports do too -- this is a nice match IMO). -- --Guido van Rossum (python.org/~guido)
Steve Dower wrote:
I believe it will be possible to provide a scheduler in the stdlib that will be satisfactory for the vast majority of applications.
I agree, and I chose my words poorly for that point: "library/framework developers" is more accurate than "end user".
I don't think that even library developers should need to write their own scheduler very often.
And since I expect every GUI framework is going to need (or at least want) their own scheduler,
I don't agree with that. They might need their own event loop, but I haven't seen any reason so far to think they would need their own coroutine scheduler. Remember that Guido wants to keep the event loop stuff and the scheduler stuff very clearly separated. The scheduler will all be pure Python and should be usable with just about any event loop. -- Greg
Hi Richard, On Mon, Oct 29, 2012 at 3:13 PM, Richard Oudkerk <shibturn@gmail.com> wrote:
On 28/10/2012 11:52pm, Guido van Rossum wrote:
I'm most interested in feedback on the design of polling.py and scheduling.py, and to a lesser extent on the design of sockets.py; main.py is just an example of how this style works out in practice.
What happens if two tasks try to do a read op (or two tasks try to do a write op) on the same file descriptor? It looks like the second one to do scheduling.block_r(fd) will cause the first task to be forgotten, causing the first task to block forever.
Shouldn't there be a list of pending readers and a list of pending writers for each fd?
There is another approach to handle this. You create a dedicated coroutine which does writing (or reading). And if other coroutine needs to write, it puts data into a queue (or channel), and wait until writer coroutine picks it up. This way you don't care about atomicity of writes, and a lot of other things. This approach is similar to what Greg Ewing proposed for handling accept() recently. -- Paul
There is another approach to handle this. You create a dedicated coroutine which does writing (or reading). And if other coroutine needs to write, it puts data into a queue (or channel), and wait until writer coroutine picks it up. This way you don't care about atomicity of writes, and a lot of other things.
I support this idea, IMHO it's by far the easiest (or least problematic) way to handle the complexity of concurrency. What's the general position on monkey patching existing libs ? This might not be possible with the above ? /rene
This approach is similar to what Greg Ewing proposed for handling accept() recently.
-- Paul _______________________________________________ Python-ideas mailing list Python-ideas@python.org http://mail.python.org/mailman/listinfo/python-ideas
Hello Guido, Le Sun, 28 Oct 2012 16:52:02 -0700, Guido van Rossum <guido@python.org> a écrit :
The event list started out as a tuple of (fd, flag, callback, args), where flag is 'r' or 'w' (easily extensible); in practice neither the fd nor the flag are used, and one of the last things I did was to wrap callback and args into a simple object that allows cancelling the callback; the add_*() methods return this object. (This could probably use a little more abstraction.) Note that poll() doesn't call the callbacks -- that's up to the event loop.
I don't understand why the pollster takes callback objects if it never calls them. Also the fact that it wraps them into DelayedCalls is more mysterious to me. DelayedCalls represent one-time cancellable callbacks with a given deadline, not callbacks which are called any number of times on I/O events and that you can't cancel.
scheduling.py: http://code.google.com/p/tulip/source/browse/scheduling.py
This is the scheduler for PEP-380 style coroutines. I started with a Scheduler class and operations along the lines of Greg Ewing's design, with a Scheduler instance as a global variable, but ended up ripping it out in favor of a Task object that represents a single stack of generators chained via yield-from. There is a Context object holding the event loop and the current task in thread-local storage, so that multiple threads can (and must) have independent event loops.
YMMV, but I tend to be wary of implicit thread-local storage. What if someone runs a function or method depending on that thread-local storage from inside a thread pool? Weird bugs ensue. I think explicit context is much less error-prone. Even a single global instance (like Twisted's reactor) would be better :-) As for the rest of the scheduling module, I can't say much since I have a hard time reading and understanding it.
To invoke a primitive I/O operation, you call the current task's block() method and then immediately yield (similar to Greg Ewing's approach). There are helpers block_r() and block_w() that arrange for a task to block until a file descriptor is ready for reading/writing. Examples of their use are in sockets.py.
That's weird and kindof ugly IMHO. Why would you write: scheduling.block_w(self.sock.fileno()) yield instead of say: yield scheduling.block_w(self.sock.fileno()) ? Also, the fact that each call to SocketTransport.{recv,send} explicitly registers then removes the fd on the event loop looks wasteful. By the way, even when a fd is signalled ready, you must still be prepared for recv() to return EAGAIN (see http://bugs.python.org/issue9090).
In the docstrings I use the prefix "COROUTINE:" to indicate public APIs that should be invoked using yield from.
Hmm, should they? Your approach looks a bit weird: you have functions that should use yield, and others that should use "yield from"? That sounds confusing to me. I'd much rather either have all functions use "yield", or have all functions use "yield from". (also, I wouldn't be shocked if coroutines had to wear a special decorator; it's a better marker than having the word COROUTINE in the docstring, anyway :-))
sockets.py: http://code.google.com/p/tulip/source/browse/sockets.py
This implements some internet primitives using the APIs in scheduling.py (including block_r() and block_w()). I call them transports but they are different from transports Twisted; they are closer to idealized sockets. SocketTransport wraps a plain socket, offering recv() and send() methods that must be invoked using yield from. SslTransport wraps an ssl socket (luckily in Python 2.6 and up, stdlib ssl sockets have good async support!).
SslTransport.{recv,send} need the same kind of logic as do_handshake(): catch both SSLWantReadError and SSLWantWriteError, and call block_r / block_w accordingly.
Then there is a BufferedReader class that implements more traditional read() and readline() coroutines (i.e., to be invoked using yield from), the latter handy for line-oriented transports.
Well... It would be nice if BufferedReader could re-use the actual io.BufferedReader and its fast readline(), read(), readinto() implementations. Regards Antoine.
On 2012-10-29, at 12:07 PM, Antoine Pitrou <solipsis@pitrou.net> wrote:
To invoke a primitive I/O operation, you call the current task's block() method and then immediately yield (similar to Greg Ewing's approach). There are helpers block_r() and block_w() that arrange for a task to block until a file descriptor is ready for reading/writing. Examples of their use are in sockets.py.
That's weird and kindof ugly IMHO. Why would you write:
scheduling.block_w(self.sock.fileno()) yield
instead of say:
yield scheduling.block_w(self.sock.fileno())
?
I, personally, like and use the second approach. But I believe the main incentive for Guido & Greg to use 'yields' like that is to make one thing *very* clear: always use 'yield from' to call something. 'yield' statement is just an explicit context switch point, and it should be used only for that purpose and only when you write a low-level APIs. - Yury
On 2012-10-29, at 12:07 PM, Antoine Pitrou <solipsis@pitrou.net> wrote:
In the docstrings I use the prefix "COROUTINE:" to indicate public APIs that should be invoked using yield from.
Hmm, should they? Your approach looks a bit weird: you have functions that should use yield, and others that should use "yield from"? That sounds confusing to me.
I'd much rather either have all functions use "yield", or have all functions use "yield from".
(also, I wouldn't be shocked if coroutines had to wear a special decorator; it's a better marker than having the word COROUTINE in the docstring, anyway :-))
That's what bothers me is well. 'yield from' looks too long for a simple thing it does (1); users will be confused whether they should use 'yield' or 'yield from' (2); there is no visible difference between a plain generator and a coroutine (3). Personally, I like Greg's PEP 3152 (aside from 'cocall' keyword). With that approach it's easy to distinguish coroutines, generators and plain functions. And it'd be easier to add some special methods/properties to codefs, like 'in_finally()' method etc. - Yury
On Oct 29, 2012, at 5:59 PM, Yury Selivanov <yselivanov.ml@gmail.com> wrote:
On 2012-10-29, at 12:07 PM, Antoine Pitrou <solipsis@pitrou.net> wrote:
In the docstrings I use the prefix "COROUTINE:" to indicate public APIs that should be invoked using yield from.
Hmm, should they? Your approach looks a bit weird: you have functions that should use yield, and others that should use "yield from"? That sounds confusing to me.
I'd much rather either have all functions use "yield", or have all functions use "yield from".
(also, I wouldn't be shocked if coroutines had to wear a special decorator; it's a better marker than having the word COROUTINE in the docstring, anyway :-))
That's what bothers me is well. 'yield from' looks too long for a simple thing it does (1); users will be confused whether they should use 'yield' or 'yield from' (2); there is no visible difference between a plain generator and a coroutine (3).
I agree, was this ever commented ? I know it maybe late in the discussion but just because you can use yield/yield from for concurrent stuff, should you? it looks very implicit to me (breaking the second rule) Have the delegate/event model of C# been discussed ? As always i recommend moving the concurrent stuff to the object level, it would be so much easier to state that a message for an object is just that: An async message sent from one object to another… :-) A simple decorator like @task would be enough: @task # explicit run instance in own thread/coroutine class SomeTask(object): def asyc_add(self, x, y) return x + y # returns a Future() with result task = SomeTask() n = task.async_add(2,2) # Do other stuff while waiting for answer print( "result is %d" % n ) # Future will wait/hang until result is ready br /rene
Personally, I like Greg's PEP 3152 (aside from 'cocall' keyword). With that approach it's easy to distinguish coroutines, generators and plain functions. And it'd be easier to add some special methods/properties to codefs, like 'in_finally()' method etc.
- Yury _______________________________________________ Python-ideas mailing list Python-ideas@python.org http://mail.python.org/mailman/listinfo/python-ideas
Rene Nejsum wrote:
[SNIP]
That's what bothers me is well. 'yield from' looks too long for a simple thing it does (1); users will be confused whether they should use 'yield' or 'yield from' (2); there is no visible difference between a plain generator and a coroutine (3).
I agree, was this ever commented ? I know it maybe late in the discussion but just because you can use yield/yield from for concurrent stuff, should you?
it looks very implicit to me (breaking the second rule)
Have the delegate/event model of C# been discussed ?
As always i recommend moving the concurrent stuff to the object level, it would be so much easier to state that a message for an object is just that: An async message sent from one object to another... :-) A simple decorator like @task would be enough:
@task # explicit run instance in own thread/coroutine class SomeTask(object): def asyc_add(self, x, y) return x + y # returns a Future() with result
task = SomeTask() n = task.async_add(2,2) # Do other stuff while waiting for answer print( "result is %d" % n ) # Future will wait/hang until result is ready
I think you'll like what I'll be sending out later tonight (US Pacific time), so hold on :) (In the meantime, feel free to read up on C#'s async/await model, which is very similar to what both Guido and I are proposing and has already been pretty well received.) Cheers, Steve
On Mon, Oct 29, 2012 at 3:23 PM, Rene Nejsum <rene@stranden.com> wrote:
On Oct 29, 2012, at 5:59 PM, Yury Selivanov <yselivanov.ml@gmail.com> wrote:
On 2012-10-29, at 12:07 PM, Antoine Pitrou <solipsis@pitrou.net> wrote:
In the docstrings I use the prefix "COROUTINE:" to indicate public APIs that should be invoked using yield from.
Hmm, should they? Your approach looks a bit weird: you have functions that should use yield, and others that should use "yield from"? That sounds confusing to me.
I'd much rather either have all functions use "yield", or have all functions use "yield from".
(also, I wouldn't be shocked if coroutines had to wear a special decorator; it's a better marker than having the word COROUTINE in the docstring, anyway :-))
That's what bothers me is well. 'yield from' looks too long for a simple thing it does (1); users will be confused whether they should use 'yield' or 'yield from' (2); there is no visible difference between a plain generator and a coroutine (3).
I agree, was this ever commented ? I know it maybe late in the discussion but just because you can use yield/yield from for concurrent stuff, should you?
I explained my position on yield vs. yield from twice already in this thread. -- --Guido van Rossum (python.org/~guido)
On Mon, Oct 29, 2012 at 9:07 AM, Antoine Pitrou <solipsis@pitrou.net> wrote:
Le Sun, 28 Oct 2012 16:52:02 -0700, Guido van Rossum <guido@python.org> a écrit :
The event list started out as a tuple of (fd, flag, callback, args), where flag is 'r' or 'w' (easily extensible); in practice neither the fd nor the flag are used, and one of the last things I did was to wrap callback and args into a simple object that allows cancelling the callback; the add_*() methods return this object. (This could probably use a little more abstraction.) Note that poll() doesn't call the callbacks -- that's up to the event loop.
I don't understand why the pollster takes callback objects if it never calls them. Also the fact that it wraps them into DelayedCalls is more mysterious to me. DelayedCalls represent one-time cancellable callbacks with a given deadline, not callbacks which are called any number of times on I/O events and that you can't cancel.
Yeah, this part definitely needs reworking. In the current design the pollster is a base class of the eventloop, and the latter *does* call them; but I want to refactor that anyway. I'll probably end up with a pollster that registers (what are to it) opaque tokens and returns just a list of tokens from poll(). (Unrelated: would it be useful if poll() was an iterator?)
scheduling.py: http://code.google.com/p/tulip/source/browse/scheduling.py
This is the scheduler for PEP-380 style coroutines. I started with a Scheduler class and operations along the lines of Greg Ewing's design, with a Scheduler instance as a global variable, but ended up ripping it out in favor of a Task object that represents a single stack of generators chained via yield-from. There is a Context object holding the event loop and the current task in thread-local storage, so that multiple threads can (and must) have independent event loops.
YMMV, but I tend to be wary of implicit thread-local storage. What if someone runs a function or method depending on that thread-local storage from inside a thread pool? Weird bugs ensue.
Agreed, I had to figure out one of these in the implementation of call_in_thread() and it wasn't fun. I don't know what else to do -- I think it's probably best if I base my implementation on this for now so that I know it works correctly in such an environment. In the end there will probably be an API to get the current context and another to influence how that API gets it, so people can plug in their own schemes, from TLS to a simple global to something determined by an external library.
I think explicit context is much less error-prone. Even a single global instance (like Twisted's reactor) would be better :-)
I find that passing the context around everywhere makes for awkward APIs though.
As for the rest of the scheduling module, I can't say much since I have a hard time reading and understanding it.
That's a problem, I need to write this up properly so that everyone can understand it.
To invoke a primitive I/O operation, you call the current task's block() method and then immediately yield (similar to Greg Ewing's approach). There are helpers block_r() and block_w() that arrange for a task to block until a file descriptor is ready for reading/writing. Examples of their use are in sockets.py.
That's weird and kindof ugly IMHO. Why would you write:
scheduling.block_w(self.sock.fileno()) yield
instead of say:
yield scheduling.block_w(self.sock.fileno())
?
This has been debated at nauseam already (be glad you missed it); basically, there's not a whole lot of difference but if there are some APIs that require "yield X(args)" and others that require "yield from Y(args)" that's really confusing. The "bare yield only" makes it possible (though I didn't implement it here) to put some strict checks in the scheduler -- next() should never return anything except None. But there are other ways to do that too. Anyway, I probably will change the API so that e.g. sockets.py doesn't have to use this paradigm; I'll just wrap these low-level APIs in a proper "coroutine" and then sockets.py can just use "yield from block_r(fd)". (This is one reason why I like the "bare generators with yield from" approach that Greg Ewing and PEP 380 recommend: it's really cheap to wrap an API in an extra layer of yield-from. (See the yyftime.py benchmark I added to the tulip drectory.)
Also, the fact that each call to SocketTransport.{recv,send} explicitly registers then removes the fd on the event loop looks wasteful.
I am hoping to add some optimization for this -- I am actually planning a hackathon (or re-education session :-) with some Twisted folks where I hope they'll explain to me how they do this.
By the way, even when a fd is signalled ready, you must still be prepared for recv() to return EAGAIN (see http://bugs.python.org/issue9090).
Yeah, I should know, I ran into this for a Google project too (there was a kernel driver that was lying...). I had a cryptic remark in my post above referring to this.
In the docstrings I use the prefix "COROUTINE:" to indicate public APIs that should be invoked using yield from.
Hmm, should they? Your approach looks a bit weird: you have functions that should use yield, and others that should use "yield from"? That sounds confusing to me.
Yeah, see above.
I'd much rather either have all functions use "yield", or have all functions use "yield from".
Agreed, and I'm strongly in favor of "yield from". The block_r() + yield is considered an *internal* API.
(also, I wouldn't be shocked if coroutines had to wear a special decorator; it's a better marker than having the word COROUTINE in the docstring, anyway :-))
Agreed it would be useful as documentation, and maybe an API can use this to enforce proper coding style. It would have to be purely decoration though -- I don't want an extra layer of wrapping to occur each time you call a coroutine. (I.e. the decorator should just return "func".)
sockets.py: http://code.google.com/p/tulip/source/browse/sockets.py
This implements some internet primitives using the APIs in scheduling.py (including block_r() and block_w()). I call them transports but they are different from transports Twisted; they are closer to idealized sockets. SocketTransport wraps a plain socket, offering recv() and send() methods that must be invoked using yield from. SslTransport wraps an ssl socket (luckily in Python 2.6 and up, stdlib ssl sockets have good async support!).
SslTransport.{recv,send} need the same kind of logic as do_handshake(): catch both SSLWantReadError and SSLWantWriteError, and call block_r / block_w accordingly.
Oh... Thanks for the tip. I didn't find this in the ssl module docs.
Then there is a BufferedReader class that implements more traditional read() and readline() coroutines (i.e., to be invoked using yield from), the latter handy for line-oriented transports.
Well... It would be nice if BufferedReader could re-use the actual io.BufferedReader and its fast readline(), read(), readinto() implementations.
Agreed, I would love that too, but the problem is, *this* BufferedReader defines methods you have to invoke with yield from. Maybe we can come up with a solution for sharing code by modifying the _io module though; that would be great! (I've also been thinking of layering TextIOWrapper on top of these.) Thanks for the thorough review! -- --Guido van Rossum (python.org/~guido)
On 2012-10-29, at 1:03 PM, Guido van Rossum <guido@python.org> wrote:
Agreed it would be useful as documentation, and maybe an API can use this to enforce proper coding style. It would have to be purely decoration though -- I don't want an extra layer of wrapping to occur each time you call a coroutine. (I.e. the decorator should just return "func".)
I'd also set something like 'func.__coroutine__' to True. That will allow to analyze, introspect, validate and do other useful things. - Yury
On Mon, Oct 29, 2012 at 10:08 AM, Yury Selivanov <yselivanov.ml@gmail.com> wrote:
On 2012-10-29, at 1:03 PM, Guido van Rossum <guido@python.org> wrote:
Agreed it would be useful as documentation, and maybe an API can use this to enforce proper coding style. It would have to be purely decoration though -- I don't want an extra layer of wrapping to occur each time you call a coroutine. (I.e. the decorator should just return "func".)
I'd also set something like 'func.__coroutine__' to True. That will allow to analyze, introspect, validate and do other useful things.
Yes, that sounds about right. -- --Guido van Rossum (python.org/~guido)
Pollster has to support any object as file descriptor. The use case is ZeroMQ sockets: they are implemented at user level and socket is just some opaque structure wrapped by Python object. ZeroMQ has own poll function to process zmq sockets as well as regular sockets/pipes/files. I would to see add_{reader,writer} and call_{soon,later} accepting **kwargs as well as *args. At least to respect functions with keyword-only arguments. +1 for explicit passing loop instance and clearing role of DelayedCall. Decorating coroutines with setting some flag looks good to me, but I expect some problems with setting extra attribute to objects like staticmethod/classmethod. Thanks, Andrew.
On Mon, Oct 29, 2012 at 11:02 AM, Andrew Svetlov <andrew.svetlov@gmail.com> wrote:
Pollster has to support any object as file descriptor. The use case is ZeroMQ sockets: they are implemented at user level and socket is just some opaque structure wrapped by Python object. ZeroMQ has own poll function to process zmq sockets as well as regular sockets/pipes/files.
Good call! This seem to be an excellent use case to validate the pollster design. Are you saying that the approach I used for SslTransport doesn't work here? (I can believe it, I've never looked at 0MQ, but I can't tell from your message.) The insistence on isinstance(fd, int) is mostly there so that I don't accidentally register a socket object *and* its file descriptor at the same time -- but there are other ways to ensure that. I've added a TODO item for now.
I would to see add_{reader,writer} and call_{soon,later} accepting **kwargs as well as *args. At least to respect functions with keyword-only arguments.
Hmm... I intentionally ruled those out because I wanted to leave the door open for keyword args that modify the registration function (add_reader etc.); it is awkward to require conventions like "your function cannot have a keyword arg named X because we use that for our own API" and it is even more awkward to have to retrofit new values of X into that rule. Maybe we can come up with a simple wrapper.
+1 for explicit passing loop instance and clearing role of DelayedCall.
Will do. (I think you meant clarifying?)
Decorating coroutines with setting some flag looks good to me, but I expect some problems with setting extra attribute to objects like staticmethod/classmethod.
Noted. -- --Guido van Rossum (python.org/~guido)
On Mon, Oct 29, 2012 at 8:10 PM, Guido van Rossum <guido@python.org> wrote:
On Mon, Oct 29, 2012 at 11:02 AM, Andrew Svetlov <andrew.svetlov@gmail.com> wrote:
Pollster has to support any object as file descriptor. The use case is ZeroMQ sockets: they are implemented at user level and socket is just some opaque structure wrapped by Python object. ZeroMQ has own poll function to process zmq sockets as well as regular sockets/pipes/files.
Good call! This seem to be an excellent use case to validate the pollster design. Are you saying that the approach I used for SslTransport doesn't work here? (I can believe it, I've never looked at 0MQ, but I can't tell from your message.) The insistence on isinstance(fd, int) is mostly there so that I don't accidentally register a socket object *and* its file descriptor at the same time -- but there are other ways to ensure that. I've added a TODO item for now.
0MQ socket has no file descriptor at all, it's just pointer to some unspecified structure. So 0MQ has own *poll* function which can process that sockets as well as file descriptors. Interface is mimic to poll object from python stdlib. You can see https://github.com/zeromq/pyzmq/blob/master/zmq/eventloop/ioloop.py as example. For 0MQ support tulip has to have yet another reactor implementation in line of select, epoll, kqueue etc. Not big deal, but it would be nice if PollsterBase will not assume the registered object is always int file descriptor.
I would to see add_{reader,writer} and call_{soon,later} accepting **kwargs as well as *args. At least to respect functions with keyword-only arguments.
Hmm... I intentionally ruled those out because I wanted to leave the door open for keyword args that modify the registration function (add_reader etc.); it is awkward to require conventions like "your function cannot have a keyword arg named X because we use that for our own API" and it is even more awkward to have to retrofit new values of X into that rule. Maybe we can come up with a simple wrapper.
It can be solved easy with using names like __when, __callback etc. That names will never clutter with user provided kwargs I believe.
+1 for explicit passing loop instance and clearing role of DelayedCall.
Will do. (I think you meant clarifying?)
Exactly. Thanks.
Decorating coroutines with setting some flag looks good to me, but I expect some problems with setting extra attribute to objects like staticmethod/classmethod.
Noted.
-- --Guido van Rossum (python.org/~guido)
Thank you, Andrew Svetlov
On Mon, Oct 29, 2012 at 12:24 PM, Andrew Svetlov <andrew.svetlov@gmail.com> wrote:
On Mon, Oct 29, 2012 at 8:10 PM, Guido van Rossum <guido@python.org> wrote: [Andrew]
I would to see add_{reader,writer} and call_{soon,later} accepting **kwargs as well as *args. At least to respect functions with keyword-only arguments.
Hmm... I intentionally ruled those out because I wanted to leave the door open for keyword args that modify the registration function (add_reader etc.); it is awkward to require conventions like "your function cannot have a keyword arg named X because we use that for our own API" and it is even more awkward to have to retrofit new values of X into that rule. Maybe we can come up with a simple wrapper.
It can be solved easy with using names like __when, __callback etc. That names will never clutter with user provided kwargs I believe.
No, those names have different meaning inside a class (they would be transformed into _<class>__when, where <class> is the name of the *current* class textually enclosing the use). I am not closing the door on this one but I'd have to see a lot more evidence that this issue is widespread. -- --Guido van Rossum (python.org/~guido)
I mean just something like: def call_soon(__self, __callback, *__args, **__kwargs): dcall = DelayedCall(None, __callback, __args, __kwargs) __self.ready.append(dcall) return dcall Not big deal, through. We can delay this discussion for later. On Mon, Oct 29, 2012 at 9:54 PM, Guido van Rossum <guido@python.org> wrote:
On Mon, Oct 29, 2012 at 12:24 PM, Andrew Svetlov <andrew.svetlov@gmail.com> wrote:
On Mon, Oct 29, 2012 at 8:10 PM, Guido van Rossum <guido@python.org> wrote: [Andrew]
I would to see add_{reader,writer} and call_{soon,later} accepting **kwargs as well as *args. At least to respect functions with keyword-only arguments.
Hmm... I intentionally ruled those out because I wanted to leave the door open for keyword args that modify the registration function (add_reader etc.); it is awkward to require conventions like "your function cannot have a keyword arg named X because we use that for our own API" and it is even more awkward to have to retrofit new values of X into that rule. Maybe we can come up with a simple wrapper.
It can be solved easy with using names like __when, __callback etc. That names will never clutter with user provided kwargs I believe.
No, those names have different meaning inside a class (they would be transformed into _<class>__when, where <class> is the name of the *current* class textually enclosing the use). I am not closing the door on this one but I'd have to see a lot more evidence that this issue is widespread.
-- --Guido van Rossum (python.org/~guido)
-- Thanks, Andrew Svetlov
Andrew Svetlov wrote:
0MQ socket has no file descriptor at all, it's just pointer to some unspecified structure. So 0MQ has own *poll* function which can process that sockets as well as file descriptors.
Aaargh... yet another event loop that wants to rule the world. This is not good. -- Greg
On Oct 29, 2012, at 5:25 PM, Greg Ewing <greg.ewing at canterbury.ac.nz> wrote:
Andrew Svetlov wrote:
0MQ socket has no file descriptor at all, it's just pointer to some unspecified structure. So 0MQ has own *poll* function which can process that sockets as well as file descriptors.
Aaargh... yet another event loop that wants to rule the world. This is not good.
As a wise man once said, "everybody wants to rule the world". All event loops have their own run() API, and expect to be on top of everything, driving the loop. This is one of the central principles of Twisted's design; by not attempting to directly take control of any loop, and providing a high-level wrapper around run, and an API that would accommodate every wacky wrapper around poll and select and kqueue and GetQueuedCompletionStatus, we could be a single loop that everything can use as an API and get the advantages of whatever event driven thing is popular this week. You can't accomplish this by trying to force other loops to play by your rules; rather, accommodate and pave over their peculiarities and it'll be your API that their users actually write to. (In the land of Mordor, where the shadows lie.) -glyph
Guido van Rossum wrote:
I would to see add_{reader,writer} and call_{soon,later} accepting **kwargs as well as *args. At least to respect functions with keyword-only arguments.
Hmm... I intentionally ruled those out because I wanted to leave the door open for keyword args that modify the registration function
One way to accommodate that would be to make the registration API look like this: call_later(my_func)(arg1, ..., kwd = value, ...) -- Greg
On 2012-10-29, at 2:02 PM, Andrew Svetlov <andrew.svetlov@gmail.com> wrote:
Pollster has to support any object as file descriptor. The use case is ZeroMQ sockets: they are implemented at user level and socket is just some opaque structure wrapped by Python object. ZeroMQ has own poll function to process zmq sockets as well as regular sockets/pipes/files.
Well, you can use epoll/select/kqueue or whatever else with ZMQ sockets. Just get the underlying file descriptor with 'getsockopt', as described here: http://api.zeromq.org/master:zmq-getsockopt#toc20 For instance, here is a stripped out zmq support classes I have in my framework: class Socket(_zmq_Socket): def __init__(self, *args, **kwargs): super().__init__(*args, **kwargs) self.fileno = self.getsockopt(FD) ... #coroutine def send(self, data, *, flags=0, copy=True, track=False): flags |= NOBLOCK try: result = _zmq_Socket.send(self, data, flags, copy, track) except ZMQError as e: if e.errno != EAGAIN: raise self._sending = (Promise(), data, flags, copy, track) self._scheduler.proactor._schedule_write(self) return self._sending[0] else: p = Promise() p.send(result) return p ... class Context(_zmq_Context): _socket_class = Socket And '_schedule_write' accepts any object with 'fileno' property, and uses an appropriate polling mechanism to poll. So to use a non-blocking ZMQ sockets, you simply do: context = Context() socket = context.socket(zmq.REP) ... yield socket.send(message)
On Mon, Oct 29, 2012 at 9:10 PM, Yury Selivanov <yselivanov.ml@gmail.com> wrote:
On 2012-10-29, at 2:02 PM, Andrew Svetlov <andrew.svetlov@gmail.com> wrote:
Pollster has to support any object as file descriptor. The use case is ZeroMQ sockets: they are implemented at user level and socket is just some opaque structure wrapped by Python object. ZeroMQ has own poll function to process zmq sockets as well as regular sockets/pipes/files.
Well, you can use epoll/select/kqueue or whatever else with ZMQ sockets. Just get the underlying file descriptor with 'getsockopt', as described here: http://api.zeromq.org/master:zmq-getsockopt#toc20
Well, will take a look. I used zmq poll only. It works for reading only, not for writing, right? As I know you use proactor pattern. Can reactor has some problems with this approach? May embedded 0MQ poll be more effective via some internal optimizations?
For instance, here is a stripped out zmq support classes I have in my framework:
class Socket(_zmq_Socket): def __init__(self, *args, **kwargs): super().__init__(*args, **kwargs) self.fileno = self.getsockopt(FD)
...
#coroutine def send(self, data, *, flags=0, copy=True, track=False): flags |= NOBLOCK
try: result = _zmq_Socket.send(self, data, flags, copy, track) except ZMQError as e: if e.errno != EAGAIN: raise self._sending = (Promise(), data, flags, copy, track) self._scheduler.proactor._schedule_write(self) return self._sending[0] else: p = Promise() p.send(result) return p ...
class Context(_zmq_Context): _socket_class = Socket
And '_schedule_write' accepts any object with 'fileno' property, and uses an appropriate polling mechanism to poll.
So to use a non-blocking ZMQ sockets, you simply do:
context = Context() socket = context.socket(zmq.REP) ... yield socket.send(message)
-- Thanks, Andrew Svetlov
On 2012-10-29, at 3:32 PM, Andrew Svetlov <andrew.svetlov@gmail.com> wrote:
On Mon, Oct 29, 2012 at 9:10 PM, Yury Selivanov <yselivanov.ml@gmail.com> wrote:
On 2012-10-29, at 2:02 PM, Andrew Svetlov <andrew.svetlov@gmail.com> wrote:
Pollster has to support any object as file descriptor. The use case is ZeroMQ sockets: they are implemented at user level and socket is just some opaque structure wrapped by Python object. ZeroMQ has own poll function to process zmq sockets as well as regular sockets/pipes/files.
Well, you can use epoll/select/kqueue or whatever else with ZMQ sockets. Just get the underlying file descriptor with 'getsockopt', as described here: http://api.zeromq.org/master:zmq-getsockopt#toc20
Well, will take a look. I used zmq poll only. It works for reading only, not for writing, right? As I know you use proactor pattern. Can reactor has some problems with this approach? May embedded 0MQ poll be more effective via some internal optimizations?
It's officially documented and supported approach. We haven't seen any problem with it so far. It works both for reading and writing, however, 99.9% EAGAIN errors occur on reading. When you 'send', it just stores your data in an internal buffer and sends it itself. When you 'read', well, if there is no data in buffers you get EAGAIN. As for the performance -- I haven't tested 'zmq.poll' vs (let's say) epoll, but I doubt there is any significant difference. And if I would want to write a benchmark, I'd first compare pure blocking ZMQ sockets vs non-blocking ZMQ sockets with ZMQ.poll, as ZMQ uses threads heavily, and probably, blocking threads-driven IO is faster then non-blocking with polling (when FDs count is relatively small), no matter whether you use zmq.poll or epoll/etc. - Yury
On Mon, 29 Oct 2012 10:03:00 -0700 Guido van Rossum <guido@python.org> wrote:
Then there is a BufferedReader class that implements more traditional read() and readline() coroutines (i.e., to be invoked using yield from), the latter handy for line-oriented transports.
Well... It would be nice if BufferedReader could re-use the actual io.BufferedReader and its fast readline(), read(), readinto() implementations.
Agreed, I would love that too, but the problem is, *this* BufferedReader defines methods you have to invoke with yield from. Maybe we can come up with a solution for sharing code by modifying the _io module though; that would be great! (I've also been thinking of layering TextIOWrapper on top of these.)
There is a rather infamous issue about _io.BufferedReader and non-blocking I/O at http://bugs.python.org/issue13322 It is a bit problematic because currently non-blocking readline() returns '' instead of None when no data is available, meaning EOF can't be easily detected :( Once this issue is solved, you could use _io.BufferedReader, and workaround the "partial read/readline result" issue by iterating (hopefully in most cases there is enough data in the buffer to return a complete read or readline, so the C optimizations are useful). Here is how it may work: def __init__(self, fd): self.fd = fd self.bufio = _io.BufferedReader(...) def readline(self): chunks = [] while True: line = self.bufio.readline() if line is not None: chunks.append(line) if line == b'' or line.endswith(b'\n'): # EOF or EOL return b''.join(chunks) yield from scheduler.block_r(self.fd) def read(self, n): chunks = [] bytes_read = 0 while True: data = self.bufio.read(n - bytes_read) if data is not None: chunks.append(data) bytes_read += len(data) if data == b'' or bytes_read == n: # EOF or read satisfied break yield from scheduler.block_r(self.fd) return b''.join(chunks) As for TextIOWrapper, AFAIR it doesn't handle non-blocking I/O at all (but my memories are vague). By the way I don't know how this whole approach (of mocking socket-like or file-like objects with coroutine-y read() / readline() methods) lends itself to plugging into Windows' IOCP. You may rely on some raw I/O object that registers a callback when a read() is requested and then yields a Future object that gets completed by the callback. I'm sure Richard has some ideas about that :-) Regards Antoine.
On Mon, Oct 29, 2012 at 2:25 PM, Antoine Pitrou <solipsis@pitrou.net> wrote:
On Mon, 29 Oct 2012 10:03:00 -0700 Guido van Rossum <guido@python.org> wrote:
Then there is a BufferedReader class that implements more traditional read() and readline() coroutines (i.e., to be invoked using yield from), the latter handy for line-oriented transports.
Well... It would be nice if BufferedReader could re-use the actual io.BufferedReader and its fast readline(), read(), readinto() implementations.
Agreed, I would love that too, but the problem is, *this* BufferedReader defines methods you have to invoke with yield from. Maybe we can come up with a solution for sharing code by modifying the _io module though; that would be great! (I've also been thinking of layering TextIOWrapper on top of these.)
There is a rather infamous issue about _io.BufferedReader and non-blocking I/O at http://bugs.python.org/issue13322 It is a bit problematic because currently non-blocking readline() returns '' instead of None when no data is available, meaning EOF can't be easily detected :(
Eeew!
Once this issue is solved, you could use _io.BufferedReader, and workaround the "partial read/readline result" issue by iterating (hopefully in most cases there is enough data in the buffer to return a complete read or readline, so the C optimizations are useful).
Yes, that's what I'm hoping for.
Here is how it may work:
def __init__(self, fd): self.fd = fd self.bufio = _io.BufferedReader(...)
def readline(self): chunks = [] while True: line = self.bufio.readline() if line is not None: chunks.append(line) if line == b'' or line.endswith(b'\n'): # EOF or EOL return b''.join(chunks) yield from scheduler.block_r(self.fd)
def read(self, n): chunks = [] bytes_read = 0 while True: data = self.bufio.read(n - bytes_read) if data is not None: chunks.append(data) bytes_read += len(data) if data == b'' or bytes_read == n: # EOF or read satisfied break yield from scheduler.block_r(self.fd) return b''.join(chunks)
Hm... I wonder if it would make more sense if these standard APIs were to return specific exceptions, like the ssl module does in non-blocking mode? Look here (I updated since posting last night): http://code.google.com/p/tulip/source/browse/sockets.py#142
As for TextIOWrapper, AFAIR it doesn't handle non-blocking I/O at all (but my memories are vague).
Same suggestion... (I only found out about ssl's approach to async I/O a few days ago. It felt brilliant and right to me. But maybe I'm missing something?)
By the way I don't know how this whole approach (of mocking socket-like or file-like objects with coroutine-y read() / readline() methods) lends itself to plugging into Windows' IOCP.
Me neither. I hope Steve Dower can tell us.
You may rely on some raw I/O object that registers a callback when a read() is requested and then yields a Future object that gets completed by the callback. I'm sure Richard has some ideas about that :-)
Which Richard? -- --Guido van Rossum (python.org/~guido)
Guido van Rossum wrote:
By the way I don't know how this whole approach (of mocking socket-like or file-like objects with coroutine-y read() / readline() methods) lends itself to plugging into Windows' IOCP.
Me neither. I hope Steve Dower can tell us.
I suppose since my name has been invoked I ought to comment, though Richard (Oudkerk, I think?) seems to have more experience with IOCP than I do.
From my point of view, IOCP fits in very well provided the callbacks (which will run in the IOCP thread pool) are only used to unblock tasks. Yes, it then will not be a pure single-threaded model, but on the other hand it isn't going to use an unbounded number of threads. There are alternatives to IOCP, but they will require expert hands to make them efficient under scale - IOCP has already had the expect hands applied (I assume... maybe it was written by an intern? I really don't know).
The whole blocking coroutine model works really well with callback-based unblocks (whether they call Future.set_result or unblock_task), so I don't think there's anything to worry about here. Compatibility-wise, it should be easy to make programs portable, and since we can have completely separate implementations for Linux/Mac/Windows it will be possible to get good, if not excellent, performance out of each. What will make a difference is the ready vs. complete notifications - most async Windows APIs will signal when they are complete (for example, the data has been read from the file) unlike many (most? All?) Linux APIs that signal when they are ready. It is possible to wrap this difference up by making all APIs notify on completion, and if we don't do this then user code may be less portable, which I'd hate to see. It doesn't directly relate to IOCP, but it is an important consideration for good cross-platform libraries. Cheers, Steve
On Mon, Oct 29, 2012 at 4:12 PM, Steve Dower <Steve.Dower@microsoft.com> wrote:
Guido van Rossum wrote:
By the way I don't know how this whole approach (of mocking socket-like or file-like objects with coroutine-y read() / readline() methods) lends itself to plugging into Windows' IOCP.
Me neither. I hope Steve Dower can tell us.
I suppose since my name has been invoked I ought to comment, though Richard (Oudkerk, I think?) seems to have more experience with IOCP than I do.
Aha, somehow I thought Richard was a Mac expert. :-(
From my point of view, IOCP fits in very well provided the callbacks (which will run in the IOCP thread pool) are only used to unblock tasks. Yes, it then will not be a pure single-threaded model, but on the other hand it isn't going to use an unbounded number of threads. There are alternatives to IOCP, but they will require expert hands to make them efficient under scale - IOCP has already had the expect hands applied (I assume... maybe it was written by an intern? I really don't know).
Experts all point in its direction, so I believe IOCP is solid.
The whole blocking coroutine model works really well with callback-based unblocks (whether they call Future.set_result or unblock_task), so I don't think there's anything to worry about here. Compatibility-wise, it should be easy to make programs portable, and since we can have completely separate implementations for Linux/Mac/Windows it will be possible to get good, if not excellent, performance out of each.
Right. Did you see my call_in_thread() yet? http://code.google.com/p/tulip/source/browse/scheduling.py#210 http://code.google.com/p/tulip/source/browse/polling.py#481
What will make a difference is the ready vs. complete notifications - most async Windows APIs will signal when they are complete (for example, the data has been read from the file) unlike many (most? All?) Linux APIs that signal when they are ready. It is possible to wrap this difference up by making all APIs notify on completion, and if we don't do this then user code may be less portable, which I'd hate to see. It doesn't directly relate to IOCP, but it is an important consideration for good cross-platform libraries.
I wonder if this could be done by varying the transports by platform? Not too many people are going to write new transports -- there just aren't that many options. And those that do might be doing something platform-specific anyway. It shouldn't be that hard to come up with a transport abstraction that lets protocol implementations work regardless of whether it's a UNIX style transport or a Windows style transport. UNIX systems with IOCP support could use those too. -- --Guido van Rossum (python.org/~guido)
Guido van Rossum wrote: [SNIP]
On Mon, Oct 29, 2012 at 4:12 PM, Steve Dower <Steve.Dower@microsoft.com> wrote:
The whole blocking coroutine model works really well with callback-based unblocks (whether they call Future.set_result or unblock_task), so I don't think there's anything to worry about here. Compatibility-wise, it should be easy to make programs portable, and since we can have completely separate implementations for Linux/Mac/Windows it will be possible to get good, if not excellent, performance out of each.
Right. Did you see my call_in_thread() yet? http://code.google.com/p/tulip/source/browse/scheduling.py#210 http://code.google.com/p/tulip/source/browse/polling.py#481
Yes, and it really stood out as one of the similarities between our work. I don't have an equivalent function, since writing "yield thread_pool.submit(...)" is sufficient (because it already returns a Future), but I haven't actually made the thread pool a property of the current scheduler. I think there's value in it
What will make a difference is the ready vs. complete notifications - most async Windows APIs will signal when they are complete (for example, the data has been read from the file) unlike many (most? All?) Linux APIs that signal when they are ready. It is possible to wrap this difference up by making all APIs notify on completion, and if we don't do this then user code may be less portable, which I'd hate to see. It doesn't directly relate to IOCP, but it is an important consideration for good cross-platform libraries.
I wonder if this could be done by varying the transports by platform? Not too many people are going to write new transports -- there just aren't that many options. And those that do might be doing something platform-specific anyway. It shouldn't be that hard to come up with a transport abstraction that lets protocol implementations work regardless of whether it's a UNIX style transport or a Windows style transport. UNIX systems with IOCP support could use those too.
I feel like a bit of a tease now, since I still haven't posted my code (it's coming, but I also have day work to do [also Python related]), but I've really left this side of things out of my definition completely in favour of allowing schedulers to "unblock" known functions. For example, (library) code that needs a socket to be ready can ask the current scheduler if it can do "select([sock], [], [])", and if the scheduler can then it will give the library code a Future. How the scheduler ends up implementing the asynchronous-select is entirely up to the scheduler, and if it can't do it, the caller can do it their own way (which probably means using a thread pool as a last resort). What I would expect this to result in is a set of platform-specific default schedulers that do common operations well and other (3rd-party) schedulers that do particular things really well. So if you want high performance single-threaded sockets, you replace the default scheduler with another one - but if Windows doesn't support the optimized scheduler, you can use the default scheduler without your code breaking. Writing this now it seems to be even clearer that we've approached the problem differently, which should mean there'll be room to share parts of the designs and come up with a really solid result. I'm looking forward to it. Cheers, Steve
Steve Dower wrote:
For example, (library) code that needs a socket to be ready can ask the current scheduler if it can do "select([sock], [], [])",
I think you're mixing up the scheduler and event loop layers here. If the scheduler is involved in this at all, it would only be to pass the request on to the event loop. -- Greg
Greg Ewing wrote:
Steve Dower wrote:
For example, (library) code that needs a socket to be ready can ask the current scheduler if it can do "select([sock], [], [])",
I think you're mixing up the scheduler and event loop layers here. If the scheduler is involved in this at all, it would only be to pass the request on to the event loop.
Could you clarify for me what goes into each layer? I've been treating "scheduler" and "event loop" as more-or-less synonyms (I see an event loop as one possible implementation of a scheduler). If you consider the scheduler to be the part that calls __next__() on the generator and sets up callbacks, that is implemented in my _Awaiter class, and should never need to be touched. Possibly the difference in terminology comes out because I'm not treating I/O specially? As far as wattle is concerned, I/O is just another operation that will eventually call Future.set_result(). I've tried to capture this in my write-up: https://bitbucket.org/stevedower/wattle/wiki/Proposal Cheers, Steve
On 29/10/2012 11:29pm, Guido van Rossum wrote:
I wonder if this could be done by varying the transports by platform? Not too many people are going to write new transports -- there just aren't that many options. And those that do might be doing something platform-specific anyway. It shouldn't be that hard to come up with a transport abstraction that lets protocol implementations work regardless of whether it's a UNIX style transport or a Windows style transport. UNIX systems with IOCP support could use those too.
Yes, having separate implementations of the transport layer should work. But I think it would be cleaner to put all the platform specific stuff in the pollster, and make the pollster poll-for-completion rather than poll-for-readiness. (Is this the "proactor pattern"?) That seems to be the direction libevent has moved in. -- Richard
On Mon, Oct 29, 2012 at 5:01 PM, Richard Oudkerk <shibturn@gmail.com> wrote:
On 29/10/2012 11:29pm, Guido van Rossum wrote:
I wonder if this could be done by varying the transports by platform? Not too many people are going to write new transports -- there just aren't that many options. And those that do might be doing something platform-specific anyway. It shouldn't be that hard to come up with a transport abstraction that lets protocol implementations work regardless of whether it's a UNIX style transport or a Windows style transport. UNIX systems with IOCP support could use those too.
Yes, having separate implementations of the transport layer should work.
But I think it would be cleaner to put all the platform specific stuff in the pollster, and make the pollster poll-for-completion rather than poll-for-readiness. (Is this the "proactor pattern"?) That seems to be the direction libevent has moved in.
Interesting. I'd like to hear what Twisted thinks of this. (I will find out next week. :-) -- --Guido van Rossum (python.org/~guido)
On Tue, Oct 30, 2012 at 9:29 AM, Guido van Rossum <guido@python.org> wrote:
Aha, somehow I thought Richard was a Mac expert. :-(
Just in case anyone else confused the two names (I know I have in the past): Ronald Oussoren = Mac expert Richard Oudkerk = multiprocessing expert (including tools for inter-process communication) Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
Steve Dower wrote:
From my point of view, IOCP fits in very well provided the callbacks (which will run in the IOCP thread pool) are only used to unblock tasks.
Is it really necessary to have a separate thread just to handle unblocking tasks? That thread will have very little to do, so it could just as well run the tasks too, couldn't it? -- Greg
Le Tue, 30 Oct 2012 18:10:28 +1300, Greg Ewing <greg.ewing@canterbury.ac.nz> a écrit :
Steve Dower wrote:
From my point of view, IOCP fits in very well provided the callbacks (which will run in the IOCP thread pool) are only used to unblock tasks.
Is it really necessary to have a separate thread just to handle unblocking tasks? That thread will have very little to do, so it could just as well run the tasks too, couldn't it?
The IOCP thread pool is managed by Windows, not you. Regards Antoine.
-----Original Message----- From: Python-ideas [mailto:python-ideas- bounces+kristjan=ccpgames.com@python.org] On Behalf Of Greg Ewing Sent: 30. október 2012 05:10 To: python-ideas@python.org Subject: Re: [Python-ideas] non-blocking buffered I/O wrote:
From my point of view, IOCP fits in very well provided the callbacks (which will run in the IOCP thread pool) are only used to unblock tasks.
Is it really necessary to have a separate thread just to handle unblocking tasks? That thread will have very little to do, so it could just as well run the tasks too, couldn't it?
StacklessIO (which is an IOCP implementation for stackless) uses callbacks on an arbitrary thread (in practice a worker thread from window's own threadpool that it keeps for such things) to unblock tasklets. You don't want to do any significant work on such a thread because it is used for other stuff by the system. By the way: We found that acquiring the GIL by a random external thread in response to the IOCP to wake up tasklets was incredibly expensive. I spent a lot of effort figuring out why that is and found no real answer. The mechanism we now use is to let the external worker thread schedule a "pending call" which is serviced by the main thread at the earliest opportunity. Also, the main thread is interrupted if it is doing a sleep. This is much more efficient. K
On Tue, Oct 30, 2012 at 9:11 AM, Kristján Valur Jónsson <kristjan@ccpgames.com> wrote:
By the way: We found that acquiring the GIL by a random external thread in response to the IOCP to wake up tasklets was incredibly expensive. I spent a lot of effort figuring out why that is and found no real answer. The mechanism we now use is to let the external worker thread schedule a "pending call" which is serviced by the main thread at the earliest opportunity. Also, the main thread is interrupted if it is doing a sleep. This is much more efficient.
In which Python version? The GIL has been redesigned at least once. Also the latency (not necessarily cost) to acquire the GIL varies by the sys.setswitchinterval setting. (Actually the more responsive you make it, the more it will cost you in overall performance.) I do think that using the pending call mechanism is the right solution here. -- --Guido van Rossum (python.org/~guido)
-----Original Message----- From: gvanrossum@gmail.com [mailto:gvanrossum@gmail.com] On Behalf Of Guido van Rossum Sent: 30. október 2012 17:47 To: Kristján Valur Jónsson Cc: python-ideas@python.org Subject: Re: [Python-ideas] non-blocking buffered I/O
On Tue, Oct 30, 2012 at 9:11 AM, Kristján Valur Jónsson <kristjan@ccpgames.com> wrote:
By the way: We found that acquiring the GIL by a random external thread in response to the IOCP to wake up tasklets was incredibly expensive. I spent a lot of effort figuring out why that is and found no real answer. The mechanism we now use is to let the external worker thread schedule a "pending call" which is serviced by the main thread at the earliest opportunity. Also, the main thread is interrupted if it is doing a sleep. This is much more efficient.
In which Python version? The GIL has been redesigned at least once. Also the latency (not necessarily cost) to acquire the GIL varies by the sys.setswitchinterval setting. (Actually the more responsive you make it, the more it will cost you in overall performance.)
I do think that using the pending call mechanism is the right solution here.
I am talking about 2.7, of course, the python of hard working lumberjacks everywhere :) Anyway I don't think the issue is much affected by the particular GIL implementation. Alternative a) Callback comes on arbitrary thread arbitrary thread calls PyGILState_Ensure (This causes a _dynamic thread state_ to be generated for the arbitrary thread, and the GIL to be subsequently acquired) arbitrary thread does whatever python gymnastics required to complete the IO (wake up tasklet arbitrary thread calls PyGILState_Release For whatever reason, this approach _increased CPU usage_ on a loaded server. Latency was fine, throughput the same, and the delay in actual GIL acquisition was ok. I suspect that the problem lies with the dynamic acquisition of a thread state, and other initialization that may occur. I did experiment with having a cache of unused threadstates on the ready for external threads, but it didn't get me anywhere. This could also be the result of cache thrashing or something that doesn't show up immediately on a multicore cpu. Alternative b) Callback comes on arbitrary thread external thread callse PyEval_SchedulePendingCall() This grabs a static lock, puts in a record, and signals to python that something needs to be done immediately. external thread calls a custom function to interrupt the main thread in the IO bound application, currently most likely sleeping in a WaitForMultipleObjects() with a timeout. Main thread wakes up from its sleep (if it was sleeping). Main thread runs python code, causing it to immediately service the scheduled pending call, causing it to perform the wait. In reality, StacklessIO uses a slight variation of the above: StacklessIO dispatch system Callback comes on arbitrary thread external thread schedules a completion event in its own "dispatch" buffer to be serviced by the main thread. This is protected by its own lock, and doesn't need the GIL. external thread callse PyEval_SchedulePendingCall() to "tick" the dispatch buffer external thread calls a custom function to interrupt the main thread in the IO bound application, currently most likely sleeping in a WaitForMultipleObjects() with a timeout. If main thread is sleeping: Main thread wakes up from its sleep Immediately at after sleeping, the main thread will 'tick' the dispatch queue After ticking, tasklets may have been made runnable, so the main thread may continue out into the main loop of the application to do work. If not, it may continue sleeping. Main thread runs python code, causing it to immediately service the scheduled pending call, which will tick the dispatch queue. This may be a no-op if the main thread was sleeping and was already ticked. The issue we were facing was not with latency (although grabbing the GIL when the main thread is busy is slower than notifying it of a pending call), but with unexplained increased cpu showing up. A proxy node servicing 2000 clients or upwards would suddenly double or triple its cpu. The reason I'm mentioning this here is that this is important. We have spent quite some time and energy on trying to figure out the most efficient way to complete IOCP from an arbitrary thread and this is the end result. Perhaps things can be done to improve this. Also, it is really important to study these things under real load, experience has shown me that the most innocuous changes that work well in the lab suddenly start behaving strangely in the field.
Modern CPUs are black boxes full of magic. I'm not too surprised that running Python code on multiple threads incurs some kind of overhead that keeping the Python interpreter in one thread avoids. On Wed, Oct 31, 2012 at 2:29 AM, Kristján Valur Jónsson <kristjan@ccpgames.com> wrote:
-----Original Message----- From: gvanrossum@gmail.com [mailto:gvanrossum@gmail.com] On Behalf Of Guido van Rossum Sent: 30. október 2012 17:47 To: Kristján Valur Jónsson Cc: python-ideas@python.org Subject: Re: [Python-ideas] non-blocking buffered I/O
On Tue, Oct 30, 2012 at 9:11 AM, Kristján Valur Jónsson <kristjan@ccpgames.com> wrote:
By the way: We found that acquiring the GIL by a random external thread in response to the IOCP to wake up tasklets was incredibly expensive. I spent a lot of effort figuring out why that is and found no real answer. The mechanism we now use is to let the external worker thread schedule a "pending call" which is serviced by the main thread at the earliest opportunity. Also, the main thread is interrupted if it is doing a sleep. This is much more efficient.
In which Python version? The GIL has been redesigned at least once. Also the latency (not necessarily cost) to acquire the GIL varies by the sys.setswitchinterval setting. (Actually the more responsive you make it, the more it will cost you in overall performance.)
I do think that using the pending call mechanism is the right solution here.
I am talking about 2.7, of course, the python of hard working lumberjacks everywhere :)
Anyway I don't think the issue is much affected by the particular GIL implementation. Alternative a) Callback comes on arbitrary thread arbitrary thread calls PyGILState_Ensure (This causes a _dynamic thread state_ to be generated for the arbitrary thread, and the GIL to be subsequently acquired) arbitrary thread does whatever python gymnastics required to complete the IO (wake up tasklet arbitrary thread calls PyGILState_Release
For whatever reason, this approach _increased CPU usage_ on a loaded server. Latency was fine, throughput the same, and the delay in actual GIL acquisition was ok. I suspect that the problem lies with the dynamic acquisition of a thread state, and other initialization that may occur. I did experiment with having a cache of unused threadstates on the ready for external threads, but it didn't get me anywhere. This could also be the result of cache thrashing or something that doesn't show up immediately on a multicore cpu.
Alternative b) Callback comes on arbitrary thread external thread callse PyEval_SchedulePendingCall() This grabs a static lock, puts in a record, and signals to python that something needs to be done immediately. external thread calls a custom function to interrupt the main thread in the IO bound application, currently most likely sleeping in a WaitForMultipleObjects() with a timeout. Main thread wakes up from its sleep (if it was sleeping). Main thread runs python code, causing it to immediately service the scheduled pending call, causing it to perform the wait.
In reality, StacklessIO uses a slight variation of the above:
StacklessIO dispatch system Callback comes on arbitrary thread external thread schedules a completion event in its own "dispatch" buffer to be serviced by the main thread. This is protected by its own lock, and doesn't need the GIL. external thread callse PyEval_SchedulePendingCall() to "tick" the dispatch buffer external thread calls a custom function to interrupt the main thread in the IO bound application, currently most likely sleeping in a WaitForMultipleObjects() with a timeout. If main thread is sleeping: Main thread wakes up from its sleep Immediately at after sleeping, the main thread will 'tick' the dispatch queue After ticking, tasklets may have been made runnable, so the main thread may continue out into the main loop of the application to do work. If not, it may continue sleeping. Main thread runs python code, causing it to immediately service the scheduled pending call, which will tick the dispatch queue. This may be a no-op if the main thread was sleeping and was already ticked.
The issue we were facing was not with latency (although grabbing the GIL when the main thread is busy is slower than notifying it of a pending call), but with unexplained increased cpu showing up. A proxy node servicing 2000 clients or upwards would suddenly double or triple its cpu.
The reason I'm mentioning this here is that this is important. We have spent quite some time and energy on trying to figure out the most efficient way to complete IOCP from an arbitrary thread and this is the end result. Perhaps things can be done to improve this. Also, it is really important to study these things under real load, experience has shown me that the most innocuous changes that work well in the lab suddenly start behaving strangely in the field.
-- --Guido van Rossum (python.org/~guido)
-----Original Message----- From: gvanrossum@gmail.com [mailto:gvanrossum@gmail.com] On Behalf Of Guido van Rossum Sent: 31. október 2012 15:01 To: Kristján Valur Jónsson Cc: python-ideas@python.org Subject: Re: [Python-ideas] non-blocking buffered I/O
Modern CPUs are black boxes full of magic. I'm not too surprised that running Python code on multiple threads incurs some kind of overhead that keeping the Python interpreter in one thread avoids.
Ah, but I forgot to mention one weird thing: If we used a pool of threads for the callbacks, and pre-initalized those threads with python states, and then acquired the GIL using PyEval_RestoreThread(), then this overhead went away. It was only the dynamic tread state acquired using PyGilState_Ensure() that caused cpu overhead. Using the fixed pool was not acceptable in the long run, in particular we din't want to complicate things to another level by adding a thread pool manger to the whole thing when the OS is fully capable of providing an external callback thread. I regret not spending more time on this and to be able to provide an actual performance analysis and fix. Instead I have to be that weird old man in the tavern uttering inscrutable warnings that no young adventurer pays any attention to :) K
Greg Ewing wrote:
Steve Dower wrote:
From my point of view, IOCP fits in very well provided the callbacks (which will run in the IOCP thread pool) are only used to unblock tasks.
Is it really necessary to have a separate thread just to handle unblocking tasks? That thread will have very little to do, so it could just as well run the tasks too, couldn't it?
In the C10k problem (which seems to keep coming up as our "goal") that thread will have a lot to do. I would expect that most actual users of this API could keep running on that thread without issue, but since it is OS managed and belongs to a pool, the chances of deadlocking are much higher than on a 'real' CPU thread. Limiting its work to unblocking at least prevents the end developer from having to worry about this. Cheers, Steve
2012/10/29 Guido van Rossum <guido@python.org>
I'm most interested in feedback on the design of polling.py and scheduling.py, and to a lesser extent on the design of sockets.py; main.py is just an example of how this style works out in practice.
Follows my comments. === About polling.py === 1 - I think DelayedCall should have a reset() method, other than just cancel(). 2 - EventLoopMixin should have a call_every() method other than just call_later() 3 - call_later() and call_every() should also take **kwargs other than just *args 4 - I think PollsterBase should provide a method to modify() the events registered for a certain fd (both poll() and epoll() have such a method and it's faster compared to un/registering a fd). Feel free to take a look at my scheduler implementation which looks quite similar to what you've done in polling.py: http://code.google.com/p/pyftpdlib/source/browse/trunk/pyftpdlib/lib/ioloop.py?spec=svn1115&r=1115#85 === About sockets.py === 1 - In SocketTransport it seems there's no error handling provisioned for send() and recv(). You should expect these errors http://hg.python.org/cpython/file/95931c48a76f/Lib/asyncore.py#l60 signaling disconnection plus EWOULDBLOCK and EAGAIN for "retry" 2 - SslTransport's send() and recv() methods should suffer the same problem. 3 - I don't fully understand how data transfer works exactly but keep in mind that the transport should interact with the pollster. What I mean is that generally speaking a connected socket should *always* be readable ("r"), even when it's idle, then switch to "rw" events when sending data, then get back to "r" when all the data has been sent. This is *crucial* if you want to achieve high performances/scalability and that is why PollsterBase should probably provide a modify() method. Please take a look at what I've done here: http://code.google.com/p/pyftpdlib/source/browse/trunk/pyftpdlib/lib/ioloop.py?spec=svn1115&r=1115#809 === Other considerations === This 'yield' / 'yield from' approach is new to me (I'm more of a "callback guy") so I can't say I fully understand what's going on just by reading the code. What I would like to see instead of main.py is a bunch of code samples / demos showing how this library is supposed to be used in different circumstances. In details I'd like to see at least: 1 - a client example (connect(), send() a string, recv() a response, close()) 2 - an echo server example (accept(), recv() string, send() it back(), close() 3 - how to use a different transport (e.g. UDP)? 4 - how to run long running tasks in a thread? Also: 5 - is it possible to use multiple "reactors" in different threads? How? (asyncore for example achieves this by providing a separate 'map' argument for both the 'reactor' and the dispatchers) I understand you just started with this so I'm probably asking too much at this point in time. Feel free to consider this a kind of a "long term review". --- Giampaolo http://code.google.com/p/pyftpdlib/ http://code.google.com/p/psutil/ http://code.google.com/p/pysendfile/
On Mon, Oct 29, 2012 at 11:08 AM, Giampaolo Rodolà <g.rodola@gmail.com> wrote:
2012/10/29 Guido van Rossum <guido@python.org>
I'm most interested in feedback on the design of polling.py and scheduling.py, and to a lesser extent on the design of sockets.py; main.py is just an example of how this style works out in practice.
Follows my comments.
=== About polling.py ===
1 - I think DelayedCall should have a reset() method, other than just cancel().
So, essentially an uncancel()? Why not just re-register in that case? Or what's your use case? (Right now there's no problem in calling one of these many times -- it's just that cancellation is permanent.)
2 - EventLoopMixin should have a call_every() method other than just call_later()
Arguably you can emulate that with a simple loop: def call_every(secs, func, *args): while True: yield from scheduler.sleep(secs) func(*args) (Flavor to taste to log exceptions, handle cancellation, automatically spawn a separate task, etc.) I can build lots of other useful things out of call_soon() and call_later() -- but I do need at least those two as "axioms".
3 - call_later() and call_every() should also take **kwargs other than just *args
I just replied to that in a previous message; there's also a comment in the code. How important is this really? Are there lots of use cases that require you to pass keyword args? If it's only on occasion you can use a lambda. (The *args is a compromise so we don't need a lambda to wrap every callback. But I want to reserve keyword args for future extensions to the registration functions.)
4 - I think PollsterBase should provide a method to modify() the events registered for a certain fd (both poll() and epoll() have such a method and it's faster compared to un/registering a fd).
Did you see the concrete implementations? Those where this matters implicitly uses modify() if the required flags change. I can imagine more optimizations of the implementations (e.g. delaying register()/modify() calls until poll() is actually called, to avoid unnecessary churn) without making the API more complex.
Feel free to take a look at my scheduler implementation which looks quite similar to what you've done in polling.py: http://code.google.com/p/pyftpdlib/source/browse/trunk/pyftpdlib/lib/ioloop.py?spec=svn1115&r=1115#85
Thanks, I had seen it previously, I think this also proves that there's nothing particularly earth-shattering about this design. :-) I'd love to copy some more of your tricks, e.g. the occasional re-heapifying. (What usage pattern is this dealing with exactly?) I should also check that I've taken care of all the various flags and other details (I recall being quite surprised that with poll(), on some platforms I need to check for POLLHUP but not on others).
=== About sockets.py ===
1 - In SocketTransport it seems there's no error handling provisioned for send() and recv(). You should expect these errors http://hg.python.org/cpython/file/95931c48a76f/Lib/asyncore.py#l60 signaling disconnection plus EWOULDBLOCK and EAGAIN for "retry"
Right, I know have been naive about these and have already got a TODO note.
2 - SslTransport's send() and recv() methods should suffer the same problem.
Ditto, Antoine told me.
3 - I don't fully understand how data transfer works exactly but keep in mind that the transport should interact with the pollster. What I mean is that generally speaking a connected socket should *always* be readable ("r"), even when it's idle, then switch to "rw" events when sending data, then get back to "r" when all the data has been sent. This is *crucial* if you want to achieve high performances/scalability and that is why PollsterBase should probably provide a modify() method. Please take a look at what I've done here: http://code.google.com/p/pyftpdlib/source/browse/trunk/pyftpdlib/lib/ioloop.py?spec=svn1115&r=1115#809
Hm. I am not convinced that managing this explicitly from the transport is the right solution (note that my transports are quite different from those in Twisted). But I'll keep this in mind -- I would like to set up a benchmark suite at some point. I will probably have to implement the server side of HTTP for that purpose, so I can point e.g. ab at my app.
=== Other considerations ===
This 'yield' / 'yield from' approach is new to me (I'm more of a "callback guy") so I can't say I fully understand what's going on just by reading the code.
Fair enough. You should probably start by reading Greg Ewing's tutorial -- it's short and sweet: http://www.cosc.canterbury.ac.nz/greg.ewing/python/tasks/SimpleScheduler.htm...
What I would like to see instead of main.py is a bunch of code samples / demos showing how this library is supposed to be used in different circumstances.
Agreed, more examples are needed.
In details I'd like to see at least:
1 - a client example (connect(), send() a string, recv() a response, close())
Hm, that's all in urlfetch().
2 - an echo server example (accept(), recv() string, send() it back(), close()
Yes, that's missing.
3 - how to use a different transport (e.g. UDP)?
I haven't looked into this yet. I expect I'll have to write a different SocketTransport for this (the existing transports are implicitly stream-oriented) but I know that the scheduler and eventloop implementation can handle this fine.
4 - how to run long running tasks in a thread?
That's implemented. Check out call_in_thread(). Note that you can pass it an alternate threadpool (executor).
Also:
5 - is it possible to use multiple "reactors" in different threads?
Should be possible.
How? (asyncore for example achieves this by providing a separate 'map' argument for both the 'reactor' and the dispatchers)
It works by making the Context class use thread-local storage (TLS).
I understand you just started with this so I'm probably asking too much at this point in time. Feel free to consider this a kind of a "long term review".
You have asked many useful questions already. Since you have implemented a real-world I/O loop yourself, your input is extremely valuable. Thanks, and keep at it! -- --Guido van Rossum (python.org/~guido)
On Mon, Oct 29, 2012 at 8:43 PM, Guido van Rossum <guido@python.org> wrote:
On Mon, Oct 29, 2012 at 11:08 AM, Giampaolo Rodolà <g.rodola@gmail.com> wrote:
2012/10/29 Guido van Rossum <guido@python.org>
I'm most interested in feedback on the design of polling.py and scheduling.py, and to a lesser extent on the design of sockets.py; main.py is just an example of how this style works out in practice.
Follows my comments.
=== About polling.py ===
1 - I think DelayedCall should have a reset() method, other than just cancel().
So, essentially an uncancel()? Why not just re-register in that case? Or what's your use case? (Right now there's no problem in calling one of these many times -- it's just that cancellation is permanent.)
2 - EventLoopMixin should have a call_every() method other than just call_later()
Arguably you can emulate that with a simple loop:
def call_every(secs, func, *args): while True: yield from scheduler.sleep(secs) func(*args)
(Flavor to taste to log exceptions, handle cancellation, automatically spawn a separate task, etc.)
I can build lots of other useful things out of call_soon() and call_later() -- but I do need at least those two as "axioms".
3 - call_later() and call_every() should also take **kwargs other than just *args
I just replied to that in a previous message; there's also a comment in the code. How important is this really? Are there lots of use cases that require you to pass keyword args? If it's only on occasion you can use a lambda. (The *args is a compromise so we don't need a lambda to wrap every callback. But I want to reserve keyword args for future extensions to the registration functions.)
Well, using keyword-only arguments for passing flags can be good point. I can live with *args only. Maybe using **kwargs for call_later family only is good compromise? Really I don't care on add_reader/add_writer, that functions intended to library writers. call_later and call_soon can be used in user code often enough and passing keyword arguments can be convenient.
4 - I think PollsterBase should provide a method to modify() the events registered for a certain fd (both poll() and epoll() have such a method and it's faster compared to un/registering a fd).
Did you see the concrete implementations? Those where this matters implicitly uses modify() if the required flags change. I can imagine more optimizations of the implementations (e.g. delaying register()/modify() calls until poll() is actually called, to avoid unnecessary churn) without making the API more complex.
Feel free to take a look at my scheduler implementation which looks quite similar to what you've done in polling.py: http://code.google.com/p/pyftpdlib/source/browse/trunk/pyftpdlib/lib/ioloop.py?spec=svn1115&r=1115#85
Thanks, I had seen it previously, I think this also proves that there's nothing particularly earth-shattering about this design. :-) I'd love to copy some more of your tricks, e.g. the occasional re-heapifying. (What usage pattern is this dealing with exactly?) I should also check that I've taken care of all the various flags and other details (I recall being quite surprised that with poll(), on some platforms I need to check for POLLHUP but not on others).
=== About sockets.py ===
1 - In SocketTransport it seems there's no error handling provisioned for send() and recv(). You should expect these errors http://hg.python.org/cpython/file/95931c48a76f/Lib/asyncore.py#l60 signaling disconnection plus EWOULDBLOCK and EAGAIN for "retry"
Right, I know have been naive about these and have already got a TODO note.
2 - SslTransport's send() and recv() methods should suffer the same problem.
Ditto, Antoine told me.
3 - I don't fully understand how data transfer works exactly but keep in mind that the transport should interact with the pollster. What I mean is that generally speaking a connected socket should *always* be readable ("r"), even when it's idle, then switch to "rw" events when sending data, then get back to "r" when all the data has been sent. This is *crucial* if you want to achieve high performances/scalability and that is why PollsterBase should probably provide a modify() method. Please take a look at what I've done here: http://code.google.com/p/pyftpdlib/source/browse/trunk/pyftpdlib/lib/ioloop.py?spec=svn1115&r=1115#809
Hm. I am not convinced that managing this explicitly from the transport is the right solution (note that my transports are quite different from those in Twisted). But I'll keep this in mind -- I would like to set up a benchmark suite at some point. I will probably have to implement the server side of HTTP for that purpose, so I can point e.g. ab at my app.
=== Other considerations ===
This 'yield' / 'yield from' approach is new to me (I'm more of a "callback guy") so I can't say I fully understand what's going on just by reading the code.
Fair enough. You should probably start by reading Greg Ewing's tutorial -- it's short and sweet: http://www.cosc.canterbury.ac.nz/greg.ewing/python/tasks/SimpleScheduler.htm...
What I would like to see instead of main.py is a bunch of code samples / demos showing how this library is supposed to be used in different circumstances.
Agreed, more examples are needed.
In details I'd like to see at least:
1 - a client example (connect(), send() a string, recv() a response, close())
Hm, that's all in urlfetch().
2 - an echo server example (accept(), recv() string, send() it back(), close()
Yes, that's missing.
3 - how to use a different transport (e.g. UDP)?
I haven't looked into this yet. I expect I'll have to write a different SocketTransport for this (the existing transports are implicitly stream-oriented) but I know that the scheduler and eventloop implementation can handle this fine.
4 - how to run long running tasks in a thread?
That's implemented. Check out call_in_thread(). Note that you can pass it an alternate threadpool (executor).
Also:
5 - is it possible to use multiple "reactors" in different threads?
Should be possible.
How? (asyncore for example achieves this by providing a separate 'map' argument for both the 'reactor' and the dispatchers)
It works by making the Context class use thread-local storage (TLS).
I understand you just started with this so I'm probably asking too much at this point in time. Feel free to consider this a kind of a "long term review".
You have asked many useful questions already. Since you have implemented a real-world I/O loop yourself, your input is extremely valuable. Thanks, and keep at it!
-- --Guido van Rossum (python.org/~guido) _______________________________________________ Python-ideas mailing list Python-ideas@python.org http://mail.python.org/mailman/listinfo/python-ideas
-- Thanks, Andrew Svetlov
2012/10/29 Guido van Rossum <guido@python.org>:
On Mon, Oct 29, 2012 at 11:08 AM, Giampaolo Rodolà <g.rodola@gmail.com> wrote:
2012/10/29 Guido van Rossum <guido@python.org> === About polling.py ===
1 - I think DelayedCall should have a reset() method, other than just cancel().
So, essentially an uncancel()? Why not just re-register in that case? Or what's your use case? (Right now there's no problem in calling one of these many times -- it's just that cancellation is permanent.)
The most common use case is when you want to disconnect the other peer after a certain time of inactivity. Ideally what you would do is schedule() a idle/timeout function and reset() it every time the other peer sends you some data.
2 - EventLoopMixin should have a call_every() method other than just call_later()
Arguably you can emulate that with a simple loop:
def call_every(secs, func, *args): while True: yield from scheduler.sleep(secs) func(*args)
(Flavor to taste to log exceptions, handle cancellation, automatically spawn a separate task, etc.)
I can build lots of other useful things out of call_soon() and call_later() -- but I do need at least those two as "axioms".
Agreed.
3 - call_later() and call_every() should also take **kwargs other than just *args
I just replied to that in a previous message; there's also a comment in the code. How important is this really? Are there lots of use cases that require you to pass keyword args? If it's only on occasion you can use a lambda. (The *args is a compromise so we don't need a lambda to wrap every callback. But I want to reserve keyword args for future extensions to the registration functions.)
It's not crucial to have kwargs, just nice, but I understand your motives to rule them out, in fact I reserved two kwarg names ('_errback' and '_scheduler') for the same reason. In my experience I learned that passing an extra error handler function (what Twisted calls 'errrback') can be desirable, so that's another thing you might want to consider. In my scheduler implementation I achieved that by passing an _errback keyword parameter, like this:
ioloop.call_later(30, callback, _errback=err_callback)
Not very nice to use a reserved keyword, I agree. Perhaps you can keep ruling out kwargs referred to the callback function and change the current call_later signature as such: - def call_later(self, when, callback, *args): + def call_later(self, when, callback, *args, errback=None): ...or maybe provide a DelayedCall.add_errback() method a-la Twisted.
Thanks, I had seen it previously, I think this also proves that there's nothing particularly earth-shattering about this design. :-) I'd love to copy some more of your tricks,
Sure, go on. It's MIT licensed code.
e.g. the occasional re-heapifying. (What usage pattern is this dealing with exactly?)
It's intended to avoid making the list grow with too many cancelled functions. Imagine this use case: WEEK = 60 x 60 x 24 x 7 for x in xrange(1000000): f = call_later(WEEK, fun) f.cancel() You'll end up having a heap with milions of cancelled items which will be freed after a week. Instead you can keep track of the number of cancelled functions every time cancel() is called and re-heapify the list when that number gets too high: http://code.google.com/p/pyftpdlib/source/browse/trunk/pyftpdlib/lib/ioloop.py?spec=svn1115&r=1115#122
should also check that I've taken care of all the various flags and other details (I recall being quite surprised that with poll(), on some platforms I need to check for POLLHUP but not on others).
Yeah, that's a painful part. Try to look here: http://code.google.com/p/pyftpdlib/source/browse/trunk/pyftpdlib/lib/ioloop.py?spec=svn1115&r=1115#464 Instead of handle_close()ing you should add the fd to the list of readable ones ("r"). The call to recv() which will be coming next will then cause the socket to close (you have to add the error handling to recv() first though).
3 - I don't fully understand how data transfer works exactly but keep in mind that the transport should interact with the pollster. What I mean is that generally speaking a connected socket should *always* be readable ("r"), even when it's idle, then switch to "rw" events when sending data, then get back to "r" when all the data has been sent. This is *crucial* if you want to achieve high performances/scalability and that is why PollsterBase should probably provide a modify() method. Please take a look at what I've done here: http://code.google.com/p/pyftpdlib/source/browse/trunk/pyftpdlib/lib/ioloop.py?spec=svn1115&r=1115#809
Hm. I am not convinced that managing this explicitly from the transport is the right solution (note that my transports are quite different from those in Twisted). But I'll keep this in mind -- I would like to set up a benchmark suite at some point. I will probably have to implement the server side of HTTP for that purpose, so I can point e.g. ab at my app.
I think you might want to apply that to something slighlty higher level than the mere transport. Something like the equivalent of asynchat.push / asynchat.push_with_producer, if you'll ever want to go that far in terms of abstraction, or maybe avoid that at all but make it clear in the doc that the user should take care of that. My point is that having a socket registered for both "r" AND "w" events when in fact you want only "r" OR "w" is an exponential waste of CPU cycles and it should be avoided either by the lib or by the user. "old select() implementation" vs "new select() implementation" benchmark shown here reflects exactly this problem which still affects base asyncore module: https://code.google.com/p/pyftpdlib/issues/detail?id=203#c6 I'll keep following the progress on this and hopefully come up with another set of questions and/or random thoughts. --- Giampaolo https://code.google.com/p/pyftpdlib/ https://code.google.com/p/psutil/ https://code.google.com/p/pysendfile/
On Mon, Oct 29, 2012 at 2:20 PM, Giampaolo Rodolà <g.rodola@gmail.com> wrote:
2012/10/29 Guido van Rossum <guido@python.org>:
On Mon, Oct 29, 2012 at 11:08 AM, Giampaolo Rodolà <g.rodola@gmail.com> wrote:
2012/10/29 Guido van Rossum <guido@python.org> === About polling.py ===
1 - I think DelayedCall should have a reset() method, other than just cancel().
So, essentially an uncancel()? Why not just re-register in that case? Or what's your use case? (Right now there's no problem in calling one of these many times -- it's just that cancellation is permanent.)
The most common use case is when you want to disconnect the other peer after a certain time of inactivity. Ideally what you would do is schedule() a idle/timeout function and reset() it every time the other peer sends you some data.
Um, ok, I think you are saying that you want to be able to set timeouts and then "reset" that timeout. This is a much higher-level thing than canceling the DelayedCall object. (I have no desire to make DelayedCall have functionality like Twisted's Deferred. It is something *much* simpler; it's just the API for cancelling a callback passed to call_later(), and its other uses are similar to this.) [...]
Not very nice to use a reserved keyword, I agree. Perhaps you can keep ruling out kwargs referred to the callback function and change the current call_later signature as such:
- def call_later(self, when, callback, *args): + def call_later(self, when, callback, *args, errback=None):
...or maybe provide a DelayedCall.add_errback() method a-la Twisted.
I really don't want that though! But I'm glad you're not too hell-bent on supporting callbacks with keyword-only args. [...]
should also check that I've taken care of all the various flags and other details (I recall being quite surprised that with poll(), on some platforms I need to check for POLLHUP but not on others).
Yeah, that's a painful part. Try to look here: http://code.google.com/p/pyftpdlib/source/browse/trunk/pyftpdlib/lib/ioloop.py?spec=svn1115&r=1115#464 Instead of handle_close()ing you should add the fd to the list of readable ones ("r"). The call to recv() which will be coming next will then cause the socket to close (you have to add the error handling to recv() first though).
Aha, are you suggesting that I close the socket when I detect that the socket is closed? But what if the other side uses shutdown() to close only one end? Depending on the protocol it might be useful to either stop reading but keep sending, or vice versa. Maybe I could detect that both ends are closed and then close the socket. Or are you suggesting something else?
3 - I don't fully understand how data transfer works exactly but keep in mind that the transport should interact with the pollster. What I mean is that generally speaking a connected socket should *always* be readable ("r"), even when it's idle, then switch to "rw" events when sending data, then get back to "r" when all the data has been sent. This is *crucial* if you want to achieve high performances/scalability and that is why PollsterBase should probably provide a modify() method. Please take a look at what I've done here: http://code.google.com/p/pyftpdlib/source/browse/trunk/pyftpdlib/lib/ioloop.py?spec=svn1115&r=1115#809
Hm. I am not convinced that managing this explicitly from the transport is the right solution (note that my transports are quite different from those in Twisted). But I'll keep this in mind -- I would like to set up a benchmark suite at some point. I will probably have to implement the server side of HTTP for that purpose, so I can point e.g. ab at my app.
I think you might want to apply that to something slighlty higher level than the mere transport.
(Apply *what*?)
Something like the equivalent of asynchat.push / asynchat.push_with_producer, if you'll ever want to go that far in terms of abstraction, or maybe avoid that at all but make it clear in the doc that the user should take care of that.
I'm actually not sufficiently familiar with asynchat to comment. I think it's got quite a different model than what I am trying to set up here.
My point is that having a socket registered for both "r" AND "w" events when in fact you want only "r" OR "w" is an exponential waste of CPU cycles and it should be avoided either by the lib or by the user.
One task can only be blocked for reading OR writing. The only way to have a socket registered for both is if there are separate tasks for reading and writing, and then presumably that is what you want. (I have a feeling you haven't fully grokked my HTTP client code yet?)
"old select() implementation" vs "new select() implementation" benchmark shown here reflects exactly this problem which still affects base asyncore module: https://code.google.com/p/pyftpdlib/issues/detail?id=203#c6
Hm, I am already using epoll or kqueue if available, otherwise poll, falling back to select only if there's nothing else available (in practice that's only Windows). But I will diligently work towards a benchmarkable demo.
I'll keep following the progress on this and hopefully come up with another set of questions and/or random thoughts.
Thanks! -- --Guido van Rossum (python.org/~guido)
On Tue, Oct 30, 2012 at 12:03 AM, Guido van Rossum <guido@python.org> wrote:
On Mon, Oct 29, 2012 at 2:20 PM, Giampaolo Rodolà <g.rodola@gmail.com> wrote:
2012/10/29 Guido van Rossum <guido@python.org>:
On Mon, Oct 29, 2012 at 11:08 AM, Giampaolo Rodolà <g.rodola@gmail.com> wrote:
2012/10/29 Guido van Rossum <guido@python.org> === About polling.py ===
1 - I think DelayedCall should have a reset() method, other than just cancel().
So, essentially an uncancel()? Why not just re-register in that case? Or what's your use case? (Right now there's no problem in calling one of these many times -- it's just that cancellation is permanent.)
The most common use case is when you want to disconnect the other peer after a certain time of inactivity. Ideally what you would do is schedule() a idle/timeout function and reset() it every time the other peer sends you some data.
Um, ok, I think you are saying that you want to be able to set timeouts and then "reset" that timeout. This is a much higher-level thing than canceling the DelayedCall object. (I have no desire to make DelayedCall have functionality like Twisted's Deferred. It is something *much* simpler; it's just the API for cancelling a callback passed to call_later(), and its other uses are similar to this.)
Twisted's DelayedCall is different from Deferred, it used for reactor.callLater and returned from this function (the same as call_later from tulip) Interface is: http://twistedmatrix.com/trac/browser/trunk/twisted/internet/interfaces.py#L... Implementation is http://twistedmatrix.com/trac/browser/trunk/twisted/internet/base.py#L35 DelayedCall from twisted has nothing common with Deferred, it's just an interface for scheduled activity, which can be called once, cancelled or rescheduled to another time. I've found that concept very useful when I used twisted.
[...]
Not very nice to use a reserved keyword, I agree. Perhaps you can keep ruling out kwargs referred to the callback function and change the current call_later signature as such:
- def call_later(self, when, callback, *args): + def call_later(self, when, callback, *args, errback=None):
...or maybe provide a DelayedCall.add_errback() method a-la Twisted.
I really don't want that though! But I'm glad you're not too hell-bent on supporting callbacks with keyword-only args.
[...]
should also check that I've taken care of all the various flags and other details (I recall being quite surprised that with poll(), on some platforms I need to check for POLLHUP but not on others).
Yeah, that's a painful part. Try to look here: http://code.google.com/p/pyftpdlib/source/browse/trunk/pyftpdlib/lib/ioloop.py?spec=svn1115&r=1115#464 Instead of handle_close()ing you should add the fd to the list of readable ones ("r"). The call to recv() which will be coming next will then cause the socket to close (you have to add the error handling to recv() first though).
Aha, are you suggesting that I close the socket when I detect that the socket is closed? But what if the other side uses shutdown() to close only one end? Depending on the protocol it might be useful to either stop reading but keep sending, or vice versa. Maybe I could detect that both ends are closed and then close the socket. Or are you suggesting something else?
3 - I don't fully understand how data transfer works exactly but keep in mind that the transport should interact with the pollster. What I mean is that generally speaking a connected socket should *always* be readable ("r"), even when it's idle, then switch to "rw" events when sending data, then get back to "r" when all the data has been sent. This is *crucial* if you want to achieve high performances/scalability and that is why PollsterBase should probably provide a modify() method. Please take a look at what I've done here: http://code.google.com/p/pyftpdlib/source/browse/trunk/pyftpdlib/lib/ioloop.py?spec=svn1115&r=1115#809
Hm. I am not convinced that managing this explicitly from the transport is the right solution (note that my transports are quite different from those in Twisted). But I'll keep this in mind -- I would like to set up a benchmark suite at some point. I will probably have to implement the server side of HTTP for that purpose, so I can point e.g. ab at my app.
I think you might want to apply that to something slighlty higher level than the mere transport.
(Apply *what*?)
Something like the equivalent of asynchat.push / asynchat.push_with_producer, if you'll ever want to go that far in terms of abstraction, or maybe avoid that at all but make it clear in the doc that the user should take care of that.
I'm actually not sufficiently familiar with asynchat to comment. I think it's got quite a different model than what I am trying to set up here.
My point is that having a socket registered for both "r" AND "w" events when in fact you want only "r" OR "w" is an exponential waste of CPU cycles and it should be avoided either by the lib or by the user.
One task can only be blocked for reading OR writing. The only way to have a socket registered for both is if there are separate tasks for reading and writing, and then presumably that is what you want. (I have a feeling you haven't fully grokked my HTTP client code yet?)
"old select() implementation" vs "new select() implementation" benchmark shown here reflects exactly this problem which still affects base asyncore module: https://code.google.com/p/pyftpdlib/issues/detail?id=203#c6
Hm, I am already using epoll or kqueue if available, otherwise poll, falling back to select only if there's nothing else available (in practice that's only Windows).
But I will diligently work towards a benchmarkable demo.
I'll keep following the progress on this and hopefully come up with another set of questions and/or random thoughts.
Thanks!
-- --Guido van Rossum (python.org/~guido) _______________________________________________ Python-ideas mailing list Python-ideas@python.org http://mail.python.org/mailman/listinfo/python-ideas
-- Thanks, Andrew Svetlov
On Mon, Oct 29, 2012 at 3:19 PM, Andrew Svetlov <andrew.svetlov@gmail.com> wrote:
Twisted's DelayedCall is different from Deferred, it used for reactor.callLater and returned from this function (the same as call_later from tulip) Interface is: http://twistedmatrix.com/trac/browser/trunk/twisted/internet/interfaces.py#L... Implementation is http://twistedmatrix.com/trac/browser/trunk/twisted/internet/base.py#L35 DelayedCall from twisted has nothing common with Deferred, it's just an interface for scheduled activity, which can be called once, cancelled or rescheduled to another time.
I've found that concept very useful when I used twisted.
Oh dear. I had no idea there was something named DelayedCall in Twisted. There is no intention of similarity. -- --Guido van Rossum (python.org/~guido)
Guido van Rossum wrote:
I can build lots of other useful things out of call_soon() and call_later() -- but I do need at least those two as "axioms".
Isn't call_soon() equivalent to call_later() with a time delay of 0? If so, then call_later() is really the only axiomatic one. -- Greg
Guido, Finally got some time to do a review & read what others posted. Some comments are more general, some are more implementation-specific (hopefully you want to hear latter ones as well) And I'm still in the process of digesting your approach & code (as I've spent too much time with my implementation)... On 2012-10-28, at 7:52 PM, Guido van Rossum <guido@python.org> wrote: [...]
polling.py: http://code.google.com/p/tulip/source/browse/polling.py [...]
1. I'd make EventLoopMixin a separate entity from pollsters. So that you'd be able to add many different pollsters to one EventLoop. This way you can have specialized pollster for different types of IO, including UI etc. 2. Sometimes, there is a need to run a coroutine in a threadpool. I know it sounds weird, but it's probably worth exploring. 3. In my framework each threadpool worker has its own local context, with various information like what Task run the operation etc. And few small things: 4. epoll.poll and other syscalls need to be wrapped in try..except to catch and ignore (and log?) EINTR type of exceptions. 5. For epoll you probably want to check/(log?) EPOLLHUP and EPOLLERR errors too.
scheduling.py: http://code.google.com/p/tulip/source/browse/scheduling.py [...]
In the docstrings I use the prefix "COROUTINE:" to indicate public APIs that should be invoked using yield from. [...]
As others, I would definitely suggest adding a decorator to make coroutines more distinguishable. It would be even better if we can return a tiny wrapper, that lets you to simply write 'doit.run().with_timeout(2.1)', instead of: task = scheduling.Task(doit(), timeout=2.1) task.start() scheduling.run() And avoid manual Task instantiation at all. I also liked the simplicity of the Task class. I think it'd be easy to mix greenlets in it by switching in a new greenlet on each 'step'. That will give you 'yield_()' function, which you can use in the same way you use 'yield' statement now (I'm not proposing to incorporate greenlets in the lib itself, but rather to provide an option to do so) Hence there should be a way to plug your own Task (sub-)class in. Thank you, Yury
On Mon, Oct 29, 2012 at 5:43 PM, Yury Selivanov <yselivanov.ml@gmail.com> wrote:
Finally got some time to do a review & read what others posted.
Great!
Some comments are more general, some are more implementation-specific (hopefully you want to hear latter ones as well)
Yes!
And I'm still in the process of digesting your approach & code (as I've spent too much time with my implementation)...
Heh. :-)
On 2012-10-28, at 7:52 PM, Guido van Rossum <guido@python.org> wrote: [...]
polling.py: http://code.google.com/p/tulip/source/browse/polling.py [...]
1. I'd make EventLoopMixin a separate entity from pollsters. So that you'd be able to add many different pollsters to one EventLoop. This way you can have specialized pollster for different types of IO, including UI etc.
I came to the same conclusion, so I fixed this. See the latest version. (BTW, I also renamed add_reader() etc. on the Pollster class to register_reader() etc. -- I dislike similar APIs on different classes to have the same name if there's not a strict super class override involved.)
2. Sometimes, there is a need to run a coroutine in a threadpool. I know it sounds weird, but it's probably worth exploring.
I think that can be done quite simply. Since each thread has its own eventloop (via the magic of TLS), it's as simple as writing a function that creates a task, starts it, and then runs the eventloop. There's nothing else running in that particular thread, and its eventloop will terminate when there's nothing left to do there -- i.e. when the task is done. Sketch: def some_generator(arg): ...stuff using yield from... return 42 def run_it_in_the_threadpool(arg): t = Task(some_generator(arg)) t.start() scheduling.run() return t.result # And in your code: result = yield from scheduling.call_in_thread(run_it_in_the_threadpool, arg) # Now result == 42.
3. In my framework each threadpool worker has its own local context, with various information like what Task run the operation etc.
I think I have this too -- Thread-Local Storage!
And few small things:
4. epoll.poll and other syscalls need to be wrapped in try..except to catch and ignore (and log?) EINTR type of exceptions.
Good point.
5. For epoll you probably want to check/(log?) EPOLLHUP and EPOLLERR errors too.
Do you have a code sample? I haven't found a need yet.
scheduling.py: http://code.google.com/p/tulip/source/browse/scheduling.py [...]
In the docstrings I use the prefix "COROUTINE:" to indicate public APIs that should be invoked using yield from. [...]
As others, I would definitely suggest adding a decorator to make coroutines more distinguishable.
That's definitely on my TODO list.
It would be even better if we can return a tiny wrapper, that lets you to simply write 'doit.run().with_timeout(2.1)', instead of:
task = scheduling.Task(doit(), timeout=2.1) task.start() scheduling.run()
The run() call shouldn't be necessary unless you are at the toplevel.
And avoid manual Task instantiation at all.
Hm. I want the generator function to return just a generator object, and I can't add methods to that. But we can come up with a decent API.
I also liked the simplicity of the Task class. I think it'd be easy to mix greenlets in it by switching in a new greenlet on each 'step'. That will give you 'yield_()' function, which you can use in the same way you use 'yield' statement now (I'm not proposing to incorporate greenlets in the lib itself, but rather to provide an option to do so) Hence there should be a way to plug your own Task (sub-)class in.
Hm. Someone else will have to give that a try. Thanks for your feedback!! -- --Guido van Rossum (python.org/~guido)
On 2012-10-29, at 10:07 PM, Guido van Rossum <guido@python.org> wrote: [...]
5. For epoll you probably want to check/(log?) EPOLLHUP and EPOLLERR errors too.
Do you have a code sample? I haven't found a need yet.
Just a code dump from my epoll proactor: if ev & EPOLLHUP: sock.close(_error_cls=ConnectionResetError) self._unschedule(fd) continue if ev & EPOLLERR: sock.close(_error_cls=ConnectionError, _error_msg='socket error in epoll proactor') self._unschedule(fd) continue [...]
It would be even better if we can return a tiny wrapper, that lets you to simply write 'doit.run().with_timeout(2.1)', instead of:
task = scheduling.Task(doit(), timeout=2.1) task.start() scheduling.run()
The run() call shouldn't be necessary unless you are at the toplevel.
Yes, that's just a sugar to make top-level runs more appealing. You'll also get a nice way of setting timeouts, yield from coro().with_timeout(1.0) [...]
I also liked the simplicity of the Task class. I think it'd be easy to mix greenlets in it by switching in a new greenlet on each 'step'. That will give you 'yield_()' function, which you can use in the same way you use 'yield' statement now (I'm not proposing to incorporate greenlets in the lib itself, but rather to provide an option to do so) Hence there should be a way to plug your own Task (sub-)class in.
Hm. Someone else will have to give that a try.
I'll be that someone once we choose the direction ;) IMO the greenlets integration is a very important topic. - Yury
Yury Selivanov wrote:
It would be even better if we can return a tiny wrapper, that lets you to simply write 'doit.run().with_timeout(2.1)', instead of:
task = scheduling.Task(doit(), timeout=2.1) task.start() scheduling.run()
I would prefer spelling this something like scheduling.spawn(doit(), timeout=2.1) A newly spawned task should be scheduled automatically; if you're not ready for it to run yet, then don't spawn it until you are. Also, it should almost *never* be necessary to call scheduling.run(). That should happen only in a very few places, mostly buried deep inside the scheduling/event loop system. -- Greg
Hi, I've been following the PEP380-related threads and I've reviewed this stuff, while trying to do the protocols/transports PEP, and trying to glue the two together. The biggest difference I can see is that protocols as they've been discussed are "pull": they get called when some data arrives. They don't know how much data there is; they just get told "here's some data". The obvious difference with the API in, eg: https://code.google.com/p/tulip/source/browse/sockets.py#56 ... is that now I have to tell a socket to read n bytes, which "blocks" the coroutine, then I get some data. Now, there doesn't have to be an issue; you could simply say: data = yield from s.recv(4096) # that's the magic number usually right proto.data_received(4096) It seems a bit boilerplatey, but I suppose that eventually could be hidden away. But this style is pervasive, for example that's how reading by lines works: https://code.google.com/p/tulip/source/browse/echosvr.py#20 While I'm not a big fan (I may be convinced if I see a protocol test that looks nice); I'm just wondering if there's any point in trying to write the pull-style protocols when this works quite differently. Additionally, I'm not sure if readline belongs on the socket. I understand the simile with files, though. With the coroutine style I could see how the most obvious fit would be something like tornado's read_until, or an as_lines that essentially calls read_until repeatedly. Can the delimiter for this be modified? My main syntactic gripe is that when I write @inlineCallbacks code or monocle code or whatever, when I say "yield" I'm yielding to the reactor. That makes sense to me (I realize natural language arguments don't always make sense in a programming language context). "yield from" less so (but okay, that's what it has to look like). But this just seems weird to me: yield from trans.send(line.upper()) Not only do I not understand why I'm yielding there in the first place (I don't have to wait for anything, I just want to push some data out!), it feels like all of my yields have been replaced with yield froms for no obvious reason (well, there are reasons, I'm just trying to look at this naively). I guess Twisted gets away with this because of deferred chaining: that one deferred might have tons of callbacks in the background, many of which also doing IO operations, resulting in a sequence of asynchronous operations that only at the end cause the generator to be run some more. I guess that belongs in a different thread, though. Even, then, I'm not sure if I'm uncomfortable because I'm seeing something different from what I'm used to, or if my argument from English actually makes any sense whatsoever. Speaking of protocol tests, what would those look like? How do I yell, say, "POST /blah HTTP/1.1\r\n" from a transport? Presumably I'd have a mock transport, and call the handler with that? (I realize it's early days to be thinking that far ahead; I'm just trying to figure out how I can contribute a good protocol definition to all of this). cheers lvh
Sorry to chime in, but would this be a case where there could be the syntax `yield to` ? On Tue, Oct 30, 2012 at 10:12 AM, Laurens Van Houtven <_@lvh.cc> wrote:
Hi,
I've been following the PEP380-related threads and I've reviewed this stuff, while trying to do the protocols/transports PEP, and trying to glue the two together.
The biggest difference I can see is that protocols as they've been discussed are "pull": they get called when some data arrives. They don't know how much data there is; they just get told "here's some data". The obvious difference with the API in, eg:
https://code.google.com/p/tulip/source/browse/sockets.py#56
... is that now I have to tell a socket to read n bytes, which "blocks" the coroutine, then I get some data.
Now, there doesn't have to be an issue; you could simply say:
data = yield from s.recv(4096) # that's the magic number usually right proto.data_received(4096)
It seems a bit boilerplatey, but I suppose that eventually could be hidden away.
But this style is pervasive, for example that's how reading by lines works:
https://code.google.com/p/tulip/source/browse/echosvr.py#20
While I'm not a big fan (I may be convinced if I see a protocol test that looks nice); I'm just wondering if there's any point in trying to write the pull-style protocols when this works quite differently.
Additionally, I'm not sure if readline belongs on the socket. I understand the simile with files, though. With the coroutine style I could see how the most obvious fit would be something like tornado's read_until, or an as_lines that essentially calls read_until repeatedly. Can the delimiter for this be modified?
My main syntactic gripe is that when I write @inlineCallbacks code or monocle code or whatever, when I say "yield" I'm yielding to the reactor. That makes sense to me (I realize natural language arguments don't always make sense in a programming language context). "yield from" less so (but okay, that's what it has to look like). But this just seems weird to me:
yield from trans.send(line.upper())
Not only do I not understand why I'm yielding there in the first place (I don't have to wait for anything, I just want to push some data out!), it feels like all of my yields have been replaced with yield froms for no obvious reason (well, there are reasons, I'm just trying to look at this naively).
I guess Twisted gets away with this because of deferred chaining: that one deferred might have tons of callbacks in the background, many of which also doing IO operations, resulting in a sequence of asynchronous operations that only at the end cause the generator to be run some more.
I guess that belongs in a different thread, though. Even, then, I'm not sure if I'm uncomfortable because I'm seeing something different from what I'm used to, or if my argument from English actually makes any sense whatsoever.
Speaking of protocol tests, what would those look like? How do I yell, say, "POST /blah HTTP/1.1\r\n" from a transport? Presumably I'd have a mock transport, and call the handler with that? (I realize it's early days to be thinking that far ahead; I'm just trying to figure out how I can contribute a good protocol definition to all of this).
cheers lvh
_______________________________________________ Python-ideas mailing list Python-ideas@python.org http://mail.python.org/mailman/listinfo/python-ideas
On Tue, Oct 30, 2012 at 3:12 AM, Laurens Van Houtven <_@lvh.cc> wrote:
I've been following the PEP380-related threads and I've reviewed this stuff, while trying to do the protocols/transports PEP, and trying to glue the two together.
Thanks! I know it can't be easy to keep up with all the threads (and now code repos).
The biggest difference I can see is that protocols as they've been discussed are "pull": they get called when some data arrives. They don't know how much data there is; they just get told "here's some data". The obvious difference with the API in, eg:
https://code.google.com/p/tulip/source/browse/sockets.py#56
... is that now I have to tell a socket to read n bytes, which "blocks" the coroutine, then I get some data.
Yes. But do note that sockets.py is mostly a throw-away example written to support the only style I am familiar with -- synchronous reads and writes. My point in writing this particular set of transports is that I want to take existing synchronous code (e.g. a threaded server built using the stdlib's socketserver.ThreadingTCPServer class) and make minimal changes to the protocol logic to support async operation -- those minimal changes should boil down to using a different way to set up a connection or a listening socket or constructing a stream from a socket, and putting "yield from" in front of the blocking operations (recv(), send(), and the read/readline/write operations on the streams. I'm still looking for guidance from Twisted and Tornado (and you!) to come up with better abstractions for transports and protocols. The underlying event loop *does* support a style where an object registers a callback function once which is called repeatedly, as long as the socket is readable (or writable, depending on the registration call).
Now, there doesn't have to be an issue; you could simply say:
data = yield from s.recv(4096) # that's the magic number usually right proto.data_received(4096)
(Off-topic: ages ago I determined that the optimal block size is actually 8192. But for all I know it is 256K these days. :-)
It seems a bit boilerplatey, but I suppose that eventually could be hidden away.
But this style is pervasive, for example that's how reading by lines works:
Right -- again, this is all geared towards making it palatable for people used to write synchronous code (either single-threaded or multi-threaded), not for people used to Twisted.
While I'm not a big fan (I may be convinced if I see a protocol test that looks nice);
Check out urlfetch() in main.py: http://code.google.com/p/tulip/source/browse/main.py#39 For sure, this isn't "pretty" and it should be rewritten using more abstraction -- I only wrote the entire thing as a single function because I was focused on the scheduler and event loop. And it is clearly missing a buffering layer for writing (it currently uses a separate send() call for each line of the HTTP headers, blech). But it implements a fairly complex (?) protocol and it performs well enough.
I'm just wondering if there's any point in trying to write the pull-style protocols when this works quite differently.
Perhaps you could try to write some pull-style transports and protocols for tulip to see if anything's missing from the scheduler and eventloop APIs or implementations? I'd be happy to rename sockets.py to push_sockets.py so there's room for a competing pull_sockets.py, and then we can compare apples to apples. (Unlike the yield vs. yield-from issue, where I am very biased, I am not biased about push vs. pull style. I just coded up what I was most familiar with first.)
Additionally, I'm not sure if readline belongs on the socket.
It isn't -- it is on the BufferedReader, which wraps around the socket (or other socket-like transport, like SSL). This is similar to the way the stdlib socket.socket class has a makefile() method that returns a stream wrapping the socket.
I understand the simile with files, though.
Right, that's where I've gotten most of my inspiration. I figure they are a good model to lure unsuspecting regular Python users in. :-)
With the coroutine style I could see how the most obvious fit would be something like tornado's read_until, or an as_lines that essentially calls read_until repeatedly. Can the delimiter for this be modified?
You can write your own BufferedReader, and if this is a common pattern we can make it a standard API. Unlike the SocketTransport and SslTransport classes, which contain various I/O hacks and integrate tightly with the polling capability of the eventloop, I consider BufferedReader plain user code. Antoine also hinted that with not too many changes we could reuse the existing buffering classes in the stdlib io module, which are implemented in C.
My main syntactic gripe is that when I write @inlineCallbacks code or monocle code or whatever, when I say "yield" I'm yielding to the reactor. That makes sense to me (I realize natural language arguments don't always make sense in a programming language context). "yield from" less so (but okay, that's what it has to look like). But this just seems weird to me:
yield from trans.send(line.upper())
Not only do I not understand why I'm yielding there in the first place (I don't have to wait for anything, I just want to push some data out!), it feels like all of my yields have been replaced with yield froms for no obvious reason (well, there are reasons, I'm just trying to look at this naively).
Are you talking about yield vs. yield-from here, or about the need to suspend every write? Regarding yield vs. yield-from, please squint and get used to seeing yield-from everywhere -- the scheduler implementation becomes *much* simpler and *much* faster using yield-from, so much so that there really is no competition. As to why you would have to suspend each time you call send(), that's mostly just an artefact of the incomplete example -- I didn't implement a BufferedWriter yet. I also have some worries about a task producing data at a rate faster than the socket can drain it from the buffer, but in practice I would probably relent and implement a write() call that returns immediately and should *not* be used with yield-from. (Unfortunately you can't have a call that works with or without yield-from.) I think there's a throttling mechanism in Twisted that can probably be copied here.
I guess Twisted gets away with this because of deferred chaining: that one deferred might have tons of callbacks in the background, many of which also doing IO operations, resulting in a sequence of asynchronous operations that only at the end cause the generator to be run some more.
I guess that belongs in a different thread, though. Even, then, I'm not sure if I'm uncomfortable because I'm seeing something different from what I'm used to, or if my argument from English actually makes any sense whatsoever.
Speaking of protocol tests, what would those look like? How do I yell, say, "POST /blah HTTP/1.1\r\n" from a transport? Presumably I'd have a mock transport, and call the handler with that? (I realize it's early days to be thinking that far ahead; I'm just trying to figure out how I can contribute a good protocol definition to all of this).
Actually I think the ease of writing tests should definitely be taken into account when designing the APIs here. In the Zope world, Jim Fulton wrote a simple abstraction for networking code that explicitly provides for testing: http://packages.python.org/zc.ngi/ (it also supports yield-style callbacks, similar to Twisted's inlineCallbacks). I currently don't have any tests, apart from manually running main.py and checking its output. I am a bit hesitant to add unit tests in this early stage, because keeping the tests passing inevitably slows down the process of ripping apart the API and rebuilding it in a different way -- something I do at least once a day, whenever I get feedback or a clever thought strikes me or something annoying reaches my trigger level. But I should probably write at least *some* tests, I'm sure it will be enlightening and I will end up changing the APIs to make testing easier. It's in the TODO. -- --Guido van Rossum (python.org/~guido)
On Tue, 30 Oct 2012 10:34:12 -0700 Guido van Rossum <guido@python.org> wrote:
Speaking of protocol tests, what would those look like? How do I yell, say, "POST /blah HTTP/1.1\r\n" from a transport? Presumably I'd have a mock transport, and call the handler with that? (I realize it's early days to be thinking that far ahead; I'm just trying to figure out how I can contribute a good protocol definition to all of this).
Actually I think the ease of writing tests should definitely be taken into account when designing the APIs here.
+11 ! Regards Antoine.
Finally getting around to this one... I am sorry if I'm repeating any criticism that has already been rehashed in this thread. There is really a deluge of mail here and I can't keep up with it. I've skimmed some of it and avoided or noted things that I did see mentioned, but I figured I should write up something before next week. To make a long story short, my main points here are: I think tulip unfortunately has a lot of the problems I tried to describe in earlier messages, it would be really great if we could have a core I/O interface that we could use for interoperability with Twisted before bolting a requirement for coroutine trampolines on to everything, twisted-style protocol/transport separation is really important and this should not neglect it. As I've tried to illustrate in previous messages, an API where applications have to call send() or recv() is just not going to behave intuitively in edge cases or perform well, I know it's a prototype, but this isn't such an unexplored area that it should be developed without TDD: all this code should both have tests and provide testing support to show how applications that use it can be tested the scheduler module needs some example implementation of something like Twisted's gatherResults for me to critique its expressiveness; it looks like it might be missing something in the area of one task coordinating multiple others but I can't tell On Oct 28, 2012, at 4:52 PM, Guido van Rossum <guido at python.org> wrote:
The pollster has a very simple API: add_reader(fd, callback, *args), add_writer(<ditto>), remove_reader(fd), remove_writer(fd), and poll(timeout) -> list of events. (fd means file descriptor.) There's also pollable() which just checks if there are any fds registered. My implementation requires fd to be an int, but that could easily be extended to support other types of event sources.
I don't see how that is. All of the mechanisms I would leverage within Twisted to support other event sources are missing (e.g.: abstract interfaces for those event sources). Are you saying that a totally different pollster could just accept a different type to add_reader, and not an integer? If so, how would application code know how to construct something else.
I'm not super happy that I have parallel reader/writer APIs, but passing a separate read/write flag didn't come out any more elegant, and I don't foresee other operation types (though I may be wrong).
add_reader and add_writer is an important internal layer of the API for UNIX-like operating systems, but the design here is fundamentally flawed in that application code (e.g. echosvr.py) needs to import concrete socket-handling classes like SocketTransport and BufferedReader in order to synthesize a transport. These classes might need to vary their behavior significantly between platforms, and application code should not be manipulating them unless there is a serious low-level need to. It looks like you've already addressed the fact that some transports need to be platform-specific. That's not quite accurate, unless you take a very broad definition of "platform". In Twisted, the basic socket-based TCP transport is actually supported across all platforms; but some other *APIs* (well, let's be honest, right now, just IOCP, but there have been others, such as java's native I/O APIs under Jython, in the past). You have to ask the "pollster" (by which I mean: reactor) for transport objects, because different multiplexing mechanisms can require different I/O APIs, even for basic socket I/O. This is why I keep talking about IOCP. It's not that Windows is particularly great, but that the IOCP API, if used correctly, is fairly alien, and is a good proxy for other use-cases which are less direct to explain, like interacting with GUI libraries where you need to interact with the GUI's notion of a socket to get notifications, rather than a raw FD. (GUI libraries often do this because they have to support Windows and therefore IOCP.) Others in this thread have already mentioned the fact that ZeroMQ requires the same sort of affordance. This is really a design error on 0MQ's part, but, you have to deal with it anyway ;-). More importantly, concretely tying everything to sockets is just bad design. You want to be able to operate on pipes and PTYs (which need to call read(), or, a bunch of gross ioctl()s and then read(), not recv()). You want to be able to able to operate on these things in unit tests without involving any actual file descriptors or syscalls. The higher level of abstraction makes regular application code a lot shorter, too: I was able to compress echosvr.py down to 22 lines by removing all the comments and logging and such, but that is still more than twice as long as the (9 line) echo server example on the front page of <http://twistedmatrix.com/trac/>. It's closer in length to the (19 line) full line-based publish/subscribe protocol over on the third tab. Also, what about testing? You want to be able to simulate the order of responses of multiple syscalls to coerce your event-driven program to receive its events in different orders. One of the big advantages of event driven programming is that everything's just a method call, so your unit tests can just call the methods to deliver data to your program and see what it does, without needing to have a large, elaborate simulation edifice to pretend to be a socket. But, once you mix in the magic of the generator trampoline, it's somewhat hard to assemble your own working environment without some kind of test event source; at least, it's not clear to me how to assemble a Task without having a pollster anywhere, or how to make my own basic pollster for testing.
The event loop has two basic ways to register callbacks: call_soon(callback, *args) causes callback(*args) to be called the next time the event loop runs; call_later(delay, callback, *args) schedules a callback at some time (relative or absolute) in the future.
"relative or absolute" is hiding the whole monotonic-clocks discussion behind a simple phrase, but that probably does not need to be resolved here... I'll let you know if we ever figure it out :).
sockets.py: http://code.google.com/p/tulip/source/browse/sockets.py
This implements some internet primitives using the APIs in scheduling.py (including block_r() and block_w()). I call them transports but they are different from transports Twisted; they are closer to idealized sockets. SocketTransport wraps a plain socket, offering recv() and send() methods that must be invoked using yield from.
I feel I should note that these methods behave inconsistently; send() behaves as sendall(), re-trying its writes until it receives a full buffer, but recv() may yield a short read. (But most importantly, block_r and block_w are insufficient as primitives; you need a separate pollster that uses write_then_block(data) and read_then_block() too, which may need to dispatch to WSASend/WSARecv or WriteFile/ReadFile.)
SslTransport wraps an ssl socket (luckily in Python 2.6 and up, stdlib ssl sockets have good async support!).
stdlib ssl sockets have async support that makes a number of UNIX-y assumptions. The wrap_socket trick doesn't work with IOCP, because the I/O operations are initiated within the SSL layer, and therefore can't be associated with a completion port, so they won't cause a queued completion status trigger and therefore won't wake up the loop. This plagued us for many years within Twisted and has only relatively recently been fixed: <http://tm.tl/593>. Since probably 99% of the people on this list don't actually give a crap about Windows, let me give a more practical example: you can't do SSL over a UNIX pipe. Off the top of my head, this means you can't write a command-line tool to encrypt a connection via a shell pipeline, but there are many other cases where you'd expect to be able to get arbitrary I/O over stdout. It's reasonable, of course, for lots of Python applications to not care about high-performance, high-concurrency SSL on Windows,; select() works okay for many applications on Windows. And most SSL happens on sockets, not pipes, hence the existence of the OpenSSL API that the stdlib ssl module exposes for wrapping sockets. But, as I'll explain in a moment, this is one reason that it's important to be able to give your code a turbo boost with Twisted (or other third-party extensions) once you start encountering problems like this.
I don't particularly care about the exact abstractions in this module; they are convenient and I was surprised how easy it was to add SSL, but still these mostly serve as somewhat realistic examples of how to use scheduling.py.
This is where I think we really differ. I think that the whole attempt to build a coroutine scheduler at the low level is somewhat misguided and will encourage people to write misleading, sloppy, incorrect programs that will be tricky to debug (although, to be fair, not quite as tricky as even more misleading/sloppy/incorrect multi-threaded ones). However, I'm more than happy to agree to disagree on this point: clearly you think that forests of yielding coroutines are a big part of the future of Python. Maybe you're even right to do so, since I have no interest in adding language features, whereas if you hit a rough edge in 'yield' syntax you can sand it off rather than living with it. I will readily concede that 'yield from' and 'return' are nicer than the somewhat ad-hoc idioms we ended up having to contend with in the current iteration of @inlineCallbacks. (Except for the exit-at-a-distance problem, which it doesn't seem that return->StopIteration addresses - does this happen, with PEP-380 generators? <http://twistedmatrix.com/trac/ticket/4157>) What I'm not happy to disagree about is the importance of a good I/O abstraction and interoperation layer. Twisted is not going away; there are oodles of good reasons that it's built the way it is, as I've tried to describe in this and other messages, and none of our plans for its future involve putting coroutine trampolines at the core of the event loop; those are just fine over on the side with inlineCallbacks. However, lots of Python programmers are going to use what you come up with. They'd use it even if it didn't really work, just because it's bundled in and it's convenient. But I think it'll probably work fine for many tasks, and it will appeal to lots of people new to event-driven I/O because of the seductive deception of synchronous control flow and the superiority to scheduling I/O operations with threads. What I think is really very important in the design of this new system is to present an API whereby: if someone wants to write a basic protocol or data-format parser for the stdlib, it should be easy to write it as a feed parser without needing generator coroutines (for example, if they're pushing data into a C library, they shouldn't have to write a while loop that calls recv, they should be able to just transform some data callback into Python into some data callback in C; it should be able to leverage tulip without much more work, if users of tulip (read; the stdlib) need access to some functionality implemented within Twisted, like an event-driven DNS client that is more scalable than getaddrinfo, they can call into Twisted without re-writing their entire program, if users of Twisted need to invoke some functionality implemented on top of tulip, they can construct a task and weave in a scheduler, similarly without re-writing much, if users of tulip want to just use Twisted to get better performance or reliability than the built-in stdlib multiplexor, they ideally shouldn't have to change anything, just run it with a different import line or something, and if (when) users of tulip realize that their generators have devolved into a mess of spaghetti ;-) and they need to migrate to Twisted-style event-driven callbacks and maybe some formal state machines or generated parsers to deal with their inputs, that process can be done incrementally and not in one giant shoot-the-moon effort which will make them hate Twisted. As an added bonus, such an API would provide a great basis for Tornado and Twisted to interoperate. It would also be nice to have a more discrete I/O layer to insulate application code from common foibles like the fact that, for example, if you call send() in tulip multiple times but forget to 'yield from ...send()', you may end up writing interleaved garbage on the connection, then raising an assertion error, but only if there's a sufficient quantity of data and it needs to block; it will otherwise appear to work, leading to bugs that only start happening when you are pushing large volumes of data through a system at rates exceeding wire speed. In other words, "only in production, only during the holiday season, only during traffic spikes, only when it's really really important for the system to keep working". This is why I think that step 1 here needs to be a common low-level API for event-triggered operations that does not have anything to do with generators. I don't want to stop you from doing interesting things with generators, but I do really want to decouple the tasks so that their responsibilities are not unnecessarily conflated. task.unblock() is a method; protocol.data_received is a method. Both can be invoked at the same level by an event loop. Once that low-level event loop is delivering data to that callback's satisfaction, the callbacks can happily drive a coroutine scheduler, and the coroutine scheduler can have much less of a deep integration with the I/O itself; it just needs some kind of sentinel object (a Future, a Deferred) to keep track of what exactly it's waiting for.
I'm most interested in feedback on the design of polling.py and scheduling.py, and to a lesser extent on the design of sockets.py; main.py is just an example of how this style works out in practice.
It looks to me like there's a design error in scheduling.py with respect to coordinating concurrent operations. If you try to block on two operations at once, you'll get an assertion error ('assert not self.blocked', in block), so you can't coordinate two interesting I/O requests without spawning a bunch of new Tasks and then having them unblock their parent Task when they're done. I may just be failing to imagine how one would implement something like Twisted's gatherResults, but this looks like it would be frustrating, tedious, and involve creating lots of extra objects and making the scheduler do a bunch more work. Also, shouldn't there be a lot more real exceptions and a lot fewer assertions in this code? Relatedly, add_reader/writer will silently stomp on a previous FD registration, so if two tasks end up calling recv() on the same socket, it doesn't look like there's any way to find out that they both did that. It looks like the first task to call it will just hang forever, and the second one will "win"? What are the intended semantics? Speaking from the perspective of I/O scheduling, it will also be thrashing any stateful multiplexor with a ton of unnecessary syscalls. A Twisted protocol in normal operation just receiving data from a single connection, using, let's say, a kqueue-based multiplexor will call kevent() once to register interest, then kqueue() to block, and then just keep getting data-available notifications and processing them unless some downstream buffer fills up and the transport is told to pause producing data, at which point another kevent() gets issued. tulip, by contrast, will call kevent() over and over again, removing and then re-adding its reader repeatedly for every packet, since it can never know if someone is about to call recv() again any time soon. Once again, request/response is not the best model for retrieving data from a transport; active connections need to be prepared to receive more data at any time and not in response to any particular request. Finally, apologies for spelling / grammar errors; I didn't have a lot of time to copy-edit. -glyph
It's been a week, and nobody has responded to Glyph's email. I don't think I know enough to agree or disagree with what he said, but it was well-written and it looked important. Also, Glyph has a lot of experience with this sort of thing, and it would be a shame if he was discouraged by the lack of response. We can't really expect people to contribute if their opinions are ignored. Can relevant people please take another look at his post? -- Devin On Wed, Oct 31, 2012 at 6:10 AM, Glyph <glyph@twistedmatrix.com> wrote:
Finally getting around to this one...
I am sorry if I'm repeating any criticism that has already been rehashed in this thread. There is really a deluge of mail here and I can't keep up with it. I've skimmed some of it and avoided or noted things that I did see mentioned, but I figured I should write up something before next week.
To make a long story short, my main points here are:
I think tulip unfortunately has a lot of the problems I tried to describe in earlier messages, it would be really great if we could have a core I/O interface that we could use for interoperability with Twisted before bolting a requirement for coroutine trampolines on to everything, twisted-style protocol/transport separation is really important and this should not neglect it. As I've tried to illustrate in previous messages, an API where applications have to call send() or recv() is just not going to behave intuitively in edge cases or perform well, I know it's a prototype, but this isn't such an unexplored area that it should be developed without TDD: all this code should both have tests and provide testing support to show how applications that use it can be tested the scheduler module needs some example implementation of something like Twisted's gatherResults for me to critique its expressiveness; it looks like it might be missing something in the area of one task coordinating multiple others but I can't tell
On Oct 28, 2012, at 4:52 PM, Guido van Rossum <guido at python.org> wrote:
The pollster has a very simple API: add_reader(fd, callback, *args),
add_writer(<ditto>), remove_reader(fd), remove_writer(fd), and poll(timeout) -> list of events. (fd means file descriptor.) There's also pollable() which just checks if there are any fds registered. My implementation requires fd to be an int, but that could easily be extended to support other types of event sources.
I don't see how that is. All of the mechanisms I would leverage within Twisted to support other event sources are missing (e.g.: abstract interfaces for those event sources). Are you saying that a totally different pollster could just accept a different type to add_reader, and not an integer? If so, how would application code know how to construct something else.
I'm not super happy that I have parallel reader/writer APIs, but passing a separate read/write flag didn't come out any more elegant, and I don't foresee other operation types (though I may be wrong).
add_reader and add_writer is an important internal layer of the API for UNIX-like operating systems, but the design here is fundamentally flawed in that application code (e.g. echosvr.py) needs to import concrete socket-handling classes like SocketTransport and BufferedReader in order to synthesize a transport. These classes might need to vary their behavior significantly between platforms, and application code should not be manipulating them unless there is a serious low-level need to.
It looks like you've already addressed the fact that some transports need to be platform-specific. That's not quite accurate, unless you take a very broad definition of "platform". In Twisted, the basic socket-based TCP transport is actually supported across all platforms; but some other *APIs* (well, let's be honest, right now, just IOCP, but there have been others, such as java's native I/O APIs under Jython, in the past).
You have to ask the "pollster" (by which I mean: reactor) for transport objects, because different multiplexing mechanisms can require different I/O APIs, even for basic socket I/O. This is why I keep talking about IOCP. It's not that Windows is particularly great, but that the IOCP API, if used correctly, is fairly alien, and is a good proxy for other use-cases which are less direct to explain, like interacting with GUI libraries where you need to interact with the GUI's notion of a socket to get notifications, rather than a raw FD. (GUI libraries often do this because they have to support Windows and therefore IOCP.) Others in this thread have already mentioned the fact that ZeroMQ requires the same sort of affordance. This is really a design error on 0MQ's part, but, you have to deal with it anyway ;-).
More importantly, concretely tying everything to sockets is just bad design. You want to be able to operate on pipes and PTYs (which need to call read(), or, a bunch of gross ioctl()s and then read(), not recv()). You want to be able to able to operate on these things in unit tests without involving any actual file descriptors or syscalls. The higher level of abstraction makes regular application code a lot shorter, too: I was able to compress echosvr.py down to 22 lines by removing all the comments and logging and such, but that is still more than twice as long as the (9 line) echo server example on the front page of <http://twistedmatrix.com/trac/>. It's closer in length to the (19 line) full line-based publish/subscribe protocol over on the third tab.
Also, what about testing? You want to be able to simulate the order of responses of multiple syscalls to coerce your event-driven program to receive its events in different orders. One of the big advantages of event driven programming is that everything's just a method call, so your unit tests can just call the methods to deliver data to your program and see what it does, without needing to have a large, elaborate simulation edifice to pretend to be a socket. But, once you mix in the magic of the generator trampoline, it's somewhat hard to assemble your own working environment without some kind of test event source; at least, it's not clear to me how to assemble a Task without having a pollster anywhere, or how to make my own basic pollster for testing.
The event loop has two basic ways to register callbacks: call_soon(callback, *args) causes callback(*args) to be called the next time the event loop runs; call_later(delay, callback, *args) schedules a callback at some time (relative or absolute) in the future.
"relative or absolute" is hiding the whole monotonic-clocks discussion behind a simple phrase, but that probably does not need to be resolved here... I'll let you know if we ever figure it out :).
sockets.py: http://code.google.com/p/tulip/source/browse/sockets.py
This implements some internet primitives using the APIs in scheduling.py (including block_r() and block_w()). I call them transports but they are different from transports Twisted; they are closer to idealized sockets. SocketTransport wraps a plain socket, offering recv() and send() methods that must be invoked using yield from.
I feel I should note that these methods behave inconsistently; send() behaves as sendall(), re-trying its writes until it receives a full buffer, but recv() may yield a short read.
(But most importantly, block_r and block_w are insufficient as primitives; you need a separate pollster that uses write_then_block(data) and read_then_block() too, which may need to dispatch to WSASend/WSARecv or WriteFile/ReadFile.)
SslTransport wraps an ssl socket (luckily in Python 2.6 and up, stdlib ssl sockets have good async support!).
stdlib ssl sockets have async support that makes a number of UNIX-y assumptions. The wrap_socket trick doesn't work with IOCP, because the I/O operations are initiated within the SSL layer, and therefore can't be associated with a completion port, so they won't cause a queued completion status trigger and therefore won't wake up the loop. This plagued us for many years within Twisted and has only relatively recently been fixed: <http://tm.tl/593>.
Since probably 99% of the people on this list don't actually give a crap about Windows, let me give a more practical example: you can't do SSL over a UNIX pipe. Off the top of my head, this means you can't write a command-line tool to encrypt a connection via a shell pipeline, but there are many other cases where you'd expect to be able to get arbitrary I/O over stdout.
It's reasonable, of course, for lots of Python applications to not care about high-performance, high-concurrency SSL on Windows,; select() works okay for many applications on Windows. And most SSL happens on sockets, not pipes, hence the existence of the OpenSSL API that the stdlib ssl module exposes for wrapping sockets. But, as I'll explain in a moment, this is one reason that it's important to be able to give your code a turbo boost with Twisted (or other third-party extensions) once you start encountering problems like this.
I don't particularly care about the exact abstractions in this module; they are convenient and I was surprised how easy it was to add SSL, but still these mostly serve as somewhat realistic examples of how to use scheduling.py.
This is where I think we really differ.
I think that the whole attempt to build a coroutine scheduler at the low level is somewhat misguided and will encourage people to write misleading, sloppy, incorrect programs that will be tricky to debug (although, to be fair, not quite as tricky as even more misleading/sloppy/incorrect multi-threaded ones). However, I'm more than happy to agree to disagree on this point: clearly you think that forests of yielding coroutines are a big part of the future of Python. Maybe you're even right to do so, since I have no interest in adding language features, whereas if you hit a rough edge in 'yield' syntax you can sand it off rather than living with it. I will readily concede that 'yield from' and 'return' are nicer than the somewhat ad-hoc idioms we ended up having to contend with in the current iteration of @inlineCallbacks. (Except for the exit-at-a-distance problem, which it doesn't seem that return->StopIteration addresses - does this happen, with PEP-380 generators? <http://twistedmatrix.com/trac/ticket/4157>)
What I'm not happy to disagree about is the importance of a good I/O abstraction and interoperation layer.
Twisted is not going away; there are oodles of good reasons that it's built the way it is, as I've tried to describe in this and other messages, and none of our plans for its future involve putting coroutine trampolines at the core of the event loop; those are just fine over on the side with inlineCallbacks. However, lots of Python programmers are going to use what you come up with. They'd use it even if it didn't really work, just because it's bundled in and it's convenient. But I think it'll probably work fine for many tasks, and it will appeal to lots of people new to event-driven I/O because of the seductive deception of synchronous control flow and the superiority to scheduling I/O operations with threads.
What I think is really very important in the design of this new system is to present an API whereby:
if someone wants to write a basic protocol or data-format parser for the stdlib, it should be easy to write it as a feed parser without needing generator coroutines (for example, if they're pushing data into a C library, they shouldn't have to write a while loop that calls recv, they should be able to just transform some data callback into Python into some data callback in C; it should be able to leverage tulip without much more work, if users of tulip (read; the stdlib) need access to some functionality implemented within Twisted, like an event-driven DNS client that is more scalable than getaddrinfo, they can call into Twisted without re-writing their entire program, if users of Twisted need to invoke some functionality implemented on top of tulip, they can construct a task and weave in a scheduler, similarly without re-writing much, if users of tulip want to just use Twisted to get better performance or reliability than the built-in stdlib multiplexor, they ideally shouldn't have to change anything, just run it with a different import line or something, and if (when) users of tulip realize that their generators have devolved into a mess of spaghetti ;-) and they need to migrate to Twisted-style event-driven callbacks and maybe some formal state machines or generated parsers to deal with their inputs, that process can be done incrementally and not in one giant shoot-the-moon effort which will make them hate Twisted.
As an added bonus, such an API would provide a great basis for Tornado and Twisted to interoperate.
It would also be nice to have a more discrete I/O layer to insulate application code from common foibles like the fact that, for example, if you call send() in tulip multiple times but forget to 'yield from ...send()', you may end up writing interleaved garbage on the connection, then raising an assertion error, but only if there's a sufficient quantity of data and it needs to block; it will otherwise appear to work, leading to bugs that only start happening when you are pushing large volumes of data through a system at rates exceeding wire speed. In other words, "only in production, only during the holiday season, only during traffic spikes, only when it's really really important for the system to keep working".
This is why I think that step 1 here needs to be a common low-level API for event-triggered operations that does not have anything to do with generators. I don't want to stop you from doing interesting things with generators, but I do really want to decouple the tasks so that their responsibilities are not unnecessarily conflated.
task.unblock() is a method; protocol.data_received is a method. Both can be invoked at the same level by an event loop. Once that low-level event loop is delivering data to that callback's satisfaction, the callbacks can happily drive a coroutine scheduler, and the coroutine scheduler can have much less of a deep integration with the I/O itself; it just needs some kind of sentinel object (a Future, a Deferred) to keep track of what exactly it's waiting for.
I'm most interested in feedback on the design of polling.py and scheduling.py, and to a lesser extent on the design of sockets.py; main.py is just an example of how this style works out in practice.
It looks to me like there's a design error in scheduling.py with respect to coordinating concurrent operations. If you try to block on two operations at once, you'll get an assertion error ('assert not self.blocked', in block), so you can't coordinate two interesting I/O requests without spawning a bunch of new Tasks and then having them unblock their parent Task when they're done. I may just be failing to imagine how one would implement something like Twisted's gatherResults, but this looks like it would be frustrating, tedious, and involve creating lots of extra objects and making the scheduler do a bunch more work.
Also, shouldn't there be a lot more real exceptions and a lot fewer assertions in this code?
Relatedly, add_reader/writer will silently stomp on a previous FD registration, so if two tasks end up calling recv() on the same socket, it doesn't look like there's any way to find out that they both did that. It looks like the first task to call it will just hang forever, and the second one will "win"? What are the intended semantics?
Speaking from the perspective of I/O scheduling, it will also be thrashing any stateful multiplexor with a ton of unnecessary syscalls. A Twisted protocol in normal operation just receiving data from a single connection, using, let's say, a kqueue-based multiplexor will call kevent() once to register interest, then kqueue() to block, and then just keep getting data-available notifications and processing them unless some downstream buffer fills up and the transport is told to pause producing data, at which point another kevent() gets issued. tulip, by contrast, will call kevent() over and over again, removing and then re-adding its reader repeatedly for every packet, since it can never know if someone is about to call recv() again any time soon. Once again, request/response is not the best model for retrieving data from a transport; active connections need to be prepared to receive more data at any time and not in response to any particular request.
Finally, apologies for spelling / grammar errors; I didn't have a lot of time to copy-edit.
-glyph
_______________________________________________ Python-ideas mailing list Python-ideas@python.org http://mail.python.org/mailman/listinfo/python-ideas
Glyph and three other Twisted developers visited me yesterday. All is well. We're behind in reporting -- I have a variety of trips and other activities coming up, but I am still very much planning to act on what we discussed. (And no, they didn't convince me to add Twisted to the stdlib. :-) --Guido On Wed, Nov 7, 2012 at 1:11 AM, Devin Jeanpierre <jeanpierreda@gmail.com>wrote:
It's been a week, and nobody has responded to Glyph's email. I don't think I know enough to agree or disagree with what he said, but it was well-written and it looked important. Also, Glyph has a lot of experience with this sort of thing, and it would be a shame if he was discouraged by the lack of response. We can't really expect people to contribute if their opinions are ignored.
Can relevant people please take another look at his post?
-- Devin
Finally getting around to this one...
I am sorry if I'm repeating any criticism that has already been rehashed in this thread. There is really a deluge of mail here and I can't keep up with it. I've skimmed some of it and avoided or noted things that I did see mentioned, but I figured I should write up something before next week.
To make a long story short, my main points here are:
I think tulip unfortunately has a lot of the problems I tried to describe in earlier messages, it would be really great if we could have a core I/O interface that we could use for interoperability with Twisted before bolting a requirement for coroutine trampolines on to everything, twisted-style protocol/transport separation is really important and this should not neglect it. As I've tried to illustrate in previous messages, an API where applications have to call send() or recv() is just not going to behave intuitively in edge cases or perform well, I know it's a prototype, but this isn't such an unexplored area that it should be developed without TDD: all this code should both have tests and provide testing support to show how applications that use it can be tested the scheduler module needs some example implementation of something like Twisted's gatherResults for me to critique its expressiveness; it looks
it might be missing something in the area of one task coordinating multiple others but I can't tell
On Oct 28, 2012, at 4:52 PM, Guido van Rossum <guido at python.org> wrote:
The pollster has a very simple API: add_reader(fd, callback, *args),
add_writer(<ditto>), remove_reader(fd), remove_writer(fd), and poll(timeout) -> list of events. (fd means file descriptor.) There's also pollable() which just checks if there are any fds registered. My implementation requires fd to be an int, but that could easily be extended to support other types of event sources.
I don't see how that is. All of the mechanisms I would leverage within Twisted to support other event sources are missing (e.g.: abstract interfaces for those event sources). Are you saying that a totally different pollster could just accept a different type to add_reader, and not an integer? If so, how would application code know how to construct something else.
I'm not super happy that I have parallel reader/writer APIs, but passing a separate read/write flag didn't come out any more elegant, and I don't foresee other operation types (though I may be wrong).
add_reader and add_writer is an important internal layer of the API for UNIX-like operating systems, but the design here is fundamentally flawed in that application code (e.g. echosvr.py) needs to import concrete socket-handling classes like SocketTransport and BufferedReader in order to synthesize a transport. These classes might need to vary their behavior significantly between platforms, and application code should not be manipulating them unless there is a serious low-level need to.
It looks like you've already addressed the fact that some transports need to be platform-specific. That's not quite accurate, unless you take a very broad definition of "platform". In Twisted, the basic socket-based TCP transport is actually supported across all platforms; but some other *APIs* (well, let's be honest, right now, just IOCP, but there have been others, such as java's native I/O APIs under Jython, in the past).
You have to ask the "pollster" (by which I mean: reactor) for transport objects, because different multiplexing mechanisms can require different I/O APIs, even for basic socket I/O. This is why I keep talking about IOCP. It's not that Windows is particularly great, but that the IOCP API, if used correctly, is fairly alien, and is a good proxy for other use-cases which are less direct to explain, like interacting with GUI libraries where you need to interact with the GUI's notion of a socket to get notifications, rather than a raw FD. (GUI libraries often do this because they have to support Windows and therefore IOCP.) Others in this thread have already mentioned the fact that ZeroMQ requires the same sort of affordance. This is really a design error on 0MQ's part, but, you have to deal with it anyway ;-).
More importantly, concretely tying everything to sockets is just bad design. You want to be able to operate on pipes and PTYs (which need to call read(), or, a bunch of gross ioctl()s and then read(), not recv()). You want to be able to able to operate on these things in unit tests without involving any actual file descriptors or syscalls. The higher level of abstraction makes regular application code a lot shorter, too: I was able to compress echosvr.py down to 22 lines by removing all the comments and logging and such, but that is still more than twice as long as the (9 line) echo server example on the front page of <http://twistedmatrix.com/trac/>. It's closer in length to the (19 line) full line-based publish/subscribe protocol over on the third tab.
Also, what about testing? You want to be able to simulate the order of responses of multiple syscalls to coerce your event-driven program to receive its events in different orders. One of the big advantages of event driven programming is that everything's just a method call, so your unit tests can just call the methods to deliver data to your program and see what it does, without needing to have a large, elaborate simulation edifice to pretend to be a socket. But, once you mix in the magic of the generator trampoline, it's somewhat hard to assemble your own working environment without some kind of test event source; at least, it's not clear to me how to assemble a Task without having a pollster anywhere, or how to make my own basic pollster for testing.
The event loop has two basic ways to register callbacks: call_soon(callback, *args) causes callback(*args) to be called the next time the event loop runs; call_later(delay, callback, *args) schedules a callback at some time (relative or absolute) in the future.
"relative or absolute" is hiding the whole monotonic-clocks discussion behind a simple phrase, but that probably does not need to be resolved here... I'll let you know if we ever figure it out :).
sockets.py: http://code.google.com/p/tulip/source/browse/sockets.py
This implements some internet primitives using the APIs in scheduling.py (including block_r() and block_w()). I call them transports but they are different from transports Twisted; they are closer to idealized sockets. SocketTransport wraps a plain socket, offering recv() and send() methods that must be invoked using yield from.
I feel I should note that these methods behave inconsistently; send() behaves as sendall(), re-trying its writes until it receives a full buffer, but recv() may yield a short read.
(But most importantly, block_r and block_w are insufficient as
you need a separate pollster that uses write_then_block(data) and read_then_block() too, which may need to dispatch to WSASend/WSARecv or WriteFile/ReadFile.)
SslTransport wraps an ssl socket (luckily in Python 2.6 and up, stdlib ssl sockets have good async support!).
stdlib ssl sockets have async support that makes a number of UNIX-y assumptions. The wrap_socket trick doesn't work with IOCP, because the I/O operations are initiated within the SSL layer, and therefore can't be associated with a completion port, so they won't cause a queued completion status trigger and therefore won't wake up the loop. This plagued us for many years within Twisted and has only relatively recently been fixed: <http://tm.tl/593>.
Since probably 99% of the people on this list don't actually give a crap about Windows, let me give a more practical example: you can't do SSL over a UNIX pipe. Off the top of my head, this means you can't write a command-line tool to encrypt a connection via a shell pipeline, but there are many other cases where you'd expect to be able to get arbitrary I/O over stdout.
It's reasonable, of course, for lots of Python applications to not care about high-performance, high-concurrency SSL on Windows,; select() works okay for many applications on Windows. And most SSL happens on sockets, not pipes, hence the existence of the OpenSSL API that the stdlib ssl module exposes for wrapping sockets. But, as I'll explain in a moment, this is one reason that it's important to be able to give your code a turbo boost with Twisted (or other third-party extensions) once you start encountering problems like this.
I don't particularly care about the exact abstractions in this module; they are convenient and I was surprised how easy it was to add SSL, but still these mostly serve as somewhat realistic examples of how to use scheduling.py.
This is where I think we really differ.
I think that the whole attempt to build a coroutine scheduler at the low level is somewhat misguided and will encourage people to write misleading, sloppy, incorrect programs that will be tricky to debug (although, to be fair, not quite as tricky as even more misleading/sloppy/incorrect multi-threaded ones). However, I'm more than happy to agree to disagree on this point: clearly you think that forests of yielding coroutines are a big part of the future of Python. Maybe you're even right to do so, since I have no interest in adding language features, whereas if you hit a rough edge in 'yield' syntax you can sand it off rather than living with it. I will readily concede that 'yield from' and 'return' are nicer than the somewhat ad-hoc idioms we ended up having to contend with in the current iteration of @inlineCallbacks. (Except for the exit-at-a-distance
which it doesn't seem that return->StopIteration addresses - does this happen, with PEP-380 generators? <http://twistedmatrix.com/trac/ticket/4157>)
What I'm not happy to disagree about is the importance of a good I/O abstraction and interoperation layer.
Twisted is not going away; there are oodles of good reasons that it's built the way it is, as I've tried to describe in this and other messages, and none of our plans for its future involve putting coroutine trampolines at the core of the event loop; those are just fine over on the side with inlineCallbacks. However, lots of Python programmers are going to use what you come up with. They'd use it even if it didn't really work, just because it's bundled in and it's convenient. But I think it'll probably work fine for many tasks, and it will appeal to lots of people new to event-driven I/O because of the seductive deception of synchronous control flow and the superiority to scheduling I/O operations with threads.
What I think is really very important in the design of this new system is to present an API whereby:
if someone wants to write a basic protocol or data-format parser for the stdlib, it should be easy to write it as a feed parser without needing generator coroutines (for example, if they're pushing data into a C
they shouldn't have to write a while loop that calls recv, they should be able to just transform some data callback into Python into some data callback in C; it should be able to leverage tulip without much more work, if users of tulip (read; the stdlib) need access to some functionality implemented within Twisted, like an event-driven DNS client that is more scalable than getaddrinfo, they can call into Twisted without re-writing their entire program, if users of Twisted need to invoke some functionality implemented on top of tulip, they can construct a task and weave in a scheduler, similarly without re-writing much, if users of tulip want to just use Twisted to get better performance or reliability than the built-in stdlib multiplexor, they ideally shouldn't have to change anything, just run it with a different import line or something, and if (when) users of tulip realize that their generators have devolved into a mess of spaghetti ;-) and they need to migrate to Twisted-style event-driven callbacks and maybe some formal state machines or generated parsers to deal with their inputs, that process can be done incrementally and not in one giant shoot-the-moon effort which will make them hate Twisted.
As an added bonus, such an API would provide a great basis for Tornado and Twisted to interoperate.
It would also be nice to have a more discrete I/O layer to insulate application code from common foibles like the fact that, for example, if you call send() in tulip multiple times but forget to 'yield from ...send()', you may end up writing interleaved garbage on the connection, then raising an assertion error, but only if there's a sufficient quantity of data and it needs to block; it will otherwise appear to work, leading to bugs that only start happening when you are pushing large volumes of data through a system at rates exceeding wire speed. In other words, "only in production, only during the holiday season, only during traffic spikes, only when it's really really important for the system to keep working".
This is why I think that step 1 here needs to be a common low-level API for event-triggered operations that does not have anything to do with generators. I don't want to stop you from doing interesting things with generators, but I do really want to decouple the tasks so that their responsibilities are not unnecessarily conflated.
task.unblock() is a method; protocol.data_received is a method. Both can be invoked at the same level by an event loop. Once that low-level event loop is delivering data to that callback's satisfaction, the callbacks can happily drive a coroutine scheduler, and the coroutine scheduler can have much less of a deep integration with the I/O itself; it just needs some kind of sentinel object (a Future, a Deferred) to keep track of what exactly it's waiting for.
I'm most interested in feedback on the design of polling.py and scheduling.py, and to a lesser extent on the design of sockets.py; main.py is just an example of how this style works out in practice.
It looks to me like there's a design error in scheduling.py with respect to coordinating concurrent operations. If you try to block on two operations at once, you'll get an assertion error ('assert not self.blocked', in block), so you can't coordinate two interesting I/O requests without spawning a bunch of new Tasks and then having them unblock their parent Task when they're done. I may just be failing to imagine how one would implement something like Twisted's gatherResults, but this looks like it would be frustrating, tedious, and involve creating lots of extra objects and making the scheduler do a bunch more work.
Also, shouldn't there be a lot more real exceptions and a lot fewer assertions in this code?
Relatedly, add_reader/writer will silently stomp on a previous FD registration, so if two tasks end up calling recv() on the same socket, it doesn't look like there's any way to find out that they both did that. It looks like the first task to call it will just hang forever, and the second one will "win"? What are the intended semantics?
Speaking from the perspective of I/O scheduling, it will also be
On Wed, Oct 31, 2012 at 6:10 AM, Glyph <glyph@twistedmatrix.com> wrote: like primitives; problem, library, thrashing
any stateful multiplexor with a ton of unnecessary syscalls. A Twisted protocol in normal operation just receiving data from a single connection, using, let's say, a kqueue-based multiplexor will call kevent() once to register interest, then kqueue() to block, and then just keep getting data-available notifications and processing them unless some downstream buffer fills up and the transport is told to pause producing data, at which point another kevent() gets issued. tulip, by contrast, will call kevent() over and over again, removing and then re-adding its reader repeatedly for every packet, since it can never know if someone is about to call recv() again any time soon. Once again, request/response is not the best model for retrieving data from a transport; active connections need to be prepared to receive more data at any time and not in response to any particular request.
Finally, apologies for spelling / grammar errors; I didn't have a lot of time to copy-edit.
-glyph
_______________________________________________ Python-ideas mailing list Python-ideas@python.org http://mail.python.org/mailman/listinfo/python-ideas
-- --Guido van Rossum (python.org/~guido)
participants (20)
-
Andrew Svetlov
-
Antoine Pitrou
-
Cesare Di Mauro
-
Devin Jeanpierre
-
Don Spaulding
-
Giampaolo Rodolà
-
Glyph
-
Greg Ewing
-
Guido van Rossum
-
Jakob Bowyer
-
Kristján Valur Jónsson
-
Laurens Van Houtven
-
Mark Hackett
-
Nick Coghlan
-
Paul Colomiets
-
Rene Nejsum
-
Richard Oudkerk
-
Steve Dower
-
Terry Reedy
-
Yury Selivanov