On Tue, Oct 30, 2012 at 3:12 AM, Laurens Van Houtven <_@lvh.cc> wrote:
I've been following the PEP380-related threads and I've reviewed this stuff, while trying to do the protocols/transports PEP, and trying to glue the two together.
Thanks! I know it can't be easy to keep up with all the threads (and now code repos).
The biggest difference I can see is that protocols as they've been discussed are "pull": they get called when some data arrives. They don't know how much data there is; they just get told "here's some data". The obvious difference with the API in, eg:
https://code.google.com/p/tulip/source/browse/sockets.py#56
... is that now I have to tell a socket to read n bytes, which "blocks" the coroutine, then I get some data.
Yes. But do note that sockets.py is mostly a throw-away example written to support the only style I am familiar with -- synchronous reads and writes. My point in writing this particular set of transports is that I want to take existing synchronous code (e.g. a threaded server built using the stdlib's socketserver.ThreadingTCPServer class) and make minimal changes to the protocol logic to support async operation -- those minimal changes should boil down to using a different way to set up a connection or a listening socket or constructing a stream from a socket, and putting "yield from" in front of the blocking operations (recv(), send(), and the read/readline/write operations on the streams. I'm still looking for guidance from Twisted and Tornado (and you!) to come up with better abstractions for transports and protocols. The underlying event loop *does* support a style where an object registers a callback function once which is called repeatedly, as long as the socket is readable (or writable, depending on the registration call).
Now, there doesn't have to be an issue; you could simply say:
data = yield from s.recv(4096) # that's the magic number usually right proto.data_received(4096)
(Off-topic: ages ago I determined that the optimal block size is actually 8192. But for all I know it is 256K these days. :-)
It seems a bit boilerplatey, but I suppose that eventually could be hidden away.
But this style is pervasive, for example that's how reading by lines works:
Right -- again, this is all geared towards making it palatable for people used to write synchronous code (either single-threaded or multi-threaded), not for people used to Twisted.
While I'm not a big fan (I may be convinced if I see a protocol test that looks nice);
Check out urlfetch() in main.py: http://code.google.com/p/tulip/source/browse/main.py#39 For sure, this isn't "pretty" and it should be rewritten using more abstraction -- I only wrote the entire thing as a single function because I was focused on the scheduler and event loop. And it is clearly missing a buffering layer for writing (it currently uses a separate send() call for each line of the HTTP headers, blech). But it implements a fairly complex (?) protocol and it performs well enough.
I'm just wondering if there's any point in trying to write the pull-style protocols when this works quite differently.
Perhaps you could try to write some pull-style transports and protocols for tulip to see if anything's missing from the scheduler and eventloop APIs or implementations? I'd be happy to rename sockets.py to push_sockets.py so there's room for a competing pull_sockets.py, and then we can compare apples to apples. (Unlike the yield vs. yield-from issue, where I am very biased, I am not biased about push vs. pull style. I just coded up what I was most familiar with first.)
Additionally, I'm not sure if readline belongs on the socket.
It isn't -- it is on the BufferedReader, which wraps around the socket (or other socket-like transport, like SSL). This is similar to the way the stdlib socket.socket class has a makefile() method that returns a stream wrapping the socket.
I understand the simile with files, though.
Right, that's where I've gotten most of my inspiration. I figure they are a good model to lure unsuspecting regular Python users in. :-)
With the coroutine style I could see how the most obvious fit would be something like tornado's read_until, or an as_lines that essentially calls read_until repeatedly. Can the delimiter for this be modified?
You can write your own BufferedReader, and if this is a common pattern we can make it a standard API. Unlike the SocketTransport and SslTransport classes, which contain various I/O hacks and integrate tightly with the polling capability of the eventloop, I consider BufferedReader plain user code. Antoine also hinted that with not too many changes we could reuse the existing buffering classes in the stdlib io module, which are implemented in C.
My main syntactic gripe is that when I write @inlineCallbacks code or monocle code or whatever, when I say "yield" I'm yielding to the reactor. That makes sense to me (I realize natural language arguments don't always make sense in a programming language context). "yield from" less so (but okay, that's what it has to look like). But this just seems weird to me:
yield from trans.send(line.upper())
Not only do I not understand why I'm yielding there in the first place (I don't have to wait for anything, I just want to push some data out!), it feels like all of my yields have been replaced with yield froms for no obvious reason (well, there are reasons, I'm just trying to look at this naively).
Are you talking about yield vs. yield-from here, or about the need to suspend every write? Regarding yield vs. yield-from, please squint and get used to seeing yield-from everywhere -- the scheduler implementation becomes *much* simpler and *much* faster using yield-from, so much so that there really is no competition. As to why you would have to suspend each time you call send(), that's mostly just an artefact of the incomplete example -- I didn't implement a BufferedWriter yet. I also have some worries about a task producing data at a rate faster than the socket can drain it from the buffer, but in practice I would probably relent and implement a write() call that returns immediately and should *not* be used with yield-from. (Unfortunately you can't have a call that works with or without yield-from.) I think there's a throttling mechanism in Twisted that can probably be copied here.
I guess Twisted gets away with this because of deferred chaining: that one deferred might have tons of callbacks in the background, many of which also doing IO operations, resulting in a sequence of asynchronous operations that only at the end cause the generator to be run some more.
I guess that belongs in a different thread, though. Even, then, I'm not sure if I'm uncomfortable because I'm seeing something different from what I'm used to, or if my argument from English actually makes any sense whatsoever.
Speaking of protocol tests, what would those look like? How do I yell, say, "POST /blah HTTP/1.1\r\n" from a transport? Presumably I'd have a mock transport, and call the handler with that? (I realize it's early days to be thinking that far ahead; I'm just trying to figure out how I can contribute a good protocol definition to all of this).
Actually I think the ease of writing tests should definitely be taken into account when designing the APIs here. In the Zope world, Jim Fulton wrote a simple abstraction for networking code that explicitly provides for testing: http://packages.python.org/zc.ngi/ (it also supports yield-style callbacks, similar to Twisted's inlineCallbacks). I currently don't have any tests, apart from manually running main.py and checking its output. I am a bit hesitant to add unit tests in this early stage, because keeping the tests passing inevitably slows down the process of ripping apart the API and rebuilding it in a different way -- something I do at least once a day, whenever I get feedback or a clever thought strikes me or something annoying reaches my trigger level. But I should probably write at least *some* tests, I'm sure it will be enlightening and I will end up changing the APIs to make testing easier. It's in the TODO. -- --Guido van Rossum (python.org/~guido)