Re: [Twisted-web] Re: [Web-SIG] A more Twisted approach to async apps in WSGI
At 12:59 AM 10/7/04 -0400, James Y Knight wrote:
On Oct 5, 2004, at 2:37 AM, Phillip J. Eby wrote:
Although you probably want something more like a pipe error if the input times out or the connection is broken.
You normally only get pipe errors on writes, read just sees EOF.
But that does bring up a good point: How does the server notify the application that the client has gone away, and any further work is useless? - For non-async apps that use the iterator model: I think the server is allowed to just call iterable.close() and never iterate again.
Yes.
- For async applications, with the proposed API, that may not be an option, because the iterable returned is the special wrapper, not a user-created class. Although, actually, I guess the app can return its own iterable whose __iter__ calls through and returns the wrapper's __iter__.
Not if the server wants to be able to handle that iterable specially. But anyway, it seems that the wrapper's constructor should take a close method, or have a way to set one.
- What about for non-async applications that use the write callable? Should write be allowed to raise an exception? Or should it just become a no-op when the client is disconnected?
It's allowed to raise an exception, though this was never explicitly put in the spec; I'll have to fix that. The actual process for that scenario looks something like this: * app calls write() * write() raises error * app catches error (maybe) and calls start_response() with exc_info * start_response() reraises the error, because it has already sent headers to the client and can't restart the response * application error handler bombs out and returns to server/gateway * server/gateway logs the exception (maybe) and gets on with life in the big 'net
Hmm, yes. I totally missed the option of just yielding ''. Of course it's a very bad idea to repeatedly yield '' to a server if you don't know the server can properly handle it (by e.g. delaying longer and longer), but, in this case, since the server itself is providing the special iterable, that should be fine.
Yes. Also, when we finally settle on an async API, I do want to cover the issue of backing off iteration when empty strings are yielded. I'm actually inclined to suggest that an async application should take responsibility for doing the delaying if it's called repeatedly, and the async API isn't available.
It seems like it should be possible to make a generic class that implements this async API for use with sync servers that do not support it natively. That would allow async apps to run on a sync server without modification, which is potentially useful. To do that, though, I think the it'd have to spawn an extra thread per request that is waiting to read data, for the read() call to block on. Unless, of course, the app never needs to yield outgoing data while waiting for incoming data.
Well, with Twisted you could deferToThread the read() operations, though it's hard for me to think straight about that scenario because I keep finding it hard to imagine an async web app that isn't just written to the Twisted API to start with... ;)
The one remaining issue I have is the required thread-safeness of various APIs.
The spec doesn't mention much of anything about threadsafeness: is it ok to call wsgi methods from a different thread than the one the server originally called the request on? Especially interesting for implementing the above sync->async adapter: environ['wsgi.input'].read(x) would be called from a second thread.
Excellent question; I should add the answer to the spec, as soon as I decide precisely what it is. :) One point: the spec should absolutely forbid servers from using thread identity to identify the application/caller. The "what can you call while what else is executing" part of the question is a bit trickier.
What thread (if there's a choice) does the on_get callback get called on. Etc.
My inclination is to make threading issues symmetrical. That is, the application doesn't get any thread-identity guarantees either.
I haven't really thought about these thready questions much either, so maybe the answers are obvious, but in my experience, that's usually not the case when it comes to threads.
Yep. :) However, the more I think about it, the more it seems to me that WSGI should emulate single-threadedness with respect to any function/method/iterator invocations associated with a given application invocation. However, it is *not* guaranteed that all such invocations will occur from the same thread. Basically, it means "no multitasking with the other guy's objects", and puts the locking burdens on whoever's trying to mix multitasking into the works.
That's why async apps are nice. ;)
Not to mention fork(). :) By the way, after all this discussion... do you think it would be better to: 1) Push towards a full async API, nailing down all these loose ends 2) Use the simple-but-klugdy "pause iteration" API idea 3) Don't make an "official" async API, and just leave it open to server authors to create their own extensions, and maybe cherry pick the best ideas for WSGI 2.0, or 4) Do something else altogether?
On Oct 7, 2004, at 1:28 AM, Phillip J. Eby wrote:
- For async applications, with the proposed API, that may not be an option, because the iterable returned is the special wrapper, not a user-created class. Although, actually, I guess the app can return its own iterable whose __iter__ calls through and returns the wrapper's __iter__.
Not if the server wants to be able to handle that iterable specially. But anyway, it seems that the wrapper's constructor should take a close method, or have a way to set one.
As already discussed, the server cannot really expect to actually get the iterable back anyhow. But yes, I'd say either the init should take a close argument, or else the use of something like "wrapper.close = myCloseFunction" should be part of the API.
Hmm, yes. I totally missed the option of just yielding ''. Of course it's a very bad idea to repeatedly yield '' to a server if you don't know the server can properly handle it (by e.g. delaying longer and longer), but, in this case, since the server itself is providing the special iterable, that should be fine.
Yes. Also, when we finally settle on an async API, I do want to cover the issue of backing off iteration when empty strings are yielded. I'm actually inclined to suggest that an async application should take responsibility for doing the delaying if it's called repeatedly, and the async API isn't available.
If the async API isn't available, and I'm an async application, I would assume I'm running on a synch server, and thus am allowed to block the request thread indefinitely, and do so, waiting for a wakeup notification from the reactor loop. It doesn't seem to me that any iterator back-off behavior is needed, or desirable. I can fabricate an async wrapper that uses threads
It seems like it should be possible to make a generic class that implements this async API for use with sync servers that do not support it natively. That would allow async apps to run on a sync server without modification, which is potentially useful. To do that, though, I think the it'd have to spawn an extra thread per request that is waiting to read data, for the read() call to block on. Unless, of course, the app never needs to yield outgoing data while waiting for incoming data.
Well, with Twisted you could deferToThread the read() operations, though it's hard for me to think straight about that scenario because I keep finding it hard to imagine an async web app that isn't just written to the Twisted API to start with... ;)
Right -- but deferToThread'ing a read() operation is essentially the same as spawning an extra thread per request to read the data, just with nicer thread management.
[thread stuff]
I haven't really thought about these thready questions much either, so maybe the answers are obvious, but in my experience, that's usually not the case when it comes to threads.
Yep. :) However, the more I think about it, the more it seems to me that WSGI should emulate single-threadedness with respect to any function/method/iterator invocations associated with a given application invocation. However, it is *not* guaranteed that all such invocations will occur from the same thread.
Basically, it means "no multitasking with the other guy's objects", and puts the locking burdens on whoever's trying to mix multitasking into the works.
That does sound good. No multitasking means it's impossible to write a response while already waiting for incoming data. But actually I think it's probably fine for an async app running on a sync server to not be able to simultaneously read data and write data, so I take back anything about needing to call wsgi server methods from more than one thread. In the compat wrapper, calling on_get can just block writing until the read has occurred; in that case, all wsgi methods can be called from the server's request thread.
By the way, after all this discussion... do you think it would be better to:
1) Push towards a full async API, nailing down all these loose ends
2) Use the simple-but-klugdy "pause iteration" API idea
3) Don't make an "official" async API, and just leave it open to server authors to create their own extensions, and maybe cherry pick the best ideas for WSGI 2.0, or
4) Do something else altogether?
I think the API you've outlined sounds good. I can imagine ways to implement it both for an async server like twisted, and as a compatibility layer for an async-requiring application on a sync server. I think it's easier to make the compatibility layer with this API than with the pause/resume API. However, I would be quite wary of including it in the final spec without it being implemented first. Another question is: what is the current use for it? Does anyone want to write untwisted async web applications? My current interest in WSGI is basically on the "plug twisted web into another webserver as an application" side of things. I wouldn't want to write an application to WSGI (without a framework on top)... If everyone else feels that way, an async API may not be actually useful until there is some other Async-WSGI web server that you could plug twisted framework stuff on top of, or some other async framework you can plug on top of the twisted server. As for postponing until WSGI 2.0, I would hope there doesn't need to be a WSGI 2.0, though, since the interface is so darn simple. ;) But it could be in a separate WSGI async addons. James
At 11:20 AM 10/15/04 -0400, James Y Knight wrote:
On Oct 7, 2004, at 1:28 AM, Phillip J. Eby wrote:
By the way, after all this discussion... do you think it would be better to:
1) Push towards a full async API, nailing down all these loose ends
2) Use the simple-but-klugdy "pause iteration" API idea
3) Don't make an "official" async API, and just leave it open to server authors to create their own extensions, and maybe cherry pick the best ideas for WSGI 2.0, or
4) Do something else altogether?
I think the API you've outlined sounds good. I can imagine ways to implement it both for an async server like twisted, and as a compatibility layer for an async-requiring application on a sync server. I think it's easier to make the compatibility layer with this API than with the pause/resume API. However, I would be quite wary of including it in the final spec without it being implemented first.
Right, this is one reason I'm thinking that #3 might be a good idea, although it'd probably be more like 1.1 than 2.0. Or really, it would just be an optional extension available under 1.0. Even if we finalize the 1.0 spec, nothing stops us from adding optional extensions that don't alter the existing required semantics.
Another question is: what is the current use for it? Does anyone want to write untwisted async web applications?
Right. That's the really big issue, and another reason why saying, "let's wait for implementations" might be a good idea. That is, if people implement something, there's clearly a market for it. If they don't, maybe we don't need it.
My current interest in WSGI is basically on the "plug twisted web into another webserver as an application" side of things. I wouldn't want to write an application to WSGI (without a framework on top)... If everyone else feels that way, an async API may not be actually useful until there is some other Async-WSGI web server that you could plug twisted framework stuff on top of, or some other async framework you can plug on top of the twisted server.
Yep, that's the issue alright. It seems that the common usecase for an async web app is going to boil down to: "do you want to proxy your Twisted app from some other web server?" Because let's face it, Twisted's process model isn't really a match for say, the Apache prefork model, or CGI. ISTM, then, that the useful thing to write would be a synchronous WSGI->HTTP "application" object. That would allow Twisted or any other async server (or really any HTTP server at all) to be treated as a WSGI application, thus letting async apps join the WSGI party without forcing them to give up any asyncness or to have to do other really horrid things to fit. With a little more sophistication, such an application component could perhaps actually spawn the async server if it's not running, by checking a pid file or some such. Or that could be middleware; you have a "server starter" middleware that just ensures the server is running before it passes the request down to the proxy middleware.
As for postponing until WSGI 2.0, I would hope there doesn't need to be a WSGI 2.0, though, since the interface is so darn simple. ;) But it could be in a separate WSGI async addons.
Technically, I don't think finalizing the base specification would prevent us from amending the PEP to add optional features even to 1.0.
participants (2)
-
James Y Knight
-
Phillip J. Eby