[Web-SIG] Emulating req.write() in WSGI

Tue Jul 6 12:50:00 CEST 2010

On 5 July 2010 22:43, Aaron Fransen <aaron.fransen at gmail.com> wrote:
> Apologies Graham, I'm not actually trying to appear dense but clearly I'm
> not one of the world's bright lights when it comes to web interfaces.
>
> My installation is literally a base installation of the latest Ubuntu server
> platform. The only configuration at play is this:
>
>     WSGIDaemonProcess node9 user=www-data group=www-data processes=2
> threads=25
>     WSGIProcessGroup node9
>     WSGIScriptAlias /run /var/www/run/run.py
>
> The error that occurs when using telnet and yield is:
>
> [Mon Jul 05 06:30:24 2010] [error] [client 127.0.0.1] mod_wsgi (pid=2716):
> Target WSGI script '/var/www/run/run.py' cannot be loaded as Python module.
> [Mon Jul 05 06:30:24 2010] [error] [client 127.0.0.1] mod_wsgi (pid=2716):
> Exception occurred processing WSGI script '/var/www/run/run.py'.
> [Mon Jul 05 06:30:24 2010] [error] [client 127.0.0.1] SyntaxError: 'return'
> with argument inside generator (run.py, line 14)
>
> using this code:
>
>     status    =    '200 OK'
>     response_headers    =    [('Content-type','text/plain')]
>     start_response(status, response_headers)
>     for x in range(0,10):
>         yield 'hey %s' % x
>         time.sleep(1)
>
> The error occurs when I use "return []" as opposed to simply "return",
> however I now see that is a result of the yield command itself.

In the code example I posted I never had a 'return' statement in same
function as 'yield'. You shouldn't be mixing the two.

Graham

> Using this method, the telnet interface returns immediately with:
>
> HTTP/1.1 200 OK
> Date: Mon, 05 Jul 2010 12:30:45 GMT
> Server: Apache/2.2.14 (Ubuntu)
> Vary: Accept-Encoding
> Connection: close
> Content-Type: text/plain
>
> 0
> Connection closed by foreign host.
>
> In fact, whether using yield or write produces the same result.
>
> If I'm not getting the results I should be, then obviously I'm doing
> something wrong.
>
> I understand the danger of having a long-running web process (hence the
> reason I have a lot of virtual machines in the live environment using
> mod_python right now) but unfortunately it's something I don't seem to be
> able to work around at the moment.
>
> Thanks to all.
>
> On Wed, Jun 30, 2010 at 5:19 PM, Graham Dumpleton
> <graham.dumpleton at gmail.com> wrote:
>>
>> On 30 June 2010 22:55, Aaron Fransen <aaron.fransen at gmail.com> wrote:
>> >
>> > I can see that this could potentially get very ugly very quickly.
>> >
>> > Using stock Apache on the current Ubuntu server, using yield produced a
>> > response error
>>
>> What error? If you aren't going to debug it enough to even work out
>> what the error is in the browser or Apache error logs and post it here
>> for comment so can say what may be wrong on your system, then we cant
>> exactly help you much can we.
>>
>> > and using write() (over the telnet interface) returned the 0
>> > only and disconnected. Similar behavior in Firefox.
>>
>> All the scripts I provided you are conforming WSGI applications and
>> work on mod_wsgi. If you are having issues, then it is likely going to
>> be the way your Apache/Python is setup or how you configured mod_wsgi
>> to host the scripts. Again, because you are providing no details about
>> how you configured mod_wsgi we cant help you work out what is wrong
>> with your system.
>>
>> > How odd that nobody's come up with a simple streaming/update schema (at
>> > least to my mind).
>>
>> For response content they have and it can be made to work. Just
>> because you cant get it working or don't understand what we are saying
>> about the need to use a JavaScript/AJAX type client (eg. comet style)
>> to make use of it as opposed to trying to rely on browser
>> functionality that doesn't exist doesn't change that. Request content
>> streaming is a different matter as I will explain below but you
>> haven't even mentioned that as yet that I can see.
>>
>> > It would have been nice to be able to provide some kind of in-stream
>> > feedback for long running jobs, but it looks like I'm going to have to
>> > abandon that approach. The only issue with either of the other solutions
>> > is
>> > that each subsequent request depends on data provided by the prior, so
>> > the
>> > amount of traffic going back & forth could potentially become a problem.
>> >
>> > Alternatively I could simply create a session database that saves the
>> > required objects then each subsequent request simply fetches the
>> > required
>> > one from the table and...
>> >
>> > Well, you can see why streaming seemed like such a simple solution! Back
>> > to
>> > the drawing board, as it were.
>>
>> I'll try one last time to try and summarise a few issues for you,
>> although based on your attitude so far, I don't think it will change
>> your opinion or help your understanding.
>>
>> 1. Streaming of responses from a WSGI application works fine using
>> either yield or write(). If it doesn't work for a specific WSGI
>> hosting mechanism then that implementation may not be conforming to
>> WSGI requirements. Specifically, between a yield and/or write() it is
>> required that an implicit flush is performed. This should ensure that
>> the data is written to the HTTP client connection and/or ensure that
>> the return of such data to the client occurs in parallel to further
>> actions occurring in that request.
>>
>> 2. A WSGI middleware that caches response data can stuff this up. One
>> cant outright prohibit a WSGI middleware holding on to response data,
>> albeit that for each yield or write() technically it is supposed to
>> still pass on at least an empty string down the chain so as to allow
>> control to get back to the underlying WSGI implementation, which may
>> uses such windows to swap what request context it is operating on so
>> as to allow a measure of concurrency in situation where threads may
>> not be getting used.
>>
>> 3. Where a WSGI adapter on top of an existing web server is used, eg.
>> various options that exist with Apache and nginx, then an output
>> filter configured into the web server may also stuff this up. For
>> example, an output filter that compresses response data may buffer up
>> response data into large blocks before compressing them and returning
>> them.
>>
>> 4. Although response content can be streamed subject to above caveats,
>> streaming of request content is a totally different matter. First off,
>> WSGI requires that the request content have a Content-Length
>> specified. Thus technically a HTTP client cant leave out
>> Content-Length and instead used chunked request content. Further, the
>> way in which many web servers and WSGI servers are implemented would
>> prohibit streaming of request content anyway. This is because many
>> implementations, especially where proxying occurs, eg. cgi, fastcgi,
>> scgi, ajp, uwsgi, mod_proxy (??), and mod_wsgi daemon mode, expect
>> that the whole request content can be read in and written across the
>> proxy connection before any attempt is made to start reading any data
>> returned from the web application. The request content therefore
>> cannot be open ended in length because most implementations will never
>> switch from reading that content to expecting response from the
>> application. Thus it isn't possible to use WSGI as both way streaming
>> mechanism where some request content is written, some response content
>> returned and then the client sends more request content based on that
>> etc etc.
>>
>> So what does this all mean. First up is that response content
>> streaming should be able to be made to work, however since request
>> content streaming isn't technically allowed with WSGI, if you need
>> that you are out of luck if you want to conform to WSGI specification.
>> Second, you can however with mod_wsgi embedded mode slightly step
>> outside of strict WSGI conformance and have request content streaming.
>> You are then bound to Apache/mod_wsgi, but whether you want to do that
>> is debatable for reasons below.
>>
>> The bigger problem with both way streaming or long polling
>> applications which use the same HTTP request is that WSGI servers tend
>> to use processes and threads for concurrency. When you use this
>> mechanisms they will tie up a process or thread for the whole time.
>> Thus if you have lots of concurrent request you need huge numbers of
>> processes and/or threads, which just isn't usually practical because
>> of resource usage such as memory. For that reason, one would instead
>> on the server usually use special purpose web servers for these types
>> of applications and use HTTP directly and avoid WSGI, due to WSGI
>> blocking nature. Instead these servers would use an event driven
>> system model or other system which allows concurrency without
>> requiring a process or thread per application.
>>
>> In short, this is what Comet and dedicated servers for that are about.
>> Allowing large numbers of concurrent long requests with minimal
>> resources. That they are dedicated systems also allows them to avoid
>> limitations in other high level web application interfaces such as
>> CGI, FASTCGI, SCGI, AJP etc which have an expectation that can read
>> whole request content before trying to deal with any response from a
>> web application that is handling the requests.
>>
>> Anyway, hopefully that explains things better. You can do what you
>> want, you just need to select the correct tool for the job.
>>
>> Graham
>
>