[Web-SIG] Emulating req.write() in WSGI
aaron.fransen at gmail.com
Tue Jun 29 18:14:05 CEST 2010
On Tue, Jun 29, 2010 at 7:37 AM, Aaron Fransen <aaron.fransen at gmail.com>wrote:
> On Mon, Jun 28, 2010 at 5:42 PM, Graham Dumpleton <
> graham.dumpleton at gmail.com> wrote:
>> On 29 June 2010 05:01, Aaron Fransen <aaron.fransen at gmail.com> wrote:
>> > One of the nice things about mod_python is the req.write() function.
>> One thing I should warn you about req.write() in Apache is that for
>> streaming data as you seem to be using it, it will accumulate memory
>> against a request for each write call and that will not be reused,
>> albeit it will be released again at the end of the request.
>> The problem here isn't actually in mod_python but in the underlying
>> Apache ap_rwrite() call.
>> What this function does is that for each call to it, it creates what
>> is called a bucket to hold the data to be written. The memory for this
>> bucket is allocated from the per request memory pool each time. This
>> bucket is then passed down the Apache output filter chain and
>> eventually the data gets written out.
>> Now, because the code doesn't attempt to reuse the bucket, that memory
>> then remains unused, but still allocated against the memory pool, with
>> the memory pool only being destroyed at the end of the request.
>> The outcome of this is that if you had a long running request which
>> continually wrote out response data in small bits using req.write(),
>> for each call there is a small increase in amount of memory taken from
>> the per request memory pool with it not being reused. Thus if the
>> request were running for a very long time, you will see a gradual
>> increase in overall memory usage of the process. When the request
>> finishes, the memory is reclaimed and reused, but you have by then
>> already set the high ceiling on ongoing process memory in use.
>> Anyway, thought I should just warn you about this. In part this issue
>> may even be why mod_python got a reputation for memory bloat in some
>> situations. That is, the fundamental way of returning response data
>> could cause unnecessary increase in process size if called many times
>> for a request.
> Fortunately we're not talking about a huge amount of data here, basically
> just a couple of notices to keep the user happy (less than 1K usually).
> When using yield, it's as if the module where the yield command is run is
> completely ignored. The page returned is a "default" page generated by the
> application. Errors are being trapped, but none are being generated, it's
> just exiting without any kind of notice.
> When using write() without a Content-Length header, nothing shows on the
> When using write() with a Content-Length header, the first update shows
> (and only after the entire page has been generated), but none of the
> subsequent ones nor the final page.
> When using write() with a Content-Length header set large enough to
> encompass the entire final result, the final result page shows, but none of
> the informational messages leading up to the generation of the page appear.
> I haven't really done anything to the base wsgi installation; just set it
> up in daemon mode.
Couple more things I've been able to discern.
The first happened after I "fixed" the html code. Originally under
mod_python, I guess I was cheating more than a little bit by sending
<html></html> code blocks twice, once for the incremental notices, once for
the final content. Once I changed the code to send a single properly parsed
block, the entire document showed up as expected, however it still did not
send any part of the html incrementally.
Watching the line with Wireshark, all of the data was transmitted at the
same time, so nothing was sent to the browser incrementally.
(This is using the write() functionality, I haven't tried watching the line
with yield yet.)
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Web-SIG