[Web-SIG] CPU cache locality.

Thu Sep 9 15:24:06 CEST 2004

At 12:20 PM 9/9/04 +0100, Alan Kennedy wrote:
>I notice that Phillip has include a statement in PEP-0333 which states in 
>the section under "Buffering and Streaming":
>
>"""
>Generally speaking, applications will achieve the best throughput by 
>buffering their (modestly-sized) output and sending it all at once. When 
>this is the case, applications should simply return a single-element 
>iterable containing their entire output as a single string.
>
>[snip]
>
>For large files, however, or for specialized uses of HTTP streaming (such 
>as multipart "server push"), an application may need to provide output in 
>smaller blocks (e.g. to avoid loading a large file into memory). It's also 
>sometimes the case that part of a response may be time-consuming to 
>produce, but it would be useful to send ahead the portion of the response 
>that precedes it.
>"""
>
>Phillip, when you wrote about "performance" here, did you have CPU cache's 
>in mind?

Actually, the word "performance" doesn't appear anywhere in the above; I 
referred only to "throughput".  Performance can affect throughput, but not 
really the other way around.

The reason that returning a single-element iterable improves throughput in 
async architectures like Twisted and ZServer is that they use a thread pool 
for application code.   If the application object returns an iterable 
containing the whole response body, then the application thread is now free 
to run a new application instance.  This allows greater "throughput" at the 
application level, because more requests can be run in a given period of 
time than if an application thread had to continue to be used.