[Web-SIG] buffer used by socket, should also work with python stdlib Re: Request for Comments on upcoming WSGI Changes
renesd at gmail.com
Mon Sep 21 14:25:55 CEST 2009
On Mon, Sep 21, 2009 at 12:40 PM, Graham Dumpleton
<graham.dumpleton at gmail.com> wrote:
> 2009/9/21 René Dudfield <renesd at gmail.com>:
> > On Mon, Sep 21, 2009 at 11:30 AM, Graham Dumpleton
> > <graham.dumpleton at gmail.com> wrote:
> >> 2009/9/21 René Dudfield <renesd at gmail.com>:
> >> > On Mon, Sep 21, 2009 at 9:46 AM, Georg Brandl <g.brandl at gmx.net> wrote:
> >> >> René Dudfield schrieb:
> >> >>> On Mon, Sep 21, 2009 at 8:10 AM, Chris McDonough
> >> >>> <chrism-ccARneWBNkgAvxtiuMwx3w at public.gmane.org> wrote:
> >> >>>>
> >> >>>> OTOH, I suspect the Python 3 stdlib is still broken if it requires
> >> >>>> native
> >> >>>> strings in various places (and prohibits the use of bytes).
> >> >>>
> >> >>> yes, python3 stdlib should support 'str'(the old unicode), 'buffer'
> >> >>> and 'bytes' for web using stuff. Buffer is important because it's a
> >> >>> type also used for sockets(along with bytes) and it allows less memory
> >> >>> allocation (because you can reuse buffers).
> >> >>
> >> >> Please don't confuse readers and use the correct name, i.e. 'bytearray'
> >> >> instead of 'buffer'.
> >> >>
> >> >> Georg
> >> >>
> >> >
> >> > Let me try and reduce the confusion...
> >> >
> >> > There are two different python types the py3k socket module uses:
> >> > 'bytes' and 'buffer'. 'bytes' is kind of like str in python3... but
> >> > with reduced functionality (no formatting, less methods etc). buffer
> >> > is a Py_buffer from the c api.
> >> >
> >> > buffer, and bytes in socket:
> >> >
> >> > http://docs.python.org/3.1/library/socket.html#socket.socket.recvfrom_into
> >> > bytearray: http://docs.python.org/3.1/library/functions.html#bytearray
> >> > bytes: http://docs.python.org/3.1/library/functions.html#bytes
> >> > buffer: http://docs.python.org/3.1/c-api/buffer.html
> >> >
> >> > This is separate, but related to the point of bytes vs unicode. It is
> >> > really (bytes and buffer) vs unicode - since bytes and buffer can be
> >> > used with socket. socket never uses a python2 'unicode', or a python3
> >> > 'str' type.
> >> A WSGI adapter need not be sitting on top of a socket, it may be based
> >> on some lower level API which provides an abstract interface to the
> >> client connection. For example, in Apache the code handling a request
> >> doesn't deal with the socket. As such, requiring buffer/bytearray
> >> would likely stop you from using any embedded system within a web
> >> server, such as is the case for Apache/mod_wsgi. I would suspect that
> >> requiring buffer/bytearray would also prevent WSGI being used on top
> >> of CGI as well as file objects don't likely deal in those types
> >> either.
> >> I would also suggest that pursuing these types is just a case of
> >> premature optimisation. Where is your proof that using them would give
> >> any benefit? The web server layer is never the bottleneck in a web
> >> stack, it is the web application, its routing and rendering systems
> >> and any interaction with a database that are the bottleneck. It would
> >> be a waste of time to overly complicate the WSGI specification for
> >> absolutely no reason. People could get much better performance by
> >> simply paying attention to their own web applications and making them
> >> run better rather than praying that the underlying server is somehow
> >> going to make their application 4 times faster than anything else
> >> around.
> >> Maybe we can call this rush to prematurely optimise or jump on the
> >> bandwagon of the latest asynchronous server Tornado syndrome. ;-)
> >> Graham
> > hi,
> > Below are the reasons why I think considering buffers for a future
> > post-wsgi-1.1 spec is useful. I don't think it should be considered for a
> > wsgi 1.1 - I'm now in agreement with Robert that a wsgi 1.1 should come out
> > very soon. My specific concern is that pythons stdlib also support 'buffer'
> > (along with 'bytes' and 'str') - but that is separate from the new wsgi 1.1
> > spec discussion.
> > ---
> > I don't think *requiring* the use of buffers is needed... just making it
> > *possible* to use them.
> > buffer is one of the types that socket supports, so it makes sense to at
> > least consider them.
> > Using buffer would in no way make it impossible to use python in embedded
> > webservers. You can easily make a Py_buffer from the same memory apache
> > gives you to create python strings. In fact buffers allow you to support
> > more embedded systems more easily - since strings are immutable, but not all
> > embedded systems give you immutable data. Py_buffer also supports things
> > like strides, non-contiguous memory, read/write information and other stuff
> > which make it possible to use more types of memory. It's a lot more useful
> > to use for python embedded in things than a string type.
> > This is not just about performance, it's about considering the types used
> > these days. One of the things that has changed since wsgi 1.0 came out is
> > that python2.5 and above allow the use of a buffer with sockets. Python3
> > also changes the types to (str, buffer). mmap is also a very easy way to
> > share data between multiple python processes - using the Py_buffer allows
> > you to use mmap too.
> > Not all applications are the same, and some do require lots of performance.
> > My use case is video over the network for requiring buffers. When doing
> > 100s of megabytes or gigabytes per second on one machine: copying, and
> > allocating strings is a waste of time. Even allocating, and copying 200KB
> > jpeg images is a big waste of time. It's basic programing optimization
> > knowledge that allocating, and copying memory is slow. So is converting
> > memory to various different encodings if it's not needed. Proof is by
> > timing a string allocation + copy + transcode verses just using the buffer
> > given. By having the server require allocating memory, require copying
> > memory or require transcoding the memory - that's makes my use case a lot
> > slower than it needs to be.
> No, proof would be someone taking CherryPy WSGI server and change it
> to use buffer and demonstrate that it works and that wouldn't cause an
> issue for a high level WSGI application.
> A low level benchmark of the performance of a single type versus
> another in a mock up test case isn't going to prove anything as that
> doesn't necessarily translate into anything usable.
> Sorry, if I am setting the bar quite high on this one, but it is quite
> nebulous that it would at all be useful and so an actual working
> example would be much more convincing.
As I said, performance isn't the only reason to consider it(other
reasons already listed). You seem to have decided it's only an
optimization issue for some reason.
An actual working example showing allocating 4.9MB of memory being
slower than not allocating 4.9MB of memory?
The only difference would be this in pseudo code (without error checking etc):
def recv(socket, nbytes, dest = None):
if dest is None:
# not passing in a buffer, we need to allocate the memory.
buf = malloc(nbytes);
dest = make_string_from_buffer(buf)
# writing directly into the buffer supplied. No malloc needed.
buf = dest
# do the socket recv
Reusing the buffer lets you avoid the cost of malloc every time you
read from the buffer. You can store buffers in memory pools
(http://en.wikipedia.org/wiki/Memory_pool) to avoid mallocing/freeing
all the time.
There's reasons why the socket interface was changed to allow passing
in buffers to use. It wasn't just added to python for no reason.
That's all the arguing and explaining I'll do on this - I'm not going
to rewrite cherrypy for you as proof.
More information about the Web-SIG