[Web-SIG] buffer used by socket, should also work with python stdlib Re: Request for Comments on upcoming WSGI Changes

Mon Sep 21 13:40:31 CEST 2009

2009/9/21 René Dudfield <renesd at gmail.com>:
>
>
> On Mon, Sep 21, 2009 at 11:30 AM, Graham Dumpleton
> <graham.dumpleton at gmail.com> wrote:
>>
>> 2009/9/21 René Dudfield <renesd at gmail.com>:
>> > On Mon, Sep 21, 2009 at 9:46 AM, Georg Brandl <g.brandl at gmx.net> wrote:
>> >> René Dudfield schrieb:
>> >>> On Mon, Sep 21, 2009 at 8:10 AM, Chris McDonough
>> >>> <chrism-ccARneWBNkgAvxtiuMwx3w at public.gmane.org> wrote:
>> >>>>
>> >>>> OTOH, I suspect the Python 3 stdlib is still broken if it requires
>> >>>> native
>> >>>> strings in various places (and prohibits the use of bytes).
>> >>>
>> >>> yes, python3 stdlib should support 'str'(the old unicode), 'buffer'
>> >>> and 'bytes' for web using stuff.  Buffer is important because it's a
>> >>> type also used for sockets(along with bytes) and it allows less memory
>> >>> allocation (because you can reuse buffers).
>> >>
>> >> Please don't confuse readers and use the correct name, i.e. 'bytearray'
>> >> instead of 'buffer'.
>> >>
>> >> Georg
>> >>
>> >
>> > Let me try and reduce the confusion...
>> >
>> > There are two different python types the py3k socket module uses:
>> > 'bytes' and 'buffer'.  'bytes' is kind of like str in python3... but
>> > with reduced functionality (no formatting, less methods etc).  buffer
>> > is a Py_buffer from the c api.
>> >
>> > buffer, and bytes in socket:
>> >
>> > http://docs.python.org/3.1/library/socket.html#socket.socket.recvfrom_into
>> > bytearray: http://docs.python.org/3.1/library/functions.html#bytearray
>> > bytes: http://docs.python.org/3.1/library/functions.html#bytes
>> > buffer: http://docs.python.org/3.1/c-api/buffer.html
>> >
>> > This is separate, but related to the point of bytes vs unicode.  It is
>> > really (bytes and buffer) vs unicode - since bytes and buffer can be
>> > used with socket.  socket never uses a python2 'unicode', or a python3
>> > 'str' type.
>>
>> A WSGI adapter need not be sitting on top of a socket, it may be based
>> on some lower level API which provides an abstract interface to the
>> client connection. For example, in Apache the code handling a request
>> doesn't deal with the socket. As such, requiring buffer/bytearray
>> would likely stop you from using any embedded system within a web
>> server, such as is the case for Apache/mod_wsgi. I would suspect that
>> requiring buffer/bytearray would also prevent WSGI being used on top
>> of CGI as well as file objects don't likely deal in those types
>> either.
>>
>> I would also suggest that pursuing these types is just a case of
>> premature optimisation. Where is your proof that using them would give
>> any benefit? The web server layer is never the bottleneck in a web
>> stack, it is the web application, its routing and rendering systems
>> and any interaction with a database that are the bottleneck. It would
>> be a waste of time to overly complicate the WSGI specification for
>> absolutely no reason. People could get much better performance by
>> simply paying attention to their own web applications and making them
>> run better rather than praying that the underlying server is somehow
>> going to make their application 4 times faster than anything else
>> around.
>>
>> Maybe we can call this rush to prematurely optimise or jump on the
>> bandwagon of the latest asynchronous server Tornado syndrome. ;-)
>>
>> Graham
>
>
> hi,
>
>
> Below are the reasons why I think considering buffers for a future
> post-wsgi-1.1 spec is useful.  I don't think it should be considered for a
> wsgi 1.1 - I'm now in agreement with Robert that a wsgi 1.1 should come out
> very soon.  My specific concern is that pythons stdlib also support 'buffer'
> (along with 'bytes' and 'str') - but that is separate from the new wsgi 1.1
> spec discussion.
>
>
>
> ---
> I don't think *requiring* the use of buffers is needed... just making it
> *possible* to use them.
>
> buffer is one of the types that socket supports, so it makes sense to at
> least consider them.
>
> Using buffer would in no way make it impossible to use python in embedded
> webservers.  You can easily make a Py_buffer from the same memory apache
> gives you to create python strings.  In fact buffers allow you to support
> more embedded systems more easily - since strings are immutable, but not all
> embedded systems give you immutable data.  Py_buffer also supports things
> like strides, non-contiguous memory, read/write information and other stuff
> which make it possible to use more types of memory.  It's a lot more useful
> to use for python embedded in things than a string type.
>
> This is not just about performance, it's about considering the types used
> these days.  One of the things that has changed since wsgi 1.0 came out is
> that python2.5 and above allow the use of a buffer with sockets.  Python3
> also changes the types to (str, buffer).  mmap is also a very easy way to
> share data between multiple python processes - using the Py_buffer allows
> you to use mmap too.
>
>
> Not all applications are the same, and some do require lots of performance.
>
> My use case is video over the network for requiring buffers.  When doing
> 100s of megabytes or gigabytes per second on one machine: copying, and
> allocating strings is a waste of time.  Even allocating, and copying 200KB
> jpeg images is a big waste of time.  It's basic programing optimization
> knowledge that allocating, and copying memory is slow.  So is converting
> memory to various different encodings if it's not needed.  Proof is by
> timing a string allocation + copy + transcode verses just using the buffer
> given.  By having the server require allocating memory, require copying
> memory or require transcoding the memory - that's makes my use case a lot
> slower than it needs to be.

No, proof would be someone taking CherryPy WSGI server and change it
to use buffer and demonstrate that it works and that wouldn't cause an
issue for a high level WSGI application.

A low level benchmark of the performance of a single type versus
another in a mock up test case isn't going to prove anything as that
doesn't necessarily translate into anything usable.

Sorry, if I am setting the bar quite high on this one, but it is quite
nebulous that it would at all be useful and so an actual working
example would be much more convincing.

Graham