[Web-SIG] buffer used by socket, should also work with python stdlib Re: Request for Comments on upcoming WSGI Changes

Mon Sep 21 13:15:55 CEST 2009

On Mon, Sep 21, 2009 at 11:30 AM, Graham Dumpleton <
graham.dumpleton at gmail.com> wrote:

> 2009/9/21 René Dudfield <renesd at gmail.com>:
> > On Mon, Sep 21, 2009 at 9:46 AM, Georg Brandl <g.brandl at gmx.net> wrote:
> >> René Dudfield schrieb:
> >>> On Mon, Sep 21, 2009 at 8:10 AM, Chris McDonough <
> chrism-ccARneWBNkgAvxtiuMwx3w at public.gmane.org> wrote:
> >>>>
> >>>> OTOH, I suspect the Python 3 stdlib is still broken if it requires
> native
> >>>> strings in various places (and prohibits the use of bytes).
> >>>
> >>> yes, python3 stdlib should support 'str'(the old unicode), 'buffer'
> >>> and 'bytes' for web using stuff.  Buffer is important because it's a
> >>> type also used for sockets(along with bytes) and it allows less memory
> >>> allocation (because you can reuse buffers).
> >>
> >> Please don't confuse readers and use the correct name, i.e. 'bytearray'
> >> instead of 'buffer'.
> >>
> >> Georg
> >>
> >
> > Let me try and reduce the confusion...
> >
> > There are two different python types the py3k socket module uses:
> > 'bytes' and 'buffer'.  'bytes' is kind of like str in python3... but
> > with reduced functionality (no formatting, less methods etc).  buffer
> > is a Py_buffer from the c api.
> >
> > buffer, and bytes in socket:
> >
> http://docs.python.org/3.1/library/socket.html#socket.socket.recvfrom_into
> > bytearray: http://docs.python.org/3.1/library/functions.html#bytearray
> > bytes: http://docs.python.org/3.1/library/functions.html#bytes
> > buffer: http://docs.python.org/3.1/c-api/buffer.html
> >
> > This is separate, but related to the point of bytes vs unicode.  It is
> > really (bytes and buffer) vs unicode - since bytes and buffer can be
> > used with socket.  socket never uses a python2 'unicode', or a python3
> > 'str' type.
>
> A WSGI adapter need not be sitting on top of a socket, it may be based
> on some lower level API which provides an abstract interface to the
> client connection. For example, in Apache the code handling a request
> doesn't deal with the socket. As such, requiring buffer/bytearray
> would likely stop you from using any embedded system within a web
> server, such as is the case for Apache/mod_wsgi. I would suspect that
> requiring buffer/bytearray would also prevent WSGI being used on top
> of CGI as well as file objects don't likely deal in those types
> either.
>
> I would also suggest that pursuing these types is just a case of
> premature optimisation. Where is your proof that using them would give
> any benefit? The web server layer is never the bottleneck in a web
> stack, it is the web application, its routing and rendering systems
> and any interaction with a database that are the bottleneck. It would
> be a waste of time to overly complicate the WSGI specification for
> absolutely no reason. People could get much better performance by
> simply paying attention to their own web applications and making them
> run better rather than praying that the underlying server is somehow
> going to make their application 4 times faster than anything else
> around.
>
> Maybe we can call this rush to prematurely optimise or jump on the
> bandwagon of the latest asynchronous server Tornado syndrome. ;-)
>
> Graham
>

hi,

Below are the reasons why I think considering buffers for a future
post-wsgi-1.1 spec is useful.  I don't think it should be considered for a
wsgi 1.1 - I'm now in agreement with Robert that a wsgi 1.1 should come out
very soon.  My specific concern is that pythons stdlib also support 'buffer'
(along with 'bytes' and 'str') - but that is separate from the new wsgi 1.1
spec discussion.

---
I don't think *requiring* the use of buffers is needed... just making it
*possible* to use them.

buffer is one of the types that socket supports, so it makes sense to at
least consider them.

Using buffer would in no way make it impossible to use python in embedded
webservers.  You can easily make a Py_buffer from the same memory apache
gives you to create python strings.  In fact buffers allow you to support
more embedded systems more easily - since strings are immutable, but not all
embedded systems give you immutable data.  Py_buffer also supports things
like strides, non-contiguous memory, read/write information and other stuff
which make it possible to use more types of memory.  It's a lot more useful
to use for python embedded in things than a string type.

This is not just about performance, it's about considering the types used
these days.  One of the things that has changed since wsgi 1.0 came out is
that python2.5 and above allow the use of a buffer with sockets.  Python3
also changes the types to (str, buffer).  mmap is also a very easy way to
share data between multiple python processes - using the Py_buffer allows
you to use mmap too.

Not all applications are the same, and some do require lots of performance.

My use case is video over the network for requiring buffers.  When doing
100s of megabytes or gigabytes per second on one machine: copying, and
allocating strings is a waste of time.  Even allocating, and copying 200KB
jpeg images is a big waste of time.  It's basic programing optimization
knowledge that allocating, and copying memory is slow.  So is converting
memory to various different encodings if it's not needed.  Proof is by
timing a string allocation + copy + transcode verses just using the buffer
given.  By having the server require allocating memory, require copying
memory or require transcoding the memory - that's makes my use case a lot
slower than it needs to be.

cheers,
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/web-sig/attachments/20090921/41b396b2/attachment-0001.htm>