[Python-Dev] The buffer interface

M.-A. Lemburg mal@lemburg.com
Tue, 17 Oct 2000 10:16:56 +0200


Greg Stein wrote:
> 
> On Mon, Oct 16, 2000 at 01:22:22PM -0700, Jeff Collins wrote:
> >...
> > I think that buffer object is fairly important.  They provide a mechanism
> > for exposing arbitrary chunks of memory (eg, PyBuffer_FromMemory),
> > something that no other python object does, AFIAK.  Perhaps clarifying the
> > interface (such as the slice operator returning a buffer, as suggested
> > below) and providing more hooks from Python for creating buffers (via
> > newmodule, say) would be helpful.
> 
> There have been quite a few C extensions (and embedding Python!) where the
> buffer objects have been used in this fashion. For example, if you have a
> string argument that you wish to pass into Python, then you can avoid a copy
> by wrapping a Buffer Object around it and passing that.

Perhaps we ought to flesh out the current uses of buffer objects and
then decide how to proceed ?!

IMHO, the problem with buffer objects (apart from the sometimes strange
protocol behaviour) is that there too many "features" built into
it. Simplification and possibly diversification is needed: instead
of trying to achieve every possible C hack with buffer objects
we should try to come up with a reasonably small set of types
which allow only very basic tasks, e.g. 

1. wrapping C memory areas with the possibility of accessing the
   raw bytes in a read-only way (this should be buffer()),

2. providing a non-copying object reference type (I'd call this
   reference()) and

3. maintaining a writeable C memory buffer (arrays provide this feature).

The buffer object currently tries to do all three.

> Many of the issues with the buffer object can be solved with simple changes.
> For example, the "mutable object" thing is easily dealt with by having the
> object not record the pointer, but just fetch it every time that it wants to
> do an operation.
> [ and if we extend the buffer API, we could potentially optimize the
>   behavior to avoid the ptr refetch on each operation ]

Please don't extend the buffer API: the whole design is flawed
since it undermines data encapsulation in very dangerous ways.

If at all, we should consider a new API at abstract API level
which doesn't return raw C pointers, but real Python objects
(e.g. type 2 reference objects).
 
> I don't recall the motivation for returning strings. I believe it was based
> on an attempt to make the buffer look as much like a string as possible (and
> slices and concats return strings). That was a poor choice :-)  ... so,
> again, some basic changes to return slices and concats as buffer objects
> would make sense.

+1.
 
> Extending the buffer() builtin to create writeable buffer objects has been a
> reasonably common request. What seems to happen instead is that people
> developing C extensions (which desire buffer objects as their params) just
> add a new function to the extension to create buffer objects.

Please don't. Instead either suggest to use arrays or come up
with some new type with the sole purpose of providing read-write
access to a chunk of bytes.
 
> Re: the buffer API: At the time the "s"/"t" codes were introduced (before
> 1.5.2 was released), we had a very different concept of how Unicode objects
> would be implemented. At that time, Unicode objects had no 8-bit
> representation (just 16-bit chars), so the difference was important. I'm not
> clued in enough on the ramifications of torching the difference in the API,
> but it would be a nice simplification.

Well, think of it this way: Unicode was the first object to actually
try to make a difference between "s" and "t" -- and failed badly.
In the end, we reverted the decision to make any difference and
now special case Unicode objects in the getargs.c parser (so that
"s" and "t" work virtually the same for Unicode).

+1 on the idea of removing the difference altogether in 2.1.

If anyone needs to a special representation of an object, the object
should provide a clearly defined C API for this instead. E.g.
Unicode has lots of APIs to encode Unicode into quite a few
new repesentations.

> Buffers vs arrays: this is a harder question. Which is the "recommended
> binary type [for series of bytes]" ? Buffers can refer to arbitrary memory.
> Arrays maintain their own memory. I believe the two models are needed, so
> I'd initially offer that both buffers and arrays need to be maintained.
> However, given that... what is the purpose of the array if a buffer can
> *also* maintain its own memory?

Right and that's the problem: buffers shouldn't be able to
own memory. See above.

-- 
Marc-Andre Lemburg
______________________________________________________________________
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/