
GvR thought you guys might have some ideas on this one for me. If I don't get any replies, I may have to rely on my own instincts and judgment and no one knows what follies might ensue ;) Raymond Hettinger ----- Original Message ----- From: "Raymond Hettinger" <python@rcn.com> To: <python-dev@python.org> Sent: Friday, June 21, 2002 1:16 PM Subject: Behavior of buffer()

--- Raymond Hettinger <python@rcn.com> wrote:
I think buffers have a weird duality that they don't really want. In one case, the buffer object acts as a low level way to inspect some other object's PyBufferProcs. I'll call this BufferInspector. In the other case, the buffer object just acts like an array of bytes. I'll call this ByteArray. So for a BufferInspector, you'd want slices to return new "views" into the same object, and repetition doesn't make any sense. If you wanted to copy the data out of the object you're mucking with, you'd be explicit about it - either creating a new string, or a new ByteArray. For a ByteArray, I think you'd want slices to have copy behaviour and return a new ByteArray. Repetition also makes perfect sense. Of course this all gets screwy when the object being inspected by the BufferInspector sense is created solely to provide a ByteArray. I see this as an ugly workaround for arraymodule.c not allowing one to supply a pointer/destructor when creating arrays. The fact that either of these pretend to be strings is really convenient, but I don't think it has much to do with the weirdness. The fact that either of these returns strings for any operation is somewhat weird. For the ByteArray sense of the buffer object, it's analagous to a list slice/repetition returning a tuple. Since the array module already has a way to create a ByteArray (and a ShortArray, and...), buffer objects don't really need to duplicate that effort. Except creating an array from your own "special memory" (mmap, DMA, third party API), and backwards compatibility in general. :-) BTW: I chuckled when I saw you post this the first time. This topic seems to draw a lot of silence. I know that I would suggest deprecating the PyBufferObject to just being a BufferInspector, and taking what little extra functionality was in there and stuffing it into arraymodule.c. Another solution would be to factor PyBufferObject into PyBufferInspector and a "bytes" object. A few months ago, I was tempted to submit a PEP saying as much, but I think that would have quietly fallen to the floor. Nobody seems to like this topic too much... If you do go in and make changes to bufferobject.c, I've already submitted two patches (fallen quietly to the floor) that fix some other classic "buffer problems". You might want to look at them. Or not :-) Cheers, -Scott __________________________________________________ Do You Yahoo!? Yahoo! - Official partner of 2002 FIFA World Cup http://fifaworldcup.yahoo.com

On Sun, Jun 23, 2002, Scott Gilbert wrote:
OTOH, for PEPs, silence may be construed as consent. Just don't be too surprised if an actual PEP generated a lot of noise. -- Aahz (aahz@pythoncraft.com) <*> http://www.pythoncraft.com/ Project Vote Smart: http://www.vote-smart.org/

I'm a little surprised. Raymond Hettinger checked in a change that makes all slices of buffer objects return strings. His comments on SF bug 546434 say that only one person replied and that they agreed returning strings was the better solution. But that's not how I read the only response to his query that I see in python-dev, from Scott Gilbert:
I read this as a recommendation to forget about returning strings. Am I mistaken? Also, I wish you'd submitted that PEP. IMO the reason that nobody likes this topic is that there is much confusion about why we have buffer objects in the first place. Any attempt at clarifying this (e.g. proposing separate byte arrays and buffer inspectors) would be welcome. --Guido van Rossum (home page: http://www.python.org/~guido/)

Guido van Rossum wrote:
Interesting. I must have skipped that message. IMHO, all slices of buffer object should return buffer objects, but since all Python releases return strings, I guess this is too late to change. Note that the only case where a buffer object is returned in Python 2.x (x < 3) is if you write buffer()[:], i.e. you want a copy of the buffer object. -- Marc-Andre Lemburg CEO eGenix.com Software GmbH _______________________________________________________________________ eGenix.com -- Makers of the Python mx Extensions: mxDateTime,mxODBC,... Python Consulting: http://www.egenix.com/ Python Software: http://www.egenix.com/files/python/

You blink, and you find that the world has changed.
That was my preference too, but Raymond disagreed and somehow tried to find support for his position :-). Since buffer objects (of course :-) support the C-level buffer protocol, they can still be used in most places where strings are needed. But it would be incompatible. But so is Raymond's solution (because it changes buffer()[:] to also return a string).
What does a copy of a buffer object buy you? It's not too late to revert Raymond's changes. --Guido van Rossum (home page: http://www.python.org/~guido/)

Guido van Rossum wrote:
Indeed :-)
Nothing... since you only get a new reference, not an independent copy.
It's not too late to revert Raymond's changes.
Why not try the buffer slice returns buffer logic for a few alphas, then betas, and then if noone complains the final release ? -- Marc-Andre Lemburg CEO eGenix.com Software GmbH _______________________________________________________________________ eGenix.com -- Makers of the Python mx Extensions: mxDateTime,mxODBC,... Python Consulting: http://www.egenix.com/ Python Software: http://www.egenix.com/files/python/

Since nobody cares, we won't get complaints. But it's a waste of time. I'm going to deprecate it. --Guido van Rossum (home page: http://www.python.org/~guido/)

--- Guido van Rossum wrote:
After the message you're referring to, Raymond Hettinger and I corresponded a little bit off of the list. I think these are probably the most relevant snippets: --- Raymond Hettinger:
For the problem at hand, do you recommend returning buffer objects or strings?
--- To which I responded:
Forgive the bit about "Guido not caring about it", it seemed that way to me at the time. Silence comes off as disinterest or annoyance. So my suggestion was that since taking away the implicit promotion of buffer slices/repetitions/concatenations to strings was going to break someone's code, that just can't be done. If we want sane behavior, then any slice, be it buf[1:2] or buf[:], ought to at least return the same type of object. Those two in conjunction mean they ought to always returns strings. --- Raymond Hettinger also wrote:
Thanks for your input, this topic doesn't seem to interest anyone,
--- To which I responded:
I think there are others that are interested, but it's pretty tough to
get the
--- Back to Guido van Rossum:
I read this as a recommendation to forget about returning strings. Am I mistaken?
Only if breaking backwards compatibility is an option. I'd like to see that happen, but I think that would take a pronouncement from someone in authority. --- More of Guido van Rossum:
I'm glad to hear this. I'll submit the PEP sometime in the next week. Cheers, -Scott Gilbert __________________________________________________ Do You Yahoo!? Sign up for SBC Yahoo! Dial - First Month Free http://sbc.yahoo.com

It seems we're still in the same boat. It would be saner to change buffer slices to return buffer objects, except for backward compatibility. I was hoping to hear from someone who uses buffer objects and knows that this would break his code. Scott apparently doesn't have this problem with his own code, so his opinion doesn't help. :-( Raymond's change still breaks compatibility, though, for slices without begin and end points. So we have a contradiction: out of fear of breaking compatibility, we make a change that breaks compatibility. Maybe we should do the same with the buffer object as we did with xrange(), and plan to remove all functionality that we aren't sure is useful? In 2.3, we would have to maintain compatibility but we could warn about features that will go away; in 2.4, we could remove unwanted features. Maybe the name 'buffer' suggests false expectations? It's not a buffer, it's an alias for a memory area. Maybe we should do something stronger, and deprecate the buffer type altogether. --Guido van Rossum (home page: http://www.python.org/~guido/)

I'm not too interested in this anymore (I _was_ a year ago, IIRC). I have given up using the buffer object myself, I've written my own (maybe in the same way as others).
Maybe the name 'buffer' suggests false expectations? It's not a buffer, it's an alias for a memory area.
Hm. The name could be right (and I cold give up my own memory object) if there were a way to create a buffer owning it's own memory.
Maybe we should do something stronger, and deprecate the buffer type altogether.
Or this. Thomas

Right.
Maybe your memory object could become a standard Python extension. Extra points if it works well with the memmap and the array modules. --Guido van Rossum (home page: http://www.python.org/~guido/)

What do you mean by 'works well with the mmap and array modules'?
I'm not sure, since I don't know what your memory object does (and frankly, I don't really understand what the mmap module does either :-). I was just mentioning these because they are other modules that have been used and/or proposed for buffering needs. --Guido van Rossum (home page: http://www.python.org/~guido/)

"Memory-mapped file objects behave like both strings and like file objects. Unlike normal string objects, however, these are mutable." More in the Python manual... Optionally they can be backed up by files in the file system, and optionally they can be shared between processes. At least that's what they are under Windows.
I was just mentioning these because they are other modules that have been used and/or proposed for buffering needs.
Now that you mention this, mmap could be used as a 'memory' object, although it would have to be converted into a new style class. My own memory object currently supports a private protocol which dosn't make sense for core Python. But that can be fixed. Thomas

{Guido, to Scott Gilbert]
Raymond did a survey on c.l.py, asking anyone who used buffer objects at *all* to speak up. IIRC, he got no replies. On Python-Dev, apart from musing whether they might conceivably use them, the only person who eventually said they actually used them was Marc-Andre. Fredrik pressed for details, but we haven't seen any concrete use cases. In the absence of the latter, it's impossible to guess what would be backward compatible for MAL's purposes.
I told everyone you forgot the essay you wrote suggesting this the last time this rose above everyone's pain threshold. It's a comfort to know that my channeling powers have not diminished with exponentially advancing age <wink>: http://mail.python.org/pipermail/python-dev/2000-October/009974.html

But at least I didn't change my mind. :-) So let's deprecate buffer(). I also suggest to roll back Raymond's changes to make slices more consistent -- there's no point in changing something that's only kept for backwards compatibility reasons. --Guido van Rossum (home page: http://www.python.org/~guido/)

[Guido]
But at least I didn't change my mind. :-)
I would not have pointed out your previous position if you had <wink>.
I expect Raymond will be agreeable, but he announced he'll be missing in action for about another month. If rollback can wait, I prefer that to electing me to do it just because I replied <0.9 wink>.

Tim Peters wrote:
For my purposes, the strategy buffer slice returns a buffer would be more appropriate because it would save the buffer type information across the slicing operation... I mean, you don't want to get bananas when you slice an apple in real life either ;-) I use buffers to mean: this is a chunk of binary data. The purpose is to recognize this type of data for pickling via xml-rpc, soap and other rpc mechanisms etc. Strings don't provide this information (since they can be a mix of text and binary data). Buffers are compatible enough with most tools working on strings that they represent a good alternative to tag data as being binary while not losing all the nice advantages of strings. The downside is that most of these tools return their results as strings :-( Now it would be nice if at least the type itself would behave in a sane way.
Oh yeah, that was during the Unicode implementation wars... :-) -- Marc-Andre Lemburg CEO eGenix.com Software GmbH _______________________________________________________________________ eGenix.com -- Makers of the Python mx Extensions: mxDateTime,mxODBC,... Python Consulting: http://www.egenix.com/ Python Software: http://www.egenix.com/files/python/

[Tim]
[M.-A. Lemburg]
How do you use buffers? Do you stick to their C API? Do you use the Python-level buffer() function? If the latter, what do you do in Python code with a buffer object after you get one? The only use I've seen made of a buffer object in Python code is as a way to trick the interpreter into crashing (via recycling the memory the buffer object points to). And from where do you get a buffer? There are darned few types in Python that buffer() accepts as an argument. Do your extension types implement tp_as_buffer? I'm blindly casting for a reason why your appreciation of the buffer object seems unique.
Overall, this reinforces the repeated observation that we don't know why the buffer object exists -- it doesn't appear to do what you really want, but you've found some way to get it to do part of what you want, up until the point you actually use it <0.7 wink>.

Here's the numarray perspective on things. Tim Peters wrote:
We use buffers in numarray to store our array data. We use readinto to load array buffers efficiently from a file. We operate on the buffer data in-place. Since numarrays are python classe instances, buffers provide a place for the data to live.
Do you stick to their C API?
We use the C-API, and currently use the buffer object too. Using the buffer object has always seemed like a necessary evil, but having reviewed numarray usage of buffer(), ditching it sounds good to me.
Do you use the Python-level buffer() function?
Yes. We go one step further, and expose writeable buffers using our own extension function. I had a feeling I was on thin ice when I did this.
I'm getting the following things by using the buffer object: 1. Knowledge that the C-type the buffer refers to meets the buffer C-API. 2. Mutable string behavior for any object which meets the buffer C-API. 3. Storage. At least we used to get storage until we found out that there's no guarantee on double alignment. I plan to work around each of these uses as follows: 1. Create an extension function which determines whether an object meets the buffer C-API. 2. Create an extension function which copies from one buffer region to another buffer region. 3. We already have our own memory object which is now typically referenced by a buffer object. With the above extensions, I don't need a buffer "wrapper" object around it anymore.
And from where do you get a buffer? There are darned few types in Python
We get ours from mmap and our own homegrown memory object.
Numarray uses buffer() too, but dumping it sounds OK. Todd -- Todd Miller Space Telescope Science Institute

How do you use buffers?
AFAIK the buffer() function can only create read-only buffers. How do you create your buffers? If you're just using the C buffer API, that's not going away.
Good.
And from where do you get a buffer? There are darned few types in Python
We get ours from mmap and our own homegrown memory object.
Maybe instead of the buffer() function/type, there should be a way to allocate raw memory? --Guido van Rossum (home page: http://www.python.org/~guido/)

Guido van Rossum wrote:
We have a very small extension function which creates writeable buffer objects using the buffer type C-API. We also wrap suitable type instances with a "buffer object wrapper". I'm slowly gathering that this is unsafe. :-(
Yes. It would also be nice to be able to: 1. Know (at the python level) that a type supports the buffer C-API. 2. Copy bytes from one buffer to another (writeable buffer).
--Guido van Rossum (home page: http://www.python.org/~guido/)
Todd -- Todd Miller Space Telescope Science Institute

We have a very small extension function which creates writeable buffer objects using the buffer type C-API.
That's how the buffer API was supposed to be used.
We also wrap suitable type instances with a "buffer object wrapper". I'm slowly gathering that this is unsafe. :-(
I don't understand what you say, but I believe you.
Maybe instead of the buffer() function/type, there should be a way to allocate raw memory?
Yes. It would also be nice to be able to:
1. Know (at the python level) that a type supports the buffer C-API.
Good idea. (I guess right now you can see if calling buffer() with an instance as argument works. :-)
2. Copy bytes from one buffer to another (writeable buffer).
Maybe you would like to work on a requirements gathering for a memory object? --Guido van Rossum (home page: http://www.python.org/~guido/)

Guido van Rossum wrote:
I meant we call PyBuffer_FromReadWriteObject and the resulting buffer lives longer than the extension function call that created it. I have heard that it is possible for the original object to "move" leaving the buffer object pointer to it dangling.
Sure. I'd be willing to poll comp.lang.python (python-list?) and collate the results of any discussion that ensues. Is that what you had in mind?
Todd

Yes, that can happen (depending on what kind if object it is).
Yes, but beware that you will have to decide which requirements make sense and which ones don't -- the community is so large these days that you can't get agreement any more. :-) Feel free to come back with results to python-dev any time. --Guido van Rossum (home page: http://www.python.org/~guido/)

--- Todd Miller <jmiller@stsci.edu> wrote:
Yes. The PyBufferObject grabs the pointer from the PyBufferProcs supporting object when the PyBufferObject is created. If the PyBufferProcs supporting object reallocates the memory (possibly from a resize) the PyBufferObject can be left with a bad pointer. This is easily possible if you try to use the array module arrays as a buffer. I've submitted a patch to fix this particular problem (among others), but there are still enough things that the buffer object can't do that something new is needed.
And the copy operations shouldn't create any large temporaries: buf1 = memory(50000) buf2 = memory(50000) # no 10K temporary should be created in the next line buf1[10000:20000] = buf2[30000:40000] The current buffer object could be used like this, but it would create a temporary string. So getting an efficient copy operation seems to require that slices just create new "views" to the same memory.
In the PEP that I'm drafting, I've been calling the new object "bytes" (since it is just a simple array of bytes). Now that you guys are referring to it as the "memory object", should I change the name? Doesn't really matter, but it might avoid confusion to know we're all talking about the same thing. __________________________________________________ Do You Yahoo!? Yahoo! Autos - Get free new car price quotes http://autos.yahoo.com

Can you remind me of the patch#? (I'm curious how you plan to fix this...)
I like bytes just fine. PS, Todd, if you can, please don't send HTML-only mail to python-dev... --Guido van Rossum (home page: http://www.python.org/~guido/)

--- Guido van Rossum <guido@python.org> wrote:
Patch number 552438. Instead of cacheing the pointer, it grabs it from the other object every time it is needed. Might be a little slower, but I think it's correct.
<chuckle> I'm bad at patience, but I'm not terribly naive. I fully expect everyone and their dog will find something to dislike before it gets approved/rejected. Cheers, -Scott __________________________________________________ Do You Yahoo!? Yahoo! Autos - Get free new car price quotes http://autos.yahoo.com

--- Guido van Rossum <guido@python.org> wrote:
Maybe instead of the buffer() function/type, there should be a way to allocate raw memory?
This is a part of my soon to be issued PEP. I've looked at their memory object, and Numarray is one of the use cases that I'm catering to. __________________________________________________ Do You Yahoo!? Yahoo! Autos - Get free new car price quotes http://autos.yahoo.com

This is a part of my soon to be issued PEP. I've looked at their memory object, and Numarray is one of the use cases that I'm catering to.
OK, then I guess Todd doesn't have to go to c.l.py for requirements. --Guido van Rossum (home page: http://www.python.org/~guido/)

--- Guido van Rossum <guido@python.org> wrote:
More information couldn't hurt too much, and since Todd Miller volunteered* to herd the information, I'll be interested to see if any new perspectives come out. * - Actually it looked like you volunteered him, but he seemed willing enough. :-) __________________________________________________ Do You Yahoo!? Yahoo! Autos - Get free new car price quotes http://autos.yahoo.com

There may be some breakage in the Win32 overlapped IO world. A common pattern is: buf = allocate_buffer_somehow(size) Perform_Async_Read(size) # wait for notification of read completing nbytes = Wait_For_Read_Notification() data = buf[:nbytes] Currently, "data" is a string. Changing this to a buffer object will presumably break this code once "data" is passed to some other function that truly requires a string.
Maybe the name 'buffer' suggests false expectations? It's not a buffer, it's an alias for a memory area.
This distinction is a little gray. In my example, it is truly a buffer - but also an alias for a memory area. In my example though, it is *not* conceptually an alias for memory owned by another object.
Maybe we should do something stronger, and deprecate the buffer type altogether.
Maybe. However, as you have seen over the years, *something* from all this mess is a real requirement. This example of asynch IO is the only example I have ever used, but IMO, it is a real and reasonable requirement. My example *could* have been done with array() (assuming the array module had a C API exposed which it doesn't/didn't) but that too looks like a square peg in a round hole - my requirements call for a pre-allocated byte buffer, not an array. All that said: if the worst came to the worst, I could ensure that the Win32 extensions are left compatible with the way they are. All such buffers are allocated using a function inside one of my modules. Currently this just returns a buffer() object, but could be changed to a private object with the same semantics as the existing buffer() object. So consider this more a data point than an attempted veto. Mark.

--- Raymond Hettinger <python@rcn.com> wrote:
I think buffers have a weird duality that they don't really want. In one case, the buffer object acts as a low level way to inspect some other object's PyBufferProcs. I'll call this BufferInspector. In the other case, the buffer object just acts like an array of bytes. I'll call this ByteArray. So for a BufferInspector, you'd want slices to return new "views" into the same object, and repetition doesn't make any sense. If you wanted to copy the data out of the object you're mucking with, you'd be explicit about it - either creating a new string, or a new ByteArray. For a ByteArray, I think you'd want slices to have copy behaviour and return a new ByteArray. Repetition also makes perfect sense. Of course this all gets screwy when the object being inspected by the BufferInspector sense is created solely to provide a ByteArray. I see this as an ugly workaround for arraymodule.c not allowing one to supply a pointer/destructor when creating arrays. The fact that either of these pretend to be strings is really convenient, but I don't think it has much to do with the weirdness. The fact that either of these returns strings for any operation is somewhat weird. For the ByteArray sense of the buffer object, it's analagous to a list slice/repetition returning a tuple. Since the array module already has a way to create a ByteArray (and a ShortArray, and...), buffer objects don't really need to duplicate that effort. Except creating an array from your own "special memory" (mmap, DMA, third party API), and backwards compatibility in general. :-) BTW: I chuckled when I saw you post this the first time. This topic seems to draw a lot of silence. I know that I would suggest deprecating the PyBufferObject to just being a BufferInspector, and taking what little extra functionality was in there and stuffing it into arraymodule.c. Another solution would be to factor PyBufferObject into PyBufferInspector and a "bytes" object. A few months ago, I was tempted to submit a PEP saying as much, but I think that would have quietly fallen to the floor. Nobody seems to like this topic too much... If you do go in and make changes to bufferobject.c, I've already submitted two patches (fallen quietly to the floor) that fix some other classic "buffer problems". You might want to look at them. Or not :-) Cheers, -Scott __________________________________________________ Do You Yahoo!? Yahoo! - Official partner of 2002 FIFA World Cup http://fifaworldcup.yahoo.com

On Sun, Jun 23, 2002, Scott Gilbert wrote:
OTOH, for PEPs, silence may be construed as consent. Just don't be too surprised if an actual PEP generated a lot of noise. -- Aahz (aahz@pythoncraft.com) <*> http://www.pythoncraft.com/ Project Vote Smart: http://www.vote-smart.org/

I'm a little surprised. Raymond Hettinger checked in a change that makes all slices of buffer objects return strings. His comments on SF bug 546434 say that only one person replied and that they agreed returning strings was the better solution. But that's not how I read the only response to his query that I see in python-dev, from Scott Gilbert:
I read this as a recommendation to forget about returning strings. Am I mistaken? Also, I wish you'd submitted that PEP. IMO the reason that nobody likes this topic is that there is much confusion about why we have buffer objects in the first place. Any attempt at clarifying this (e.g. proposing separate byte arrays and buffer inspectors) would be welcome. --Guido van Rossum (home page: http://www.python.org/~guido/)

Guido van Rossum wrote:
Interesting. I must have skipped that message. IMHO, all slices of buffer object should return buffer objects, but since all Python releases return strings, I guess this is too late to change. Note that the only case where a buffer object is returned in Python 2.x (x < 3) is if you write buffer()[:], i.e. you want a copy of the buffer object. -- Marc-Andre Lemburg CEO eGenix.com Software GmbH _______________________________________________________________________ eGenix.com -- Makers of the Python mx Extensions: mxDateTime,mxODBC,... Python Consulting: http://www.egenix.com/ Python Software: http://www.egenix.com/files/python/

You blink, and you find that the world has changed.
That was my preference too, but Raymond disagreed and somehow tried to find support for his position :-). Since buffer objects (of course :-) support the C-level buffer protocol, they can still be used in most places where strings are needed. But it would be incompatible. But so is Raymond's solution (because it changes buffer()[:] to also return a string).
What does a copy of a buffer object buy you? It's not too late to revert Raymond's changes. --Guido van Rossum (home page: http://www.python.org/~guido/)

Guido van Rossum wrote:
Indeed :-)
Nothing... since you only get a new reference, not an independent copy.
It's not too late to revert Raymond's changes.
Why not try the buffer slice returns buffer logic for a few alphas, then betas, and then if noone complains the final release ? -- Marc-Andre Lemburg CEO eGenix.com Software GmbH _______________________________________________________________________ eGenix.com -- Makers of the Python mx Extensions: mxDateTime,mxODBC,... Python Consulting: http://www.egenix.com/ Python Software: http://www.egenix.com/files/python/

Since nobody cares, we won't get complaints. But it's a waste of time. I'm going to deprecate it. --Guido van Rossum (home page: http://www.python.org/~guido/)

--- Guido van Rossum wrote:
After the message you're referring to, Raymond Hettinger and I corresponded a little bit off of the list. I think these are probably the most relevant snippets: --- Raymond Hettinger:
For the problem at hand, do you recommend returning buffer objects or strings?
--- To which I responded:
Forgive the bit about "Guido not caring about it", it seemed that way to me at the time. Silence comes off as disinterest or annoyance. So my suggestion was that since taking away the implicit promotion of buffer slices/repetitions/concatenations to strings was going to break someone's code, that just can't be done. If we want sane behavior, then any slice, be it buf[1:2] or buf[:], ought to at least return the same type of object. Those two in conjunction mean they ought to always returns strings. --- Raymond Hettinger also wrote:
Thanks for your input, this topic doesn't seem to interest anyone,
--- To which I responded:
I think there are others that are interested, but it's pretty tough to
get the
--- Back to Guido van Rossum:
I read this as a recommendation to forget about returning strings. Am I mistaken?
Only if breaking backwards compatibility is an option. I'd like to see that happen, but I think that would take a pronouncement from someone in authority. --- More of Guido van Rossum:
I'm glad to hear this. I'll submit the PEP sometime in the next week. Cheers, -Scott Gilbert __________________________________________________ Do You Yahoo!? Sign up for SBC Yahoo! Dial - First Month Free http://sbc.yahoo.com

It seems we're still in the same boat. It would be saner to change buffer slices to return buffer objects, except for backward compatibility. I was hoping to hear from someone who uses buffer objects and knows that this would break his code. Scott apparently doesn't have this problem with his own code, so his opinion doesn't help. :-( Raymond's change still breaks compatibility, though, for slices without begin and end points. So we have a contradiction: out of fear of breaking compatibility, we make a change that breaks compatibility. Maybe we should do the same with the buffer object as we did with xrange(), and plan to remove all functionality that we aren't sure is useful? In 2.3, we would have to maintain compatibility but we could warn about features that will go away; in 2.4, we could remove unwanted features. Maybe the name 'buffer' suggests false expectations? It's not a buffer, it's an alias for a memory area. Maybe we should do something stronger, and deprecate the buffer type altogether. --Guido van Rossum (home page: http://www.python.org/~guido/)

I'm not too interested in this anymore (I _was_ a year ago, IIRC). I have given up using the buffer object myself, I've written my own (maybe in the same way as others).
Maybe the name 'buffer' suggests false expectations? It's not a buffer, it's an alias for a memory area.
Hm. The name could be right (and I cold give up my own memory object) if there were a way to create a buffer owning it's own memory.
Maybe we should do something stronger, and deprecate the buffer type altogether.
Or this. Thomas

Right.
Maybe your memory object could become a standard Python extension. Extra points if it works well with the memmap and the array modules. --Guido van Rossum (home page: http://www.python.org/~guido/)

What do you mean by 'works well with the mmap and array modules'?
I'm not sure, since I don't know what your memory object does (and frankly, I don't really understand what the mmap module does either :-). I was just mentioning these because they are other modules that have been used and/or proposed for buffering needs. --Guido van Rossum (home page: http://www.python.org/~guido/)

"Memory-mapped file objects behave like both strings and like file objects. Unlike normal string objects, however, these are mutable." More in the Python manual... Optionally they can be backed up by files in the file system, and optionally they can be shared between processes. At least that's what they are under Windows.
I was just mentioning these because they are other modules that have been used and/or proposed for buffering needs.
Now that you mention this, mmap could be used as a 'memory' object, although it would have to be converted into a new style class. My own memory object currently supports a private protocol which dosn't make sense for core Python. But that can be fixed. Thomas

{Guido, to Scott Gilbert]
Raymond did a survey on c.l.py, asking anyone who used buffer objects at *all* to speak up. IIRC, he got no replies. On Python-Dev, apart from musing whether they might conceivably use them, the only person who eventually said they actually used them was Marc-Andre. Fredrik pressed for details, but we haven't seen any concrete use cases. In the absence of the latter, it's impossible to guess what would be backward compatible for MAL's purposes.
I told everyone you forgot the essay you wrote suggesting this the last time this rose above everyone's pain threshold. It's a comfort to know that my channeling powers have not diminished with exponentially advancing age <wink>: http://mail.python.org/pipermail/python-dev/2000-October/009974.html

But at least I didn't change my mind. :-) So let's deprecate buffer(). I also suggest to roll back Raymond's changes to make slices more consistent -- there's no point in changing something that's only kept for backwards compatibility reasons. --Guido van Rossum (home page: http://www.python.org/~guido/)

[Guido]
But at least I didn't change my mind. :-)
I would not have pointed out your previous position if you had <wink>.
I expect Raymond will be agreeable, but he announced he'll be missing in action for about another month. If rollback can wait, I prefer that to electing me to do it just because I replied <0.9 wink>.

Tim Peters wrote:
For my purposes, the strategy buffer slice returns a buffer would be more appropriate because it would save the buffer type information across the slicing operation... I mean, you don't want to get bananas when you slice an apple in real life either ;-) I use buffers to mean: this is a chunk of binary data. The purpose is to recognize this type of data for pickling via xml-rpc, soap and other rpc mechanisms etc. Strings don't provide this information (since they can be a mix of text and binary data). Buffers are compatible enough with most tools working on strings that they represent a good alternative to tag data as being binary while not losing all the nice advantages of strings. The downside is that most of these tools return their results as strings :-( Now it would be nice if at least the type itself would behave in a sane way.
Oh yeah, that was during the Unicode implementation wars... :-) -- Marc-Andre Lemburg CEO eGenix.com Software GmbH _______________________________________________________________________ eGenix.com -- Makers of the Python mx Extensions: mxDateTime,mxODBC,... Python Consulting: http://www.egenix.com/ Python Software: http://www.egenix.com/files/python/

[Tim]
[M.-A. Lemburg]
How do you use buffers? Do you stick to their C API? Do you use the Python-level buffer() function? If the latter, what do you do in Python code with a buffer object after you get one? The only use I've seen made of a buffer object in Python code is as a way to trick the interpreter into crashing (via recycling the memory the buffer object points to). And from where do you get a buffer? There are darned few types in Python that buffer() accepts as an argument. Do your extension types implement tp_as_buffer? I'm blindly casting for a reason why your appreciation of the buffer object seems unique.
Overall, this reinforces the repeated observation that we don't know why the buffer object exists -- it doesn't appear to do what you really want, but you've found some way to get it to do part of what you want, up until the point you actually use it <0.7 wink>.

Here's the numarray perspective on things. Tim Peters wrote:
We use buffers in numarray to store our array data. We use readinto to load array buffers efficiently from a file. We operate on the buffer data in-place. Since numarrays are python classe instances, buffers provide a place for the data to live.
Do you stick to their C API?
We use the C-API, and currently use the buffer object too. Using the buffer object has always seemed like a necessary evil, but having reviewed numarray usage of buffer(), ditching it sounds good to me.
Do you use the Python-level buffer() function?
Yes. We go one step further, and expose writeable buffers using our own extension function. I had a feeling I was on thin ice when I did this.
I'm getting the following things by using the buffer object: 1. Knowledge that the C-type the buffer refers to meets the buffer C-API. 2. Mutable string behavior for any object which meets the buffer C-API. 3. Storage. At least we used to get storage until we found out that there's no guarantee on double alignment. I plan to work around each of these uses as follows: 1. Create an extension function which determines whether an object meets the buffer C-API. 2. Create an extension function which copies from one buffer region to another buffer region. 3. We already have our own memory object which is now typically referenced by a buffer object. With the above extensions, I don't need a buffer "wrapper" object around it anymore.
And from where do you get a buffer? There are darned few types in Python
We get ours from mmap and our own homegrown memory object.
Numarray uses buffer() too, but dumping it sounds OK. Todd -- Todd Miller Space Telescope Science Institute

How do you use buffers?
AFAIK the buffer() function can only create read-only buffers. How do you create your buffers? If you're just using the C buffer API, that's not going away.
Good.
And from where do you get a buffer? There are darned few types in Python
We get ours from mmap and our own homegrown memory object.
Maybe instead of the buffer() function/type, there should be a way to allocate raw memory? --Guido van Rossum (home page: http://www.python.org/~guido/)

Guido van Rossum wrote:
We have a very small extension function which creates writeable buffer objects using the buffer type C-API. We also wrap suitable type instances with a "buffer object wrapper". I'm slowly gathering that this is unsafe. :-(
Yes. It would also be nice to be able to: 1. Know (at the python level) that a type supports the buffer C-API. 2. Copy bytes from one buffer to another (writeable buffer).
--Guido van Rossum (home page: http://www.python.org/~guido/)
Todd -- Todd Miller Space Telescope Science Institute

We have a very small extension function which creates writeable buffer objects using the buffer type C-API.
That's how the buffer API was supposed to be used.
We also wrap suitable type instances with a "buffer object wrapper". I'm slowly gathering that this is unsafe. :-(
I don't understand what you say, but I believe you.
Maybe instead of the buffer() function/type, there should be a way to allocate raw memory?
Yes. It would also be nice to be able to:
1. Know (at the python level) that a type supports the buffer C-API.
Good idea. (I guess right now you can see if calling buffer() with an instance as argument works. :-)
2. Copy bytes from one buffer to another (writeable buffer).
Maybe you would like to work on a requirements gathering for a memory object? --Guido van Rossum (home page: http://www.python.org/~guido/)

Guido van Rossum wrote:
I meant we call PyBuffer_FromReadWriteObject and the resulting buffer lives longer than the extension function call that created it. I have heard that it is possible for the original object to "move" leaving the buffer object pointer to it dangling.
Sure. I'd be willing to poll comp.lang.python (python-list?) and collate the results of any discussion that ensues. Is that what you had in mind?
Todd

Yes, that can happen (depending on what kind if object it is).
Yes, but beware that you will have to decide which requirements make sense and which ones don't -- the community is so large these days that you can't get agreement any more. :-) Feel free to come back with results to python-dev any time. --Guido van Rossum (home page: http://www.python.org/~guido/)

--- Todd Miller <jmiller@stsci.edu> wrote:
Yes. The PyBufferObject grabs the pointer from the PyBufferProcs supporting object when the PyBufferObject is created. If the PyBufferProcs supporting object reallocates the memory (possibly from a resize) the PyBufferObject can be left with a bad pointer. This is easily possible if you try to use the array module arrays as a buffer. I've submitted a patch to fix this particular problem (among others), but there are still enough things that the buffer object can't do that something new is needed.
And the copy operations shouldn't create any large temporaries: buf1 = memory(50000) buf2 = memory(50000) # no 10K temporary should be created in the next line buf1[10000:20000] = buf2[30000:40000] The current buffer object could be used like this, but it would create a temporary string. So getting an efficient copy operation seems to require that slices just create new "views" to the same memory.
In the PEP that I'm drafting, I've been calling the new object "bytes" (since it is just a simple array of bytes). Now that you guys are referring to it as the "memory object", should I change the name? Doesn't really matter, but it might avoid confusion to know we're all talking about the same thing. __________________________________________________ Do You Yahoo!? Yahoo! Autos - Get free new car price quotes http://autos.yahoo.com

Can you remind me of the patch#? (I'm curious how you plan to fix this...)
I like bytes just fine. PS, Todd, if you can, please don't send HTML-only mail to python-dev... --Guido van Rossum (home page: http://www.python.org/~guido/)

--- Guido van Rossum <guido@python.org> wrote:
Patch number 552438. Instead of cacheing the pointer, it grabs it from the other object every time it is needed. Might be a little slower, but I think it's correct.
<chuckle> I'm bad at patience, but I'm not terribly naive. I fully expect everyone and their dog will find something to dislike before it gets approved/rejected. Cheers, -Scott __________________________________________________ Do You Yahoo!? Yahoo! Autos - Get free new car price quotes http://autos.yahoo.com

--- Guido van Rossum <guido@python.org> wrote:
Maybe instead of the buffer() function/type, there should be a way to allocate raw memory?
This is a part of my soon to be issued PEP. I've looked at their memory object, and Numarray is one of the use cases that I'm catering to. __________________________________________________ Do You Yahoo!? Yahoo! Autos - Get free new car price quotes http://autos.yahoo.com

This is a part of my soon to be issued PEP. I've looked at their memory object, and Numarray is one of the use cases that I'm catering to.
OK, then I guess Todd doesn't have to go to c.l.py for requirements. --Guido van Rossum (home page: http://www.python.org/~guido/)

--- Guido van Rossum <guido@python.org> wrote:
More information couldn't hurt too much, and since Todd Miller volunteered* to herd the information, I'll be interested to see if any new perspectives come out. * - Actually it looked like you volunteered him, but he seemed willing enough. :-) __________________________________________________ Do You Yahoo!? Yahoo! Autos - Get free new car price quotes http://autos.yahoo.com

There may be some breakage in the Win32 overlapped IO world. A common pattern is: buf = allocate_buffer_somehow(size) Perform_Async_Read(size) # wait for notification of read completing nbytes = Wait_For_Read_Notification() data = buf[:nbytes] Currently, "data" is a string. Changing this to a buffer object will presumably break this code once "data" is passed to some other function that truly requires a string.
Maybe the name 'buffer' suggests false expectations? It's not a buffer, it's an alias for a memory area.
This distinction is a little gray. In my example, it is truly a buffer - but also an alias for a memory area. In my example though, it is *not* conceptually an alias for memory owned by another object.
Maybe we should do something stronger, and deprecate the buffer type altogether.
Maybe. However, as you have seen over the years, *something* from all this mess is a real requirement. This example of asynch IO is the only example I have ever used, but IMO, it is a real and reasonable requirement. My example *could* have been done with array() (assuming the array module had a C API exposed which it doesn't/didn't) but that too looks like a square peg in a round hole - my requirements call for a pre-allocated byte buffer, not an array. All that said: if the worst came to the worst, I could ensure that the Win32 extensions are left compatible with the way they are. All such buffers are allocated using a function inside one of my modules. Currently this just returns a buffer() object, but could be changed to a private object with the same semantics as the existing buffer() object. So consider this more a data point than an attempted veto. Mark.
participants (9)
-
Aahz
-
Guido van Rossum
-
M.-A. Lemburg
-
Mark Hammond
-
Raymond Hettinger
-
Scott Gilbert
-
Thomas Heller
-
Tim Peters
-
Todd Miller