Re: [Python-checkins] CVS: python/dist/src/Modules stropmodule.c,2.81,2.82

On Mon, May 14, 2001 at 07:14:46PM -0700, Guido van Rossum wrote:
Something that I ran into the other day...
The point is that strop uses the t# to get a ptr/len pair to do its work. Thus, it can work on many things that export the buffer interface. Dropping strop means we no longer have many of those functions. Instead, the functionality must be copied to *every* object that implements the buffer interface. We can say ob.find() now, but we can't say find(ob) any longer. And saying that all objects (which implement the buffer API) must now implement a bunch of "standard" methods is awfully burdensome. In my particular case, I was trying to do a find on a BufferObject referring to a subset of another object. Blam. No good. Thankfully, when I did a find() on a mmap object, it worked simply because mmaps happen to define a .find method. [ of course, the find method on an mmap was totally broken, but I checked in a fix for that (last week or so) ] So... my question is: is there any way that we can retain a generic find() (and similar functions from the string/strop module) that operates on any type that implements the buffer API? Maybe there is some way we can do a mixin for Python types? e.g. "this mixin implements some standard methods for 8-bit character data (using the buffer API), which can be mixed into new Python types" That would reduce the burden for new types. Thoughts? Cheers, -g -- Greg Stein, http://www.lyra.org/

Greg Stein wrote:
I suppose that in 2.2 we'll be able to build a class/type hierarchy which then provides these possibilities. I haven't followed Guido's latest checkins closely though -- could be that types don't support multiple inheritence. BTW, wouldn't it suffice to add these methods to buffer objects ? Then you could write: buffer(ob).find('.'). -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/

On Thu, May 24, 2001 at 12:18:50PM +0200, M.-A. Lemburg wrote:
No idea either... that's why I asked.
BTW, wouldn't it suffice to add these methods to buffer objects ? Then you could write: buffer(ob).find('.').
You're totally missing the point with that suggestion. It does *not* suffice to add them to buffer objects. What about array objects? mmap objects? Random Joe Object who implements the buffer interface? All of those are out of luck. With strop, I can pass any of those objects to strop.find(). That function has a polymorphic argument. In the current arrangement, every object must implement their own .find and .upper and .whatever. Cheers, -g -- Greg Stein, http://www.lyra.org/

Greg Stein wrote:
That's the point: you can wrap all those into a buffer object and then use the buffer object methods to manipulate them. In that sense, buffer objects provide an adaptor to the underlying object which implements the needed methods.
-- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/

On Thu, May 24, 2001 at 04:54:24PM +0200, M.-A. Lemburg wrote:
That would certainly be a valid solution. And at the C level, we could share functions between PyBufferObject and PyStringObject. Cheers, -g -- Greg Stein, http://www.lyra.org/

"M.-A. Lemburg" wrote:
Sounds like you are trying to make the buffer object into something it is not. Not that I have the foggiest idea what it is now, since it hasn't much use and is badly broken. I like your idea of sharing functions, I just don't think the buffer object is the proper means. I think the buffer object should be removed from Python and something better put in its place. (I'm not talking about the buffer C/API, though this could also use an overhaul, since it doesn't provide enough information to the receiving method.) What I think we need is: 1) a malloc object which has a similar interface to the mmap object with access protection, etc. This object would be the fundamental way of getting memory. The string object would use it to allocate a chunk of 'read-only' memory. Other objects would then know not to modify the contents of the memory. If you wanted a reference or view of the memory/buffer, you would get a reference to this object. 2) objects supporting the buffer object should provide a view method which returns a copy of themselves (and hence all their methods) and can be used to get a pointer to a subset of its memory. In this way the type of memory/buffer being accessed is known compared to the current buffer object which only indicates the buffer is binary or char data. In essence information about how the buffer should be used is lost in the current buffer C/API. -- Paul Barrett, PhD Space Telescope Science Institute Phone: 410-338-4475 ESS/Science Software Group FAX: 410-338-4767 Baltimore, MD 21218

On Fri, May 25, 2001 at 09:21:20AM -0400, Paul Barrett wrote:
The buffer object is intended to provide a Python-level object (with methods and behavior) for any other object which exports the buffer API (but not those particular methods/behavior). It was added for Python 1.5.2, but did not keep up with the methods added to the string object. Arguably, it is out of date rather than "[turning it into] something it is not."
Not that I have the foggiest idea what it is now, since it hasn't much use and is badly broken.
"badly" is overstating the problem. It caches a pointer when it shouldn't. This doesn't work well when using it with array objects or PIL's image objects. Most objects, it is fine. The buffer object is also very good for C/Python extensions and embedding code. It provides a Python-level view on a block of memory. Using a string object implies making a copy, and it removes the possibility for read/write access to that memory. And you state: "Not that I have the foggiest idea what it is now". If so, then wtf are you making statements about the buffer object's behavior?
You're talking about the buffer object that we have *today*. It can refer to another object (i.e. the memory exposed via the other object's buffer API), refer to memory, or it can allocate its own memory. The buffer object can be marked read-only, or read-write.
I'm not sure that I understand this paragraph. No... what needs to happen is to have the bug in PyBufferObject fixed. Then to refactor stringobject.c and stropmodule.c to move all of those byte-oriented processing functions into a new file such as Python/byteops.c (whatever; name isn't important). Ideally, stringobject.c and stropmodule.c would be simple covers over the same functions. Those functions can then be used by PyBufferObject to implement the rest of the string methods on itself. This would leave us at MAL's suggested point: via the buffer object, we can perform all of the standard string methods/ops on any object that implements the buffer API. Cheers, -g -- Greg Stein, http://www.lyra.org/

Greg Stein wrote:
I wonder how we could achieve this without copy&pasting all the needed methods from stringobject.c to bufferobject.c.... all the string methods use the string object layout directly rather than just dealing with a pointer and a length. -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/

The buffer object has been neglected for years: is that because it's in prime shape, or because nobody cares about it enough to maintain it? "The bug" has been known for years without any action taken to address it; the docs give up in spots and nobody addresses that either (like "The current policy seems to state that these characters may be multi-byte characters" -- well, yes or no?); the builtin buffer() function isn't called anywhere in the std test suite; the file object still has an undocumented readinto() method that just confuses people who bump into it; and it's so obscure in daily life that it appears Guido didn't even think of it when adding iterators for the other sequence types. I expect that answers my question <wink>. Is someone (Greg? MAL?) going to champion it now? That would be cool. About combining strop and buffers and strings, don't forget unicodeobject.c: that's got oodles of basically duplicate code too. /F suggested dealing with the minor differences via maintaining one code file that gets compiled multiple times w/ appropriate #defines.

Tim Peters wrote:
I believe that nobody really likes the buffer interface enough to let the world know about it, except maybe Greg ;-) Even the idea of replacing the usage of strings as data buffers with buffer object didn't get very far; common habits are simply hard to break.
Hmm, that only saves us a few kB in source, but certainly not in the object files. The better idea would be making the types subclass from a generic abstract string object -- I just don't know how this will be possible with Guido's type patches. We'll just have to wait, I guess. -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/

[Tim]
[MAL]
Hmm, that only saves us a few kB in source, but certainly not in the object files.
That's not the point. Manually duplicated code blocks always get out of synch, as people fix bugs in, or enhance, one of them but don't even know about the others. /F brought this up after I pissed away a few hours trying to repair one of these in all places, and he noted that strop.replace() and string.replace() are woefully inefficient anyway.
Wait for what? If it were possible, is the chance that you'd take time to rework unicodeobject.c to "subclass from a generic abstract string object" greater than 0? The chance that I would is exactly 0.

Tim Peters wrote:
Ok, so what we'd need is a bunch of generic low-level string operations: one set for 8-bit and one for 16-bit code. Looking at unicodeobject.c it seems that the section "Helpers" would be a good start, plus perhaps a few bits from the method implementations refactored to form a low-level string template library. Perhaps we should move this code into a file stringhelpers.h which then gets included by stringobject.c and unicodeobject.c with appropriate #defines set up for 8-bit strings and for Unicode.
Well, that's hard to say. It would certainly be low-priority; same for the above refactoring. -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/

[Tim]
[MAL]
Well, that's hard to say. It would certainly be low-priority; same for the above refactoring.
I think you must have missed this when it first came up here: /F suggested that *he* had a non-zero chance of implementing his suggestion. That makes it far closer to reality than anything that's been suggested since <wink>.

Paul Barrett writes:
In a development version of my bindings to a Type-1 font rasterizer, I exposed a buffer interface to the resulting image data. Unfortunately, that code was lost and I've not had time to work that up again. I *think* that sort of thing was part of the intended application for the buffer interface, but I was not one of the "movers & shakers" for it, so I'm not entirely sure.
I agree. From the discussions I remember, I don't recall a clear explanation of the need for "segmented" buffers. But that may just be a failing of my recollection.
I'm not sure about the "rf_flags" field -- I see two aspects that you seem to be describing, and wouldn't call either use a "flag". There's data type (characters, anonymous binary data, image data, etc.), and element size (1 byte, 2 bytes, variable width). Those values may or may not be associated with the specific buffer or the type implementing the buffer (I'd go with the specific buffer just to allow buffer types that support different flavors).
PEPs are good; I'll look forward to seeing it! -Fred -- Fred L. Drake, Jr. <fdrake at acm.org> PythonLabs at Digital Creations

[Paul Barrett]
From the discussion so far, it appears that the buffer object is intended solely to support string-like objects.
Unsure where that impression came from. Since buffers wrap a slice "of memory", they don't make much sense except where raw memory makes sense. That includes the guts of strings, but also (in the core distribution) memory-mapped files (the mmap module) and arrays (the array module), which also support the buffer interface.
I've seen no mention of their use for binary data objects,
I mentioned two above. The use of buffers with mutable objects is dangerous, though, because of the dangling-pointer problem, and Python itself never uses buffers except for strings. Even arrays are stretching it; e.g.,
While of *some* conceivable use, that's not exactly destined to become wildly popular <wink>.
such as multidimensional arrays and matrices.
Since core Python has no such things, of course it doesn't use buffers for those either.
Will the buffer object also support these objects?
In what sense? If you have an implementation of such things, and believe that getting at raw memory slices is useful, sure -- fill in its tp_as_buffer slot.
Or do you mean redesigned?
AFACT it's entirely unused; everything in the core that supports the buffer interface returns a segment count of 1, and the buffer object itself appears to raise exceptions whenever it sees a reference to a segment other than "the first". I don't know why it's there.
Second, the dangling pointer issue has not been resolved.
I expect Greg will fix that now.
To sell that (but please save it for the PEP <wink>) I expect you have to provide some compelling uses for it. The current uses have no need of it. In the absence of specific good uses, I'm afraid it just sounds like another variant of "I can't prove segments *won't* be useful, so let's toss them in too!".
Depends on what you want to do. You've only mentioned multidimensional arrays, and the need for umpteen flavors of access control there, beyond the current object's b_readonly flag, is simply unclear. Also unclear why you've dropped the current object's b_base pointer: without it, the buffer has no way to get back to the object from which the memory is borrowed, nor even a guarantee that the object won't die while the buffer is still active. If you do pursue this, please please please boost the rf_length field! An int is too small to hold real-life sizes anymore, and "large files" are becoming common even on 32-bit boxes. Python needs to grow a wholly supported way to pass 8-byte ints around (and it looks like I'll be adding that to the struct module, possibly to the array module and marshal too).
A PEP is always a good idea.

Tim Peters wrote:
Hey! Are you discriminating against 128-bit ints? -- --- Aahz (@pobox.com) Hugs and backrubs -- I break Rule 6 <*> http://www.rahul.net/aahz/ Androgynous poly kinky vanilla queer het Pythonista I don't really mind a person having the last whine, but I do mind someone else having the last self-righteous whine.

[Aahz]
Hey! Are you discriminating against 128-bit ints?
Nope! I'm Guido's marketing guy: 128-bit ints will be the killer reason you need to upgrade to Python 3000, when the time comes. Python didn't get to where it is by giving away all the good stuff early <wink>.

complex: the support for multiple buffers does not appear necessary.
I seem to recall Guido telling me once that this was implemented for NumPy, specifically for some of their matrices. Not being a user of that package means that unfortunately I can not be any more specific... I am confident Guido will recall the specific details... Mark.

On Sat, May 26, 2001 at 05:47:47PM +0200, M.-A. Lemburg wrote:
That idea was shot down when Guido said that 'c' arrays should be the "official form of a data buffer." Cheers, -g -- Greg Stein, http://www.lyra.org/

[Tim]
The buffer object has been neglected for years: is that because it's in prime shape, or because nobody cares about it enough to maintain it?
My take is a little different. I think people could be convinced to care about it, and indeed I do. However, it has one fatal flaw, and no one seems to know what to do about it. The problem is the one best demonstrated with the array module - if you get a pointer to the buffer interface for an array object, but the array then resizes itself, the buffer pointer dangles. There have been a few attempts over time to raise the buffer profile, but this design flaw leaves people scratching their head - it is hard to press for adoption of a feature that has a known crash hiding away. However, addressing this problem is difficult. Guido appears unconvinced that buffer objects and interfaces are that worthwhile. It appears no one else knows how to proceed in the face of this ambivalence - that describes my take even if no one elses. The-buffer-is-dead,-long-live-the-buffer ly, Mark.

Mark Hammond wrote:
I guess there are three ways to "solve" this: a) mutable types don't implement the getreadbuf interface b) the getreadbuf interface is complemented with a callback interface, so the the buffer object can be notified of the change c) calling getreadbuf on a mutable object causes this object to become immutable -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/

[MAL]
I guess there are three ways to "solve" this:
a) mutable types don't implement the getreadbuf interface
Of the few types that implement it today, that would leave only strings (8-bit and Unicode). Too much machinery just for that. Besides, I once posted an example to c.l.py showing how to use regexps to search mmap'ed files, so *that* must continue to work forever <wink>.
I like this best, although there's no bound on the number of buffers that may need to be notified in case of change (i.e., the object would need to maintain a list of buffers to be notified).
c) calling getreadbuf on a mutable object causes this object to become immutable
Even easier, core dump as soon as getreadbuf is called <wink>. [Greg Ewing]
I think it would be safe if:
1) it kept a reference to the underlying object, and
That much it already does.
2) it re-fetched the pointer and length info each time it was needed, using the underlying object's buffer interface.
If after b = buffer(some_object) b.__getitem__ needed to refetch the info between b[i] and b[i+1] I expect it would be so slow even Greg wouldn't want it anymore.

On Sun, May 27, 2001 at 09:42:30PM -0400, Tim Peters wrote:
Huh? I don't think it would be all that slow. It is just a function call. And I don't think that the getitem slot is really used all that frequently (in a loop) for buffer type objects. I've been thinking that refetching the ptr/len is the right fix. Cheers, -g -- Greg Stein, http://www.lyra.org/

[Tim]
[Greg]
I expect they index into the buffer memory directly then, right? Then for buffers obtained from mutable objects, any such loop is unsafe in the absence of the GIL, or even in its presence if the loop contains code that may call back into Python.
I've been thinking that refetching the ptr/len is the right fix.
So is calling __getitem__ all the time then, unless you want to dance on the razor's edge. The idea that you can safely "borrow" memory from a mutable object without copying it is brittle.
I take that as "yes" to my "nobody cares about it enough to maintain it?". In that light, Guido's ambivalence is indeed surprising <wink>.

On Sat, Jun 02, 2001 at 02:34:43AM -0400, Tim Peters wrote:
Most access is: fetch ptr/len, index into the memory. And yes: anything within that loop which could conceivably change the target object (especially a call into Python) could move that ptr. I was saying that, at the Python level, using a loop and doing b[i] into a buffer/string/unicode object would seem to be relatively rare. b[0] and stuff is reasonably common.
Stay in C code and don't call into Python. It is safe then. The buffer API is exactly what you're saying: borrow a memory reference. The concept makes a lot of things possible that weren't before. The buffer object's storing of that reference was a mistake.
Eh? I'll maintain the thing, but you're confusing that with adding more features into it. Different question. Cheers, -g -- Greg Stein, http://www.lyra.org/

[Greg Stein]
Well, at the Python level buffer objects seem never to be used, probably because all the people who know about them don't advertise it because it's an easy way to provoke core dumps now. I don't have any real objection to any way anyone wants to fix that, just so long as it gets fixed.
Eh? I'll maintain the thing, but you're confusing that with adding more features into it. Different question.
I haven't asked for new features, just that what's already there get fixed: Python-level buffer objects are unsafe, the docs remain incomplete, there's random stuff like file.readinto() that's not documented at all (could be that's the only one -- it's certainly "discovered" on c.l.py often enough, though), and there are no buffer tests in the std test suite. The work to introduce the type wasn't completed, nobody works on it, and finishing work 3 years late doesn't count as "new feature" in my book <wink>.

On Sun, Jun 03, 2001 at 03:55:43AM -0400, Tim Peters wrote:
I'm talking about string objects and unicode objects, too. The point is that b[i] loops don't have to be all that speedy because it isn't used often.
because all the people who know about them don't advertise it because it's an easy way to provoke core dumps now.
Easy? Depends on what you use them with.
I'll fix the code.
Find another goat to screw for that one. I don't know anything about it. Hmm... Using the "annotate" feature of ViewCVS, I see that Guido added it. Go blame him if you want to scream about that function and its lack of doc.
Now you're just being bothersome. You want all that stuff, then feel free. I'll volunteer to do the code. You can go beat some heads, or find other volunteers. I'll do the code fixing just to placate you, and to get all this ranting about the buffer object to quiet down, but not because I'm joyful to do it. not-cheers, -g -- Greg Stein, http://www.lyra.org/

[Tim]
because all the people who know about them don't advertise it because it's an easy way to provoke core dumps now.
[Greg Stein]
Easy? Depends on what you use them with.
"Easy" and "depends" both, sure. I don't understand the argument: core dumps are always presumed to be errors in the Python implementation, not the users's fault. In this case, they are Python's fault by any accounting. On rare occasions we just give up and say "sorry, but we simply don't know a reasonable way fix it -- but it's still Python's fault" (for example, see the dict thread this weekend).
I haven't asked for new features, just that what's already there get fixed: Python-level buffer objects are unsafe
I'll fix the code.
Thank you!
I don't care who added it: I haven't asked anyone specific to do anything. I've been asking whether *anyone* cares enough to address the backlog of buffer maintenance work. I don't even know who dreamed up the buffer object -- although at this point I bet I can guess <wink>.
Now you're just being bothersome.
You bet. It's the same list of things I gave in my first msg; nobody volunteered to do any work then, so I repeated them.
You want all that stuff, then feel free.
"All that stuff" is the minimum now required of new features. Buffers got in before Guido got tougher about this stuff, but if they're worth having at all then surely they're worth bringing up to current standards.
I'll volunteer to do the code. You can go beat some heads, or find other volunteers.
Anyone else care to chip in?
OK, I feel guitly -- but if that's enough to make you feel joyful again, the psychology here is just sick <wink>.

However, it has one fatal flaw, and no one seems to know what to do about it.
I think it would be safe if: 1) it kept a reference to the underlying object, and 2) it re-fetched the pointer and length info each time it was needed, using the underlying object's buffer interface. Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+

On Sat, May 26, 2001 at 02:44:04AM -0400, Tim Peters wrote:
The buffer object has been neglected for years: is that because it's in prime shape, or because nobody cares about it enough to maintain it?
"Works for me" :-) Part of the neglect is also based on Guido's ambivalence. Part is that I haven't needed more from it. The day that I do, then I'll code it up :-) But that doesn't help the "generic" case, unfortunately. Cheers, -g -- Greg Stein, http://www.lyra.org/

Greg Stein <gstein@lyra.org>
"badly" is overstating the problem. It caches a pointer when it shouldn't. This doesn't work well
But "doesn't work well" means "can crash the interpreter". I don't think "badly" is an overstatement here... Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+

Greg> With strop, I can pass any of those objects to strop.find(). That Greg> function has a polymorphic argument. Where doesn't strop compile/run? If it works everywhere, either just rename it to be the string module (copying any bits from the existing string module that it doesn't yet have) or rename it something like buffer_funcs. Skip

"M.-A. Lemburg" <mal@lemburg.com>:
BTW, wouldn't it suffice to add these methods to buffer objects ? Then you could write: buffer(ob).find('.').
Aren't buffer objects as they're currently implemented inherently dangerous? Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+

Greg Ewing wrote:
Why should they be ? -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/

On Fri, May 25, 2001 at 10:23:10AM +0200, M.-A. Lemburg wrote:
The buffer object caches the pointer from getreadbuffer and friends. If the target object changes that pointer (internally), then the buffer object's value is stale. But that is a bug fix; it is independent of the discussion at hand. Cheers, -g -- Greg Stein, http://www.lyra.org/

Greg Stein wrote:
I suppose that in 2.2 we'll be able to build a class/type hierarchy which then provides these possibilities. I haven't followed Guido's latest checkins closely though -- could be that types don't support multiple inheritence. BTW, wouldn't it suffice to add these methods to buffer objects ? Then you could write: buffer(ob).find('.'). -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/

On Thu, May 24, 2001 at 12:18:50PM +0200, M.-A. Lemburg wrote:
No idea either... that's why I asked.
BTW, wouldn't it suffice to add these methods to buffer objects ? Then you could write: buffer(ob).find('.').
You're totally missing the point with that suggestion. It does *not* suffice to add them to buffer objects. What about array objects? mmap objects? Random Joe Object who implements the buffer interface? All of those are out of luck. With strop, I can pass any of those objects to strop.find(). That function has a polymorphic argument. In the current arrangement, every object must implement their own .find and .upper and .whatever. Cheers, -g -- Greg Stein, http://www.lyra.org/

Greg Stein wrote:
That's the point: you can wrap all those into a buffer object and then use the buffer object methods to manipulate them. In that sense, buffer objects provide an adaptor to the underlying object which implements the needed methods.
-- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/

On Thu, May 24, 2001 at 04:54:24PM +0200, M.-A. Lemburg wrote:
That would certainly be a valid solution. And at the C level, we could share functions between PyBufferObject and PyStringObject. Cheers, -g -- Greg Stein, http://www.lyra.org/

"M.-A. Lemburg" wrote:
Sounds like you are trying to make the buffer object into something it is not. Not that I have the foggiest idea what it is now, since it hasn't much use and is badly broken. I like your idea of sharing functions, I just don't think the buffer object is the proper means. I think the buffer object should be removed from Python and something better put in its place. (I'm not talking about the buffer C/API, though this could also use an overhaul, since it doesn't provide enough information to the receiving method.) What I think we need is: 1) a malloc object which has a similar interface to the mmap object with access protection, etc. This object would be the fundamental way of getting memory. The string object would use it to allocate a chunk of 'read-only' memory. Other objects would then know not to modify the contents of the memory. If you wanted a reference or view of the memory/buffer, you would get a reference to this object. 2) objects supporting the buffer object should provide a view method which returns a copy of themselves (and hence all their methods) and can be used to get a pointer to a subset of its memory. In this way the type of memory/buffer being accessed is known compared to the current buffer object which only indicates the buffer is binary or char data. In essence information about how the buffer should be used is lost in the current buffer C/API. -- Paul Barrett, PhD Space Telescope Science Institute Phone: 410-338-4475 ESS/Science Software Group FAX: 410-338-4767 Baltimore, MD 21218

On Fri, May 25, 2001 at 09:21:20AM -0400, Paul Barrett wrote:
The buffer object is intended to provide a Python-level object (with methods and behavior) for any other object which exports the buffer API (but not those particular methods/behavior). It was added for Python 1.5.2, but did not keep up with the methods added to the string object. Arguably, it is out of date rather than "[turning it into] something it is not."
Not that I have the foggiest idea what it is now, since it hasn't much use and is badly broken.
"badly" is overstating the problem. It caches a pointer when it shouldn't. This doesn't work well when using it with array objects or PIL's image objects. Most objects, it is fine. The buffer object is also very good for C/Python extensions and embedding code. It provides a Python-level view on a block of memory. Using a string object implies making a copy, and it removes the possibility for read/write access to that memory. And you state: "Not that I have the foggiest idea what it is now". If so, then wtf are you making statements about the buffer object's behavior?
You're talking about the buffer object that we have *today*. It can refer to another object (i.e. the memory exposed via the other object's buffer API), refer to memory, or it can allocate its own memory. The buffer object can be marked read-only, or read-write.
I'm not sure that I understand this paragraph. No... what needs to happen is to have the bug in PyBufferObject fixed. Then to refactor stringobject.c and stropmodule.c to move all of those byte-oriented processing functions into a new file such as Python/byteops.c (whatever; name isn't important). Ideally, stringobject.c and stropmodule.c would be simple covers over the same functions. Those functions can then be used by PyBufferObject to implement the rest of the string methods on itself. This would leave us at MAL's suggested point: via the buffer object, we can perform all of the standard string methods/ops on any object that implements the buffer API. Cheers, -g -- Greg Stein, http://www.lyra.org/

Greg Stein wrote:
I wonder how we could achieve this without copy&pasting all the needed methods from stringobject.c to bufferobject.c.... all the string methods use the string object layout directly rather than just dealing with a pointer and a length. -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/

The buffer object has been neglected for years: is that because it's in prime shape, or because nobody cares about it enough to maintain it? "The bug" has been known for years without any action taken to address it; the docs give up in spots and nobody addresses that either (like "The current policy seems to state that these characters may be multi-byte characters" -- well, yes or no?); the builtin buffer() function isn't called anywhere in the std test suite; the file object still has an undocumented readinto() method that just confuses people who bump into it; and it's so obscure in daily life that it appears Guido didn't even think of it when adding iterators for the other sequence types. I expect that answers my question <wink>. Is someone (Greg? MAL?) going to champion it now? That would be cool. About combining strop and buffers and strings, don't forget unicodeobject.c: that's got oodles of basically duplicate code too. /F suggested dealing with the minor differences via maintaining one code file that gets compiled multiple times w/ appropriate #defines.

Tim Peters wrote:
I believe that nobody really likes the buffer interface enough to let the world know about it, except maybe Greg ;-) Even the idea of replacing the usage of strings as data buffers with buffer object didn't get very far; common habits are simply hard to break.
Hmm, that only saves us a few kB in source, but certainly not in the object files. The better idea would be making the types subclass from a generic abstract string object -- I just don't know how this will be possible with Guido's type patches. We'll just have to wait, I guess. -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/

[Tim]
[MAL]
Hmm, that only saves us a few kB in source, but certainly not in the object files.
That's not the point. Manually duplicated code blocks always get out of synch, as people fix bugs in, or enhance, one of them but don't even know about the others. /F brought this up after I pissed away a few hours trying to repair one of these in all places, and he noted that strop.replace() and string.replace() are woefully inefficient anyway.
Wait for what? If it were possible, is the chance that you'd take time to rework unicodeobject.c to "subclass from a generic abstract string object" greater than 0? The chance that I would is exactly 0.

Tim Peters wrote:
Ok, so what we'd need is a bunch of generic low-level string operations: one set for 8-bit and one for 16-bit code. Looking at unicodeobject.c it seems that the section "Helpers" would be a good start, plus perhaps a few bits from the method implementations refactored to form a low-level string template library. Perhaps we should move this code into a file stringhelpers.h which then gets included by stringobject.c and unicodeobject.c with appropriate #defines set up for 8-bit strings and for Unicode.
Well, that's hard to say. It would certainly be low-priority; same for the above refactoring. -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/

[Tim]
[MAL]
Well, that's hard to say. It would certainly be low-priority; same for the above refactoring.
I think you must have missed this when it first came up here: /F suggested that *he* had a non-zero chance of implementing his suggestion. That makes it far closer to reality than anything that's been suggested since <wink>.

Paul Barrett writes:
In a development version of my bindings to a Type-1 font rasterizer, I exposed a buffer interface to the resulting image data. Unfortunately, that code was lost and I've not had time to work that up again. I *think* that sort of thing was part of the intended application for the buffer interface, but I was not one of the "movers & shakers" for it, so I'm not entirely sure.
I agree. From the discussions I remember, I don't recall a clear explanation of the need for "segmented" buffers. But that may just be a failing of my recollection.
I'm not sure about the "rf_flags" field -- I see two aspects that you seem to be describing, and wouldn't call either use a "flag". There's data type (characters, anonymous binary data, image data, etc.), and element size (1 byte, 2 bytes, variable width). Those values may or may not be associated with the specific buffer or the type implementing the buffer (I'd go with the specific buffer just to allow buffer types that support different flavors).
PEPs are good; I'll look forward to seeing it! -Fred -- Fred L. Drake, Jr. <fdrake at acm.org> PythonLabs at Digital Creations

[Paul Barrett]
From the discussion so far, it appears that the buffer object is intended solely to support string-like objects.
Unsure where that impression came from. Since buffers wrap a slice "of memory", they don't make much sense except where raw memory makes sense. That includes the guts of strings, but also (in the core distribution) memory-mapped files (the mmap module) and arrays (the array module), which also support the buffer interface.
I've seen no mention of their use for binary data objects,
I mentioned two above. The use of buffers with mutable objects is dangerous, though, because of the dangling-pointer problem, and Python itself never uses buffers except for strings. Even arrays are stretching it; e.g.,
While of *some* conceivable use, that's not exactly destined to become wildly popular <wink>.
such as multidimensional arrays and matrices.
Since core Python has no such things, of course it doesn't use buffers for those either.
Will the buffer object also support these objects?
In what sense? If you have an implementation of such things, and believe that getting at raw memory slices is useful, sure -- fill in its tp_as_buffer slot.
Or do you mean redesigned?
AFACT it's entirely unused; everything in the core that supports the buffer interface returns a segment count of 1, and the buffer object itself appears to raise exceptions whenever it sees a reference to a segment other than "the first". I don't know why it's there.
Second, the dangling pointer issue has not been resolved.
I expect Greg will fix that now.
To sell that (but please save it for the PEP <wink>) I expect you have to provide some compelling uses for it. The current uses have no need of it. In the absence of specific good uses, I'm afraid it just sounds like another variant of "I can't prove segments *won't* be useful, so let's toss them in too!".
Depends on what you want to do. You've only mentioned multidimensional arrays, and the need for umpteen flavors of access control there, beyond the current object's b_readonly flag, is simply unclear. Also unclear why you've dropped the current object's b_base pointer: without it, the buffer has no way to get back to the object from which the memory is borrowed, nor even a guarantee that the object won't die while the buffer is still active. If you do pursue this, please please please boost the rf_length field! An int is too small to hold real-life sizes anymore, and "large files" are becoming common even on 32-bit boxes. Python needs to grow a wholly supported way to pass 8-byte ints around (and it looks like I'll be adding that to the struct module, possibly to the array module and marshal too).
A PEP is always a good idea.

Tim Peters wrote:
Hey! Are you discriminating against 128-bit ints? -- --- Aahz (@pobox.com) Hugs and backrubs -- I break Rule 6 <*> http://www.rahul.net/aahz/ Androgynous poly kinky vanilla queer het Pythonista I don't really mind a person having the last whine, but I do mind someone else having the last self-righteous whine.

[Aahz]
Hey! Are you discriminating against 128-bit ints?
Nope! I'm Guido's marketing guy: 128-bit ints will be the killer reason you need to upgrade to Python 3000, when the time comes. Python didn't get to where it is by giving away all the good stuff early <wink>.

complex: the support for multiple buffers does not appear necessary.
I seem to recall Guido telling me once that this was implemented for NumPy, specifically for some of their matrices. Not being a user of that package means that unfortunately I can not be any more specific... I am confident Guido will recall the specific details... Mark.

On Sat, May 26, 2001 at 05:47:47PM +0200, M.-A. Lemburg wrote:
That idea was shot down when Guido said that 'c' arrays should be the "official form of a data buffer." Cheers, -g -- Greg Stein, http://www.lyra.org/

[Tim]
The buffer object has been neglected for years: is that because it's in prime shape, or because nobody cares about it enough to maintain it?
My take is a little different. I think people could be convinced to care about it, and indeed I do. However, it has one fatal flaw, and no one seems to know what to do about it. The problem is the one best demonstrated with the array module - if you get a pointer to the buffer interface for an array object, but the array then resizes itself, the buffer pointer dangles. There have been a few attempts over time to raise the buffer profile, but this design flaw leaves people scratching their head - it is hard to press for adoption of a feature that has a known crash hiding away. However, addressing this problem is difficult. Guido appears unconvinced that buffer objects and interfaces are that worthwhile. It appears no one else knows how to proceed in the face of this ambivalence - that describes my take even if no one elses. The-buffer-is-dead,-long-live-the-buffer ly, Mark.

Mark Hammond wrote:
I guess there are three ways to "solve" this: a) mutable types don't implement the getreadbuf interface b) the getreadbuf interface is complemented with a callback interface, so the the buffer object can be notified of the change c) calling getreadbuf on a mutable object causes this object to become immutable -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/

[MAL]
I guess there are three ways to "solve" this:
a) mutable types don't implement the getreadbuf interface
Of the few types that implement it today, that would leave only strings (8-bit and Unicode). Too much machinery just for that. Besides, I once posted an example to c.l.py showing how to use regexps to search mmap'ed files, so *that* must continue to work forever <wink>.
I like this best, although there's no bound on the number of buffers that may need to be notified in case of change (i.e., the object would need to maintain a list of buffers to be notified).
c) calling getreadbuf on a mutable object causes this object to become immutable
Even easier, core dump as soon as getreadbuf is called <wink>. [Greg Ewing]
I think it would be safe if:
1) it kept a reference to the underlying object, and
That much it already does.
2) it re-fetched the pointer and length info each time it was needed, using the underlying object's buffer interface.
If after b = buffer(some_object) b.__getitem__ needed to refetch the info between b[i] and b[i+1] I expect it would be so slow even Greg wouldn't want it anymore.

On Sun, May 27, 2001 at 09:42:30PM -0400, Tim Peters wrote:
Huh? I don't think it would be all that slow. It is just a function call. And I don't think that the getitem slot is really used all that frequently (in a loop) for buffer type objects. I've been thinking that refetching the ptr/len is the right fix. Cheers, -g -- Greg Stein, http://www.lyra.org/

[Tim]
[Greg]
I expect they index into the buffer memory directly then, right? Then for buffers obtained from mutable objects, any such loop is unsafe in the absence of the GIL, or even in its presence if the loop contains code that may call back into Python.
I've been thinking that refetching the ptr/len is the right fix.
So is calling __getitem__ all the time then, unless you want to dance on the razor's edge. The idea that you can safely "borrow" memory from a mutable object without copying it is brittle.
I take that as "yes" to my "nobody cares about it enough to maintain it?". In that light, Guido's ambivalence is indeed surprising <wink>.

On Sat, Jun 02, 2001 at 02:34:43AM -0400, Tim Peters wrote:
Most access is: fetch ptr/len, index into the memory. And yes: anything within that loop which could conceivably change the target object (especially a call into Python) could move that ptr. I was saying that, at the Python level, using a loop and doing b[i] into a buffer/string/unicode object would seem to be relatively rare. b[0] and stuff is reasonably common.
Stay in C code and don't call into Python. It is safe then. The buffer API is exactly what you're saying: borrow a memory reference. The concept makes a lot of things possible that weren't before. The buffer object's storing of that reference was a mistake.
Eh? I'll maintain the thing, but you're confusing that with adding more features into it. Different question. Cheers, -g -- Greg Stein, http://www.lyra.org/

[Greg Stein]
Well, at the Python level buffer objects seem never to be used, probably because all the people who know about them don't advertise it because it's an easy way to provoke core dumps now. I don't have any real objection to any way anyone wants to fix that, just so long as it gets fixed.
Eh? I'll maintain the thing, but you're confusing that with adding more features into it. Different question.
I haven't asked for new features, just that what's already there get fixed: Python-level buffer objects are unsafe, the docs remain incomplete, there's random stuff like file.readinto() that's not documented at all (could be that's the only one -- it's certainly "discovered" on c.l.py often enough, though), and there are no buffer tests in the std test suite. The work to introduce the type wasn't completed, nobody works on it, and finishing work 3 years late doesn't count as "new feature" in my book <wink>.

On Sun, Jun 03, 2001 at 03:55:43AM -0400, Tim Peters wrote:
I'm talking about string objects and unicode objects, too. The point is that b[i] loops don't have to be all that speedy because it isn't used often.
because all the people who know about them don't advertise it because it's an easy way to provoke core dumps now.
Easy? Depends on what you use them with.
I'll fix the code.
Find another goat to screw for that one. I don't know anything about it. Hmm... Using the "annotate" feature of ViewCVS, I see that Guido added it. Go blame him if you want to scream about that function and its lack of doc.
Now you're just being bothersome. You want all that stuff, then feel free. I'll volunteer to do the code. You can go beat some heads, or find other volunteers. I'll do the code fixing just to placate you, and to get all this ranting about the buffer object to quiet down, but not because I'm joyful to do it. not-cheers, -g -- Greg Stein, http://www.lyra.org/

[Tim]
because all the people who know about them don't advertise it because it's an easy way to provoke core dumps now.
[Greg Stein]
Easy? Depends on what you use them with.
"Easy" and "depends" both, sure. I don't understand the argument: core dumps are always presumed to be errors in the Python implementation, not the users's fault. In this case, they are Python's fault by any accounting. On rare occasions we just give up and say "sorry, but we simply don't know a reasonable way fix it -- but it's still Python's fault" (for example, see the dict thread this weekend).
I haven't asked for new features, just that what's already there get fixed: Python-level buffer objects are unsafe
I'll fix the code.
Thank you!
I don't care who added it: I haven't asked anyone specific to do anything. I've been asking whether *anyone* cares enough to address the backlog of buffer maintenance work. I don't even know who dreamed up the buffer object -- although at this point I bet I can guess <wink>.
Now you're just being bothersome.
You bet. It's the same list of things I gave in my first msg; nobody volunteered to do any work then, so I repeated them.
You want all that stuff, then feel free.
"All that stuff" is the minimum now required of new features. Buffers got in before Guido got tougher about this stuff, but if they're worth having at all then surely they're worth bringing up to current standards.
I'll volunteer to do the code. You can go beat some heads, or find other volunteers.
Anyone else care to chip in?
OK, I feel guitly -- but if that's enough to make you feel joyful again, the psychology here is just sick <wink>.

However, it has one fatal flaw, and no one seems to know what to do about it.
I think it would be safe if: 1) it kept a reference to the underlying object, and 2) it re-fetched the pointer and length info each time it was needed, using the underlying object's buffer interface. Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+

On Sat, May 26, 2001 at 02:44:04AM -0400, Tim Peters wrote:
The buffer object has been neglected for years: is that because it's in prime shape, or because nobody cares about it enough to maintain it?
"Works for me" :-) Part of the neglect is also based on Guido's ambivalence. Part is that I haven't needed more from it. The day that I do, then I'll code it up :-) But that doesn't help the "generic" case, unfortunately. Cheers, -g -- Greg Stein, http://www.lyra.org/

Greg Stein <gstein@lyra.org>
"badly" is overstating the problem. It caches a pointer when it shouldn't. This doesn't work well
But "doesn't work well" means "can crash the interpreter". I don't think "badly" is an overstatement here... Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+

Greg> With strop, I can pass any of those objects to strop.find(). That Greg> function has a polymorphic argument. Where doesn't strop compile/run? If it works everywhere, either just rename it to be the string module (copying any bits from the existing string module that it doesn't yet have) or rename it something like buffer_funcs. Skip

"M.-A. Lemburg" <mal@lemburg.com>:
BTW, wouldn't it suffice to add these methods to buffer objects ? Then you could write: buffer(ob).find('.').
Aren't buffer objects as they're currently implemented inherently dangerous? Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+

Greg Ewing wrote:
Why should they be ? -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/

On Fri, May 25, 2001 at 10:23:10AM +0200, M.-A. Lemburg wrote:
The buffer object caches the pointer from getreadbuffer and friends. If the target object changes that pointer (internally), then the buffer object's value is stale. But that is a bug fix; it is independent of the discussion at hand. Cheers, -g -- Greg Stein, http://www.lyra.org/
participants (9)
-
aahz@rahul.net
-
Fred L. Drake, Jr.
-
Greg Ewing
-
Greg Stein
-
M.-A. Lemburg
-
Mark Hammond
-
Paul Barrett
-
skip@pobox.com
-
Tim Peters