Wierd buffer "add" behaviour.
I just struck this, and wonder if it is intentional: * Adding 2 buffer objects together yields a string. Fair enough. * Adding a buffer and a string yields a type error! Eeek. This yields the following strange behaviour:
a=buffer('a') a+a 'aa' a+a+a Traceback (innermost last): File "<interactive input>", line 1, in ? TypeError: cannot add type "buffer" to string
That doesnt seem correct to me? Mark.
Mark Hammond wrote:
I just struck this, and wonder if it is intentional:
* Adding 2 buffer objects together yields a string. Fair enough. * Adding a buffer and a string yields a type error! Eeek.
This just seems plain wrong. Adding two buffers should yield a new buffer -- in the long run, buffers should replace strings wrt to holding binary data. If not even buffers themselves implement this idea, I don't see any perspective for ever getting there...
This yields the following strange behaviour:
a=buffer('a') a+a 'aa' a+a+a Traceback (innermost last): File "<interactive input>", line 1, in ? TypeError: cannot add type "buffer" to string
That doesnt seem correct to me?
Neither to me. -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/
On Mon, Oct 16, 2000 at 09:50:22AM +0200, M.-A. Lemburg wrote:
Mark Hammond wrote: ...
This yields the following strange behaviour:
a=buffer('a') a+a 'aa' a+a+a Traceback (innermost last): File "<interactive input>", line 1, in ? TypeError: cannot add type "buffer" to string
That doesnt seem correct to me?
Neither to me.
It is caused by the non-commutative aspect of Python types. You end up with a string, and that type doesn't know how to add a buffer to itself. Ideally, it might be nice to allow a string to append any object that exports the buffer-interface. But when somebody goes and writes "abc"+my_array ... hoo boy, will we hear complaints. The alternative is to allow the buffer to resolve the type conflict and do the appending within the buffer code. Of course, the choice of returning a string (from buf+buf) rather than a buffer was arguably the wrong choice. Cheers, -g -- Greg Stein, http://www.lyra.org/
Greg Stein wrote:
On Mon, Oct 16, 2000 at 09:50:22AM +0200, M.-A. Lemburg wrote:
Mark Hammond wrote: ...
This yields the following strange behaviour:
a=buffer('a') a+a 'aa' a+a+a Traceback (innermost last): File "<interactive input>", line 1, in ? TypeError: cannot add type "buffer" to string
That doesnt seem correct to me?
Neither to me.
It is caused by the non-commutative aspect of Python types. You end up with a string, and that type doesn't know how to add a buffer to itself.
The problem is that buffer() objects coerce to strings in the first place... they should return new buffer objects instead of strings -- then we wouldn't have the above problems.
... Of course, the choice of returning a string (from buf+buf) rather than a buffer was arguably the wrong choice.
Right :-/ -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/
The buffer interface is one of the most misunderstood parts of Python. I believe that if it were PEPped today, it would have a hard time getting accepted in its current form. There are also two different parts that are commonly referred by this name: the "buffer API", which is a C-only API, and the "buffer object", which has both a C API and a Python API. Both were largely proposed, implemented and extended by others, and I have to admit that I'm still uneasy with defending them, especially the buffer object. Both are extremely implementation-dependent (in JPython, neither makes much sense). The Buffer API -------------- The C-only buffer API was originally intended to allow efficient binary I/O from and (in some cases) to large objects that have a relatively well-understood underlying memory representation. Examples of such objects include strings, array module arrays, memory-mapped files, NumPy arrays, and PIL objects. It was created with the desire to avoid an expensive memory-copy operation when reading or writing large arrays. For example, if you have an array object containing several millions of double precision floating point numbers, and you want to dump it to a file, you might prefer to do the I/O directly from the array's memory buffer rather than first copying it to a string. (You lose portability of the data, but that's often not a problem the user cares about in these cases.) An alternative solution for this particular problem was consdered: object types in need of this kind of efficient I/O could define their own I/O methods, thereby allowing them to hide their internal representation. This was implemented in some cases (e.g. the array module has read() and write() methods) but rejected, because a simple-minded implementation of this approach would not work with "file-like" objects (e.g. StringIO files). It was deemed important that file-like objects would not place restrictions on the kind of objects that could interact with them (compared to real file objects). A possible solution would have been to require that each object implementing its own read and write methods should support both efficient I/O to/from "real" file objects and fall-back I/O to/from "file-like" objects. The fall-back I/O would have to convert the object's data to a string object which would then be passed to the write() method of the file-like object. This approach was rejected because it would make it impossible to implement an alternative file object that would be as efficient as the real file object, since large object I/O would be using the inefficient fallback interface. To address these issues, we decided to define an interface that would let I/O operations ask the objects where their data bytes are in memory, so that the I/O can go directly to/from the memory allocated by the object. This is the classic buffer API. It has a read-only and a writable variant -- the writable variant is for mutable objects that will allow I/O directly into them. Because we expected that some objects might have an internal representation distributed over a (small) number of separately allocated pieces of memory, we also added the getsegcount() API. All objects that I know support the buffer API return a segment count of 1, and most places that use the buffer API give up if the segment count is larger; so this may be considered as an unnecessary generalization (and source of complexity). The buffer API has found significant use in a way that wasn't originally intended: as a sort of informal common base class for string-like objects in situations where a char[] or char* type must be passed (in a read-only fashion) to C code. This is in fact the most common use of the buffer API now, and appears to be the reason why the segment count must typically be 1. In connection with this, the buffer API has grown a distinction between character and binary buffers (on the read-only end only). This may have been a mistake; it was intended to help with Unicode but it ended up not being used. The Buffer Object ----------------- The buffer object has a much less clear reason for its existence. When Greg Stein first proposed it, he wrote: The intent of this type is to expose a string-like interface from an object that supports the buffer interface (without making a copy). In addition, it is intended to support slices of the target object. My eventual goal here is to tweak the file object to support memory mapping and the buffer interface. The buffer object can then return slices of the file without making a new copy. Next step: change marshal.c, ceval.c, and compile.c to support a buffer for the co_code attribute. Net result is that copies of code streams don't need to be copied onto the heap, but can be left in an mmap'd file or a frozen file. I'm hoping there will be some perf gains (time and memory). Even without some of the co_code work, enabling mmap'd files and buffers onto them should be very useful. I can probably rattle off a good number of other uses for the buffer type. I don't think that any of these benefits have been realized yet, and altogether I think that the buffer object causes a lot of confusion. The buffer *API* doesn't guarantee enough about the lifetime of the pointers for the buffer *object* to be able to safely preserve those pointers, even if the buffer object holds on to the base object. (The C-level buffer API informally guarantees that the data remains valid only until you do anything to the base object; this is usually fine as long as you don't release the global interpreter lock.) The buffer object's approach to implementing the various sequence operations is strange: sometimes it behaves like a string, sometimes it doesn't. E.g. a slice returns a new string object unless it happens to address the whole buffer, in which case it returns a reference to the existing buffer object. It would seem more logical that a subslice would return a new buffer object. Concatenation and repetition of buffer objects are likewise implemented inconsistently; it would have been more consistent with the intended purpose if these weren't supported at all (i.e. if none of the buffer object operations would allocate new memory except for buffer object headers). I would have concluded that the buffer object is entirely useless, if it weren't for some very light use that is being made of it by the Unicode machinery. I can't quite tell whether that was done just because it was convenient, or whether that shows there is a real need. What Now? --------- I'm not convinced that we need the buffer object at all. For example, the mmap module defines a sequence object so doesn't seem to need the buffer object to help it support slices. Regarding the buffer API, it's clearly useful, although I'm not convinced that it needs the multiple segment count option or the char vs. binary buffer distinction, given that we're not using this for Unicode objects as we originally planned. I also feel that it would be helpful if there was an explicit way to lock and unlock the data, so that a file object can release the global interpreter lock while it is doing the I/O. But that's not a high priority (and there are no *actual* problems caused by the lack of such an API -- just *theoretical*). For Python 3000, I think I'd like to rethink this whole mess. Perhaps byte buffers and character strings should be different beasts, and maybe character strings could have Unicode and 8-bit subclasses (and maybe other subclasses that explicitly know about their encoding). And maybe we'd have a real file base class. And so on. What to do in the short run? I'm still for severely simplifing the buffer object (ripping out the unused operations) and deprecating it. --Guido van Rossum (home page: http://www.python.org/~guido/)
guido wrote:
What to do in the short run? I'm still for severely simplifing the buffer object (ripping out the unused operations) and deprecating it.
agreed.
(does this mean that we're in post-2.0 mode? ;-)
Yes :-) --Guido van Rossum (home page: http://www.python.org/~guido/)
On Mon, Oct 16, 2000 at 01:51:51PM -0500, Guido van Rossum wrote:
(does this mean that we're in post-2.0 mode? ;-) Yes :-)
Then this is a good time to ask what strategy will be followed post-2.0. Do you want a moratorium on massive checkins for a time while 2.0 shakes out, re-opening the tree for larger changes in a few months? Or will you try to live with a CVS branch for 2.0 and re-open the tree immediately? --amk
Andrew M. Kuchling writes:
Then this is a good time to ask what strategy will be followed post-2.0. Do you want a moratorium on massive checkins for a time while 2.0 shakes out, re-opening the tree for larger changes in a few months? Or will you try to live with a CVS branch for 2.0 and re-open the tree immediately?
I think a maintenance branch for 2.0.1 (or whatever) should be created as part of the release process in case we need a quick release for critical bug fixes. The tree should be re-opened after the release via a message to python-dev after the release is published. -Fred -- Fred L. Drake, Jr. <fdrake at beopen.com> BeOpen PythonLabs Team Member
[Andrew M. Kuchling]
Then this is a good time to ask what strategy will be followed post-2.0. Do you want a moratorium on massive checkins for a time while 2.0 shakes out ...
It's not a great time to ask, as we're still balls-to-the-wall *building* the release and doing last-second checkins (things like NEWS, not code). So there's an absolute ban on any checkins until further notice (on Python-Dev), excepting for those explicitly approved by Jeremy. After that, I agree with Fred that we should make a 2.0.1 branch simultaneous with the release.
Then this is a good time to ask what strategy will be followed post-2.0. Do you want a moratorium on massive checkins for a time while 2.0 shakes out, re-opening the tree for larger changes in a few months? Or will you try to live with a CVS branch for 2.0 and re-open the tree immediately?
If we need to issue a patch release, we'll use a branch. That seems the only reasonable approach. The tree will be open for checkins as soon as 2.0 is released (tonight is looking good!), but I hope the people exercise some restraint and discuss their plans for significant checkins on the python-dev list first. A lot of things should probably be discussed in the form of PEPs too! --Guido van Rossum (home page: http://www.python.org/~guido/)
Guido van Rossum wrote:
... I would have concluded that the buffer object is entirely useless, if it weren't for some very light use that is being made of it by the Unicode machinery. I can't quite tell whether that was done just because it was convenient, or whether that shows there is a real need.
I used the buffer object since I thought that buffer() objects were to replace strings as container for binary data. The buffer object wraps a memory buffer into a Python object for the purpose of decoding it into Unicode. 8-bit string objects would have worked just as well...
What Now? ---------
I'm not convinced that we need the buffer object at all. For example, the mmap module defines a sequence object so doesn't seem to need the buffer object to help it support slices.
It would be nice to have an object for "copy by reference" rather than "malloc + copy". This would be useful for strings (e.g. to access substrings of a large string), Unicode and binary data. The buffer object almost does this... it would only have to stick to always returning buffer objects in coercion, slicing etc. I also think that the name "buffer" is misleading, since it really means "reference" in the context published by the Python interface (the C API also has a way of defining new malloc areas and referencing them through the buffer interface, but that is not published in Python). The other missing data type in Python is one for binary data. Currently, string objects are in common use for this kind of data. The problems with this are obvious: in some contexts strings are expected to contain text data in other binary data. When the two meet there's great confusion. I'd suggest either making arrays the Python standard type for holding binary data, or creating a completely new type (this should then be called something like "buffer").
Regarding the buffer API, it's clearly useful, although I'm not convinced that it needs the multiple segment count option or the char vs. binary buffer distinction, given that we're not using this for Unicode objects as we originally planned.
True.
I also feel that it would be helpful if there was an explicit way to lock and unlock the data, so that a file object can release the global interpreter lock while it is doing the I/O. But that's not a high priority (and there are no *actual* problems caused by the lack of such an API -- just *theoretical*).
How about adding a generic low-level lock type for these kind of tasks. The interpreter could be made aware of these types to allow a much more fine-grained lock mechanism, e.g. to check for acquired locks of certain objects only.
For Python 3000, I think I'd like to rethink this whole mess. Perhaps byte buffers and character strings should be different beasts, and maybe character strings could have Unicode and 8-bit subclasses (and maybe other subclasses that explicitly know about their encoding). And maybe we'd have a real file base class. And so on.
Great... but 3000 is a long way ahead :-(
What to do in the short run? I'm still for severely simplifing the buffer object (ripping out the unused operations) and deprecating it.
Since it isn't all that known anyway, how about streamlining the buffer object implementations of the various protocols and removing the distinction between "s" and "t" ?! -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/
All this just when I was getting accustomed to the thought of using buffer objects in the Palm Python port... I need buffer objects for many the same reasons as Greg Stein originally proposed, as you quoted below. The on the Palm, the datamanager heap (used for permanent database storage and limited by the physical memory size) already stores the compiled python module. Directly referencing the data of objects like bytecodes and strings would greatly reduce the dynamic heap (current limit of 256K on PalmOS 3.5 on devices with 4M RAM or greater) requirements. Buffer objects seem like a natural choice. A record in a Palm database is just chunk of contiguous memory. Representing this chunk as a buffer object would allow the direct referencing it and any of it's slices. So, the co_code of code objects could be unmarshalled with a reference to permanent storage. Further, with the appropriate modifications, string objects (char *ob_sval?) could access this memory as well, though this additional optimization is probably only appropriate for small platforms. I think that buffer object is fairly important. They provide a mechanism for exposing arbitrary chunks of memory (eg, PyBuffer_FromMemory), something that no other python object does, AFIAK. Perhaps clarifying the interface (such as the slice operator returning a buffer, as suggested below) and providing more hooks from Python for creating buffers (via newmodule, say) would be helpful. On Mon, 16 Oct 2000, Guido van Rossum wrote:
The buffer interface is one of the most misunderstood parts of Python. I believe that if it were PEPped today, it would have a hard time getting accepted in its current form.
There are also two different parts that are commonly referred by this name: the "buffer API", which is a C-only API, and the "buffer object", which has both a C API and a Python API.
Both were largely proposed, implemented and extended by others, and I have to admit that I'm still uneasy with defending them, especially the buffer object. Both are extremely implementation-dependent (in JPython, neither makes much sense).
The Buffer API --------------
The C-only buffer API was originally intended to allow efficient binary I/O from and (in some cases) to large objects that have a relatively well-understood underlying memory representation. Examples of such objects include strings, array module arrays, memory-mapped files, NumPy arrays, and PIL objects.
It was created with the desire to avoid an expensive memory-copy operation when reading or writing large arrays. For example, if you have an array object containing several millions of double precision floating point numbers, and you want to dump it to a file, you might prefer to do the I/O directly from the array's memory buffer rather than first copying it to a string. (You lose portability of the data, but that's often not a problem the user cares about in these cases.)
An alternative solution for this particular problem was consdered: object types in need of this kind of efficient I/O could define their own I/O methods, thereby allowing them to hide their internal representation. This was implemented in some cases (e.g. the array module has read() and write() methods) but rejected, because a simple-minded implementation of this approach would not work with "file-like" objects (e.g. StringIO files). It was deemed important that file-like objects would not place restrictions on the kind of objects that could interact with them (compared to real file objects).
A possible solution would have been to require that each object implementing its own read and write methods should support both efficient I/O to/from "real" file objects and fall-back I/O to/from "file-like" objects. The fall-back I/O would have to convert the object's data to a string object which would then be passed to the write() method of the file-like object. This approach was rejected because it would make it impossible to implement an alternative file object that would be as efficient as the real file object, since large object I/O would be using the inefficient fallback interface.
To address these issues, we decided to define an interface that would let I/O operations ask the objects where their data bytes are in memory, so that the I/O can go directly to/from the memory allocated by the object. This is the classic buffer API. It has a read-only and a writable variant -- the writable variant is for mutable objects that will allow I/O directly into them. Because we expected that some objects might have an internal representation distributed over a (small) number of separately allocated pieces of memory, we also added the getsegcount() API. All objects that I know support the buffer API return a segment count of 1, and most places that use the buffer API give up if the segment count is larger; so this may be considered as an unnecessary generalization (and source of complexity).
The buffer API has found significant use in a way that wasn't originally intended: as a sort of informal common base class for string-like objects in situations where a char[] or char* type must be passed (in a read-only fashion) to C code. This is in fact the most common use of the buffer API now, and appears to be the reason why the segment count must typically be 1.
In connection with this, the buffer API has grown a distinction between character and binary buffers (on the read-only end only). This may have been a mistake; it was intended to help with Unicode but it ended up not being used.
The Buffer Object -----------------
The buffer object has a much less clear reason for its existence. When Greg Stein first proposed it, he wrote:
The intent of this type is to expose a string-like interface from an object that supports the buffer interface (without making a copy). In addition, it is intended to support slices of the target object.
My eventual goal here is to tweak the file object to support memory mapping and the buffer interface. The buffer object can then return slices of the file without making a new copy. Next step: change marshal.c, ceval.c, and compile.c to support a buffer for the co_code attribute. Net result is that copies of code streams don't need to be copied onto the heap, but can be left in an mmap'd file or a frozen file. I'm hoping there will be some perf gains (time and memory).
Even without some of the co_code work, enabling mmap'd files and buffers onto them should be very useful. I can probably rattle off a good number of other uses for the buffer type.
I don't think that any of these benefits have been realized yet, and altogether I think that the buffer object causes a lot of confusion. The buffer *API* doesn't guarantee enough about the lifetime of the pointers for the buffer *object* to be able to safely preserve those pointers, even if the buffer object holds on to the base object. (The C-level buffer API informally guarantees that the data remains valid only until you do anything to the base object; this is usually fine as long as you don't release the global interpreter lock.)
The buffer object's approach to implementing the various sequence operations is strange: sometimes it behaves like a string, sometimes it doesn't. E.g. a slice returns a new string object unless it happens to address the whole buffer, in which case it returns a reference to the existing buffer object. It would seem more logical that a subslice would return a new buffer object. Concatenation and repetition of buffer objects are likewise implemented inconsistently; it would have been more consistent with the intended purpose if these weren't supported at all (i.e. if none of the buffer object operations would allocate new memory except for buffer object headers).
I would have concluded that the buffer object is entirely useless, if it weren't for some very light use that is being made of it by the Unicode machinery. I can't quite tell whether that was done just because it was convenient, or whether that shows there is a real need.
What Now? ---------
I'm not convinced that we need the buffer object at all. For example, the mmap module defines a sequence object so doesn't seem to need the buffer object to help it support slices.
Regarding the buffer API, it's clearly useful, although I'm not convinced that it needs the multiple segment count option or the char vs. binary buffer distinction, given that we're not using this for Unicode objects as we originally planned.
I also feel that it would be helpful if there was an explicit way to lock and unlock the data, so that a file object can release the global interpreter lock while it is doing the I/O. But that's not a high priority (and there are no *actual* problems caused by the lack of such an API -- just *theoretical*).
For Python 3000, I think I'd like to rethink this whole mess. Perhaps byte buffers and character strings should be different beasts, and maybe character strings could have Unicode and 8-bit subclasses (and maybe other subclasses that explicitly know about their encoding). And maybe we'd have a real file base class. And so on.
What to do in the short run? I'm still for severely simplifing the buffer object (ripping out the unused operations) and deprecating it.
--Guido van Rossum (home page: http://www.python.org/~guido/)
_______________________________________________ Python-Dev mailing list Python-Dev@python.org http://www.python.org/mailman/listinfo/python-dev
-- Jeffery D. Collins Sr. Software Developer Endeavors Technology, Inc.
On Mon, Oct 16, 2000 at 01:22:22PM -0700, Jeff Collins wrote:
... I think that buffer object is fairly important. They provide a mechanism for exposing arbitrary chunks of memory (eg, PyBuffer_FromMemory), something that no other python object does, AFIAK. Perhaps clarifying the interface (such as the slice operator returning a buffer, as suggested below) and providing more hooks from Python for creating buffers (via newmodule, say) would be helpful.
There have been quite a few C extensions (and embedding Python!) where the buffer objects have been used in this fashion. For example, if you have a string argument that you wish to pass into Python, then you can avoid a copy by wrapping a Buffer Object around it and passing that. Many of the issues with the buffer object can be solved with simple changes. For example, the "mutable object" thing is easily dealt with by having the object not record the pointer, but just fetch it every time that it wants to do an operation. [ and if we extend the buffer API, we could potentially optimize the behavior to avoid the ptr refetch on each operation ] I don't recall the motivation for returning strings. I believe it was based on an attempt to make the buffer look as much like a string as possible (and slices and concats return strings). That was a poor choice :-) ... so, again, some basic changes to return slices and concats as buffer objects would make sense. Extending the buffer() builtin to create writeable buffer objects has been a reasonably common request. What seems to happen instead is that people developing C extensions (which desire buffer objects as their params) just add a new function to the extension to create buffer objects. Re: the buffer API: At the time the "s"/"t" codes were introduced (before 1.5.2 was released), we had a very different concept of how Unicode objects would be implemented. At that time, Unicode objects had no 8-bit representation (just 16-bit chars), so the difference was important. I'm not clued in enough on the ramifications of torching the difference in the API, but it would be a nice simplification. Buffers vs arrays: this is a harder question. Which is the "recommended binary type [for series of bytes]" ? Buffers can refer to arbitrary memory. Arrays maintain their own memory. I believe the two models are needed, so I'd initially offer that both buffers and arrays need to be maintained. However, given that... what is the purpose of the array if a buffer can *also* maintain its own memory? Cheers, -g -- Greg Stein, http://www.lyra.org/
Greg Stein wrote:
On Mon, Oct 16, 2000 at 01:22:22PM -0700, Jeff Collins wrote:
... I think that buffer object is fairly important. They provide a mechanism for exposing arbitrary chunks of memory (eg, PyBuffer_FromMemory), something that no other python object does, AFIAK. Perhaps clarifying the interface (such as the slice operator returning a buffer, as suggested below) and providing more hooks from Python for creating buffers (via newmodule, say) would be helpful.
There have been quite a few C extensions (and embedding Python!) where the buffer objects have been used in this fashion. For example, if you have a string argument that you wish to pass into Python, then you can avoid a copy by wrapping a Buffer Object around it and passing that.
Perhaps we ought to flesh out the current uses of buffer objects and then decide how to proceed ?! IMHO, the problem with buffer objects (apart from the sometimes strange protocol behaviour) is that there too many "features" built into it. Simplification and possibly diversification is needed: instead of trying to achieve every possible C hack with buffer objects we should try to come up with a reasonably small set of types which allow only very basic tasks, e.g. 1. wrapping C memory areas with the possibility of accessing the raw bytes in a read-only way (this should be buffer()), 2. providing a non-copying object reference type (I'd call this reference()) and 3. maintaining a writeable C memory buffer (arrays provide this feature). The buffer object currently tries to do all three.
Many of the issues with the buffer object can be solved with simple changes. For example, the "mutable object" thing is easily dealt with by having the object not record the pointer, but just fetch it every time that it wants to do an operation. [ and if we extend the buffer API, we could potentially optimize the behavior to avoid the ptr refetch on each operation ]
Please don't extend the buffer API: the whole design is flawed since it undermines data encapsulation in very dangerous ways. If at all, we should consider a new API at abstract API level which doesn't return raw C pointers, but real Python objects (e.g. type 2 reference objects).
I don't recall the motivation for returning strings. I believe it was based on an attempt to make the buffer look as much like a string as possible (and slices and concats return strings). That was a poor choice :-) ... so, again, some basic changes to return slices and concats as buffer objects would make sense.
+1.
Extending the buffer() builtin to create writeable buffer objects has been a reasonably common request. What seems to happen instead is that people developing C extensions (which desire buffer objects as their params) just add a new function to the extension to create buffer objects.
Please don't. Instead either suggest to use arrays or come up with some new type with the sole purpose of providing read-write access to a chunk of bytes.
Re: the buffer API: At the time the "s"/"t" codes were introduced (before 1.5.2 was released), we had a very different concept of how Unicode objects would be implemented. At that time, Unicode objects had no 8-bit representation (just 16-bit chars), so the difference was important. I'm not clued in enough on the ramifications of torching the difference in the API, but it would be a nice simplification.
Well, think of it this way: Unicode was the first object to actually try to make a difference between "s" and "t" -- and failed badly. In the end, we reverted the decision to make any difference and now special case Unicode objects in the getargs.c parser (so that "s" and "t" work virtually the same for Unicode). +1 on the idea of removing the difference altogether in 2.1. If anyone needs to a special representation of an object, the object should provide a clearly defined C API for this instead. E.g. Unicode has lots of APIs to encode Unicode into quite a few new repesentations.
Buffers vs arrays: this is a harder question. Which is the "recommended binary type [for series of bytes]" ? Buffers can refer to arbitrary memory. Arrays maintain their own memory. I believe the two models are needed, so I'd initially offer that both buffers and arrays need to be maintained. However, given that... what is the purpose of the array if a buffer can *also* maintain its own memory?
Right and that's the problem: buffers shouldn't be able to own memory. See above. -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/
Guido:
The buffer *API* doesn't guarantee enough about the lifetime of the pointers for the buffer *object* to be able to safely preserve those pointers, even if the buffer object holds on to the base object.
This seems like a fatal flaw to me, which should have prevented the buffer object in its present form from ever having been implemented. I suggest that this problem MUST be fixed, or the buffer object removed entirely. Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+
IMHO, the whole buffer interface is weird. - Only the Buffer_FromObject() function is exposed to the python level (as buffer()), the other functions are missing. This means that one could only use the buffer interface in extension modules. Was this intentional? - No way a python class can expose the buffer interface - There is a bug in the buffer interface (which prevented recently that I could use buffer objects at all, I had to implement my own) PyObject *base = PyBuffer_New(100); PyObject *buffer = PyBuffer_FromObject(base, 0, 100); Py_DECREF(base); After this code is executed, buffer points to deallocated memory (because buffer does not hold a reference to base anymore). (Guido classified this as a wish, but at least it contradicts the documentation) Thomas
IMHO, the whole buffer interface is weird.
I'm not sure that weird is the correct term for your points (it is for my spelling, though :-)
- Only the Buffer_FromObject() function is exposed to the ... - No way a python class can expose the buffer interface
These are simply oversights, I would suggest. Nothing in the design prevents this.
- There is a bug in the buffer interface (which prevented recently that I could use buffer objects at all, I had to implement my own)
This looks like quite a simple bug. bufferobject.c, line 77: /* if the base object is another buffer, then "deref" it */ if ( PyBuffer_Check(base) ) base = ((PyBufferObject *)base)->b_base; if the condition was changed to: if ( PyBuffer_Check(base) && ((PyBufferObject *)base)->b_base) Then I think this one would be solved? A more serious design flaw is the one Fredrik pointed out quite some time ago. If you have (eg.) an "array" object and query for its buffer, life is good. If the array then resizes itself, your pointer is now dangling. The simplest solution to this is probably to simply define the lifetimes of these pointers as very very short ;-) Mark.
Mark Hammond wrote:
[Problems with the buffer interface] If you have (eg.) an "array" object and query for its buffer, life is good. If the array then resizes itself, your pointer is now dangling. The simplest solution to this is probably to simply define the lifetimes of these pointers as very very short ;-)
...or disable the buffer interface for mutable types altogether ! -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/
participants (11)
-
Andrew M. Kuchling -
Fred L. Drake, Jr. -
Fredrik Lundh -
Greg Ewing -
Greg Stein -
Guido van Rossum -
Jeff Collins -
M.-A. Lemburg -
Mark Hammond -
Thomas Heller -
Tim Peters