Scott, Thank you for your detailed explanations. This is starting to make more sense to me. It is obvious that you understand what we are trying to do, and I pretty much agree with you in how you think it should be done. I think you do a great job of explaining things. I agree we should come up with a set of names for the interface to arrayobjects. I'm even convinced that offset should be an optional part of the interface (implied 0 if it's not there).
However, other PyBufferProcs objects like array.array will never allow themselves to be wrapped by a __builtins__.bytes since they realloc their memory and violate the promises that the __builtins__.bytes object makes. I think you disagree with me on this part, so more on that later in this message.
I think I agree with you: array.array shouldn't allow itself to by wrapped by a bytes object because it reallocates without tracking what it's shared.
Another consideration that might sway you is that the existing N-Dimensional array packages could easily add attribute methods to implement the interface, and they could do this without changing any part of their implementation. The .data attribute when requested would call a "get method" that returns a buffer. This allows user defined objects which do not implement the PyBufferProcs protocol themselves, but which contain a buffer inside of them to participate in the "ndarray protocol". Both version_two and version_three do not allow this - the object being passed must *be* a buffer.
I am not at all against the ndarray protocol you describe. In fact, I'm quite a fan. I think we should start doing it, now. I was just wondering if adding attributes to the bytes object was useful in any case. Your arguments have persuaded me that it is not worth the trouble. Underscore names are a good idea. We already have __array__ which is a protocol for returning an array object: Currently Numeric3 already implements this protocol minus name differences. So, let's come up with names. I'm happy with __array__XXXXX type names as it does dovetail nicely with the already established __array__ name which Numeric3 expects will return an actual array object. As I've already said, it would be easy to check for the more specialized attributes at object creation time to boot-strap an array from an arbitrary object. In addition, to what you state. Why not also have the protocol look at the object itself to expose the PyBufferProcs protocol if it doesn't expose a .__array__data method?
The reference count on the PyObject pointer is different than the number of users using the memory. In Python you could have:
Your examples explaining this are good, but I did realize this, that's why I stated that the check in arr.resize is overkill and will disallow situations that could actually work. Do you think the Numeric3 arrayobject should have a "memory pointer count" added to the PyArrayObject structure?
Please don't think I'm offering you resistance. I'm only trying to point out some things that I think you might have overlooked. Lots of people ignore my suggestions all the time. You'd be in good company if you did too, and I wouldn't even hold a grudge against you.
I very much appreciate the pointers. I had overlooked some things and I believe your suggestions are better.
class Numarray: # # lots of array implementing code #
# Down here at the end, add the "well-known" interface # (I haven't embraced the @property decorator syntax yet.)
def __get_shape(self): return self._shape __array_shape__ = property(__get_shape)
def __get_data(self): # Note that they use a different name internally return self._buffer __array_data__ = property(__get_data)
def __get_itemtype(self): # Perform an on the fly conversion from the class # hierarchy type to the struct module typecode that # closest matches return self._type._to_typecode() __array_itemtype__ = property(__get_itemtype)
Changing class Numarray to a PyBufferProcs supporting object would be harder.
I think they just did this, though...
The C version for Numeric3 arrays would be similar, and there is no wasted space on a per instance basis in either case.
Doing this in C would be extremely easy a simple binding of a name to an already available function (and disallowing any set attribute).
The real advantage to the struct module typecodes comes in two forms. First and most important is that it's already documented and in place - a defacto standard. Second is that Python script code could use those typecodes directly with the struct module to pull apart pieces of data. The disadvantage is that a few new typecodes would be needed...
I would even go as far as to recommend their '>' '<' prefix codes for big-endian and little-endian for just this reason...
Hmm.. an interesting idea. I don't know if I agree or not.
This would be wonderful. Third party libraries could produce data that is sufficiently ndarray like without hassle, and users of that library could promote it to a Numeric3 array with no headaches.
By the way, it looks like the "bytes" concept has been revisited recently. there is a new PEP dated Aug 11, 2004:
Thanks for the pointer.
Thanks for your attention and patience with me on this. I really appreciate the work you are doing. I wish I could explain my understanding of things more clearly.
As I said before, you do a really good job of explaining. I'm pretty much on your side now :-) Let's go ahead and get some __array__XXXXX attribute names decided on. I'll put them in the Numeric3 code base (I could also put them in old Numeric and make a 24.0 release as well --- I need to do that because of a horrible bug in the new empty method: Numeric.empty(<shape>, 'O'). -Travis