[Numpy-discussion] Re: Bytes Object and Metadata

Travis Oliphant oliphant at ee.byu.edu
Mon Mar 28 01:32:12 EST 2005


Scott,

Thank you for your detailed explanations.  This is starting to make more 
sense to me.  It is obvious that you understand what we are trying to 
do, and I pretty much agree with you in how you think it should be 
done.  I think you do a great job of explaining things. 

I agree we should come up with a set of names for the interface to 
arrayobjects.  I'm even convinced that offset should be an optional part 
of the interface (implied 0 if it's not there).

>However, other PyBufferProcs objects like array.array will never allow
>themselves to be wrapped by a __builtins__.bytes since they realloc their
>memory and violate the promises that the __builtins__.bytes object makes. 
>I think you disagree with me on this part, so more on that later in this
>message.
>  
>
I think I agree with you:  array.array shouldn't allow itself to by 
wrapped by a bytes object because it reallocates without tracking what 
it's shared.

>Another consideration that might sway you is that the existing
>N-Dimensional array packages could easily add attribute methods to
>implement the interface, and they could do this without changing any part
>of their implementation.  The .data attribute when requested would call a
>"get method" that returns a buffer.  This allows user defined objects which
>do not implement the PyBufferProcs protocol themselves, but which contain a
>buffer inside of them to participate in the "ndarray protocol".  Both
>version_two and version_three do not allow this - the object being passed
>must *be* a buffer.
>  
>
I am not at all against the ndarray protocol you describe.   In fact, 
I'm quite a fan.  I think we should start doing it, now.   I was just 
wondering if adding attributes to the bytes object was useful in any 
case.  Your arguments have persuaded me that it is not worth the trouble.

Underscore names are a good idea.   We already have __array__ which is a 
protocol for returning an array object: 

Currently Numeric3 already implements this protocol minus name 
differences.  So, let's come up with names.  I'm happy with 
__array__XXXXX  type names as it does dovetail nicely with the already 
established __array__  name which Numeric3 expects will return an actual 
array object.

As I've already said, it would be easy to check for the more specialized 
attributes at object creation time to boot-strap an array from an 
arbitrary object.  

In addition, to what you state.  Why not also have the protocol look at 
the object itself to expose the PyBufferProcs protocol if it doesn't 
expose a .__array__data method?

>The reference count on the PyObject pointer is different than the number of
>users using the memory.  In Python you could have:
>  
>
Your examples explaining this are good, but I did realize this, that's 
why I stated that the check in arr.resize is overkill and will disallow 
situations that could actually work.

Do you think the Numeric3 arrayobject should have a "memory pointer 
count" added to the PyArrayObject structure?

>Please don't think I'm offering you resistance.  I'm only trying to point
>out some things that I think you might have overlooked.  Lots of people
>ignore my suggestions all the time.  You'd be in good company if you did
>too, and I wouldn't even hold a grudge against you.
>  
>
I very much appreciate the pointers.  I had overlooked some things and I 
believe your suggestions are better.

>    class Numarray:
>        #
>        # lots of array implementing code
>        #
>
>        # Down here at the end, add the "well-known" interface
>        # (I haven't embraced the @property decorator syntax yet.)
>
>        def __get_shape(self):
>            return self._shape
>        __array_shape__ = property(__get_shape)
>
>        def __get_data(self):
>            # Note that they use a different name internally
>            return self._buffer
>        __array_data__ = property(__get_data)
>
>        def __get_itemtype(self):
>            # Perform an on the fly conversion from the class
>            # hierarchy type to the struct module typecode that
>            # closest matches
>            return self._type._to_typecode()
>        __array_itemtype__ = property(__get_itemtype)
>
>
>Changing class Numarray to a PyBufferProcs supporting object would be
>harder.
>  
>
I think they just did this, though...

>The C version for Numeric3 arrays would be similar, and there is no wasted
>space on a per instance basis in either case.
>  
>
Doing this in C would be extremely easy a simple binding of a name to an 
already available function (and disallowing any set attribute). 

>The real advantage to the struct module typecodes comes in two forms. 
>First and most important is that it's already documented and in place - a
>defacto standard.  Second is that Python script code could use those
>typecodes directly with the struct module to pull apart pieces of data. 
>The disadvantage is that a few new typecodes would be needed...
>  
>
>I would even go as far as to recommend their '>' '<' prefix codes for
>big-endian and little-endian for just this reason...
>  
>
Hmm..  an interesting idea.  I don't know if I agree or not.

>This would be wonderful.  Third party libraries could produce data that is
>sufficiently ndarray like without hassle, and users of that library could
>promote it to a Numeric3 array with no headaches.
>  
>
>By the way, it looks like the "bytes" concept has been revisited recently. 
>there is a new PEP dated Aug 11, 2004:
>
>    http://www.python.org/peps/pep-0332.html
>  
>
Thanks for the pointer.

>Thanks for your attention and patience with me on this.  I really
>appreciate the work you are doing.  I wish I could explain my understanding
>of things more clearly.
>  
>

As I said before, you do a really good job of explaining.   I'm pretty 
much on your side now :-)

Let's go ahead and get some __array__XXXXX  attribute names decided on.  
I'll put them in the Numeric3 code base (I could also put them in old 
Numeric and make a 24.0 release as well --- I need to do that because of 
a horrible bug in the new empty method:   Numeric.empty(<shape>, 'O'). 

-Travis






More information about the NumPy-Discussion mailing list