Hierarchical vs non-hierarchical ndarray.base and __array_interface__

Hi all,
I posted on the change in semantics of ndarray.base here:
https://github.com/numpy/numpy/commit/6c0ad59#commitcomment-2153047
And some folks asked me to post my question to the numpy mailing list. I've implemented a tool for mapping processes in parallel applications to nodes in cartesian networks. It uses hierarchies of numpy arrays to represent the domain decomposition of the application, as well as corresponding groups of processes on the network. You can "map" an application to the network using assignment of through views. The tool is here if anyone is curious: https://github.com/tgamblin/rubik. I used numpy to implement this because I wanted to be able to do mappings for arbitrary-dimensional networks. Blue Gene/Q, for example, has a 5-D network.
The reason I bring this up is because I rely on the ndarray.base pointer and some of the semantics in __array_interface__ to translate indices within my hierarchy of views. e.g., if a value is at (0,0) in a view I want to know that it's actually at (4,4) in its immediate parent array.
After looking over the commit I linked to above, I realized I'm actually relying on a lot of stuff that's not guaranteed by numpy. I rely on .base pointing to its closest parent, and I rely on __array_interface__.data containing the address of the array's memory and its strides. None of these is guaranteed by the API docs:
http://docs.scipy.org/doc/numpy/reference/arrays.interface.html
So I guess I have a few questions:
1. Is translating indices between base arrays and views something that would be useful to other people?
2. Is there some better way to do this than using ndarray.base and __array_interface__?
3. What's the numpy philosophy on this? Should views know about their parents or not? They obviously have to know a little bit about their memory, but whether or not they know how they were derived from their owning array is a different question. There was some discussion on the vagueness of .base here:
http://thread.gmane.org/gmane.comp.python.numeric.general/51688/focus=51703
But it doesn't look like you're deprecating .base in 1.7, only changing its behavior, which I tend to agree is worse than deprecating it.
After thinking about all this, I'm not sure what I would like to happen. I can see the value of not keeping extra references around within numpy, and my domain is pretty different from the ways that I imagine people use numpy. I wouldn't have to change my code much to make it work without .base, but I do rely on __array_interface__. If that doesn't include the address and strides, t think I'm screwed as far as translating indices go.
Any suggestions?
Thanks! -Todd
______________________________________________________________________ Todd Gamblin, tgamblin@llnl.gov, http://people.llnl.gov/gamblin2 CASC @ Lawrence Livermore National Laboratory, Livermore, CA, USA

On Sat, Nov 24, 2012 at 8:34 PM, Gamblin, Todd gamblin2@llnl.gov wrote:
Hi all,
I posted on the change in semantics of ndarray.base here:
https://github.com/numpy/numpy/commit/6c0ad59#commitcomment-2153047
And some folks asked me to post my question to the numpy mailing list. I've implemented a tool for mapping processes in parallel applications to nodes in cartesian networks. It uses hierarchies of numpy arrays to represent the domain decomposition of the application, as well as corresponding groups of processes on the network. You can "map" an application to the network using assignment of through views. The tool is here if anyone is curious: https://github.com/tgamblin/rubik. I used numpy to implement this because I wanted to be able to do mappings for arbitrary-dimensional networks. Blue Gene/Q, for example, has a 5-D network.
The reason I bring this up is because I rely on the ndarray.base pointer and some of the semantics in __array_interface__ to translate indices within my hierarchy of views. e.g., if a value is at (0,0) in a view I want to know that it's actually at (4,4) in its immediate parent array.
After looking over the commit I linked to above, I realized I'm actually relying on a lot of stuff that's not guaranteed by numpy. I rely on .base pointing to its closest parent, and I rely on __array_interface__.data containing the address of the array's memory and its strides. None of these is guaranteed by the API docs:
http://docs.scipy.org/doc/numpy/reference/arrays.interface.html
Are you saying that data/strides aren't guaranteed because they're marked optional on that page? My interpretation of "optional" is that these fields don't have to be present for all objects implementing something that qualifies as an array interface (for example ndarrays don't need a mask), but it does not mean that everything marked optional can be changed without warning for ndarrays.
So I guess I have a few questions:
- Is translating indices between base arrays and views something that
would be useful to other people?
- Is there some better way to do this than using ndarray.base and
__array_interface__?
- What's the numpy philosophy on this? Should views know about their
parents or not? They obviously have to know a little bit about their memory, but whether or not they know how they were derived from their owning array is a different question. There was some discussion on the vagueness of .base here:
http://thread.gmane.org/gmane.comp.python.numeric.general/51688/focus=51703
But it doesn't look like you're deprecating .base in 1.7, only changing its behavior, which I tend to agree is worse than deprecating it.
After thinking about all this, I'm not sure what I would like to happen. I can see the value of not keeping extra references around within numpy, and my domain is pretty different from the ways that I imagine people use numpy. I wouldn't have to change my code much to make it work without .base, but I do rely on __array_interface__. If that doesn't include the address and strides, t think I'm screwed as far as translating indices go.
Any suggestions?
The discussion on .base seems to have converged. As for __array_interface__, you could write a test which captures the essence of your use of ndarray.__array_interface__ and send a PR for it.
Ralf
participants (2)
-
Gamblin, Todd
-
Ralf Gommers