Mailman 3 Proposal of new function: iteraxis() - NumPy-Discussion

Proposal of new function: iteraxis()

older
clip with None argument changes...

andrew giessel

April 24, 2013

9:37 p.m.

Hello all- A while back I emailed the list about function for the numpy namespace, iteraxis(), which allows you to generalize the default iteration behavior of numpy arrays over any axis. I've implemented this function more cleanly and the pull request is here: https://github.com/numpy/numpy/pull/3262, and includes passing tests and documentation. This is very simple code, which uses np.rollaxis() to bring the desired dimension to the front, and then allows you to loop over slices in this re-structured view of the array. While little more than an alias, I feel this is a very useful function because looping over iterators is a core pattern in python, and makes working with slices of any multidimensional array very pythonic. Adding this function makes this more visible for users, new and old, and I hope members of this list will agree it is worth adding to the namespace. Generalizing this to iterate over multiple axes is something that might be worthwhile, but the specifics of how to implement the axis ordering would take some thought. I'm happy to discuss and tackle this if people are really interested. Hoping for some nice feedback, ag -- Andrew Giessel, PhD Department of Neurobiology, Harvard Medical School 220 Longwood Ave Boston, MA 02115 ph: 617.432.7971 email: andrew_giessel@hms.harvard.edu

Attachments:

attachment.htm (text/html — 1.7 KB)

Show replies by date

Robert Kern

April 2013

5:14 p.m.

On Wed, Apr 24, 2013 at 10:37 PM, andrew giessel <andrew.giessel@gmail.com> wrote:

...

Hello all-

A while back I emailed the list about function for the numpy namespace, iteraxis(), which allows you to generalize the default iteration behavior of numpy arrays over any axis.

I've implemented this function more cleanly and the pull request is here: https://github.com/numpy/numpy/pull/3262, and includes passing tests and documentation.

This is very simple code, which uses np.rollaxis() to bring the desired dimension to the front, and then allows you to loop over slices in this re-structured view of the array. While little more than an alias, I feel this is a very useful function because looping over iterators is a core pattern in python, and makes working with slices of any multidimensional array very pythonic. Adding this function makes this more visible for users, new and old, and I hope members of this list will agree it is worth adding to the namespace.

I'm afraid I don't. It's a just a reduced-functionality version of rollaxis(). I don't think the additional name adds anything substantial. -- Robert Kern

Matthew Brett

5:30 p.m.

Hi, On Thu, Apr 25, 2013 at 10:14 AM, Robert Kern <robert.kern@gmail.com> wrote:

...

On Wed, Apr 24, 2013 at 10:37 PM, andrew giessel <andrew.giessel@gmail.com> wrote:

...
Hello all-

A while back I emailed the list about function for the numpy namespace, iteraxis(), which allows you to generalize the default iteration behavior of numpy arrays over any axis.

I've implemented this function more cleanly and the pull request is here: https://github.com/numpy/numpy/pull/3262, and includes passing tests and documentation.

This is very simple code, which uses np.rollaxis() to bring the desired dimension to the front, and then allows you to loop over slices in this re-structured view of the array. While little more than an alias, I feel this is a very useful function because looping over iterators is a core pattern in python, and makes working with slices of any multidimensional array very pythonic. Adding this function makes this more visible for users, new and old, and I hope members of this list will agree it is worth adding to the namespace.

I'm afraid I don't. It's a just a reduced-functionality version of rollaxis(). I don't think the additional name adds anything substantial.

There's a little more on this in the pull request discussion for those of y'all that are interested. So the decision has to be based on some estimate of: 1) Cost for adding a new function to the namespace 2) Benefit : some combination of: Likelihood of needing to iterate over arbitrary axis. Likelihood of not finding rollaxis / transpose as a solution to this. Increased likelihood of finding iteraxis in this situation. As a data point - Gael pointed me to rollaxis back in the day, I didn't find it myself, and although it was completely obvious in retrospect, it had not previously occurred to me to use transposing for this task. Cheers, Matthew

Robert Kern

5:42 p.m.

On Thu, Apr 25, 2013 at 6:30 PM, Matthew Brett <matthew.brett@gmail.com> wrote:

...

Hi,

On Thu, Apr 25, 2013 at 10:14 AM, Robert Kern <robert.kern@gmail.com> wrote:

...
On Wed, Apr 24, 2013 at 10:37 PM, andrew giessel <andrew.giessel@gmail.com> wrote:

...
Hello all-

A while back I emailed the list about function for the numpy namespace, iteraxis(), which allows you to generalize the default iteration behavior of numpy arrays over any axis.

I've implemented this function more cleanly and the pull request is here: https://github.com/numpy/numpy/pull/3262, and includes passing tests and documentation.

This is very simple code, which uses np.rollaxis() to bring the desired dimension to the front, and then allows you to loop over slices in this re-structured view of the array. While little more than an alias, I feel this is a very useful function because looping over iterators is a core pattern in python, and makes working with slices of any multidimensional array very pythonic. Adding this function makes this more visible for users, new and old, and I hope members of this list will agree it is worth adding to the namespace.

I'm afraid I don't. It's a just a reduced-functionality version of rollaxis(). I don't think the additional name adds anything substantial.

There's a little more on this in the pull request discussion for those of y'all that are interested.

So the decision has to be based on some estimate of:

1) Cost for adding a new function to the namespace 2) Benefit : some combination of: Likelihood of needing to iterate over arbitrary axis. Likelihood of not finding rollaxis / transpose as a solution to this. Increased likelihood of finding iteraxis in this situation.

3) Comparison with other solutions that might obtain the same benefits without the attendant costs: i.e. additional documentation in any number of forms. -- Robert Kern

Matthew Brett

5:54 p.m.

Hi, On Thu, Apr 25, 2013 at 10:42 AM, Robert Kern <robert.kern@gmail.com> wrote:

...

On Thu, Apr 25, 2013 at 6:30 PM, Matthew Brett <matthew.brett@gmail.com> wrote:

...
Hi,

On Thu, Apr 25, 2013 at 10:14 AM, Robert Kern <robert.kern@gmail.com> wrote:

...
On Wed, Apr 24, 2013 at 10:37 PM, andrew giessel <andrew.giessel@gmail.com> wrote:

...
Hello all-

A while back I emailed the list about function for the numpy namespace, iteraxis(), which allows you to generalize the default iteration behavior of numpy arrays over any axis.

I've implemented this function more cleanly and the pull request is here: https://github.com/numpy/numpy/pull/3262, and includes passing tests and documentation.

This is very simple code, which uses np.rollaxis() to bring the desired dimension to the front, and then allows you to loop over slices in this re-structured view of the array. While little more than an alias, I feel this is a very useful function because looping over iterators is a core pattern in python, and makes working with slices of any multidimensional array very pythonic. Adding this function makes this more visible for users, new and old, and I hope members of this list will agree it is worth adding to the namespace.

I'm afraid I don't. It's a just a reduced-functionality version of rollaxis(). I don't think the additional name adds anything substantial.

There's a little more on this in the pull request discussion for those of y'all that are interested.

So the decision has to be based on some estimate of:

1) Cost for adding a new function to the namespace 2) Benefit : some combination of: Likelihood of needing to iterate over arbitrary axis. Likelihood of not finding rollaxis / transpose as a solution to this. Increased likelihood of finding iteraxis in this situation.

3) Comparison with other solutions that might obtain the same benefits without the attendant costs: i.e. additional documentation in any number of forms.

Right, good point. That would also need to be weighted with the likelihood that people will find and read that documentation. Cheers, Matthew

Robert Kern

7:10 p.m.

On Thu, Apr 25, 2013 at 6:54 PM, Matthew Brett <matthew.brett@gmail.com> wrote:

...

Hi,

On Thu, Apr 25, 2013 at 10:42 AM, Robert Kern <robert.kern@gmail.com> wrote:

...
On Thu, Apr 25, 2013 at 6:30 PM, Matthew Brett <matthew.brett@gmail.com> wrote:

...

...
...
So the decision has to be based on some estimate of:

1) Cost for adding a new function to the namespace 2) Benefit : some combination of: Likelihood of needing to iterate over arbitrary axis. Likelihood of not finding rollaxis / transpose as a solution to this. Increased likelihood of finding iteraxis in this situation.

3) Comparison with other solutions that might obtain the same benefits without the attendant costs: i.e. additional documentation in any number of forms.

Right, good point. That would also need to be weighted with the likelihood that people will find and read that documentation.

In my opinion, duplicating functionality under different aliases just so people can supposedly find things without reading the documentation is not a viable strategy for building out an API. My suggestion is to start building out a "How do I ...?" section to the User's Guide that answers small questions like this. "How do I iterate over an arbitrary axis of an array?" should be sufficiently discoverable. This is precisely the kind of problem that documentation solves better than anything else. This is what we write documentation for. Let's make use of it before trying something else. If we add such a section, and still see many people not finding it, then we can consider adding aliases. -- Robert Kern

Andrew Giessel

7:21 p.m.

I respect this opinion. However (and maybe this is legacy), while reading through the numeric.py source file, I was surprised at how short many of the functions are, generally. Functions like ones() and zeros() are pretty simple wrappers which call empty() and then copy over values. FWIW, I had used numpy for over two years before realizing that the default behavior of iterating on a numpy array was to return slices over the first axis (although, this makes sense because it makes a 1d array like a list), and I think it is generally left out of any tutorials or guides. If nothing else I learned how to build the numpy source and how to make tests. And how to iterate over axes with np.rollaxis() ;) Any other opinions from people that haven't commented on the PR thread already? ag On Thu, Apr 25, 2013 at 3:10 PM, Robert Kern <robert.kern@gmail.com> wrote:

...

On Thu, Apr 25, 2013 at 6:54 PM, Matthew Brett <matthew.brett@gmail.com> wrote:

...
Hi,

On Thu, Apr 25, 2013 at 10:42 AM, Robert Kern <robert.kern@gmail.com> wrote:

...
On Thu, Apr 25, 2013 at 6:30 PM, Matthew Brett <matthew.brett@gmail.com> wrote:

...
...
...
So the decision has to be based on some estimate of:

1) Cost for adding a new function to the namespace 2) Benefit : some combination of: Likelihood of needing to iterate over arbitrary axis. Likelihood of not finding rollaxis / transpose as a solution to this. Increased likelihood of finding iteraxis in this situation.

3) Comparison with other solutions that might obtain the same benefits without the attendant costs: i.e. additional documentation in any number of forms.

Right, good point. That would also need to be weighted with the likelihood that people will find and read that documentation.

In my opinion, duplicating functionality under different aliases just so people can supposedly find things without reading the documentation is not a viable strategy for building out an API.

My suggestion is to start building out a "How do I ...?" section to the User's Guide that answers small questions like this. "How do I iterate over an arbitrary axis of an array?" should be sufficiently discoverable. This is precisely the kind of problem that documentation solves better than anything else. This is what we write documentation for. Let's make use of it before trying something else. If we add such a section, and still see many people not finding it, then we can consider adding aliases.

-- Robert Kern _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

-- Andrew Giessel, PhD Department of Neurobiology, Harvard Medical School 220 Longwood Ave Boston, MA 02115 ph: 617.432.7971 email: andrew_giessel@hms.harvard.edu

Robert Kern

7:40 p.m.

On Thu, Apr 25, 2013 at 8:21 PM, Andrew Giessel <andrew_giessel@hms.harvard.edu> wrote:

...

I respect this opinion. However (and maybe this is legacy), while reading through the numeric.py source file, I was surprised at how short many of the functions are, generally. Functions like ones() and zeros() are pretty simple wrappers which call empty() and then copy over values.

Many of these are short, but they do tend to do at least two things that someone would otherwise have to do. This really isn't the case for iteraxis() and rollaxis(). One can use rollaxis() pretty much everywhere you would use iteraxis(), but not vice-versa.

...

FWIW, I had used numpy for over two years before realizing that the default behavior of iterating on a numpy array was to return slices over the first axis (although, this makes sense because it makes a 1d array like a list), and I think it is generally left out of any tutorials or guides.

Then let's add it. -- Robert Kern

josef.pktd＠gmail.com

7:51 p.m.

On Thu, Apr 25, 2013 at 3:40 PM, Robert Kern <robert.kern@gmail.com> wrote:

...

On Thu, Apr 25, 2013 at 8:21 PM, Andrew Giessel <andrew_giessel@hms.harvard.edu> wrote:

...
I respect this opinion. However (and maybe this is legacy), while reading through the numeric.py source file, I was surprised at how short many of the functions are, generally. Functions like ones() and zeros() are pretty simple wrappers which call empty() and then copy over values.

Many of these are short, but they do tend to do at least two things that someone would otherwise have to do. This really isn't the case for iteraxis() and rollaxis(). One can use rollaxis() pretty much everywhere you would use iteraxis(), but not vice-versa.

...
FWIW, I had used numpy for over two years before realizing that the default behavior of iterating on a numpy array was to return slices over the first axis (although, this makes sense because it makes a 1d array like a list), and I think it is generally left out of any tutorials or guides.

That definitely sounds like a documentation problem. I'm using often that it's a python iterator in the first dimension, and can be used with *args and tuple unpacking. (I didn't need it with anything else than axis=0 or axis=-1 for matplotlib IIRC) I never used rollaxis, but I have seen it a lot when I was still reading the nipy source. In general, I think that there are already too many aliases in numpy, or function whether it's not really clear if they are aliases or something slightly different. It took me more than a year to remember what `expand_dims` is called, (I always tried, add_axis) until I bookmarked it for a while. Josef

...

Then let's add it.

-- Robert Kern _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

Charles R Harris

8:04 p.m.

On Thu, Apr 25, 2013 at 1:51 PM, <josef.pktd@gmail.com> wrote:

...

...
On Thu, Apr 25, 2013 at 8:21 PM, Andrew Giessel <andrew_giessel@hms.harvard.edu> wrote:

...
I respect this opinion. However (and maybe this is legacy), while reading through the numeric.py source file, I was surprised at how short many of the functions are, generally. Functions like ones() and zeros() are pretty simple wrappers which call empty() and then copy over values.

Many of these are short, but they do tend to do at least two things that someone would otherwise have to do. This really isn't the case for iteraxis() and rollaxis(). One can use rollaxis() pretty much everywhere you would use iteraxis(), but not vice-versa.

...
FWIW, I had used numpy for over two years before realizing that the default behavior of iterating on a numpy array was to return slices over the first axis (although, this makes sense because it makes a 1d array like a

On Thu, Apr 25, 2013 at 3:40 PM, Robert Kern <robert.kern@gmail.com> wrote: list),

...
...
and I think it is generally left out of any tutorials or guides.

That definitely sounds like a documentation problem. I'm using often that it's a python iterator in the first dimension, and can be used with *args and tuple unpacking. (I didn't need it with anything else than axis=0 or axis=-1 for matplotlib IIRC)

I never used rollaxis, but I have seen it a lot when I was still reading the nipy source.

In general, I think that there are already too many aliases in numpy, or function whether it's not really clear if they are aliases or something slightly different.

It took me more than a year to remember what `expand_dims` is called, (I always tried, add_axis) until I bookmarked it for a while.

After thinking about it, I'm in favor of this small function. Rollaxis takes a bit of thought and document reading to figure out how to use it, whereas this function covers a common use with an easy to understand API. I'm not completely satisfied with the name, it isn't as memorable as I'd like, but that is a small quibble. Chuck

Sebastian Berg

8:38 p.m.

On Thu, 2013-04-25 at 14:04 -0600, Charles R Harris wrote:

...

On Thu, Apr 25, 2013 at 1:51 PM, <josef.pktd@gmail.com> wrote: On Thu, Apr 25, 2013 at 3:40 PM, Robert Kern <robert.kern@gmail.com> wrote: > On Thu, Apr 25, 2013 at 8:21 PM, Andrew Giessel > <andrew_giessel@hms.harvard.edu> wrote: >> I respect this opinion. However (and maybe this is legacy), while reading >> through the numeric.py source file, I was surprised at how short many of the >> functions are, generally. Functions like ones() and zeros() are pretty >> simple wrappers which call empty() and then copy over values. > > Many of these are short, but they do tend to do at least two things > that someone would otherwise have to do. This really isn't the case > for iteraxis() and rollaxis(). One can use rollaxis() pretty much > everywhere you would use iteraxis(), but not vice-versa. > >> FWIW, I had used numpy for over two years before realizing that the default >> behavior of iterating on a numpy array was to return slices over the first >> axis (although, this makes sense because it makes a 1d array like a list), >> and I think it is generally left out of any tutorials or guides.

That definitely sounds like a documentation problem. I'm using often that it's a python iterator in the first dimension, and can be used with *args and tuple unpacking. (I didn't need it with anything else than axis=0 or axis=-1 for matplotlib IIRC)

I never used rollaxis, but I have seen it a lot when I was still reading the nipy source.

In general, I think that there are already too many aliases in numpy, or function whether it's not really clear if they are aliases or something slightly different.

It took me more than a year to remember what `expand_dims` is called, (I always tried, add_axis) until I bookmarked it for a while.

After thinking about it, I'm in favor of this small function. Rollaxis takes a bit of thought and document reading to figure out how to use it, whereas this function covers a common use with an easy to understand API. I'm not completely satisfied with the name, it isn't as memorable as I'd like, but that is a small quibble.

What I am not quite happy with is, that it feels that if we want to keep it open to understanding multiple axes (and maybe also be a method of that same name) at some point, defaulting to flat iteration may be better. So from a future point of view, maybe it should have axes=None as default? I.e. (oh, evil code!) the long term goal could be something like this (obviously it would be preferable and much faster in C...): def iteraxes(arr, axis=None, order='C'): view_shape = [] view_strides = [] op_axes = [] count_axes = 0 if axis is None: op_axes = range(arr.ndim) else: if not isinstance(axis, tuple): axis = {axis} else: axis = set(axis) # ignores duplicates... for ax in range(arr.ndim): if ax in axis: axis.remove(ax) op_axes.append(ax) else: view_shape.append(arr.shape[ax]) view_strides.append(arr.strides[ax]) if len(axis) != 0: raise ValueError i = np.nditer(arr, op_axes=[op_axes], order=order) for s in i: view = np.lib.stride_tricks.as_strided(s, view_shape, view_strides) yield view

...

Chuck

_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

Robert Kern

9:42 a.m.

On Thu, Apr 25, 2013 at 9:04 PM, Charles R Harris <charlesr.harris@gmail.com> wrote:

...

After thinking about it, I'm in favor of this small function. Rollaxis takes a bit of thought and document reading to figure out how to use it, whereas this function covers a common use with an easy to understand API.

It seems to me that just an additional example in the rollaxis() docstring solves that problem: ``rollaxis()`` can be used to iterate over a given axis of a multidimensional array: >>> for x in np.rollaxis(a, 2): ... print x.shape ... (3, 4, 6) (3, 4, 6) (3, 4, 6) (3, 4, 6) (3, 4, 6) -- Robert Kern

Andrew Giessel

11:26 a.m.

I agree with Charles that rollaxis() isn't immediately intuitive. It seems to me that documentation like this doesn't belong in rollaxis() but instead wherever people talk about indexing and/or iterating over an array. Nothing about the iteration depends on rollaxis(), rollaxis is just giving you a different view of the array to call __getitem__() on, if I understand correctly. I'm counting 2 for (Me, Charles), 2 against (Robert, Josef), and two or three neutral parties (based on interest/comments: Matthew, Sebastian, and Phil Elson (who commented on the PR)). I don't know how to best proceed from here. Best, Andrew On Fri, Apr 26, 2013 at 5:42 AM, Robert Kern <robert.kern@gmail.com> wrote:

...

On Thu, Apr 25, 2013 at 9:04 PM, Charles R Harris <charlesr.harris@gmail.com> wrote:

...
After thinking about it, I'm in favor of this small function. Rollaxis takes a bit of thought and document reading to figure out how to use it, whereas this function covers a common use with an easy to understand API.

It seems to me that just an additional example in the rollaxis() docstring solves that problem:

``rollaxis()`` can be used to iterate over a given axis of a multidimensional array:

>>> for x in np.rollaxis(a, 2): ... print x.shape ... (3, 4, 6) (3, 4, 6) (3, 4, 6) (3, 4, 6) (3, 4, 6)

-- Robert Kern _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

-- Andrew Giessel, PhD Department of Neurobiology, Harvard Medical School 220 Longwood Ave Boston, MA 02115 ph: 617.432.7971 email: andrew_giessel@hms.harvard.edu

Robert Kern

11:33 a.m.

On Fri, Apr 26, 2013 at 12:26 PM, Andrew Giessel <andrew_giessel@hms.harvard.edu> wrote:

...

I agree with Charles that rollaxis() isn't immediately intuitive.

It seems to me that documentation like this doesn't belong in rollaxis() but instead wherever people talk about indexing and/or iterating over an array. Nothing about the iteration depends on rollaxis(), rollaxis is just giving you a different view of the array to call __getitem__() on, if I understand correctly.

Docstrings are perfect places to briefly describe and demonstrate common use cases for a function. There is no problem with including the example that I wrote in the rollaxis() docstring. In any case, whether you put the documentation in the rollaxis() docstring or in one of the indexing/iteration sections, or (preferably) both, I strongly encourage you to do that first and see how it goes before adding a new alias. -- Robert Kern

Jason Grout

1:37 p.m.

On 4/26/13 6:33 AM, Robert Kern wrote:

...

In any case, whether you put the documentation in the rollaxis() docstring or in one of the indexing/iteration sections, or (preferably) both, I strongly encourage you to do that first and see how it goes before adding a new alias.

+1 (for what it's worth) to being conservative with API changes as a first resort. Jason

josef.pktd＠gmail.com

1:52 p.m.

the "new" documentation http://stackoverflow.com/questions/1589706/iterating-over-arbitrary-dimensio... second answer, 1st answer is what I usually use search term "[numpy] iterate over axis" Josef On Fri, Apr 26, 2013 at 9:37 AM, Jason Grout <jason-sage@creativetrax.com> wrote:

...

On 4/26/13 6:33 AM, Robert Kern wrote:

...
In any case, whether you put the documentation in the rollaxis() docstring or in one of the indexing/iteration sections, or (preferably) both, I strongly encourage you to do that first and see how it goes before adding a new alias.

+1 (for what it's worth) to being conservative with API changes as a first resort.

Jason

_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

Phil Elson

1:56 p.m.

I didn't find the rollaxis solution particularly obvious and also had to think about what rollaxis did before understanding its usefulness for iteration. Now that I've understood it, I'm +1 for the statement that, as it stands, the proposed iteraxis method doesn't add enough to warrant its inclusion. That said, I do think array iteration could be made simpler (or the function I've missed better documented!). I've put together an implementation of a "slices" function which can return subsets of an array based on the axes provided (a generalisation of iteraxis but implemented slightly differently): def slices(a, axes=-1): indices = np.repeat(slice(None), a.ndim) # turn axes into a 1d array of axes indices axes = np.array(axes).flatten() bad_indices = (axes < (-a.ndim + 1)) | axes > (a.ndim - 1) if np.any(bad_indices): raise ValueError('The axis index/indices were out of range.') # Turn negative indices into real indices axes[axes < 0] = a.ndim + axes[axes < 0] if np.unique(axes).shape != axes.shape: raise ValueError('Repeated axis indices were given.') indexing_shape = np.array(a.shape)[axes] for ind in np.ndindex(*indexing_shape): indices[axes] = ind yield a[tuple(indices)] This can be used simply with:

...

...
...
a = np.ones([2, 3, 4, 5]) for s in slices(a, 2): ... print s.shape ... (2, 3, 5) (2, 3, 5) (2, 3, 5) (2, 3, 5)

Or slightly with the slightly more complex:

...

...
...
len(list(slices(a, [2, -1]))) 20

Without focusing on my actual implementation, would this kind of interface be more desirable? Cheers, On 26 April 2013 12:33, Robert Kern <robert.kern@gmail.com> wrote:

...

On Fri, Apr 26, 2013 at 12:26 PM, Andrew Giessel <andrew_giessel@hms.harvard.edu> wrote:

...
I agree with Charles that rollaxis() isn't immediately intuitive.

It seems to me that documentation like this doesn't belong in rollaxis() but instead wherever people talk about indexing and/or iterating over an array. Nothing about the iteration depends on rollaxis(), rollaxis is just giving you a different view of the array to call __getitem__() on, if I understand correctly.

Docstrings are perfect places to briefly describe and demonstrate common use cases for a function. There is no problem with including the example that I wrote in the rollaxis() docstring.

In any case, whether you put the documentation in the rollaxis() docstring or in one of the indexing/iteration sections, or (preferably) both, I strongly encourage you to do that first and see how it goes before adding a new alias.

-- Robert Kern _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

Andrew Giessel

5:02 p.m.

...

From what I can see, the ordering of the returned slices when you use more

I like this, thank you Phil. than one axis (ie: slices(a, [1,2]), increments the last axis fastest. Does this makes sense based on the default ordering of, say, nditer()? I know that C-order (row major) and Fortran order (column major) are two ways of ordering the returned values- which does this default to? Is there a default across numpy? best, On Fri, Apr 26, 2013 at 9:56 AM, Phil Elson <pelson.pub@gmail.com> wrote:

...

I didn't find the rollaxis solution particularly obvious and also had to think about what rollaxis did before understanding its usefulness for iteration. Now that I've understood it, I'm +1 for the statement that, as it stands, the proposed iteraxis method doesn't add enough to warrant its inclusion.

That said, I do think array iteration could be made simpler (or the function I've missed better documented!). I've put together an implementation of a "slices" function which can return subsets of an array based on the axes provided (a generalisation of iteraxis but implemented slightly differently):

def slices(a, axes=-1): indices = np.repeat(slice(None), a.ndim) # turn axes into a 1d array of axes indices axes = np.array(axes).flatten()

bad_indices = (axes < (-a.ndim + 1)) | axes > (a.ndim - 1) if np.any(bad_indices): raise ValueError('The axis index/indices were out of range.')

# Turn negative indices into real indices axes[axes < 0] = a.ndim + axes[axes < 0]

if np.unique(axes).shape != axes.shape: raise ValueError('Repeated axis indices were given.')

indexing_shape = np.array(a.shape)[axes]

for ind in np.ndindex(*indexing_shape): indices[axes] = ind yield a[tuple(indices)]

This can be used simply with:

...
...
...
a = np.ones([2, 3, 4, 5]) for s in slices(a, 2): ... print s.shape ... (2, 3, 5) (2, 3, 5) (2, 3, 5) (2, 3, 5)

Or slightly with the slightly more complex:

...
...
...
len(list(slices(a, [2, -1]))) 20

Without focusing on my actual implementation, would this kind of interface be more desirable?

Cheers,

On 26 April 2013 12:33, Robert Kern <robert.kern@gmail.com> wrote:

...
On Fri, Apr 26, 2013 at 12:26 PM, Andrew Giessel <andrew_giessel@hms.harvard.edu> wrote:

...
I agree with Charles that rollaxis() isn't immediately intuitive.

It seems to me that documentation like this doesn't belong in rollaxis() but instead wherever people talk about indexing and/or iterating over an array. Nothing about the iteration depends on rollaxis(), rollaxis is just giving you a different view of the array to call __getitem__() on, if I understand correctly.

Docstrings are perfect places to briefly describe and demonstrate common use cases for a function. There is no problem with including the example that I wrote in the rollaxis() docstring.

In any case, whether you put the documentation in the rollaxis() docstring or in one of the indexing/iteration sections, or (preferably) both, I strongly encourage you to do that first and see how it goes before adding a new alias.

-- Robert Kern _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

-- Andrew Giessel, PhD Department of Neurobiology, Harvard Medical School 220 Longwood Ave Boston, MA 02115 ph: 617.432.7971 email: andrew_giessel@hms.harvard.edu

Matthew Brett

8:45 p.m.

Hi, On Fri, Apr 26, 2013 at 10:02 AM, Andrew Giessel <andrew_giessel@hms.harvard.edu> wrote:

...

I like this, thank you Phil.

From what I can see, the ordering of the returned slices when you use more than one axis (ie: slices(a, [1,2]), increments the last axis fastest. Does this makes sense based on the default ordering of, say, nditer()? I know that C-order (row major) and Fortran order (column major) are two ways of ordering the returned values- which does this default to? Is there a default across numpy?

There was a thread on the distinction between index ordering and memory layout starting here: http://www.mail-archive.com/numpy-discussion@scipy.org/msg40956.html The answer is that C-like index ordering is the default across numpy (last changing fastest), and that, typically (always?) you can change this ordering to Fortran-like (first-fastest) with an 'order' keyword to the function or method. Cheers, Matthew

Gael Varoquaux

8:50 p.m.

On Thu, Apr 25, 2013 at 08:10:32PM +0100, Robert Kern wrote:

...

In my opinion, duplicating functionality under different aliases just so people can supposedly find things without reading the documentation is not a viable strategy for building out an API.

+1. It's been my experience over and over again. Richer APIs are actually less used and known than simple APIs. People don't find their way around them.

...

My suggestion is to start building out a "How do I ...?" section to the User's Guide that answers small questions like this. "How do I iterate over an arbitrary axis of an array?" should be sufficiently discoverable.

Indeed, I agree that this is a documentation problem. It does not make it a simple problem. G

Andrew Giessel

6:10 p.m.

Matthew: Thanks for the link to array order discussion. Any more thoughts on Phil's slice() function? On Fri, Apr 26, 2013 at 4:50 PM, Gael Varoquaux < gael.varoquaux@normalesup.org> wrote:

...

On Thu, Apr 25, 2013 at 08:10:32PM +0100, Robert Kern wrote:

...
In my opinion, duplicating functionality under different aliases just so people can supposedly find things without reading the documentation is not a viable strategy for building out an API.

+1. It's been my experience over and over again. Richer APIs are actually less used and known than simple APIs. People don't find their way around them.

...
My suggestion is to start building out a "How do I ...?" section to the User's Guide that answers small questions like this. "How do I iterate over an arbitrary axis of an array?" should be sufficiently discoverable.

Indeed, I agree that this is a documentation problem. It does not make it a simple problem.

G _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

-- Andrew Giessel, PhD Department of Neurobiology, Harvard Medical School 220 Longwood Ave Boston, MA 02115 ph: 617.432.7971 email: andrew_giessel@hms.harvard.edu

Benjamin Root

May 2013

5:50 p.m.

On Mon, Apr 29, 2013 at 2:10 PM, Andrew Giessel < andrew_giessel@hms.harvard.edu> wrote:

...

Matthew: Thanks for the link to array order discussion.

Any more thoughts on Phil's slice() function?

I rather like Phil's solution. Just some caveats. Will it always return views or copies? It should be one or the other (I haven't looked closely enough to check), and it should be documented to that affect. Plus, tests should be added to make sure it does that. Cheers! Ben Root

4327

Age (days ago)

4334

Last active (days ago)

List overview

Download

21 comments

11 participants

participants (11)

andrew giessel
Andrew Giessel
Benjamin Root
Charles R Harris
Gael Varoquaux
Jason Grout
josef.pktd＠gmail.com
Matthew Brett
Phil Elson
Robert Kern
Sebastian Berg

Proposal of new function: iteraxis()

andrew giessel

Andrew Giessel

Andrew Giessel

Jason Grout

Andrew Giessel

Andrew Giessel

Benjamin Root

tags

participants (11)