
On Fri, Oct 17, 2014 at 10:56 PM, Stephan Hoyer <shoyer@gmail.com> wrote:
Yesterday I created a GitHub issue proposing adding an axis argument to numpy's gufuncs: https://github.com/numpy/numpy/issues/5197
I was told I should repost this on the mailing list, so here's the recap:
I would like to write generalized ufuncs (probably using numba), to create fast functions such as nanmean (signature '(n)->()') or rolling_mean (signature '(n),()->(n)') that take the axis along which to aggregate as a keyword argument, e.g., nanmean(x, axis=0) or rolling_mean(x, window=5, axis=0).
Of course, I could write my own wrapper for this that reorders dimensions using swapaxes or transpose. But I also think that an "axis" argument to allow for specifying the core dimensions of gufuncs would be more generally useful, and we should consider adding it to numpy.
Nathaniel and Jaime added some good points, noting that such an axis argument should cleanly handle multiple input and output arguments and have a plan for handling optional dimensions (e.g., (m?,n),(n,p?)->(m?,p?) for the new dot).
Here are my initial thoughts on the syntax:
(1) Generally speaking, I think the "nested tuple" syntax (e.g., axis=[(0, 1), (2, 3)]) would be most congruous with the axis arguments numpy already supports.
(2) For gufuncs with simpler signatures, we should support supplying an integer or an unnested tuple, e.g., - axis=0 for (n)->() - axis=(0, 1) for (n)(m)->() or (n,m)->() - axis=[(0, 1), 2] for (n,m),(o)->().
(3) If we require a full axis specification for core dimensions, we could use the axis argument for unambiguous control of optional core dimensions: e.g., axis=(0, 1) would indicate that you want the "vectorized inner product" version of the new dot operator, rather than matrix multiplication, and axis=[(-2, -1), -1] would mean that you want the "vectorized matrix-vector product". This seems relatively tidy, although I admit I am not convinced that optional core dimensions are necessary.
(4) We can either include the output axis as part of the signature, or add another argument "axis_out" or "out_axis". I think prefer the separate argument, particularly if we require "axis" to specify all core dimensions, which may be a good idea even if we don't use "axis" for controlling optional core dimensions.
Might want to contact continuum analytics also. They recently created a gufunc <https://github.com/ContinuumIO/blaze> repository. Chuck