PR Idea: Allowing multiple axis arguments (axis = tuple of ints) in the stats package?

Hello! I use scipy.stats.sem a lot and I would love if it was able to take multiple axis arguments as many numpy functions can. Looking at the source code sem and other functions in scipy.stats are already implemented mostly in terms of numpy functions so it seems like it would only require changing some parts of the logic. Taking stats.sem as an example I think it would require changing n = a.shape[axis] to something like n = product(a.shape[axis]) For masked arrays, a.count(axis) is used which already works with multiple axes. Before I start work on a PR I wanted to ask if there is some reason that this change would be considered a bad idea. Otherwise, if I write a decent PR with tests, benchmarks and documentation that updates all the functions in scipy.stats that might reasonably take multiple axis arguments, is it likely to be accepted? Thanks, Tom

On Sat, Oct 3, 2020 at 2:02 PM Thomas Hodson <thomas.c.hodson@gmail.com> wrote:
Hello!
I use scipy.stats.sem a lot and I would love if it was able to take multiple axis arguments as many numpy functions can. Looking at the source code sem and other functions in scipy.stats are already implemented mostly in terms of numpy functions so it seems like it would only require changing some parts of the logic. Taking stats.sem as an example I think it would require changing n = a.shape[axis] to something like n = product(a.shape[axis]) For masked arrays, a.count(axis) is used which already works with multiple axes. Before I start work on a PR I wanted to ask if there is some reason that this change would be considered a bad idea. Otherwise, if I write a decent PR with tests, benchmarks and documentation that updates all the functions in scipy.stats that might reasonably take multiple axis arguments, is it likely to be accepted?
Good question, thanks for asking. At first sight it does seem appealing, but also a bit worrying that we may end up with something less consistent. Right now in scipy.stats there's O(100) instances of the axis keyword, and they all work the same (int or None), with the exception of gstd and iqr, which take multiple integers. NumPy on the other hand is far messier, whether or not tuple of ints is accepted is less predictable, and it's also more common to use "axes" when tuple of ints is accepted. So I'd think it's fine if you indeed do it for all functions in scipy.stats, and if there's no unexpected complications in the implementation (e.g. makes functions with nan_policy or shape prediction much harder to get right). Cheers, Ralf Thanks, Tom
_______________________________________________ SciPy-Dev mailing list SciPy-Dev@python.org https://mail.python.org/mailman/listinfo/scipy-dev
participants (2)
-
Ralf Gommers
-
Thomas Hodson