It’s nice that this is pure Python / NumPy vectorized, whereas generic_filter requires some compilation to get good performance. (Tim, although your implementation is nice and readable, it would have been very slow for any significant volumes.)
However, my feeling is that this function is too specialized for a foundational package like NumPy. As Sebastian Berg pointed out on one of the PRs, it can cause confusion when there are many ways of achieving the same outcome. imho, the One Way to do this kind of operation is using generic_filter together with LowLevelCallable. My two blog posts on the topic:
https://ilovesymposia.com/2017/03/12/scipys-new-lowlevelcallable-is-a-game-changer/
https://ilovesymposia.com/2017/03/15/prettier-lowlevelcallables-with-numba-jit-and-decorators/
This has the advantage that it’s even more general. (In fact, it avoids the repeated-applications-vs-diagonal-application argument altogether. These are simply two different kernels.)
Perhaps ndimage lacks discoverability to other fields… But I think that can be better solved with documentation, rather than duplicating functionality and cluttering the NumPy API.
Sorry!
Juan.