[Numpy-discussion] PR added: frozen dimensions in gufunc signatures

Thu Aug 28 20:40:06 EDT 2014

On Fri, Aug 29, 2014 at 1:14 AM, Jaime Fernández del Río
<jaime.frio at gmail.com> wrote:
> Hi,
>
> I have just sent a PR (https://github.com/numpy/numpy/pull/5015), adding the
> possibility of having frozen dimensions in gufunc signatures. As a proof of
> concept, I have added a `cross1d` gufunc to `numpy.core.umath_tests`:
>
> In [1]: import numpy as np
> In [2]: from numpy.core.umath_tests import cross1d
>
> In [3]: cross1d.signature
> Out[3]: '(3),(3)->(3)'
>
> In [4]: a = np.random.rand(1000, 3)
> In [5]: b = np.random.rand(1000, 3)
>
> In [6]: np.allclose(np.cross(a, b), cross1d(a, b))
> Out[6]: True
>
> In [7]: %timeit np.cross(a, b)
> 10000 loops, best of 3: 76.1 us per loop
>
> In [8]: %timeit cross1d(a, b)
> 100000 loops, best of 3: 13.1 us per loop
>
> In [9]: c = np.random.rand(1000, 2)
> In [10]: d = np.random.rand(1000, 2)
>
> In [11]: cross1d(c, d)
> ---------------------------------------------------------------------------
> ValueError                                Traceback (most recent call last)
> <ipython-input-11-72c66212e40c> in <module>()
> ----> 1 cross1d(c, d)
>
> ValueError: cross1d: Operand 0 has a mismatch in its core dimension 0, with
> gufunc signature (3),(3)->(3) (size 2 is different from 3)
>
> The speed up over `np.cross` is nice, and while `np.cross` is not the best
> of examples, as it needs to handle more sizes, in many cases this will allow
> producing gufuncs that work without a Python wrapper redoing checks that are
> best left to the iterator, such as dimension sizes.
>
> It still needs tests, but before embarking on fully developing those, I
> wanted to make sure that there is an interest on this.
>
> I would also like to further enhance gufuncs providing computed dimensions,
> e.g. making it possible to e.g. define `pairwise_cross` with signature '(n,
> 3)->($m, 3)', where the $ indicates that m is a computed dimension, that
> would have to be calculated by a function passed to the gufunc constructor
> and stored in the gufunc object, based on the other core dimensions. In this
> case it would make $m be n*(n-1), so that all pairwise cross products
> between 3D vectors could be computed.
>
> The syntax with '$' is kind of crappy, so any suggestions on how to better
> express this in the signature are more than welcome, as well as any feedback
> on the merits (or lack of them) of implementing this.

Some thoughts:

When I first saw the PR my first reaction was that maybe we should be
allowing more general hooks for a gufunc to choose its core
dimensions. Reading the code convinced me that this is a relatively
minimal enhancement over what we're currently doing, so your current
PR looks fine to me.

But, for your computed dimension idea I'm wondering if what we should
do instead is just let a gufunc provide a C callback that looks at the
input array dimensions and explicitly says somehow which dimensions it
wants to treat as the core dimensions and what its output shapes will
be. There's no rule that we have to extend the signature mini-language
to be Turing complete, we can just use C :-).

It would be good to have a better motivation for computed gufunc
dimensions, though. Your "all pairwise cross products" example would
be *much* better handled by implementing the .outer method for binary
gufuncs: pairwise_cross(a) == cross.outer(a, a). This would make
gufuncs more consistent with ufuncs, plus let you do
all-pairwise-cross-products between two different sets of cross
products, plus give us all-pairwise-matrix-products for free, etc.

While you're messing around with the gufunc dimension matching logic,
any chance we can tempt you to implement the "optional dimensions"
needed to handle '@', solve, etc. elegantly? The rule would be that
you can write something like
   (n?,k),(k,m?)->(n?,m?)
and the ? dimensions are allowed to take on an additional value
"nothing at all". If there's no dimension available in the input, then
we act like it was reshaped to add a dimension with shape 1, and then
in the output we squeeze this dimension out again. I guess the rules
would be that (1) in the input, you can have ? dimensions at the
beginning or the end of your shape, but not both at the same time, (2)
any dimension that has a ? in one place must have it in all places,
(3) when checking argument conformity, "nothing at all" only matches
against "nothing at all", not against 1; this is because if we allowed
(n?,m),(n?,m)->(n?,m) to be applied to two arrays with shapes (5,) and
(1, 5), then it would be ambiguous whether the output should have
shape (5,) or (1, 5).

-n

-- 
Nathaniel J. Smith
Postdoctoral researcher - Informatics - University of Edinburgh
http://vorpus.org