On Mar 17, 2014 5:54 PM, "Nathaniel Smith" <njs@pobox.com> wrote:
>
> On Sat, Mar 15, 2014 at 6:28 PM, Nathaniel Smith <njs@pobox.com> wrote:
> > Mathematica: instead of having an associativity, a @ b @ c gets
> > converted into mdot([a, b, c])
>
> out), and wondering if Mathematica's approach is worth following up
> more. (It would need to make it past python-dev, of course, but worst
> case is just that they say no and we're back where we are now, so we
> might as well think it through.)
>
> Here's how it would work:
>
> Currently Python has 3 different kinds of ops: left-associative (most
> of them), right-associative (**), and "chaining". Chaining is used for
> comparison ops. Example:
>
>    a < b < c
>
> gets parsed to something like
>
>    do_comparison(args=[a, b, c], ops=[lt, lt])
>
> Notice this is very different from either of
>
>   (a < b) < c
>   a < (b < c)
>
> Which means that comparisons aren't left- OR right-associative,
> they're this other thing, "chaining".
>
> So we could propose adding a 4th kind of op, calling "grouping", which
> would be only @. And the idea is that
>
>   a @ b @ c
>
> would be equivalent to
>
>   operator.matmul((a, b, c))
>
> which eventually (see below) becomes a call to
>
>   a.__matmul__((a, b, c))
>
> We'd use exactly the same parsing rules as the chaining ops, so you
> can still control evaluation order with parentheses if you want:
>
>   a @ (b @ c) -> matmul((a, matmul((b, c))))
>   (a @ b) @ c -> matmul((matmul((a, c)), c))
>
> ...but if you don't specify, then each contiguous group of @ operators
> gets collected up and handed to __matmul__ together, and the
> __matmul__ implementation gets to decide which evaluation strategy to
> use.
>
> It's trivially fast for the computer to figure out the best evaluation
> order for matrix multiplication, so in practice I think this would
> mean that you could just stop worrying about parentheses for multiple
> contiguous calls to matmul. Fancier versions of __matmul__ defined on
> more specialized non-ndarray classes might even take into account
> their specialized knowledge of how expensive different evaluation
> orders are for their specific type -- I'm not sure if this actually
> happens in practice, but it might. (Like maybe the best way to
> evaluate a @ b @ c depends on the sparsity pattern in the various
> matrices, or maybe it depends on which matrices are currently on the
> GPU and which are in memory? Anyone have any real examples of this?)
>
> (Of course, this same evaluation-order problem arises for *all*
> expressions using numpy; being able to optimize whole expressions at a
> time is exactly what numexpr/numba/theano/etc. are useful for. So one
> could argue that "baking it in" to @ is pointless, if anyone gets
> tired of writing parentheses they should just use one of these
> libraries. Or maybe evaluation order problems arise so rarely for @
> that no-one cares. But OTOH it would be so nice for once to just have
> a single best solution -- "always use @ and be happy, it just works"
> -- instead of all the caveats we normally do -- "@ is good in some
> cases, but in other cases mdot is better, or if you know you can just
> use @ with the right parentheses...".)
>
> Of course, we still have to say something about what value a @ b @ c
> actually computes. In the PEP semantics, it isn't always associative
> -- specifically not if we do Mat @ vec @ Mat. So in this approach, we
> still need to decide what
>   matmul((Mat, vec, Mat))
> should return.
>
> But, this is actually a feature! Because obviously what *should* be
> returned in this case is *not* (Mat @ vec) @ Mat, *or* Mat @ (vec @
> Mat). Both of those answers are terrible; it's just, if you have an
> ordinary left-/right-associative operator, those are your only
> options. What *should* be returned is an error. And in this scheme we
> get to see the whole @ expression at once, so we actually can raise an
> error for such things.
>
> So, this possibly has nicer performance characteristics, and is also
> possibly has nicer semantics.
>
> Now, how would this look in terms of the language definition?
>
> As far as the parser and AST go, this would use exactly the same rules
> as the chaining ops, so that's easy.
>
> Having parsed, we must evaluate. Likely the most contentious part of
> this approach is that we now have an n-arg operator, so the standard
> __X__/__rX__ dichotomy won't work, we need to do something like
> multiple dispatch. I haven't followed the debate on this issue in
> detail, but what I'd propose for this limited context is not to do
> anything like "real" multiple dispatch, but just directly generalize
> the familiar __rX__ rule to n arguments. The __rX__ rule is how
> Python's existing binary operators work: usually to evaluate a # b,
> you try a.__foo__, and then b.__foo__ EXCEPT if b is a proper subclass
> of a, you try b first. Generalized to >2 arguments, this looks like:
>
> def operator.matmul(args):
>     candidates = list(args)
>     while candidates:
>         candidate = pop_next_candidate(candidates)
>         if hasattr(candidate, "__matmul__"):
>             result = candidate.__matmul__(args)
>             if result is not NotImplemented:
>                 return result
>     raise TypeError
>
> def pop_next_candidate(candidates):
>     classes = [c.__class__ for c in candidates]
>     # We'll try the left-most remaining candidate...
>     for i in range(len(candidates)):
>         # ...unless there's a later, untried candidate that's a proper subclass.
>         if not has_proper_subclass(classes[i], classes):
>             return candidates.pop(i)
>     assert False
>
> def has_proper_subclass(class_, other_classes):
>     for other_class in other_classes:
>         if (issubclass(other_class, class_)
>             and not issubclass(class_, other_class)):
>             return True
>     return False
>
> ...which, it turns out, is exactly the lookup rule that
> __numpy_ufunc__ will use, so at least it isn't too weird from our
> point of view:
>     http://docs.scipy.org/doc/numpy-dev/reference/arrays.classes.html#numpy.class.__numpy_ufunc__
>
> There are still plenty of details to think about, e.g.:
>
> What does the in-place operator do? I think
>   a @= b @ c
> would have to be the same as
>   a = a @ (b @ c)
> and NOT
>   a = a @ b @ c
> because otherwise what do you do with
>   a @= b @ c + d.

You cannot do inplace matrix multiplication without an intermediate copy of the first array, or at least of each row of the first array in turns. I don't like the idea of providing syntax that looks faster but isn't, I'd rather see @= return an error in the context of matrix multiplication.

> I'm not sure how this would interact with implementing np.dot (which
> we'd still need for its out= argument, and will I guess do dispatch
> through the __numpy_ufunc__ mechanism?). We should probably work
> through this in detail.
>
> We'd still need to pick a precedence for @: grouping is different than
> left-associativity, so we can't do same-group, we'd have to pick
> either tight-group or weak-group. My gut feeling is that tight-group
> makes more sense that weak-group, because if @ is a magical thing that
> collects up a group of items, then it is helpful if there's a simple
> visual mapping where the group starts just to the left of the first @
> and extends just to the right of the last @.
>
> I'm still not at all sure this rigmorale is worth it -- I still think
> we need some data on how often people chain together multiple @ or
> np.dot calls.
>
> Still, I thought I'd throw this out here and see what people think of it.
>
> -n
>
> --
> Nathaniel J. Smith
> Postdoctoral researcher - Informatics - University of Edinburgh
> http://vorpus.org
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion