On Mar 17, 2014 5:54 PM, "Nathaniel Smith" <njs@pobox.com> wrote:

>

> On Sat, Mar 15, 2014 at 6:28 PM, Nathaniel Smith <njs@pobox.com> wrote:

> > Mathematica: instead of having an associativity, a @ b @ c gets

> > converted into mdot([a, b, c])

>

> So, I've been thinking about this (thanks to @rfateman for pointing it

> out), and wondering if Mathematica's approach is worth following up

> more. (It would need to make it past python-dev, of course, but worst

> case is just that they say no and we're back where we are now, so we

> might as well think it through.)

>

> Here's how it would work:

>

> Currently Python has 3 different kinds of ops: left-associative (most

> of them), right-associative (**), and "chaining". Chaining is used for

> comparison ops. Example:

>

> a < b < c

>

> gets parsed to something like

>

> do_comparison(args=[a, b, c], ops=[lt, lt])

>

> Notice this is very different from either of

>

> (a < b) < c

> a < (b < c)

>

> Which means that comparisons aren't left- OR right-associative,

> they're this other thing, "chaining".

>

> So we could propose adding a 4th kind of op, calling "grouping", which

> would be only @. And the idea is that

>

> a @ b @ c

>

> would be equivalent to

>

> operator.matmul((a, b, c))

>

> which eventually (see below) becomes a call to

>

> a.__matmul__((a, b, c))

>

> We'd use exactly the same parsing rules as the chaining ops, so you

> can still control evaluation order with parentheses if you want:

>

> a @ (b @ c) -> matmul((a, matmul((b, c))))

> (a @ b) @ c -> matmul((matmul((a, c)), c))

>

> ...but if you don't specify, then each contiguous group of @ operators

> gets collected up and handed to __matmul__ together, and the

> __matmul__ implementation gets to decide which evaluation strategy to

> use.

>

> It's trivially fast for the computer to figure out the best evaluation

> order for matrix multiplication, so in practice I think this would

> mean that you could just stop worrying about parentheses for multiple

> contiguous calls to matmul. Fancier versions of __matmul__ defined on

> more specialized non-ndarray classes might even take into account

> their specialized knowledge of how expensive different evaluation

> orders are for their specific type -- I'm not sure if this actually

> happens in practice, but it might. (Like maybe the best way to

> evaluate a @ b @ c depends on the sparsity pattern in the various

> matrices, or maybe it depends on which matrices are currently on the

> GPU and which are in memory? Anyone have any real examples of this?)

>

> (Of course, this same evaluation-order problem arises for *all*

> expressions using numpy; being able to optimize whole expressions at a

> time is exactly what numexpr/numba/theano/etc. are useful for. So one

> could argue that "baking it in" to @ is pointless, if anyone gets

> tired of writing parentheses they should just use one of these

> libraries. Or maybe evaluation order problems arise so rarely for @

> that no-one cares. But OTOH it would be so nice for once to just have

> a single best solution -- "always use @ and be happy, it just works"

> -- instead of all the caveats we normally do -- "@ is good in some

> cases, but in other cases mdot is better, or if you know you can just

> use @ with the right parentheses...".)

>

> Of course, we still have to say something about what value a @ b @ c

> actually computes. In the PEP semantics, it isn't always associative

> -- specifically not if we do Mat @ vec @ Mat. So in this approach, we

> still need to decide what

> matmul((Mat, vec, Mat))

> should return.

>

> But, this is actually a feature! Because obviously what *should* be

> returned in this case is *not* (Mat @ vec) @ Mat, *or* Mat @ (vec @

> Mat). Both of those answers are terrible; it's just, if you have an

> ordinary left-/right-associative operator, those are your only

> options. What *should* be returned is an error. And in this scheme we

> get to see the whole @ expression at once, so we actually can raise an

> error for such things.

>

> So, this possibly has nicer performance characteristics, and is also

> possibly has nicer semantics.

>

> Now, how would this look in terms of the language definition?

>

> As far as the parser and AST go, this would use exactly the same rules

> as the chaining ops, so that's easy.

>

> Having parsed, we must evaluate. Likely the most contentious part of

> this approach is that we now have an n-arg operator, so the standard

> __X__/__rX__ dichotomy won't work, we need to do something like

> multiple dispatch. I haven't followed the debate on this issue in

> detail, but what I'd propose for this limited context is not to do

> anything like "real" multiple dispatch, but just directly generalize

> the familiar __rX__ rule to n arguments. The __rX__ rule is how

> Python's existing binary operators work: usually to evaluate a # b,

> you try a.__foo__, and then b.__foo__ EXCEPT if b is a proper subclass

> of a, you try b first. Generalized to >2 arguments, this looks like:

>

> def operator.matmul(args):

> candidates = list(args)

> while candidates:

> candidate = pop_next_candidate(candidates)

> if hasattr(candidate, "__matmul__"):

> result = candidate.__matmul__(args)

> if result is not NotImplemented:

> return result

> raise TypeError

>

> def pop_next_candidate(candidates):

> classes = [c.__class__ for c in candidates]

> # We'll try the left-most remaining candidate...

> for i in range(len(candidates)):

> # ...unless there's a later, untried candidate that's a proper subclass.

> if not has_proper_subclass(classes[i], classes):

> return candidates.pop(i)

> assert False

>

> def has_proper_subclass(class_, other_classes):

> for other_class in other_classes:

> if (issubclass(other_class, class_)

> and not issubclass(class_, other_class)):

> return True

> return False

>

> ...which, it turns out, is exactly the lookup rule that

> __numpy_ufunc__ will use, so at least it isn't too weird from our

> point of view:

> http://docs.scipy.org/doc/numpy-dev/reference/arrays.classes.html#numpy.class.__numpy_ufunc__

>

> There are still plenty of details to think about, e.g.:

>

> What does the in-place operator do? I think

> a @= b @ c

> would have to be the same as

> a = a @ (b @ c)

> and NOT

> a = a @ b @ c

> because otherwise what do you do with

> a @= b @ c + d.

You cannot do inplace matrix multiplication without an intermediate copy of the first array, or at least of each row of the first array in turns. I don't like the idea of providing syntax that looks faster but isn't, I'd rather see @= return an error in the context of matrix multiplication.

> I'm not sure how this would interact with implementing np.dot (which

> we'd still need for its out= argument, and will I guess do dispatch

> through the __numpy_ufunc__ mechanism?). We should probably work

> through this in detail.

>

> We'd still need to pick a precedence for @: grouping is different than

> left-associativity, so we can't do same-group, we'd have to pick

> either tight-group or weak-group. My gut feeling is that tight-group

> makes more sense that weak-group, because if @ is a magical thing that

> collects up a group of items, then it is helpful if there's a simple

> visual mapping where the group starts just to the left of the first @

> and extends just to the right of the last @.

>

> I'm still not at all sure this rigmorale is worth it -- I still think

> we need some data on how often people chain together multiple @ or

> np.dot calls.

>

> Still, I thought I'd throw this out here and see what people think of it.

>

> -n

>

> --

> Nathaniel J. Smith

> Postdoctoral researcher - Informatics - University of Edinburgh

> http://vorpus.org

> _______________________________________________

> NumPy-Discussion mailing list

> NumPy-Discussion@scipy.org

> http://mail.scipy.org/mailman/listinfo/numpy-discussion