[Numpy-discussion] [help needed] associativity and precedence of '@'

Fri Mar 14 23:41:50 EDT 2014

Hi all,

Here's the main blocker for adding a matrix multiply operator '@' to
Python: we need to decide what we think its precedence and associativity
should be. I'll explain what that means so we're on the same page, and what
the choices are, and then we can all argue about it. But even better would
be if we could get some data to guide our decision, and this would be a lot
easier if some of you all can help; I'll suggest some ways you might be
able to do that.

So! Precedence and left- versus right-associativity. If you already know
what these are you can skim down until you see CAPITAL LETTERS.

We all know what precedence is. Code like this:
  a + b * c
gets evaluated as:
  a + (b * c)
because * has higher precedence than +. It "binds more tightly", as they
say. Python's complete precedence able is here:
  http://docs.python.org/3/reference/expressions.html#operator-precedence

Associativity, in the parsing sense, is less well known, though it's just
as important. It's about deciding how to evaluate code like this:
  a * b * c
Do we use
  a * (b * c)    # * is "right associative"
or
  (a * b) * c    # * is "left associative"
? Here all the operators have the same precedence (because, uh... they're
the same operator), so precedence doesn't help. And mostly we can ignore
this in day-to-day life, because both versions give the same answer, so who
cares. But a programming language has to pick one (consider what happens if
one of those objects has a non-default __mul__ implementation). And of
course it matters a lot for non-associative operations like
  a - b - c
or
  a / b / c
So when figuring out order of evaluations, what you do first is check the
precedence, and then if you have multiple operators next to each other with
the same precedence, you check their associativity. Notice that this means
that if you have different operators that share the same precedence level
(like + and -, or * and /), then they have to all have the same
associativity. All else being equal, it's generally considered nice to have
fewer precedence levels, because these have to be memorized by users.

Right now in Python, every precedence level is left-associative, except for
'**'. If you write these formulas without any parentheses, then what the
interpreter will actually execute is:
  (a * b) * c
  (a - b) - c
  (a / b) / c
but
  a ** (b ** c)

Okay, that's the background. Here's the question. We need to decide on
precedence and associativity for '@'. In particular, there are three
different options that are interesting:

OPTION 1 FOR @:
Precedence: same as *
Associativity: left
My shorthand name for it: "same-left" (yes, very creative)

This means that if you don't use parentheses, you get:
   a @ b @ c  ->  (a @ b) @ c
   a * b @ c  ->  (a * b) @ c
   a @ b * c  ->  (a @ b) * c

OPTION 2 FOR @:
Precedence: more-weakly-binding than *
Associativity: right
My shorthand name for it: "weak-right"

This means that if you don't use parentheses, you get:
   a @ b @ c  ->  a @ (b @ c)
   a * b @ c  ->  (a * b) @ c
   a @ b * c  ->  a @ (b * c)

OPTION 3 FOR @:
Precedence: more-tightly-binding than *
Associativity: right
My shorthand name for it: "tight-right"

This means that if you don't use parentheses, you get:
   a @ b @ c  ->  a @ (b @ c)
   a * b @ c  ->  a * (b @ c)
   a @ b * c  ->  (a @ b) * c

We need to pick which of which options we think is best, based on whatever
reasons we can think of, ideally more than "hmm, weak-right gives me warm
fuzzy feelings" ;-). (In principle the other 2 possible options are
tight-left and weak-left, but there doesn't seem to be any argument in
favor of either, so we'll leave them out of the discussion.)

Some things to consider:

* and @ are actually not associative (in the math sense) with respect to
each other, i.e., (a * b) @ c and a * (b @ c) in general give different
results when 'a' is not a scalar. So considering the two expressions 'a * b
@ c' and 'a @ b * c', we can see that each of these three options gives
produces different results in some cases.

"Same-left" is the easiest to explain and remember, because it's just, "@
acts like * and /". So we already have to know the rule in order to
understand other non-associative expressions like a / b / c or a - b - c,
and it'd be nice if the same rule applied to things like a * b @ c so we
only had to memorize *one* rule. (Of course there's ** which uses the
opposite rule, but I guess everyone internalized that one in secondary
school; that's not true for * versus @.) This is definitely the default we
should choose unless we have a good reason to do otherwise.

BUT: there might indeed be a good reason to do otherwise, which is the
whole reason this has come up. Consider:
    Mat1 @ Mat2 @ vec
Obviously this will execute much more quickly if we do
    Mat1 @ (Mat2 @ vec)
because that results in two cheap matrix-vector multiplies, while
    (Mat1 @ Mat2) @ vec
starts out by doing an expensive matrix-matrix multiply. So: maybe @ should
be right associative, so that we get the fast behaviour without having to
use explicit parentheses! /If/ these kinds of expressions are common enough
that having to remember to put explicit parentheses in all the time is more
of a programmer burden than having to memorize a special associativity rule
for @. Obviously Mat @ Mat @ vec is more common than vec @ Mat @ Mat, but
maybe they're both so rare that it doesn't matter in practice -- I don't
know.

Also, if we do want @ to be right associative, then I can't think of any
clever reasons to prefer weak-right over tight-right, or vice-versa. For
the scalar multiplication case, I believe both options produce the same
result in the same amount of time. For the non-scalar case, they give
different answers. Do people have strong intuitions about what expressions
like
  a * b @ c
  a @ b * c
should do actually? (I'm guessing not, but hey, you never know.)

And, while intuition is useful, it would be really *really* nice to be
basing these decisions on more than *just* intuition, since whatever we
decide will be subtly influencing the experience of writing linear algebra
code in Python for the rest of time. So here's where I could use some help.
First, of course, if you have any other reasons why one or the other of
these options is better, then please share! But second, I think we need to
know something about how often the Mat @ Mat @ vec type cases arise in
practice. How often do non-scalar * and np.dot show up in the same
expression? How often does it look like a * np.dot(b, c), and how often
does it look like np.dot(a * b, c)? How often do we see expressions like
np.dot(np.dot(a, b), c), and how often do we see expressions like np.dot(a,
np.dot(b, c))? This would really help guide the debate. I don't have this
data, and I'm not sure the best way to get it. A super-fancy approach would
be to write a little script that uses the 'ast' module to count things
automatically. A less fancy approach would be to just pick some code you've
written, or a well-known package, grep through for calls to 'dot', and make
notes on what you see. (An advantage of the less-fancy approach is that as
a human you might be able to tell the difference between scalar and
non-scalar *, or check whether it actually matters what order the 'dot'
calls are done in.)

-n

-- 
Nathaniel J. Smith
Postdoctoral researcher - Informatics - University of Edinburgh
http://vorpus.org
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20140315/146498a4/attachment.html>