Re: [Numpy-discussion] NumPy-Discussion Digest, Vol 90, Issue 45
I would like to see the case made for @. Yes, I know that Guido has accepted the idea, but he has changed his mind before. The PEP seems neutral to retaining both np.matrix and @. Nearly ten years ago, Tim Peters <http://legacy.python.org/dev/peps/pep-0020/> gave us: /There should be one-- and preferably only one --obvious way to do it. / W/e now have: / /C= A * B C becomes an instance of the Matrix class (m, p) When A and B are matrices a matrix of (m, n) and (n, p) respectively. Actually, the rules are a little more general than the above. / The PEP proposes that /C= /A @ B where the types or classes of A, B and C are not clear. We also have A.I for the inverse, for the square matrix) or A.T for the transpose of a matrix. One way is recommended in the Zen of Python, of the two, which is the obvious way? Colin W. / / On 15-Mar-2014 9:25 PM, numpy-discussion-request@scipy.org wrote:
Send NumPy-Discussion mailing list submissions to numpy-discussion@scipy.org
To subscribe or unsubscribe via the World Wide Web, visit http://mail.scipy.org/mailman/listinfo/numpy-discussion or, via email, send a message with subject or body 'help' to numpy-discussion-request@scipy.org
You can reach the person managing the list at numpy-discussion-owner@scipy.org
When replying, please edit your Subject line so it is more specific than "Re: Contents of NumPy-Discussion digest..."
Today's Topics:
1. Re: [help needed] associativity and precedence of '@' (josef.pktd@gmail.com) 2. Re: [RFC] should we argue for a matrix power operator, @@? (josef.pktd@gmail.com)
----------------------------------------------------------------------
Message: 1 Date: Sat, 15 Mar 2014 21:20:40 -0400 From: josef.pktd@gmail.com Subject: Re: [Numpy-discussion] [help needed] associativity and precedence of '@' To: Discussion of Numerical Python <numpy-discussion@scipy.org> Message-ID: <CAMMTP+Ahag9fN3XPtS4uDRThBknVXzudc0G8TtJ7G3w3dWbBWw@mail.gmail.com> Content-Type: text/plain; charset="iso-8859-1"
On Fri, Mar 14, 2014 at 11:41 PM, Nathaniel Smith <njs@pobox.com> wrote:
Hi all,
Here's the main blocker for adding a matrix multiply operator '@' to Python: we need to decide what we think its precedence and associativity should be. I'll explain what that means so we're on the same page, and what the choices are, and then we can all argue about it. But even better would be if we could get some data to guide our decision, and this would be a lot easier if some of you all can help; I'll suggest some ways you might be able to do that.
So! Precedence and left- versus right-associativity. If you already know what these are you can skim down until you see CAPITAL LETTERS.
We all know what precedence is. Code like this: a + b * c gets evaluated as: a + (b * c) because * has higher precedence than +. It "binds more tightly", as they say. Python's complete precedence able is here: http://docs.python.org/3/reference/expressions.html#operator-precedence
Associativity, in the parsing sense, is less well known, though it's just as important. It's about deciding how to evaluate code like this: a * b * c Do we use a * (b * c) # * is "right associative" or (a * b) * c # * is "left associative" ? Here all the operators have the same precedence (because, uh... they're the same operator), so precedence doesn't help. And mostly we can ignore this in day-to-day life, because both versions give the same answer, so who cares. But a programming language has to pick one (consider what happens if one of those objects has a non-default __mul__ implementation). And of course it matters a lot for non-associative operations like a - b - c or a / b / c So when figuring out order of evaluations, what you do first is check the precedence, and then if you have multiple operators next to each other with the same precedence, you check their associativity. Notice that this means that if you have different operators that share the same precedence level (like + and -, or * and /), then they have to all have the same associativity. All else being equal, it's generally considered nice to have fewer precedence levels, because these have to be memorized by users.
Right now in Python, every precedence level is left-associative, except for '**'. If you write these formulas without any parentheses, then what the interpreter will actually execute is: (a * b) * c (a - b) - c (a / b) / c but a ** (b ** c)
Okay, that's the background. Here's the question. We need to decide on precedence and associativity for '@'. In particular, there are three different options that are interesting:
OPTION 1 FOR @: Precedence: same as * Associativity: left My shorthand name for it: "same-left" (yes, very creative)
This means that if you don't use parentheses, you get: a @ b @ c -> (a @ b) @ c a * b @ c -> (a * b) @ c a @ b * c -> (a @ b) * c
OPTION 2 FOR @: Precedence: more-weakly-binding than * Associativity: right My shorthand name for it: "weak-right"
This means that if you don't use parentheses, you get: a @ b @ c -> a @ (b @ c) a * b @ c -> (a * b) @ c a @ b * c -> a @ (b * c)
OPTION 3 FOR @: Precedence: more-tightly-binding than * Associativity: right My shorthand name for it: "tight-right"
This means that if you don't use parentheses, you get: a @ b @ c -> a @ (b @ c) a * b @ c -> a * (b @ c) a @ b * c -> (a @ b) * c
We need to pick which of which options we think is best, based on whatever reasons we can think of, ideally more than "hmm, weak-right gives me warm fuzzy feelings" ;-). (In principle the other 2 possible options are tight-left and weak-left, but there doesn't seem to be any argument in favor of either, so we'll leave them out of the discussion.)
Some things to consider:
* and @ are actually not associative (in the math sense) with respect to each other, i.e., (a * b) @ c and a * (b @ c) in general give different results when 'a' is not a scalar. So considering the two expressions 'a * b @ c' and 'a @ b * c', we can see that each of these three options gives produces different results in some cases.
"Same-left" is the easiest to explain and remember, because it's just, "@ acts like * and /". So we already have to know the rule in order to understand other non-associative expressions like a / b / c or a - b - c, and it'd be nice if the same rule applied to things like a * b @ c so we only had to memorize *one* rule. (Of course there's ** which uses the opposite rule, but I guess everyone internalized that one in secondary school; that's not true for * versus @.) This is definitely the default we should choose unless we have a good reason to do otherwise.
BUT: there might indeed be a good reason to do otherwise, which is the whole reason this has come up. Consider: Mat1 @ Mat2 @ vec Obviously this will execute much more quickly if we do Mat1 @ (Mat2 @ vec) because that results in two cheap matrix-vector multiplies, while (Mat1 @ Mat2) @ vec starts out by doing an expensive matrix-matrix multiply. So: maybe @ should be right associative, so that we get the fast behaviour without having to use explicit parentheses! /If/ these kinds of expressions are common enough that having to remember to put explicit parentheses in all the time is more of a programmer burden than having to memorize a special associativity rule for @. Obviously Mat @ Mat @ vec is more common than vec @ Mat @ Mat, but maybe they're both so rare that it doesn't matter in practice -- I don't know.
Also, if we do want @ to be right associative, then I can't think of any clever reasons to prefer weak-right over tight-right, or vice-versa. For the scalar multiplication case, I believe both options produce the same result in the same amount of time. For the non-scalar case, they give different answers. Do people have strong intuitions about what expressions like a * b @ c a @ b * c should do actually? (I'm guessing not, but hey, you never know.)
And, while intuition is useful, it would be really *really* nice to be basing these decisions on more than *just* intuition, since whatever we decide will be subtly influencing the experience of writing linear algebra code in Python for the rest of time. So here's where I could use some help. First, of course, if you have any other reasons why one or the other of these options is better, then please share! But second, I think we need to know something about how often the Mat @ Mat @ vec type cases arise in practice. How often do non-scalar * and np.dot show up in the same expression? How often does it look like a * np.dot(b, c), and how often does it look like np.dot(a * b, c)? How often do we see expressions like np.dot(np.dot(a, b), c), and how often do we see expressions like np.dot(a, np.dot(b, c))? This would really help guide the debate. I don't have this data, and I'm not sure the best way to get it. A super-fancy approach would be to write a little script that uses the 'ast' module to count things automatically. A less fancy approach would be to just pick some code you've written, or a well-known package, grep through for calls to 'dot', and make notes on what you see. (An advantage of the less-fancy approach is that as a human you might be able to tell the difference between scalar and non-scalar *, or check whether it actually matters what order the 'dot' calls are done in.)
-n
-- Nathaniel J. Smith Postdoctoral researcher - Informatics - University of Edinburgh http://vorpus.org
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
I'm in favor of same-left because it's the easiest to remember. with scalar factors it is how I read formulas.
Both calculating dot @ first or calculating elementwise * first sound logical, but I wouldn't know which should go first. (My "feeling" would be @ first.)
two cases I remembered in statsmodels H = np.dot(results.model.pinv_wexog, scale[:,None] * results.model.pinv_wexog.T) se = (exog * np.dot(covb, exog.T).T).sum(1)
we are mixing * and dot pretty freely in all combinations AFAIR
my guess is that I wouldn't trust any sequence without parenthesis for a long time. (and I don't trust a sequence of dots @ without parenthesis either, in our applications.)
x @ (W.T @ W) @ x ( W.shape = (10000, 5) ) or x * (W.T @ W) * x
(w * x) @ x weighted sum of squares
Josef
On Sun, Mar 16, 2014 at 4:37 PM, Colin J. Williams <cjwilliams43@gmail.com> wrote:
I would like to see the case made for @. Yes, I know that Guido has accepted the idea, but he has changed his mind before.
I'm not sure how to usefully respond to this, since, I already wrote a ~20 page document making the case for @? Maybe if you think the arguments in it aren't good, it would be more helpful to explain which ones and why?
The PEP seems neutral to retaining both np.matrix and @.
I'm not sure what gives you this impression. The main point of the whole first section of the PEP is to explain why the existence of np.matrix causes problems and why a substantial majority of developers hate it, and how adding @ will let us solve these problems. Whether we actually get rid of np.matrix is a more complicated question (we'll need sort of compatibility/transition strategy, it will depend on how quickly python versions with @ support are adopted, etc.), but at the very least the goal is that @ eventually replace it in all new code. -n -- Nathaniel J. Smith Postdoctoral researcher - Informatics - University of Edinburgh http://vorpus.org
participants (2)
-
Colin J. Williams
-
Nathaniel Smith