[help needed] associativity and precedence of '@'
Hi all, Here's the main blocker for adding a matrix multiply operator '@' to Python: we need to decide what we think its precedence and associativity should be. I'll explain what that means so we're on the same page, and what the choices are, and then we can all argue about it. But even better would be if we could get some data to guide our decision, and this would be a lot easier if some of you all can help; I'll suggest some ways you might be able to do that. So! Precedence and left- versus right-associativity. If you already know what these are you can skim down until you see CAPITAL LETTERS. We all know what precedence is. Code like this: a + b * c gets evaluated as: a + (b * c) because * has higher precedence than +. It "binds more tightly", as they say. Python's complete precedence able is here: http://docs.python.org/3/reference/expressions.html#operator-precedence Associativity, in the parsing sense, is less well known, though it's just as important. It's about deciding how to evaluate code like this: a * b * c Do we use a * (b * c) # * is "right associative" or (a * b) * c # * is "left associative" ? Here all the operators have the same precedence (because, uh... they're the same operator), so precedence doesn't help. And mostly we can ignore this in day-to-day life, because both versions give the same answer, so who cares. But a programming language has to pick one (consider what happens if one of those objects has a non-default __mul__ implementation). And of course it matters a lot for non-associative operations like a - b - c or a / b / c So when figuring out order of evaluations, what you do first is check the precedence, and then if you have multiple operators next to each other with the same precedence, you check their associativity. Notice that this means that if you have different operators that share the same precedence level (like + and -, or * and /), then they have to all have the same associativity. All else being equal, it's generally considered nice to have fewer precedence levels, because these have to be memorized by users. Right now in Python, every precedence level is left-associative, except for '**'. If you write these formulas without any parentheses, then what the interpreter will actually execute is: (a * b) * c (a - b) - c (a / b) / c but a ** (b ** c) Okay, that's the background. Here's the question. We need to decide on precedence and associativity for '@'. In particular, there are three different options that are interesting: OPTION 1 FOR @: Precedence: same as * Associativity: left My shorthand name for it: "same-left" (yes, very creative) This means that if you don't use parentheses, you get: a @ b @ c -> (a @ b) @ c a * b @ c -> (a * b) @ c a @ b * c -> (a @ b) * c OPTION 2 FOR @: Precedence: more-weakly-binding than * Associativity: right My shorthand name for it: "weak-right" This means that if you don't use parentheses, you get: a @ b @ c -> a @ (b @ c) a * b @ c -> (a * b) @ c a @ b * c -> a @ (b * c) OPTION 3 FOR @: Precedence: more-tightly-binding than * Associativity: right My shorthand name for it: "tight-right" This means that if you don't use parentheses, you get: a @ b @ c -> a @ (b @ c) a * b @ c -> a * (b @ c) a @ b * c -> (a @ b) * c We need to pick which of which options we think is best, based on whatever reasons we can think of, ideally more than "hmm, weak-right gives me warm fuzzy feelings" ;-). (In principle the other 2 possible options are tight-left and weak-left, but there doesn't seem to be any argument in favor of either, so we'll leave them out of the discussion.) Some things to consider: * and @ are actually not associative (in the math sense) with respect to each other, i.e., (a * b) @ c and a * (b @ c) in general give different results when 'a' is not a scalar. So considering the two expressions 'a * b @ c' and 'a @ b * c', we can see that each of these three options gives produces different results in some cases. "Same-left" is the easiest to explain and remember, because it's just, "@ acts like * and /". So we already have to know the rule in order to understand other non-associative expressions like a / b / c or a - b - c, and it'd be nice if the same rule applied to things like a * b @ c so we only had to memorize *one* rule. (Of course there's ** which uses the opposite rule, but I guess everyone internalized that one in secondary school; that's not true for * versus @.) This is definitely the default we should choose unless we have a good reason to do otherwise. BUT: there might indeed be a good reason to do otherwise, which is the whole reason this has come up. Consider: Mat1 @ Mat2 @ vec Obviously this will execute much more quickly if we do Mat1 @ (Mat2 @ vec) because that results in two cheap matrix-vector multiplies, while (Mat1 @ Mat2) @ vec starts out by doing an expensive matrix-matrix multiply. So: maybe @ should be right associative, so that we get the fast behaviour without having to use explicit parentheses! /If/ these kinds of expressions are common enough that having to remember to put explicit parentheses in all the time is more of a programmer burden than having to memorize a special associativity rule for @. Obviously Mat @ Mat @ vec is more common than vec @ Mat @ Mat, but maybe they're both so rare that it doesn't matter in practice -- I don't know. Also, if we do want @ to be right associative, then I can't think of any clever reasons to prefer weak-right over tight-right, or vice-versa. For the scalar multiplication case, I believe both options produce the same result in the same amount of time. For the non-scalar case, they give different answers. Do people have strong intuitions about what expressions like a * b @ c a @ b * c should do actually? (I'm guessing not, but hey, you never know.) And, while intuition is useful, it would be really *really* nice to be basing these decisions on more than *just* intuition, since whatever we decide will be subtly influencing the experience of writing linear algebra code in Python for the rest of time. So here's where I could use some help. First, of course, if you have any other reasons why one or the other of these options is better, then please share! But second, I think we need to know something about how often the Mat @ Mat @ vec type cases arise in practice. How often do non-scalar * and np.dot show up in the same expression? How often does it look like a * np.dot(b, c), and how often does it look like np.dot(a * b, c)? How often do we see expressions like np.dot(np.dot(a, b), c), and how often do we see expressions like np.dot(a, np.dot(b, c))? This would really help guide the debate. I don't have this data, and I'm not sure the best way to get it. A super-fancy approach would be to write a little script that uses the 'ast' module to count things automatically. A less fancy approach would be to just pick some code you've written, or a well-known package, grep through for calls to 'dot', and make notes on what you see. (An advantage of the less-fancy approach is that as a human you might be able to tell the difference between scalar and non-scalar *, or check whether it actually matters what order the 'dot' calls are done in.) -n -- Nathaniel J. Smith Postdoctoral researcher - Informatics - University of Edinburgh http://vorpus.org
Hi all, Let me preface my two cents by saying that I think the best part of @ being accepted is the potential for deprecating the matrix class — the syntactic beauty of infix for matrix multiply is a nice side effect IMHO :) This may be why my basic attitude is: I don’t think it matters very much but I would vote (weakly) for weak-right. Where there is ambiguity, I suspect most practitioners will just put in parentheses anyway — especially with combinations of * and @, where I don’t think there is a natural intuitive precedence relationship. At least, element-wise multiplication is very rare in math/physics texts as an explicitly defined elementary operation so I’d be surprised if anybody had a strong intuition about the precedence of the ‘*’ operator. And the binding order doesn’t matter if it is scalar multiplication. I have quite a bit of code with large matrices where the order of matrix-vector multiplies is an important optimization and I would certainly have a few simpler looking expressions for op @ op @ vec, hence the weak preference for right-associativity. That said, I routinely come across situations where the optimal matrix multiplication order is more complicated than can be expressed as left-right or right-left (because some matrices might be diagonal, CSR or CSC), which is why the preference is only weak. I don’t see a down-side in the use-case that it is actually associative (as in matrix-matrix-vector). Best, Chris -- Chris Laumann Sent with Airmail On March 14, 2014 at 8:42:00 PM, Nathaniel Smith (njs@pobox.com) wrote: Hi all, Here's the main blocker for adding a matrix multiply operator '@' to Python: we need to decide what we think its precedence and associativity should be. I'll explain what that means so we're on the same page, and what the choices are, and then we can all argue about it. But even better would be if we could get some data to guide our decision, and this would be a lot easier if some of you all can help; I'll suggest some ways you might be able to do that. So! Precedence and left- versus right-associativity. If you already know what these are you can skim down until you see CAPITAL LETTERS. We all know what precedence is. Code like this: a + b * c gets evaluated as: a + (b * c) because * has higher precedence than +. It "binds more tightly", as they say. Python's complete precedence able is here: http://docs.python.org/3/reference/expressions.html#operator-precedence Associativity, in the parsing sense, is less well known, though it's just as important. It's about deciding how to evaluate code like this: a * b * c Do we use a * (b * c) # * is "right associative" or (a * b) * c # * is "left associative" ? Here all the operators have the same precedence (because, uh... they're the same operator), so precedence doesn't help. And mostly we can ignore this in day-to-day life, because both versions give the same answer, so who cares. But a programming language has to pick one (consider what happens if one of those objects has a non-default __mul__ implementation). And of course it matters a lot for non-associative operations like a - b - c or a / b / c So when figuring out order of evaluations, what you do first is check the precedence, and then if you have multiple operators next to each other with the same precedence, you check their associativity. Notice that this means that if you have different operators that share the same precedence level (like + and -, or * and /), then they have to all have the same associativity. All else being equal, it's generally considered nice to have fewer precedence levels, because these have to be memorized by users. Right now in Python, every precedence level is left-associative, except for '**'. If you write these formulas without any parentheses, then what the interpreter will actually execute is: (a * b) * c (a - b) - c (a / b) / c but a ** (b ** c) Okay, that's the background. Here's the question. We need to decide on precedence and associativity for '@'. In particular, there are three different options that are interesting: OPTION 1 FOR @: Precedence: same as * Associativity: left My shorthand name for it: "same-left" (yes, very creative) This means that if you don't use parentheses, you get: a @ b @ c -> (a @ b) @ c a * b @ c -> (a * b) @ c a @ b * c -> (a @ b) * c OPTION 2 FOR @: Precedence: more-weakly-binding than * Associativity: right My shorthand name for it: "weak-right" This means that if you don't use parentheses, you get: a @ b @ c -> a @ (b @ c) a * b @ c -> (a * b) @ c a @ b * c -> a @ (b * c) OPTION 3 FOR @: Precedence: more-tightly-binding than * Associativity: right My shorthand name for it: "tight-right" This means that if you don't use parentheses, you get: a @ b @ c -> a @ (b @ c) a * b @ c -> a * (b @ c) a @ b * c -> (a @ b) * c We need to pick which of which options we think is best, based on whatever reasons we can think of, ideally more than "hmm, weak-right gives me warm fuzzy feelings" ;-). (In principle the other 2 possible options are tight-left and weak-left, but there doesn't seem to be any argument in favor of either, so we'll leave them out of the discussion.) Some things to consider: * and @ are actually not associative (in the math sense) with respect to each other, i.e., (a * b) @ c and a * (b @ c) in general give different results when 'a' is not a scalar. So considering the two expressions 'a * b @ c' and 'a @ b * c', we can see that each of these three options gives produces different results in some cases. "Same-left" is the easiest to explain and remember, because it's just, "@ acts like * and /". So we already have to know the rule in order to understand other non-associative expressions like a / b / c or a - b - c, and it'd be nice if the same rule applied to things like a * b @ c so we only had to memorize *one* rule. (Of course there's ** which uses the opposite rule, but I guess everyone internalized that one in secondary school; that's not true for * versus @.) This is definitely the default we should choose unless we have a good reason to do otherwise. BUT: there might indeed be a good reason to do otherwise, which is the whole reason this has come up. Consider: Mat1 @ Mat2 @ vec Obviously this will execute much more quickly if we do Mat1 @ (Mat2 @ vec) because that results in two cheap matrix-vector multiplies, while (Mat1 @ Mat2) @ vec starts out by doing an expensive matrix-matrix multiply. So: maybe @ should be right associative, so that we get the fast behaviour without having to use explicit parentheses! /If/ these kinds of expressions are common enough that having to remember to put explicit parentheses in all the time is more of a programmer burden than having to memorize a special associativity rule for @. Obviously Mat @ Mat @ vec is more common than vec @ Mat @ Mat, but maybe they're both so rare that it doesn't matter in practice -- I don't know. Also, if we do want @ to be right associative, then I can't think of any clever reasons to prefer weak-right over tight-right, or vice-versa. For the scalar multiplication case, I believe both options produce the same result in the same amount of time. For the non-scalar case, they give different answers. Do people have strong intuitions about what expressions like a * b @ c a @ b * c should do actually? (I'm guessing not, but hey, you never know.) And, while intuition is useful, it would be really *really* nice to be basing these decisions on more than *just* intuition, since whatever we decide will be subtly influencing the experience of writing linear algebra code in Python for the rest of time. So here's where I could use some help. First, of course, if you have any other reasons why one or the other of these options is better, then please share! But second, I think we need to know something about how often the Mat @ Mat @ vec type cases arise in practice. How often do non-scalar * and np.dot show up in the same expression? How often does it look like a * np.dot(b, c), and how often does it look like np.dot(a * b, c)? How often do we see expressions like np.dot(np.dot(a, b), c), and how often do we see expressions like np.dot(a, np.dot(b, c))? This would really help guide the debate. I don't have this data, and I'm not sure the best way to get it. A super-fancy approach would be to write a little script that uses the 'ast' module to count things automatically. A less fancy approach would be to just pick some code you've written, or a well-known package, grep through for calls to 'dot', and make notes on what you see. (An advantage of the less-fancy approach is that as a human you might be able to tell the difference between scalar and non-scalar *, or check whether it actually matters what order the 'dot' calls are done in.) -n -- Nathaniel J. Smith Postdoctoral researcher - Informatics - University of Edinburgh http://vorpus.org _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
On Fri, Mar 14, 2014 at 9:15 PM, Chris Laumann <chris.laumann@gmail.com>wrote:
Hi all,
Let me preface my two cents by saying that I think the best part of @ being accepted is the potential for deprecating the matrix class -- the syntactic beauty of infix for matrix multiply is a nice side effect IMHO :) This may be why my basic attitude is:
I don't think it matters very much but I would vote (weakly) for weak-right. Where there is ambiguity, I suspect most practitioners will just put in parentheses anyway -- especially with combinations of * and @, where I don't think there is a natural intuitive precedence relationship.
At least, element-wise multiplication is very rare in math/physics texts as
an explicitly defined elementary operation so I'd be surprised if anybody had a strong intuition about the precedence of the '*' operator.
My take on this is that if you mix * and @, you are probably using * to build the matrices you want to __matmul__ with @. So weak-right would be the way to go from that point of view. Jaime
Hi Chris, On Sat, Mar 15, 2014 at 4:15 AM, Chris Laumann <chris.laumann@gmail.com> wrote:
Hi all,
Let me preface my two cents by saying that I think the best part of @ being accepted is the potential for deprecating the matrix class — the syntactic beauty of infix for matrix multiply is a nice side effect IMHO :) This may be why my basic attitude is:
I don’t think it matters very much but I would vote (weakly) for weak-right. Where there is ambiguity, I suspect most practitioners will just put in parentheses anyway — especially with combinations of * and @, where I don’t think there is a natural intuitive precedence relationship. At least, element-wise multiplication is very rare in math/physics texts as an explicitly defined elementary operation so I’d be surprised if anybody had a strong intuition about the precedence of the ‘*’ operator. And the binding order doesn’t matter if it is scalar multiplication.
"It doesn't matter" and "no-one has strong intuitions" are generally arguments for same-left, since that allows everyone to reason about @ in the same way they reason about all of Python's operators.
I have quite a bit of code with large matrices where the order of matrix-vector multiplies is an important optimization and I would certainly have a few simpler looking expressions for op @ op @ vec, hence the weak preference for right-associativity. That said, I routinely come across situations where the optimal matrix multiplication order is more complicated than can be expressed as left-right or right-left (because some matrices might be diagonal, CSR or CSC), which is why the preference is only weak. I don’t see a down-side in the use-case that it is actually associative (as in matrix-matrix-vector).
Would you mind taking a more systematic look through this code, or sharing some examples so the rest of us can look? "Certainly have a few simpler looking expressions" is a good start, but when we're talking about changing the grammar of one of the most popular programming languages in the world it seems worth the effort to gather some more careful data :-). -n -- Nathaniel J. Smith Postdoctoral researcher - Informatics - University of Edinburgh http://vorpus.org
I tend to favor tight-right. The general scheme of precedence more or less puts "heavier" operations higher than "lighter" operations (+ < * < **) and @ is "heavier" than * in my mind. I think tight (either -right or -left) has a good correspondence with current dot() expressions, so it will make translation a bit more straightforward, IMO. s * dot(A, b) == s * A @ b dot(s * A, b) == (s * A) @ b On Sat, Mar 15, 2014 at 3:41 AM, Nathaniel Smith <njs@pobox.com> wrote:
Hi all,
Here's the main blocker for adding a matrix multiply operator '@' to Python: we need to decide what we think its precedence and associativity should be. I'll explain what that means so we're on the same page, and what the choices are, and then we can all argue about it. But even better would be if we could get some data to guide our decision, and this would be a lot easier if some of you all can help; I'll suggest some ways you might be able to do that.
So! Precedence and left- versus right-associativity. If you already know what these are you can skim down until you see CAPITAL LETTERS.
We all know what precedence is. Code like this: a + b * c gets evaluated as: a + (b * c) because * has higher precedence than +. It "binds more tightly", as they say. Python's complete precedence able is here: http://docs.python.org/3/reference/expressions.html#operator-precedence
Associativity, in the parsing sense, is less well known, though it's just as important. It's about deciding how to evaluate code like this: a * b * c Do we use a * (b * c) # * is "right associative" or (a * b) * c # * is "left associative" ? Here all the operators have the same precedence (because, uh... they're the same operator), so precedence doesn't help. And mostly we can ignore this in day-to-day life, because both versions give the same answer, so who cares. But a programming language has to pick one (consider what happens if one of those objects has a non-default __mul__ implementation). And of course it matters a lot for non-associative operations like a - b - c or a / b / c So when figuring out order of evaluations, what you do first is check the precedence, and then if you have multiple operators next to each other with the same precedence, you check their associativity. Notice that this means that if you have different operators that share the same precedence level (like + and -, or * and /), then they have to all have the same associativity. All else being equal, it's generally considered nice to have fewer precedence levels, because these have to be memorized by users.
Right now in Python, every precedence level is left-associative, except for '**'. If you write these formulas without any parentheses, then what the interpreter will actually execute is: (a * b) * c (a - b) - c (a / b) / c but a ** (b ** c)
Okay, that's the background. Here's the question. We need to decide on precedence and associativity for '@'. In particular, there are three different options that are interesting:
OPTION 1 FOR @: Precedence: same as * Associativity: left My shorthand name for it: "same-left" (yes, very creative)
This means that if you don't use parentheses, you get: a @ b @ c -> (a @ b) @ c a * b @ c -> (a * b) @ c a @ b * c -> (a @ b) * c
OPTION 2 FOR @: Precedence: more-weakly-binding than * Associativity: right My shorthand name for it: "weak-right"
This means that if you don't use parentheses, you get: a @ b @ c -> a @ (b @ c) a * b @ c -> (a * b) @ c a @ b * c -> a @ (b * c)
OPTION 3 FOR @: Precedence: more-tightly-binding than * Associativity: right My shorthand name for it: "tight-right"
This means that if you don't use parentheses, you get: a @ b @ c -> a @ (b @ c) a * b @ c -> a * (b @ c) a @ b * c -> (a @ b) * c
We need to pick which of which options we think is best, based on whatever reasons we can think of, ideally more than "hmm, weak-right gives me warm fuzzy feelings" ;-). (In principle the other 2 possible options are tight-left and weak-left, but there doesn't seem to be any argument in favor of either, so we'll leave them out of the discussion.)
Some things to consider:
* and @ are actually not associative (in the math sense) with respect to each other, i.e., (a * b) @ c and a * (b @ c) in general give different results when 'a' is not a scalar. So considering the two expressions 'a * b @ c' and 'a @ b * c', we can see that each of these three options gives produces different results in some cases.
"Same-left" is the easiest to explain and remember, because it's just, "@ acts like * and /". So we already have to know the rule in order to understand other non-associative expressions like a / b / c or a - b - c, and it'd be nice if the same rule applied to things like a * b @ c so we only had to memorize *one* rule. (Of course there's ** which uses the opposite rule, but I guess everyone internalized that one in secondary school; that's not true for * versus @.) This is definitely the default we should choose unless we have a good reason to do otherwise.
BUT: there might indeed be a good reason to do otherwise, which is the whole reason this has come up. Consider: Mat1 @ Mat2 @ vec Obviously this will execute much more quickly if we do Mat1 @ (Mat2 @ vec) because that results in two cheap matrix-vector multiplies, while (Mat1 @ Mat2) @ vec starts out by doing an expensive matrix-matrix multiply. So: maybe @ should be right associative, so that we get the fast behaviour without having to use explicit parentheses! /If/ these kinds of expressions are common enough that having to remember to put explicit parentheses in all the time is more of a programmer burden than having to memorize a special associativity rule for @. Obviously Mat @ Mat @ vec is more common than vec @ Mat @ Mat, but maybe they're both so rare that it doesn't matter in practice -- I don't know.
Also, if we do want @ to be right associative, then I can't think of any clever reasons to prefer weak-right over tight-right, or vice-versa. For the scalar multiplication case, I believe both options produce the same result in the same amount of time. For the non-scalar case, they give different answers. Do people have strong intuitions about what expressions like a * b @ c a @ b * c should do actually? (I'm guessing not, but hey, you never know.)
And, while intuition is useful, it would be really *really* nice to be basing these decisions on more than *just* intuition, since whatever we decide will be subtly influencing the experience of writing linear algebra code in Python for the rest of time. So here's where I could use some help. First, of course, if you have any other reasons why one or the other of these options is better, then please share! But second, I think we need to know something about how often the Mat @ Mat @ vec type cases arise in practice. How often do non-scalar * and np.dot show up in the same expression? How often does it look like a * np.dot(b, c), and how often does it look like np.dot(a * b, c)? How often do we see expressions like np.dot(np.dot(a, b), c), and how often do we see expressions like np.dot(a, np.dot(b, c))? This would really help guide the debate. I don't have this data, and I'm not sure the best way to get it. A super-fancy approach would be to write a little script that uses the 'ast' module to count things automatically. A less fancy approach would be to just pick some code you've written, or a well-known package, grep through for calls to 'dot', and make notes on what you see. (An advantage of the less-fancy approach is that as a human you might be able to tell the difference between scalar and non-scalar *, or check whether it actually matters what order the 'dot' calls are done in.)
-n
-- Nathaniel J. Smith Postdoctoral researcher - Informatics - University of Edinburgh http://vorpus.org
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
-- Robert Kern
On Sat, Mar 15, 2014 at 11:44 AM, Robert Kern <robert.kern@gmail.com> wrote:
I tend to favor tight-right. The general scheme of precedence more or less puts "heavier" operations higher than "lighter" operations (+ < * < **) and @ is "heavier" than * in my mind. I think tight (either -right or -left) has a good correspondence with current dot() expressions, so it will make translation a bit more straightforward, IMO.
s * dot(A, b) == s * A @ b dot(s * A, b) == (s * A) @ b
I'm not sure if this is a convincing argument, but I'll throw it out there: in most of my programming fonts, @ is a bigger, more visually salient character than *, so in any expression that mixes the two, my visual system is going to end up grouping the @-connected terms anyways. -- Robert Kern
I favor the weak right option. 1) Giving '*' higher precedence than `@` makes it easier, to my mind, to parse out what is going to happen: all the element-wise multiplications, followed by the matrix operations. I'd probably still use parenthesis for clarity. 2) Right associative has the advantage of efficiency in many common use cases, plus I tend to read matrix expressions from right to left. Chuck
On Sat, Mar 15, 2014 at 2:49 PM, Charles R Harris <charlesr.harris@gmail.com> wrote:
I favor the weak right option.
1) Giving '*' higher precedence than `@` makes it easier, to my mind, to parse out what is going to happen: all the element-wise multiplications, followed by the matrix operations. I'd probably still use parenthesis for clarity.
It seems to me that 'tight' gives the same benefit. Any reasoning for the preference of 'weak' over 'tight'? -- Robert Kern
On Sat, Mar 15, 2014 at 9:58 AM, Robert Kern <robert.kern@gmail.com> wrote:
On Sat, Mar 15, 2014 at 2:49 PM, Charles R Harris <charlesr.harris@gmail.com> wrote:
I favor the weak right option.
1) Giving '*' higher precedence than `@` makes it easier, to my mind, to parse out what is going to happen: all the element-wise multiplications, followed by the matrix operations. I'd probably still use parenthesis for clarity.
It seems to me that 'tight' gives the same benefit. Any reasoning for the preference of 'weak' over 'tight'?
Two other reasons come to mind. First, '*' is right associative, so I think it is nicer to first view the expression as parsed into blocks separated by '@', which act somewhat like parenthesis at that point, and then evaluate the blocks. Second, and somewhat weaker, it might make it easier to track how arrays are broadcast. Chuck
Oops, make that '*' is *left* associative. <snip> Chuck
On Sat, Mar 15, 2014 at 4:40 PM, Charles R Harris <charlesr.harris@gmail.com> wrote:
On Sat, Mar 15, 2014 at 9:58 AM, Robert Kern <robert.kern@gmail.com> wrote:
On Sat, Mar 15, 2014 at 2:49 PM, Charles R Harris <charlesr.harris@gmail.com> wrote:
I favor the weak right option.
1) Giving '*' higher precedence than `@` makes it easier, to my mind, to parse out what is going to happen: all the element-wise multiplications, followed by the matrix operations. I'd probably still use parenthesis for clarity.
It seems to me that 'tight' gives the same benefit. Any reasoning for the preference of 'weak' over 'tight'?
Two other reasons come to mind. First, '*' is right associative, so I think it is nicer to first view the expression as parsed into blocks separated by '@', which act somewhat like parenthesis at that point, and then evaluate the blocks.
Again, I think tight does the same amount of separation, just with blocks of matrix multiplication broken up by elementwise multiplication; this is just an argument against 'same', not for 'weak' over 'tight'. As I mentioned elsewhere, my visual system seems to break things up with @-tight anyways. Does it not for you?
Second, and somewhat weaker, it might make it easier to track how arrays are broadcast.
I think this point is going to ultimately determine this, if we can break down all the cases and if they actually do favor one choice over another. -- Robert Kern
On Fri, Mar 14, 2014 at 11:41 PM, Nathaniel Smith <njs@pobox.com> wrote:
Here's the main blocker for adding a matrix multiply operator '@' to Python: we need to decide what we think its precedence and associativity should be.
I am not ready to form my own opinion, but I hope the following will help shaping the discussion. Currently, [1], Python operator precedence is +, -Addition and subtraction*, /, //, %Multiplication, division, remainder [5] <http://docs.python.org/3/reference/expressions.html#id20>+x, -x, ~xPositive, negative, bitwise NOT**Exponentiation [6]<http://docs.python.org/3/reference/expressions.html#id21> x[index], x[index:index], x(arguments...), x.attributeSubscription, slicing, call, attribute reference We need to decide whether @ belongs to one of the existing row or deserves one of its own. The associativity debate is one of those debates [2] where there is no right answer. Guido has very wisely left it for the numeric community to decide. I would start with surveying the prior art of using right associativity and the reasons it was chosen and see if those reasons apply. (An example of a choice made for wrong reasons is our decimal system. We write our numbers backwards - from high to low place value - only because we took them from people who write text from right to left. As a result, computer parsers have to skip to the last or count the number of digits before they can start evaluating the number.) Here is the start: 1. APL uses right to left associativity for all operators and all operators have the same precedence. 2. Exponentiation operator is right associative in most languages with MATLAB being a notable exception. [1] http://docs.python.org/3/reference/expressions.html#evaluation-order [2] http://en.wikipedia.org/wiki/Lilliput_and_Blefuscu [3] http://www.tcl.tk/cgi-bin/tct/tip/274.html
On Sat, Mar 15, 2014 at 2:25 PM, Alexander Belopolsky <ndarray@mac.com>wrote:
On Fri, Mar 14, 2014 at 11:41 PM, Nathaniel Smith <njs@pobox.com> wrote:
Here's the main blocker for adding a matrix multiply operator '@' to Python: we need to decide what we think its precedence and associativity should be.
I am not ready to form my own opinion, but I hope the following will help shaping the discussion.
One more question that I think should be answered by the PEP and may influence the associativity decision is what happens if in an A @ B @ C expression, each operand has its own type that defines __matmul__ and __rmatmul__? For example, A can be an ndarray, B a sympy expression and C a pyoperator.
On Sat, Mar 15, 2014 at 1:01 PM, Alexander Belopolsky <ndarray@mac.com>wrote:
On Sat, Mar 15, 2014 at 2:25 PM, Alexander Belopolsky <ndarray@mac.com>wrote:
On Fri, Mar 14, 2014 at 11:41 PM, Nathaniel Smith <njs@pobox.com> wrote:
Here's the main blocker for adding a matrix multiply operator '@' to Python: we need to decide what we think its precedence and associativity should be.
I am not ready to form my own opinion, but I hope the following will help shaping the discussion.
One more question that I think should be answered by the PEP and may influence the associativity decision is what happens if in an A @ B @ C expression, each operand has its own type that defines __matmul__ and __rmatmul__? For example, A can be an ndarray, B a sympy expression and C a pyoperator.
My impression is that the pyoperator folks would prefer right associativity as it corresponds to function composition, which also proceeds right to left. I don't think the sympy folks have expressed and opinion, except perhaps that they are more in the sage camp where matrices are symbolic, and not to be confused with arrays. That is, they don't depend on having two operators, one for the Hadamard product and another for the matrix product. Chuck
On Sat, Mar 15, 2014 at 7:01 PM, Alexander Belopolsky <ndarray@mac.com> wrote:
On Sat, Mar 15, 2014 at 2:25 PM, Alexander Belopolsky <ndarray@mac.com> wrote:
On Fri, Mar 14, 2014 at 11:41 PM, Nathaniel Smith <njs@pobox.com> wrote:
Here's the main blocker for adding a matrix multiply operator '@' to Python: we need to decide what we think its precedence and associativity should be.
I am not ready to form my own opinion, but I hope the following will help shaping the discussion.
One more question that I think should be answered by the PEP and may influence the associativity decision is what happens if in an A @ B @ C expression, each operand has its own type that defines __matmul__ and __rmatmul__? For example, A can be an ndarray, B a sympy expression and C a pyoperator.
The general rule in Python is that in a binary operation A # B, then first we try A.__special__, and if that doesn't exist or it returns NotImplemented, then we try B.__rspecial__. (The exception is that if B.__class__ is a proper subclass of A.__class__, then we do it in the reverse order.) -- Nathaniel J. Smith Postdoctoral researcher - Informatics - University of Edinburgh http://vorpus.org
On Mon, Mar 17, 2014 at 11:48 AM, Nathaniel Smith <njs@pobox.com> wrote:
One more question that I think should be answered by the PEP and may influence the associativity decision is what happens if in an A @ B @ C expression, each operand has its own type that defines __matmul__ and __rmatmul__? For example, A can be an ndarray, B a sympy expression and C a pyoperator.
The general rule in Python is that in a binary operation A # B, then first we try A.__special__, and if that doesn't exist or it returns NotImplemented, then we try B.__rspecial__. (The exception is that if B.__class__ is a proper subclass of A.__class__, then we do it in the reverse order.)
This is the simple case. My question was: "what happens if in an A @ B @ C expression, each operand has its own type that defines __matmul__ and __rmatmul__?" Are we going to recommend that other projects adopt numpy's __array_priority__? In mixed-type expressions, do you expect A @ B @ C to have type of A, B, or C? Does __matmul__ first then __rmatmul__ rule makes sense if @ becomes right-associative or should the order be reversed?
On Mon, Mar 17, 2014 at 4:09 PM, Alexander Belopolsky <ndarray@mac.com> wrote:
On Mon, Mar 17, 2014 at 11:48 AM, Nathaniel Smith <njs@pobox.com> wrote:
One more question that I think should be answered by the PEP and may influence the associativity decision is what happens if in an A @ B @ C expression, each operand has its own type that defines __matmul__ and __rmatmul__? For example, A can be an ndarray, B a sympy expression and C a pyoperator.
The general rule in Python is that in a binary operation A # B, then first we try A.__special__, and if that doesn't exist or it returns NotImplemented, then we try B.__rspecial__. (The exception is that if B.__class__ is a proper subclass of A.__class__, then we do it in the reverse order.)
This is the simple case. My question was: "what happens if in an A @ B @ C expression, each operand has its own type that defines __matmul__ and __rmatmul__?"
The point of associativity is that the complex case A @ B @ C gets turned into either A @ (B @ C) or else (A @ B) @ C, and then you're back in the simple case.
Are we going to recommend that other projects adopt numpy's __array_priority__?
In mixed-type expressions, do you expect A @ B @ C to have type of A, B, or C?
Does __matmul__ first then __rmatmul__ rule makes sense if @ becomes right-associative or should the order be reversed?
** is right-associative and uses the left-then-right rule, so it seems fine to me. In general the left-then-right rule has no particular logic behind it, it's just chosen so as to have *some* rule. In practice all well-behaved classes have to make sure that they implement __special__ methods in such a way that all the different variations work, no matter which class ends up actually handling the operation. -- Nathaniel J. Smith Postdoctoral researcher - Informatics - University of Edinburgh http://vorpus.org
On Mon, Mar 17, 2014 at 12:13 PM, Nathaniel Smith <njs@pobox.com> wrote:
In practice all well-behaved classes have to make sure that they implement __special__ methods in such a way that all the different variations work, no matter which class ends up actually handling the operation.
"Well-behaved classes" are hard to come by in practice. The @ operator may fix the situation with np.matrix, so take a look at MaskedArray with its 40-line __array_wrap__ and no end of bugs. Requiring superclass __method__ to handle creation of subclass results correctly is turning Liskov principle on its head. With enough clever tricks and tight control over the full class hierarchy you can make it work in some cases, but it is not a good design. I am afraid that making @ special among other binary operators that implement mathematically associative operations will create a lot of confusion. (The pow operator is special because the corresponding mathematical operation is non-associative.) Imagine teaching someone that a % b % c = (a % b) % c, but a @ b @ c = a @ (b @ c). What are the chances that they will correctly figure out what a // b // c means after this?
On Mon, Mar 17, 2014 at 12:50 PM, Alexander Belopolsky <ndarray@mac.com>wrote:
On Mon, Mar 17, 2014 at 12:13 PM, Nathaniel Smith <njs@pobox.com> wrote:
In practice all well-behaved classes have to make sure that they implement __special__ methods in such a way that all the different variations work, no matter which class ends up actually handling the operation.
"Well-behaved classes" are hard to come by in practice. The @ operator may fix the situation with np.matrix, so take a look at MaskedArray with its 40-line __array_wrap__ and no end of bugs.
Requiring superclass __method__ to handle creation of subclass results correctly is turning Liskov principle on its head. With enough clever tricks and tight control over the full class hierarchy you can make it work in some cases, but it is not a good design.
I am afraid that making @ special among other binary operators that implement mathematically associative operations will create a lot of confusion. (The pow operator is special because the corresponding mathematical operation is non-associative.)
Imagine teaching someone that a % b % c = (a % b) % c, but a @ b @ c = a @ (b @ c). What are the chances that they will correctly figure out what a // b // c means after this?
a.shape (100,) 1. * a.dot(a) -98.0 (1.*a).dot(a) 328350.0 a.dtype
One case where we need to keep track of left or right is type promotion dtype('int8')
1. * a @ a ???
similar to
1. * 2 / 3 0.6666666666666666 1. * (2 / 3) # I'm not in the `future` 0.0
Josef
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
1. * a.dot(a)
-98.0
On Mon, Mar 17, 2014 at 1:18 PM, <josef.pktd@gmail.com> wrote:
On Mon, Mar 17, 2014 at 12:50 PM, Alexander Belopolsky <ndarray@mac.com>wrote:
On Mon, Mar 17, 2014 at 12:13 PM, Nathaniel Smith <njs@pobox.com> wrote:
In practice all well-behaved classes have to make sure that they implement __special__ methods in such a way that all the different variations work, no matter which class ends up actually handling the operation.
"Well-behaved classes" are hard to come by in practice. The @ operator may fix the situation with np.matrix, so take a look at MaskedArray with its 40-line __array_wrap__ and no end of bugs.
Requiring superclass __method__ to handle creation of subclass results correctly is turning Liskov principle on its head. With enough clever tricks and tight control over the full class hierarchy you can make it work in some cases, but it is not a good design.
I am afraid that making @ special among other binary operators that implement mathematically associative operations will create a lot of confusion. (The pow operator is special because the corresponding mathematical operation is non-associative.)
Imagine teaching someone that a % b % c = (a % b) % c, but a @ b @ c = a @ (b @ c). What are the chances that they will correctly figure out what a // b // c means after this?
One case where we need to keep track of left or right is type promotion
a.shape (100,) 1. * a.dot(a) -98.0 (1.*a).dot(a) 328350.0 a.dtype dtype('int8')
1. * a @ a ???
similar to
1. * 2 / 3 0.6666666666666666 1. * (2 / 3) # I'm not in the `future` 0.0
I thought of sending a message with I'm +-1 on either, but I'm not I'm again in favor of "left", because it's the simplest to understand A.dot(B).dot(C) with some * mixed in I understand now the computational argument in favor of right x @ inv(x.T @ x) @ x.T @ y ( with shapes T,k k,k k,T T,1 ) or x @ pinv(x) @ y (with shapes T,k k,T T,1 ) with with T>>k (last 1 could be a m>1 with T>>m) However, we don't write code like that most of the time. Alan's students won't care much if some intermediate arrays blow up. In library code like in statsmodels it's almost always a conscious choice of where to set the parenthesis and, more often, which part of a long array expression is taken out as a temporary or permanent variable. I think almost the only uses of chain_dot(A, B, C) (which is "right") is for quadratic forms xtxi = pinv(np.dot(exog.T, exog)) # k,k xtdx = np.dot(exog.T * d[np.newaxis, :], exog) # k,k vcov = chain_dot(xtxi, xtdx, xtxi) # kk, kk, kk (from Quantreg) I think optimizing this way is relatively easy On the other hand, I worry a lot more about messy cases with different dtypes or different classes involved as Alexander has pointed out. Cases that might trip up medium to medium-advanced numpy users. (Let's see, I have to read @ back to front, and * front to back, and why did I put a sparse matrix in the middle and a masked array at the end. Oh no, that's not a masked array it's a panda.) compared to (Somewhere there is a mistake, let's go through all terms from the beginning to the end) Josef
Josef
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
1. * a.dot(a)
-98.0
On Mon, Mar 17, 2014 at 2:55 PM, <josef.pktd@gmail.com> wrote:
I'm again in favor of "left", because it's the simplest to understand A.dot(B).dot(C)
+1 Note that for many years to come the best option for repeated matrix product will be A.dot(B).dot(C) ... People who convert their dot(dot(dot('s to more readable method call syntax now should not be forced to change the order or add parentheses when they switch to @. (Full disclosure: I am one of those people having recently converted a large Numeric-based project to NumPy.)
On Mon, Mar 17, 2014 at 3:48 PM, Nathaniel Smith <njs@pobox.com> wrote:
On Sat, Mar 15, 2014 at 7:01 PM, Alexander Belopolsky <ndarray@mac.com> wrote:
One more question that I think should be answered by the PEP and may influence the associativity decision is what happens if in an A @ B @ C expression, each operand has its own type that defines __matmul__ and __rmatmul__? For example, A can be an ndarray, B a sympy expression and C a pyoperator.
The general rule in Python is that in a binary operation A # B, then first we try A.__special__, and if that doesn't exist or it returns NotImplemented, then we try B.__rspecial__. (The exception is that if B.__class__ is a proper subclass of A.__class__, then we do it in the reverse order.)
Assuming that all combinations are possible and give no error: A @ B @ C == A.__matmul__(B.__matmul__(C)) # right A @ B @ C == A.__matmul__(B).__matmul__(C) # left Did you want to specify which permutations of X.__matmul__(Y) return NotImplemented? -- Robert Kern
On Sat, Mar 15, 2014 at 3:41 AM, Nathaniel Smith <njs@pobox.com> wrote:
Hi all,
Here's the main blocker for adding a matrix multiply operator '@' to Python: we need to decide what we think its precedence and associativity should be.
Another data point that might be useful: Matlab: same-left R: tight-left IDL: same-left GAUSS: same-left (IIUC -- any GAUSS experts please correct me if I misunderstood the fine manual) Mathematica: instead of having an associativity, a @ b @ c gets converted into mdot([a, b, c]) -- Nathaniel J. Smith Postdoctoral researcher - Informatics - University of Edinburgh http://vorpus.org
On Sat, Mar 15, 2014 at 1:28 PM, Nathaniel Smith <njs@pobox.com> wrote:
On Sat, Mar 15, 2014 at 3:41 AM, Nathaniel Smith <njs@pobox.com> wrote:
Hi all,
Here's the main blocker for adding a matrix multiply operator '@' to Python: we need to decide what we think its precedence and associativity should be.
Another data point that might be useful:
Matlab: same-left
R: tight-left
I was going to ask this earlier, but I was worried I was missing something major. Why was "tight-left" not an option? This means that if you don't use parentheses, you get: a @ b @ c -> (a @ b) @ c a * b @ c -> a * (b @ c) a @ b * c -> (a @ b) * c In my (very inexperienced) opinion, it seems like the most intuitive option. Cheers, -Joe
IDL: same-left
GAUSS: same-left (IIUC -- any GAUSS experts please correct me if I misunderstood the fine manual)
Mathematica: instead of having an associativity, a @ b @ c gets converted into mdot([a, b, c])
-- Nathaniel J. Smith Postdoctoral researcher - Informatics - University of Edinburgh http://vorpus.org _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
On Sat, Mar 15, 2014 at 6:33 PM, Joe Kington <joferkington@gmail.com> wrote:
On Sat, Mar 15, 2014 at 1:28 PM, Nathaniel Smith <njs@pobox.com> wrote:
On Sat, Mar 15, 2014 at 3:41 AM, Nathaniel Smith <njs@pobox.com> wrote:
Hi all,
Here's the main blocker for adding a matrix multiply operator '@' to Python: we need to decide what we think its precedence and associativity should be.
Another data point that might be useful:
Matlab: same-left
R: tight-left
I was going to ask this earlier, but I was worried I was missing something major.
Why was "tight-left" not an option?
This means that if you don't use parentheses, you get: a @ b @ c -> (a @ b) @ c a * b @ c -> a * (b @ c) a @ b * c -> (a @ b) * c
In my (very inexperienced) opinion, it seems like the most intuitive option.
Because tight-left doesn't seem to have much to recommend it over same-left, and all else being equal having fewer levels of precedence is usually considered a good thing. Unless I'm missing something. If we do decide that tight-left is best then we could certainly advocate for it. I wouldn't read too much into R's choice; they don't actually define a separate precedence level for matrix multiplication specifically. They have a single precedence level for all "special" (user-defined) operators, and matrix multiplication happens to be one of these. (Their versions of // and % are also "special", but I don't think anyone would expect // to bind more tightly than / if one were choosing precedences on a case-by-case basis.) -n -- Nathaniel J. Smith Postdoctoral researcher - Informatics - University of Edinburgh http://vorpus.org
On Sat, Mar 15, 2014 at 12:40 PM, Nathaniel Smith <njs@pobox.com> wrote:
On Sat, Mar 15, 2014 at 1:28 PM, Nathaniel Smith <njs@pobox.com> wrote:
On Sat, Mar 15, 2014 at 3:41 AM, Nathaniel Smith <njs@pobox.com> wrote:
Hi all,
Here's the main blocker for adding a matrix multiply operator '@' to Python: we need to decide what we think its precedence and associativity
should
be.
Another data point that might be useful:
Matlab: same-left
R: tight-left
I was going to ask this earlier, but I was worried I was missing something major.
Why was "tight-left" not an option?
This means that if you don't use parentheses, you get: a @ b @ c -> (a @ b) @ c a * b @ c -> a * (b @ c) a @ b * c -> (a @ b) * c
In my (very inexperienced) opinion, it seems like the most intuitive
On Sat, Mar 15, 2014 at 6:33 PM, Joe Kington <joferkington@gmail.com> wrote: option.
Because tight-left doesn't seem to have much to recommend it over same-left, and all else being equal having fewer levels of precedence is usually considered a good thing. Unless I'm missing something. If we do decide that tight-left is best then we could certainly advocate for it.
I wouldn't read too much into R's choice; they don't actually define a separate precedence level for matrix multiplication specifically. They have a single precedence level for all "special" (user-defined) operators, and matrix multiplication happens to be one of these. (Their versions of // and % are also "special", but I don't think anyone would expect // to bind more tightly than / if one were choosing precedences on a case-by-case basis.)
Just to throw something new into the mix u@v@w = u@(v@w) -- u@v is a dyadic matrix u@v -- is a scalar It would be nice if u@v@None, or some such, would evaluate as a dyad. Or else we will still need the concept of row and column 1-D matrices. I still think v.T should set a flag so that one can distinguish u@v.T (dyad) from u.T@v (inner product), where 1-D arrays are normally treated as column vectors. Chuck
Just to throw something new into the mix
u@v@w = u@(v@w) -- u@v is a dyadic matrix
u@v -- is a scalar
It would be nice if u@v@None, or some such, would evaluate as a dyad. Or else we will still need the concept of row and column 1-D matrices. I still
On 15 Mar 2014 19:02, "Charles R Harris" <charlesr.harris@gmail.com> wrote: think v.T should set a flag so that one can distinguish u@v.T (dyad) from u.T@v (inner product), where 1-D arrays are normally treated as column vectors. This sounds important but I have no idea what any of it means :-) (What's a dyadic matrix?) Can you elaborate? -n
On Sat, Mar 15, 2014 at 3:29 PM, Nathaniel Smith <njs@pobox.com> wrote:
It would be nice if u@v@None, or some such, would evaluate as a dyad. Or else we will still need the concept of row and column 1-D matrices. I still think v.T should set a flag so that one can distinguish u@v.T(dyad) from u.T@v(inner product), where 1-D arrays are normally treated as column vectors.
This sounds important but I have no idea what any of it means :-) (What's a dyadic matrix?) Can you elaborate?
I assume dyadic means 2d. This discussion gave me an idea that is only tangentially relevant to the discussion at hand. It looks like numpy operators commonly need to make a choice whether to treat an Nd array as a unit (atom) or as a list to broadcast itself over. APL-derived languages solve this problem by using operator modifiers. Applied to our case, given a dot-product operator @, each[@] operator works on 2d arrays by "dotting" them pair-wise and returning a 1d array. Similarly, eachleft[@] would operate on 2d, 1d operands by broadcasting itself over the left operand (incidentally reproducing the mat @ vec behavior) and eachright[@] would treat its left operand atomically and broadcast over the right operand. My idea is inspired by Guido's "use facade" suggestion. We can define ndarray.each(axes=(0,)) method that would return a light-weigh proxy object so that a each[@] b is spelled a.each() @ b.each() a eachleft[@] b is spelled a.each() @ b a eachright[@] b is spelled a @ b.each()
On Sat, Mar 15, 2014 at 1:29 PM, Nathaniel Smith <njs@pobox.com> wrote:
On 15 Mar 2014 19:02, "Charles R Harris" <charlesr.harris@gmail.com> wrote:
Just to throw something new into the mix
u@v@w = u@(v@w) -- u@v is a dyadic matrix
u@v -- is a scalar
It would be nice if u@v@None, or some such, would evaluate as a dyad. Or else we will still need the concept of row and column 1-D matrices. I still think v.T should set a flag so that one can distinguish u@v.T(dyad) from u.T@v(inner product), where 1-D arrays are normally treated as column vectors.
This sounds important but I have no idea what any of it means :-) (What's a dyadic matrix?) Can you elaborate?
Dyadic matrices date back to the beginning of vector calculus and J. W. Gibbs. These days they are usually written as v*w.T, i.e., the outer product of two vectors and are a fairly common occurrence in matrix expressions. For instance, covariance matrices are defined as E(v * v.T) Chuck
On Sat, Mar 15, 2014 at 4:00 PM, Charles R Harris <charlesr.harris@gmail.com
wrote:
These days they are usually written as v*w.T, i.e., the outer product of two vectors and are a fairly common occurrence in matrix expressions. For instance, covariance matrices are defined as E(v * v.T)
With the current numpy, we can do
x = arange(1, 5) x[:,None].dot(x[None,:]) array([[ 1, 2, 3, 4], [ 2, 4, 6, 8], [ 3, 6, 9, 12], [ 4, 8, 12, 16]])
I assume once @ becomes available, we will have
x[:,None] @ x[None,:] array([[ 1, 2, 3, 4], [ 2, 4, 6, 8], [ 3, 6, 9, 12], [ 4, 8, 12, 16]])
On Sat, Mar 15, 2014 at 2:12 PM, Alexander Belopolsky <ndarray@mac.com>wrote:
On Sat, Mar 15, 2014 at 4:00 PM, Charles R Harris < charlesr.harris@gmail.com> wrote:
These days they are usually written as v*w.T, i.e., the outer product of two vectors and are a fairly common occurrence in matrix expressions. For instance, covariance matrices are defined as E(v * v.T)
With the current numpy, we can do
x = arange(1, 5) x[:,None].dot(x[None,:]) array([[ 1, 2, 3, 4], [ 2, 4, 6, 8], [ 3, 6, 9, 12], [ 4, 8, 12, 16]])
I assume once @ becomes available, we will have
x[:,None] @ x[None,:] array([[ 1, 2, 3, 4], [ 2, 4, 6, 8], [ 3, 6, 9, 12], [ 4, 8, 12, 16]])
Yes, that works. I was thinking more of easy translation of the forms found in textbooks. Householder reflection, for instance, is usually written as I - 2 * v * v.T Where the `v` are unit vectors. Chuck
On Sat, Mar 15, 2014 at 6:28 PM, Nathaniel Smith <njs@pobox.com> wrote:
Mathematica: instead of having an associativity, a @ b @ c gets converted into mdot([a, b, c])
So, I've been thinking about this (thanks to @rfateman for pointing it out), and wondering if Mathematica's approach is worth following up more. (It would need to make it past python-dev, of course, but worst case is just that they say no and we're back where we are now, so we might as well think it through.) Here's how it would work: Currently Python has 3 different kinds of ops: left-associative (most of them), right-associative (**), and "chaining". Chaining is used for comparison ops. Example: a < b < c gets parsed to something like do_comparison(args=[a, b, c], ops=[lt, lt]) Notice this is very different from either of (a < b) < c a < (b < c) Which means that comparisons aren't left- OR right-associative, they're this other thing, "chaining". So we could propose adding a 4th kind of op, calling "grouping", which would be only @. And the idea is that a @ b @ c would be equivalent to operator.matmul((a, b, c)) which eventually (see below) becomes a call to a.__matmul__((a, b, c)) We'd use exactly the same parsing rules as the chaining ops, so you can still control evaluation order with parentheses if you want: a @ (b @ c) -> matmul((a, matmul((b, c)))) (a @ b) @ c -> matmul((matmul((a, c)), c)) ...but if you don't specify, then each contiguous group of @ operators gets collected up and handed to __matmul__ together, and the __matmul__ implementation gets to decide which evaluation strategy to use. It's trivially fast for the computer to figure out the best evaluation order for matrix multiplication, so in practice I think this would mean that you could just stop worrying about parentheses for multiple contiguous calls to matmul. Fancier versions of __matmul__ defined on more specialized non-ndarray classes might even take into account their specialized knowledge of how expensive different evaluation orders are for their specific type -- I'm not sure if this actually happens in practice, but it might. (Like maybe the best way to evaluate a @ b @ c depends on the sparsity pattern in the various matrices, or maybe it depends on which matrices are currently on the GPU and which are in memory? Anyone have any real examples of this?) (Of course, this same evaluation-order problem arises for *all* expressions using numpy; being able to optimize whole expressions at a time is exactly what numexpr/numba/theano/etc. are useful for. So one could argue that "baking it in" to @ is pointless, if anyone gets tired of writing parentheses they should just use one of these libraries. Or maybe evaluation order problems arise so rarely for @ that no-one cares. But OTOH it would be so nice for once to just have a single best solution -- "always use @ and be happy, it just works" -- instead of all the caveats we normally do -- "@ is good in some cases, but in other cases mdot is better, or if you know you can just use @ with the right parentheses...".) Of course, we still have to say something about what value a @ b @ c actually computes. In the PEP semantics, it isn't always associative -- specifically not if we do Mat @ vec @ Mat. So in this approach, we still need to decide what matmul((Mat, vec, Mat)) should return. But, this is actually a feature! Because obviously what *should* be returned in this case is *not* (Mat @ vec) @ Mat, *or* Mat @ (vec @ Mat). Both of those answers are terrible; it's just, if you have an ordinary left-/right-associative operator, those are your only options. What *should* be returned is an error. And in this scheme we get to see the whole @ expression at once, so we actually can raise an error for such things. So, this possibly has nicer performance characteristics, and is also possibly has nicer semantics. Now, how would this look in terms of the language definition? As far as the parser and AST go, this would use exactly the same rules as the chaining ops, so that's easy. Having parsed, we must evaluate. Likely the most contentious part of this approach is that we now have an n-arg operator, so the standard __X__/__rX__ dichotomy won't work, we need to do something like multiple dispatch. I haven't followed the debate on this issue in detail, but what I'd propose for this limited context is not to do anything like "real" multiple dispatch, but just directly generalize the familiar __rX__ rule to n arguments. The __rX__ rule is how Python's existing binary operators work: usually to evaluate a # b, you try a.__foo__, and then b.__foo__ EXCEPT if b is a proper subclass of a, you try b first. Generalized to >2 arguments, this looks like: def operator.matmul(args): candidates = list(args) while candidates: candidate = pop_next_candidate(candidates) if hasattr(candidate, "__matmul__"): result = candidate.__matmul__(args) if result is not NotImplemented: return result raise TypeError def pop_next_candidate(candidates): classes = [c.__class__ for c in candidates] # We'll try the left-most remaining candidate... for i in range(len(candidates)): # ...unless there's a later, untried candidate that's a proper subclass. if not has_proper_subclass(classes[i], classes): return candidates.pop(i) assert False def has_proper_subclass(class_, other_classes): for other_class in other_classes: if (issubclass(other_class, class_) and not issubclass(class_, other_class)): return True return False ...which, it turns out, is exactly the lookup rule that __numpy_ufunc__ will use, so at least it isn't too weird from our point of view: http://docs.scipy.org/doc/numpy-dev/reference/arrays.classes.html#numpy.clas... There are still plenty of details to think about, e.g.: What does the in-place operator do? I think a @= b @ c would have to be the same as a = a @ (b @ c) and NOT a = a @ b @ c because otherwise what do you do with a @= b @ c + d. I'm not sure how this would interact with implementing np.dot (which we'd still need for its out= argument, and will I guess do dispatch through the __numpy_ufunc__ mechanism?). We should probably work through this in detail. We'd still need to pick a precedence for @: grouping is different than left-associativity, so we can't do same-group, we'd have to pick either tight-group or weak-group. My gut feeling is that tight-group makes more sense that weak-group, because if @ is a magical thing that collects up a group of items, then it is helpful if there's a simple visual mapping where the group starts just to the left of the first @ and extends just to the right of the last @. I'm still not at all sure this rigmorale is worth it -- I still think we need some data on how often people chain together multiple @ or np.dot calls. Still, I thought I'd throw this out here and see what people think of it. -n -- Nathaniel J. Smith Postdoctoral researcher - Informatics - University of Edinburgh http://vorpus.org
On Mon, Mar 17, 2014 at 8:54 PM, Nathaniel Smith <njs@pobox.com> wrote:
Currently Python has 3 different kinds of ops: left-associative (most of them), right-associative (**), and "chaining". Chaining is used for comparison ops. Example:
a < b < c
gets parsed to something like
do_comparison(args=[a, b, c], ops=[lt, lt])
The actual parse tree is more like Compare(a, [lt, lt], [b, c]) with the first aruments playing a distinct role:
ast.dump(ast.parse('a<b<c'), annotate_fields=False) "Module([Expr(Compare(Name('a', Load()), [Lt(), Lt()], [Name('b', Load()), Name('c', Load())]))])"
Your idea is very interesting and IMO, worth considering independently from the @ operator. I always wanted a vector "between" operator to be available in numpy as low < x < high. The only problem I see here is with mixed types, but we can follow the pow precedent [1]: "Note that ternary pow() will not try calling __rpow__() (the coercion rules would become too complicated)." [1] http://docs.python.org/3/reference/datamodel.html#emulating-numeric-types
On Mar 17, 2014 5:54 PM, "Nathaniel Smith" <njs@pobox.com> wrote:
On Sat, Mar 15, 2014 at 6:28 PM, Nathaniel Smith <njs@pobox.com> wrote:
Mathematica: instead of having an associativity, a @ b @ c gets converted into mdot([a, b, c])
So, I've been thinking about this (thanks to @rfateman for pointing it out), and wondering if Mathematica's approach is worth following up more. (It would need to make it past python-dev, of course, but worst case is just that they say no and we're back where we are now, so we might as well think it through.)
Here's how it would work:
Currently Python has 3 different kinds of ops: left-associative (most of them), right-associative (**), and "chaining". Chaining is used for comparison ops. Example:
a < b < c
gets parsed to something like
do_comparison(args=[a, b, c], ops=[lt, lt])
Notice this is very different from either of
(a < b) < c a < (b < c)
Which means that comparisons aren't left- OR right-associative, they're this other thing, "chaining".
So we could propose adding a 4th kind of op, calling "grouping", which would be only @. And the idea is that
a @ b @ c
would be equivalent to
operator.matmul((a, b, c))
which eventually (see below) becomes a call to
a.__matmul__((a, b, c))
We'd use exactly the same parsing rules as the chaining ops, so you can still control evaluation order with parentheses if you want:
a @ (b @ c) -> matmul((a, matmul((b, c)))) (a @ b) @ c -> matmul((matmul((a, c)), c))
...but if you don't specify, then each contiguous group of @ operators gets collected up and handed to __matmul__ together, and the __matmul__ implementation gets to decide which evaluation strategy to use.
It's trivially fast for the computer to figure out the best evaluation order for matrix multiplication, so in practice I think this would mean that you could just stop worrying about parentheses for multiple contiguous calls to matmul. Fancier versions of __matmul__ defined on more specialized non-ndarray classes might even take into account their specialized knowledge of how expensive different evaluation orders are for their specific type -- I'm not sure if this actually happens in practice, but it might. (Like maybe the best way to evaluate a @ b @ c depends on the sparsity pattern in the various matrices, or maybe it depends on which matrices are currently on the GPU and which are in memory? Anyone have any real examples of this?)
(Of course, this same evaluation-order problem arises for *all* expressions using numpy; being able to optimize whole expressions at a time is exactly what numexpr/numba/theano/etc. are useful for. So one could argue that "baking it in" to @ is pointless, if anyone gets tired of writing parentheses they should just use one of these libraries. Or maybe evaluation order problems arise so rarely for @ that no-one cares. But OTOH it would be so nice for once to just have a single best solution -- "always use @ and be happy, it just works" -- instead of all the caveats we normally do -- "@ is good in some cases, but in other cases mdot is better, or if you know you can just use @ with the right parentheses...".)
Of course, we still have to say something about what value a @ b @ c actually computes. In the PEP semantics, it isn't always associative -- specifically not if we do Mat @ vec @ Mat. So in this approach, we still need to decide what matmul((Mat, vec, Mat)) should return.
But, this is actually a feature! Because obviously what *should* be returned in this case is *not* (Mat @ vec) @ Mat, *or* Mat @ (vec @ Mat). Both of those answers are terrible; it's just, if you have an ordinary left-/right-associative operator, those are your only options. What *should* be returned is an error. And in this scheme we get to see the whole @ expression at once, so we actually can raise an error for such things.
So, this possibly has nicer performance characteristics, and is also possibly has nicer semantics.
Now, how would this look in terms of the language definition?
As far as the parser and AST go, this would use exactly the same rules as the chaining ops, so that's easy.
Having parsed, we must evaluate. Likely the most contentious part of this approach is that we now have an n-arg operator, so the standard __X__/__rX__ dichotomy won't work, we need to do something like multiple dispatch. I haven't followed the debate on this issue in detail, but what I'd propose for this limited context is not to do anything like "real" multiple dispatch, but just directly generalize the familiar __rX__ rule to n arguments. The __rX__ rule is how Python's existing binary operators work: usually to evaluate a # b, you try a.__foo__, and then b.__foo__ EXCEPT if b is a proper subclass of a, you try b first. Generalized to >2 arguments, this looks like:
def operator.matmul(args): candidates = list(args) while candidates: candidate = pop_next_candidate(candidates) if hasattr(candidate, "__matmul__"): result = candidate.__matmul__(args) if result is not NotImplemented: return result raise TypeError
def pop_next_candidate(candidates): classes = [c.__class__ for c in candidates] # We'll try the left-most remaining candidate... for i in range(len(candidates)): # ...unless there's a later, untried candidate that's a proper
subclass.
if not has_proper_subclass(classes[i], classes): return candidates.pop(i) assert False
def has_proper_subclass(class_, other_classes): for other_class in other_classes: if (issubclass(other_class, class_) and not issubclass(class_, other_class)): return True return False
...which, it turns out, is exactly the lookup rule that __numpy_ufunc__ will use, so at least it isn't too weird from our point of view:
http://docs.scipy.org/doc/numpy-dev/reference/arrays.classes.html#numpy.clas...
There are still plenty of details to think about, e.g.:
What does the in-place operator do? I think a @= b @ c would have to be the same as a = a @ (b @ c) and NOT a = a @ b @ c because otherwise what do you do with a @= b @ c + d.
You cannot do inplace matrix multiplication without an intermediate copy of the first array, or at least of each row of the first array in turns. I don't like the idea of providing syntax that looks faster but isn't, I'd rather see @= return an error in the context of matrix multiplication.
I'm not sure how this would interact with implementing np.dot (which we'd still need for its out= argument, and will I guess do dispatch through the __numpy_ufunc__ mechanism?). We should probably work through this in detail.
We'd still need to pick a precedence for @: grouping is different than left-associativity, so we can't do same-group, we'd have to pick either tight-group or weak-group. My gut feeling is that tight-group makes more sense that weak-group, because if @ is a magical thing that collects up a group of items, then it is helpful if there's a simple visual mapping where the group starts just to the left of the first @ and extends just to the right of the last @.
I'm still not at all sure this rigmorale is worth it -- I still think we need some data on how often people chain together multiple @ or np.dot calls.
Still, I thought I'd throw this out here and see what people think of it.
-n
-- Nathaniel J. Smith Postdoctoral researcher - Informatics - University of Edinburgh http://vorpus.org _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
On Mon, Mar 17, 2014 at 8:54 PM, Nathaniel Smith <njs@pobox.com> wrote:
But, this is actually a feature! Because obviously what *should* be returned in this case is *not* (Mat @ vec) @ Mat, *or* Mat @ (vec @ Mat). Both of those answers are terrible; it's just, if you have an ordinary left-/right-associative operator, those are your only options. What *should* be returned is an error. And in this scheme we get to see the whole @ expression at once, so we actually can raise an error for such things.
Sorry if this is a little off topic. But there's still something about the "vector" examples that bugs me, "matrix@vector" and "vector@@2", keep popping up (this also applies to the matrix@matrix examples to a lesser extent). I'm a little unconformable looking at the shape to to decide what's a matrix and what's a vector. (Matlab has some problems like this) If it only has one or two dimensions it's easy, but I always find that if I've written code that works for 1 matrix or vector, 5 minutes later I want it to work for fields of matrices or vectors. If we're just going by shape there's no way to distinguish between a 2d field of matrices and a 3d field of vectors. I guess this is a repeat of part of what Eelco Hoogendoorn saying a few posts back I was just wondering if anyone sees a place, to get @ a little closer to Einsum, for some sort of array class that understands the difference between a 4D array of scalars, a 3D array of vectors, and a 2D array of matrices... The difference between the axes that broad-cast and the axes that can sum when you hit them with an @ ... or something like that. Just a thought. Einsum is fantastic by the way, totally worth learning and using. Mark Daoust
Perhaps this a bit of a thread hyjack; but this discussion got me thinking about how to arrive at a more vectorized/tensorified way of specifying linear algebra operations, in an elegant manner. I probably got a little carried away, but what about this syntax? - indexing/calling an ndarray with a string returns a TensorExpression object - these TensorExpression objects can be combined into a graph using operator overloads - and these graphs are translated to calls to BLAS or einsum, as is appropriate #declare some symbols i,j,ij,k = 'i','j','ij','k' #we may force evaluation of a (sub) TensorExpression by calling it #this is trivial to translate to call to einsum #but such special cases could be dispatched to BLAS as well b = (A(ij) * x(j)) (i) #alternatively, we can predeclare a LHS which is automatically sized later #note that this translates into the same call as the above; just some syntactic sugar b = np.empty(()) b[i] = A(ij) * x(j) #more complex TensorExpression graphs of this form are trivial to translate to a call to einsum as well a(i)*b(j)*c(k) #conceptually, there is no need to limit this scheme to multiplications only! #although such generalizations would require a more complex execution engine #however, the revamped nditer should make this quite managable to implement a(i)*b(j) + c(k) #if axes strings are omitted, standard numpy broadcasting rules are applied to the expressiongraph created #this is identical to a*b+c; except that we have the opportunity to eliminate temporaries a()*b()+c() Note that such an approach kills quite some birds with one stone it allows for the elimination of temporaries along the lines of numexpr But if i could write: b[i] = A[ij] * x[j] I would much prefer that over b = A @ x even though the latter is shorter Now if i had n input and output vectors, it would be easy what to do with them: b[ni] = A[ij] * x[nj] As i argued earlier, I much prefer this form of explicitness over conventions about what constitutes a row or column vector. And vectorization of linear algebra is a trivial extension in this manner, which in itself is just a subset of even more general multilinear products, which themselves are a subset of more general expression involving things other than products Its a somewhat ambitious idea, and there are probably reasons why it isnt a good idea as well, but it does not require python language modifications, and it does not clash with any other functionality or syntax of numpy, as far as i can tell. Calling of arrays is not yet defined, and alternatively array indexing could be overloaded on string type. Either way, something to chew on when deciding on the best way to go forward. On Tue, Mar 18, 2014 at 4:28 AM, Mark Daoust <daoust.mj@gmail.com> wrote:
On Mon, Mar 17, 2014 at 8:54 PM, Nathaniel Smith <njs@pobox.com> wrote:
But, this is actually a feature! Because obviously what *should* be returned in this case is *not* (Mat @ vec) @ Mat, *or* Mat @ (vec @ Mat). Both of those answers are terrible; it's just, if you have an ordinary left-/right-associative operator, those are your only options. What *should* be returned is an error. And in this scheme we get to see the whole @ expression at once, so we actually can raise an error for such things.
Sorry if this is a little off topic.
But there's still something about the "vector" examples that bugs me, "matrix@vector" and "vector@@2", keep popping up (this also applies to the matrix@matrix examples to a lesser extent).
I'm a little unconformable looking at the shape to to decide what's a matrix and what's a vector. (Matlab has some problems like this)
If it only has one or two dimensions it's easy, but I always find that if I've written code that works for 1 matrix or vector, 5 minutes later I want it to work for fields of matrices or vectors. If we're just going by shape there's no way to distinguish between a 2d field of matrices and a 3d field of vectors.
I guess this is a repeat of part of what Eelco Hoogendoorn saying a few posts back
I was just wondering if anyone sees a place, to get @ a little closer to Einsum, for some sort of array class that understands the difference between a 4D array of scalars, a 3D array of vectors, and a 2D array of matrices... The difference between the axes that broad-cast and the axes that can sum when you hit them with an @ ... or something like that.
Just a thought.
Einsum is fantastic by the way, totally worth learning and using.
Mark Daoust
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Just add one vote: I am for * right association * because 1) I'm thinking of matrix multiplication more like operators, which I also learned to work from right to left and because 2) I would put a vector to the right, which would result in better performance. I don't have an opinion on tight/same/ or weak.... (maybe that means then 'same' because it's easier to remember !?) My two cents, Sebastian Haase On Tue, Mar 18, 2014 at 7:13 AM, Eelco Hoogendoorn <hoogendoorn.eelco@gmail.com> wrote:
Perhaps this a bit of a thread hyjack; but this discussion got me thinking about how to arrive
at a more vectorized/tensorified way of specifying linear algebra operations, in an elegant manner.
I probably got a little carried away, but what about this syntax?
indexing/calling an ndarray with a string returns a TensorExpression object these TensorExpression objects can be combined into a graph using operator overloads and these graphs are translated to calls to BLAS or einsum, as is appropriate
#declare some symbols i,j,ij,k = 'i','j','ij','k' #we may force evaluation of a (sub) TensorExpression by calling it #this is trivial to translate to call to einsum #but such special cases could be dispatched to BLAS as well b = (A(ij) * x(j)) (i) #alternatively, we can predeclare a LHS which is automatically sized later #note that this translates into the same call as the above; just some syntactic sugar b = np.empty(()) b[i] = A(ij) * x(j) #more complex TensorExpression graphs of this form are trivial to translate to a call to einsum as well a(i)*b(j)*c(k) #conceptually, there is no need to limit this scheme to multiplications only! #although such generalizations would require a more complex execution engine #however, the revamped nditer should make this quite managable to implement a(i)*b(j) + c(k) #if axes strings are omitted, standard numpy broadcasting rules are applied to the expressiongraph created #this is identical to a*b+c; except that we have the opportunity to eliminate temporaries a()*b()+c()
Note that such an approach kills quite some birds with one stone it allows for the elimination of temporaries along the lines of numexpr
But if i could write:
b[i] = A[ij] * x[j] I would much prefer that over b = A @ x even though the latter is shorter
Now if i had n input and output vectors, it would be easy what to do with them:
b[ni] = A[ij] * x[nj]
As i argued earlier, I much prefer this form of explicitness over conventions about what constitutes a row or column vector. And vectorization of linear algebra is a trivial extension in this manner, which in itself is just a subset of even more general multilinear products, which themselves are a subset of more general expression involving things other than products
Its a somewhat ambitious idea, and there are probably reasons why it isnt a good idea as well, but it does not require python language modifications, and it does not clash with any other functionality or syntax of numpy, as far as i can tell. Calling of arrays is not yet defined, and alternatively array indexing could be overloaded on string type.
Either way, something to chew on when deciding on the best way to go forward.
On Tue, Mar 18, 2014 at 4:28 AM, Mark Daoust <daoust.mj@gmail.com> wrote:
On Mon, Mar 17, 2014 at 8:54 PM, Nathaniel Smith <njs@pobox.com> wrote:
But, this is actually a feature! Because obviously what *should* be returned in this case is *not* (Mat @ vec) @ Mat, *or* Mat @ (vec @ Mat). Both of those answers are terrible; it's just, if you have an ordinary left-/right-associative operator, those are your only options. What *should* be returned is an error. And in this scheme we get to see the whole @ expression at once, so we actually can raise an error for such things.
Sorry if this is a little off topic.
But there's still something about the "vector" examples that bugs me, "matrix@vector" and "vector@@2", keep popping up (this also applies to the matrix@matrix examples to a lesser extent).
I'm a little unconformable looking at the shape to to decide what's a matrix and what's a vector. (Matlab has some problems like this)
If it only has one or two dimensions it's easy, but I always find that if I've written code that works for 1 matrix or vector, 5 minutes later I want it to work for fields of matrices or vectors. If we're just going by shape there's no way to distinguish between a 2d field of matrices and a 3d field of vectors.
I guess this is a repeat of part of what Eelco Hoogendoorn saying a few posts back
I was just wondering if anyone sees a place, to get @ a little closer to Einsum, for some sort of array class that understands the difference between a 4D array of scalars, a 3D array of vectors, and a 2D array of matrices... The difference between the axes that broad-cast and the axes that can sum when you hit them with an @ ... or something like that.
Just a thought.
Einsum is fantastic by the way, totally worth learning and using.
Mark Daoust
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
To elaborate a little on such a more general and explicit method of specifying linear operations (perhaps 'expressions with named axes' is a good nomer to cover this topic). I think indexing rather than calling is preferable. I worried at first about the performance overhead of checking for strings at every indexing op, but get ndarray__getitem__ is already quite a complex beast anyway, and adding (yet another) type test on its args isn't a significant difference. For those who disagree; we could also approach strings with a 'forgiveness is better then permission' attitude. The general rules could be: if no string args, everything works as normal. In case of string args, we may think of the effect of __getitem__ as indexing with strings replaced by colons first, and then creating a NamedAxisIndexExpression (NAIE), associating the given string label with each corresponding axis. Thus, we can write things like A[0:3,'i'] As some additional rules; string arguments can be 'expanded', the string is split on commas if present, and otherwise split into characters, which are then the axis labels. In expressions, all non-labeled axes are treated in sequential order, similar to the ... construct, and have standard numpy broadcasting semantics. The only problem with [] notation is field name lookup; though I have always felt that tables with named columns should be an ndarray subtype, given their fundamentally different indexing semantics. Realizing the full potential of such an approach would be a complex undertaking, but to start with, a more elegant interface to np.einsum would be rather easy to implement. On Tue, Mar 18, 2014 at 9:46 AM, Sebastian Haase <seb.haase@gmail.com>wrote:
Just add one vote: I am for * right association * because 1) I'm thinking of matrix multiplication more like operators, which I also learned to work from right to left and because 2) I would put a vector to the right, which would result in better performance.
I don't have an opinion on tight/same/ or weak.... (maybe that means then 'same' because it's easier to remember !?)
My two cents, Sebastian Haase
On Tue, Mar 18, 2014 at 7:13 AM, Eelco Hoogendoorn <hoogendoorn.eelco@gmail.com> wrote:
Perhaps this a bit of a thread hyjack; but this discussion got me
about how to arrive
at a more vectorized/tensorified way of specifying linear algebra operations, in an elegant manner.
I probably got a little carried away, but what about this syntax?
indexing/calling an ndarray with a string returns a TensorExpression object these TensorExpression objects can be combined into a graph using operator overloads and these graphs are translated to calls to BLAS or einsum, as is appropriate
#declare some symbols i,j,ij,k = 'i','j','ij','k' #we may force evaluation of a (sub) TensorExpression by calling it #this is trivial to translate to call to einsum #but such special cases could be dispatched to BLAS as well b = (A(ij) * x(j)) (i) #alternatively, we can predeclare a LHS which is automatically sized later #note that this translates into the same call as the above; just some syntactic sugar b = np.empty(()) b[i] = A(ij) * x(j) #more complex TensorExpression graphs of this form are trivial to
to a call to einsum as well a(i)*b(j)*c(k) #conceptually, there is no need to limit this scheme to multiplications only! #although such generalizations would require a more complex execution engine #however, the revamped nditer should make this quite managable to implement a(i)*b(j) + c(k) #if axes strings are omitted, standard numpy broadcasting rules are applied to the expressiongraph created #this is identical to a*b+c; except that we have the opportunity to eliminate temporaries a()*b()+c()
Note that such an approach kills quite some birds with one stone it allows for the elimination of temporaries along the lines of numexpr
But if i could write:
b[i] = A[ij] * x[j] I would much prefer that over b = A @ x even though the latter is shorter
Now if i had n input and output vectors, it would be easy what to do with them:
b[ni] = A[ij] * x[nj]
As i argued earlier, I much prefer this form of explicitness over conventions about what constitutes a row or column vector. And vectorization of linear algebra is a trivial extension in this manner, which in itself is just a subset of even more general multilinear products, which themselves are a subset of more general expression involving things other than
thinking translate products
Its a somewhat ambitious idea, and there are probably reasons why it
good idea as well, but it does not require python language modifications, and it does not clash with any other functionality or syntax of numpy, as far as i can tell. Calling of arrays is not yet defined, and alternatively array indexing could be overloaded on string type.
Either way, something to chew on when deciding on the best way to go forward.
On Tue, Mar 18, 2014 at 4:28 AM, Mark Daoust <daoust.mj@gmail.com> wrote:
On Mon, Mar 17, 2014 at 8:54 PM, Nathaniel Smith <njs@pobox.com> wrote:
But, this is actually a feature! Because obviously what *should* be returned in this case is *not* (Mat @ vec) @ Mat, *or* Mat @ (vec @ Mat). Both of those answers are terrible; it's just, if you have an ordinary left-/right-associative operator, those are your only options. What *should* be returned is an error. And in this scheme we get to see the whole @ expression at once, so we actually can raise an error for such things.
Sorry if this is a little off topic.
But there's still something about the "vector" examples that bugs me, "matrix@vector" and "vector@@2", keep popping up (this also applies to
isnt a the
matrix@matrix examples to a lesser extent).
I'm a little unconformable looking at the shape to to decide what's a matrix and what's a vector. (Matlab has some problems like this)
If it only has one or two dimensions it's easy, but I always find that if I've written code that works for 1 matrix or vector, 5 minutes later I want it to work for fields of matrices or vectors. If we're just going by shape there's no way to distinguish between a 2d field of matrices and a 3d field of vectors.
I guess this is a repeat of part of what Eelco Hoogendoorn saying a few posts back
I was just wondering if anyone sees a place, to get @ a little closer to Einsum, for some sort of array class that understands the difference between a 4D array of scalars, a 3D array of vectors, and a 2D array of matrices... The difference between the axes that broad-cast and the axes that can sum when you hit them with an @ ... or something like that.
Just a thought.
Einsum is fantastic by the way, totally worth learning and using.
Mark Daoust
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
*About weak-left.* You need to define a priority of @ the matrix product regarding to * the elementwise product because (A*B)@C <> A*(B@C) : see the example above. I say that also from a mathematical point of view. Using mathematical like notations, Matrix1 * Matrix2 * 3 can be written because (Matrix1 * Matrix2) * 3 = Matrix1 * (Matrix2 * 3). That's why I think that the weak-left is the better choice. *About group implementation.* I think the idea of calculating A@B@C as __atmul__([A,B,C]) is a very good idea because this allows efficient implementations. *---------------------* * [1 2]* *A = [3 4]* * [5 6]* *B = [7 8]* * [a d]* *C = [b c]* *---------------------* *(A*B)@C* *=* *[5 12] [a d]* *[21 32] @ [b c]* *=* *[5a+12b 5d+12c ]* *[21a+32b 21d+32c]* *---------------------* *A*(B@C)* *=* *[1 2] [5a+6b 5d+6c]* *[3 4] * [7a+8b 7d+8c]* *=* *[5a+6b 10d+12c]* *[21a+24b 28d+32c]* 2014-03-18 10:50 GMT+01:00 Eelco Hoogendoorn <hoogendoorn.eelco@gmail.com>:
To elaborate a little on such a more general and explicit method of specifying linear operations (perhaps 'expressions with named axes' is a good nomer to cover this topic).
I think indexing rather than calling is preferable. I worried at first about the performance overhead of checking for strings at every indexing op, but get ndarray__getitem__ is already quite a complex beast anyway, and adding (yet another) type test on its args isn't a significant difference. For those who disagree; we could also approach strings with a 'forgiveness is better then permission' attitude.
The general rules could be: if no string args, everything works as normal. In case of string args, we may think of the effect of __getitem__ as indexing with strings replaced by colons first, and then creating a NamedAxisIndexExpression (NAIE), associating the given string label with each corresponding axis. Thus, we can write things like A[0:3,'i']
As some additional rules; string arguments can be 'expanded', the string is split on commas if present, and otherwise split into characters, which are then the axis labels.
In expressions, all non-labeled axes are treated in sequential order, similar to the ... construct, and have standard numpy broadcasting semantics.
The only problem with [] notation is field name lookup; though I have always felt that tables with named columns should be an ndarray subtype, given their fundamentally different indexing semantics.
Realizing the full potential of such an approach would be a complex undertaking, but to start with, a more elegant interface to np.einsum would be rather easy to implement.
On Tue, Mar 18, 2014 at 9:46 AM, Sebastian Haase <seb.haase@gmail.com>wrote:
Just add one vote: I am for * right association * because 1) I'm thinking of matrix multiplication more like operators, which I also learned to work from right to left and because 2) I would put a vector to the right, which would result in better performance.
I don't have an opinion on tight/same/ or weak.... (maybe that means then 'same' because it's easier to remember !?)
My two cents, Sebastian Haase
On Tue, Mar 18, 2014 at 7:13 AM, Eelco Hoogendoorn <hoogendoorn.eelco@gmail.com> wrote:
Perhaps this a bit of a thread hyjack; but this discussion got me
about how to arrive
at a more vectorized/tensorified way of specifying linear algebra operations, in an elegant manner.
I probably got a little carried away, but what about this syntax?
indexing/calling an ndarray with a string returns a TensorExpression object these TensorExpression objects can be combined into a graph using operator overloads and these graphs are translated to calls to BLAS or einsum, as is appropriate
#declare some symbols i,j,ij,k = 'i','j','ij','k' #we may force evaluation of a (sub) TensorExpression by calling it #this is trivial to translate to call to einsum #but such special cases could be dispatched to BLAS as well b = (A(ij) * x(j)) (i) #alternatively, we can predeclare a LHS which is automatically sized later #note that this translates into the same call as the above; just some syntactic sugar b = np.empty(()) b[i] = A(ij) * x(j) #more complex TensorExpression graphs of this form are trivial to
to a call to einsum as well a(i)*b(j)*c(k) #conceptually, there is no need to limit this scheme to multiplications only! #although such generalizations would require a more complex execution engine #however, the revamped nditer should make this quite managable to implement a(i)*b(j) + c(k) #if axes strings are omitted, standard numpy broadcasting rules are applied to the expressiongraph created #this is identical to a*b+c; except that we have the opportunity to eliminate temporaries a()*b()+c()
Note that such an approach kills quite some birds with one stone it allows for the elimination of temporaries along the lines of numexpr
But if i could write:
b[i] = A[ij] * x[j] I would much prefer that over b = A @ x even though the latter is shorter
Now if i had n input and output vectors, it would be easy what to do with them:
b[ni] = A[ij] * x[nj]
As i argued earlier, I much prefer this form of explicitness over conventions about what constitutes a row or column vector. And vectorization of linear algebra is a trivial extension in this manner, which in itself is just a subset of even more general multilinear products, which
are a subset of more general expression involving things other than
thinking translate themselves products
Its a somewhat ambitious idea, and there are probably reasons why it
isnt a
good idea as well, but it does not require python language modifications, and it does not clash with any other functionality or syntax of numpy, as far as i can tell. Calling of arrays is not yet defined, and alternatively array indexing could be overloaded on string type.
Either way, something to chew on when deciding on the best way to go forward.
On Tue, Mar 18, 2014 at 4:28 AM, Mark Daoust <daoust.mj@gmail.com> wrote:
On Mon, Mar 17, 2014 at 8:54 PM, Nathaniel Smith <njs@pobox.com>
wrote:
But, this is actually a feature! Because obviously what *should* be returned in this case is *not* (Mat @ vec) @ Mat, *or* Mat @ (vec @ Mat). Both of those answers are terrible; it's just, if you have an ordinary left-/right-associative operator, those are your only options. What *should* be returned is an error. And in this scheme we get to see the whole @ expression at once, so we actually can raise an error for such things.
Sorry if this is a little off topic.
But there's still something about the "vector" examples that bugs me, "matrix@vector" and "vector@@2", keep popping up (this also applies to the matrix@matrix examples to a lesser extent).
I'm a little unconformable looking at the shape to to decide what's a matrix and what's a vector. (Matlab has some problems like this)
If it only has one or two dimensions it's easy, but I always find that if I've written code that works for 1 matrix or vector, 5 minutes later I want it to work for fields of matrices or vectors. If we're just going by shape there's no way to distinguish between a 2d field of matrices and a 3d field of vectors.
I guess this is a repeat of part of what Eelco Hoogendoorn saying a few posts back
I was just wondering if anyone sees a place, to get @ a little closer to Einsum, for some sort of array class that understands the difference between a 4D array of scalars, a 3D array of vectors, and a 2D array of matrices... The difference between the axes that broad-cast and the axes that can sum when you hit them with an @ ... or something like that.
Just a thought.
Einsum is fantastic by the way, totally worth learning and using.
Mark Daoust
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
On Tue, Mar 18, 2014 at 3:22 PM, Christophe Bal <projetmbc@gmail.com> wrote:
About weak-left. You need to define a priority of @ the matrix product regarding to * the elementwise product because (A*B)@C <> A*(B@C) : see the example above. I say that also from a mathematical point of view.
What example above?
Using mathematical like notations, Matrix1 * Matrix2 * 3 can be written because (Matrix1 * Matrix2) * 3 = Matrix1 * (Matrix2 * 3).
This seems to argue against what you just said.
That's why I think that the weak-left is the better choice.
But this is true as well: 3 * Matrix1 * Matrix2 = (3 * Matrix1) * Matrix2 = 3 * (Matrix1 * Matrix2) Does that expression argue for tight-left? -- Robert Kern
Strange, Gmail has cut my example. Here it is normally. * [1 2]* *A = [3 4]* * [5 6]* *B = [7 8]* * [a d]* *C = [b c]* *(A*B)@C* *=* *[5 12] [a d]* *[21 32] @ [b c]* *=* *[5a+12b 5d+12c ]* *[21a+32b 21d+32c]* *A*(B@C)* *=* *[1 2] [5a+6b 5d+6c]* *[3 4] * [7a+8b 7d+8c]* *=* *[5a+6b 10d+12c]* *[21a+24b 28d+32c]* 2014-03-18 16:29 GMT+01:00 Robert Kern <robert.kern@gmail.com>:
About weak-left. You need to define a priority of @ the matrix product regarding to * the elementwise product because (A*B)@C <> A*(B@C) : see
On Tue, Mar 18, 2014 at 3:22 PM, Christophe Bal <projetmbc@gmail.com> wrote: the
example above. I say that also from a mathematical point of view.
What example above?
Using mathematical like notations, Matrix1 * Matrix2 * 3 can be written because (Matrix1 * Matrix2) * 3 = Matrix1 * (Matrix2 * 3).
This seems to argue against what you just said.
That's why I think that the weak-left is the better choice.
But this is true as well:
3 * Matrix1 * Matrix2 = (3 * Matrix1) * Matrix2 = 3 * (Matrix1 * Matrix2)
Does that expression argue for tight-left?
-- Robert Kern _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
When I write "using mathematical like notations...", Matrix1 * Matrix2 is a matrix multiplication.
I'm still bothered by what Nathaniel mentioned about mixing 1d and 2d arrays
c = np.arange(4) a = np.arange(16).reshape(4,4) cc = c[:,None]
a.dot(c).dot(c.T) 420
a.dot(c.dot(c.T)) array([[ 0, 14, 28, 42], [ 56, 70, 84, 98], [112, 126, 140, 154], [168, 182, 196, 210]])
a.dot(cc).dot(cc.T) array([[ 0, 14, 28, 42], [ 0, 38, 76, 114], [ 0, 62, 124, 186], [ 0, 86, 172, 258]])
a.dot(cc.dot(cc.T)) array([[ 0, 14, 28, 42], [ 0, 38, 76, 114], [ 0, 62, 124, 186], [ 0, 86, 172, 258]])
hint:
c.dot(c.T) 14
and I expect it will be a lot more fun if we mix in some 3d or nd arrays. I think some of the decisions should not be driven by what is the most convenient for the usual cases, but by how easy it is to read your code and find the bugs where we made "silly" mistakes. A biased view from someone who learned how to use numpy and scipy by debugging. Matlab and GAUSS are user friendly, they don't allow for reduced dimension and never steal an axis in a reduce operation. I didn't manage to come up with more difficult examples (and ran out of time)
(a.dot(c).dot(c.T)).dot(2*a) Traceback (most recent call last): File "<stdin>", line 1, in <module> AttributeError: 'numpy.int32' object has no attribute 'dot'
If we make too many mistakes, then numpy tells us. In the above examples, the scalar dot matrix would raise according to the PEP. I cannot come up with examples where we mix 3d and 2d and 1d because dot currently doesn't do it. "sane-left" Josef
On Tue, Mar 18, 2014 at 3:22 PM, Christophe Bal <projetmbc@gmail.com> wrote:
About weak-left. You need to define a priority of @ the matrix product regarding to * the elementwise product because (A*B)@C <> A*(B@C)
This doesn't follow. (a / b) * c != a / (b * c), but / and * in Python have the same priority. -- Nathaniel J. Smith Postdoctoral researcher - Informatics - University of Edinburgh http://vorpus.org
This is a different situation because / is indeed an hidden multiplication : a/b = a*inv(b). The same is true for + and - : a-b=a+opp(b). What I'm saying is that these operations * and / are indeed of the very same j-kind. This is not the same for * and @. 2014-03-18 17:53 GMT+01:00 Nathaniel Smith <njs@pobox.com>:
On Tue, Mar 18, 2014 at 3:22 PM, Christophe Bal <projetmbc@gmail.com> wrote:
About weak-left. You need to define a priority of @ the matrix product regarding to * the elementwise product because (A*B)@C <> A*(B@C)
This doesn't follow. (a / b) * c != a / (b * c), but / and * in Python have the same priority.
-- Nathaniel J. Smith Postdoctoral researcher - Informatics - University of Edinburgh http://vorpus.org _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
On 18 Mar 2014 17:32, "Christophe Bal" <projetmbc@gmail.com> wrote:
This is a different situation because / is indeed an hidden
multiplication : a/b = a*inv(b). The same is true for + and - : a-b=a+opp(b). What I'm saying is that these operations * and / are indeed of the very same j-kind.
This is not the same for * and @.
// (floordiv) isn't equivalent to a multiplication, but it is also at the same level. << and >> aren't inverses, but they are at the same level. 'in' and 'is' are not even the same type (they have totally different requirements on their right argument) but they are at the same level. Whatever choice we make needs to be something we can justify, and our justification should probably not imply that all of python's other operators are wrong ;-). -n
I think that there is very big misunderstanding. My point of view is both a mathematical and a programmagical one. Le 18 mars 2014 20:20, "Nathaniel Smith" <njs@pobox.com> a écrit :
On 18 Mar 2014 17:32, "Christophe Bal" <projetmbc@gmail.com> wrote:
This is a different situation because / is indeed an hidden
multiplication : a/b = a*inv(b). The same is true for + and - : a-b=a+opp(b). What I'm saying is that these operations * and / are indeed of the very same j-kind.
This is not the same for * and @.
// (floordiv) isn't equivalent to a multiplication, but it is also at the same level. << and >> aren't inverses, but they are at the same level. 'in' and 'is' are not even the same type (they have totally different requirements on their right argument) but they are at the same level. Whatever choice we make needs to be something we can justify, and our justification should probably not imply that all of python's other operators are wrong ;-).
-n
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
I'm not saying that Python choices are wrong, I'm just saying that if * is elementwise product, then (A*B)@C <> A*(B@C) should imply to choose different levels fro * and @. By choosing A*B@C = (A*B)@C, that is a convention, we just say that a human who want to calculate something like A*B@C*D@E@G*H*K will have to start first with the elementwise products and then finish by the matrix product that is to say this human will evaluate (A*B)@(C*D)@E@(G*H*K). Maybe you should argue to calculate first the @-products but this will not be a good choice if * is the product of a scalar with a matrix like in 2*A@3 *B. *On the other hand, if you calculate from left to right, there will be a lot of isolate @-products to do instead of doing a single one. This will not allow to use the very good group technic you have proposed.* 1) If A*B@C*D@E@G*H*K = (A*B)@(C*D)@E@(G*H*K), you quickly evaluate first X = A*B, Y = C*D and Z = G*H*K, and then you can do an efficient @-product of X, Y and Z. 2) If you calculate from left to right, you will do three @-products on couple without having the possibility to choose the more efficient way to evaluate the @-products. Christophe BAL PS1: // is an approximate calculation of the exact mathematical inversion, so it is not really a counter example. PS2: here is a second time my example showing that (A*B)@C <> A*(B@C). * [1 2]* *A = [3 4]* * [5 6]* *B = [7 8]* * [a d]* *C = [b c]* *(A*B)@C* *=* *[5 12] [a d]* *[21 32] @ [b c]* *=* *[5a+12b 5d+12c ]* *[21a+32b 21d+32c]* *A*(B@C)* *=* *[1 2] [5a+6b 5d+6c]* *[3 4] * [7a+8b 7d+8c]* *=* *[5a+6b 10d+12c]* *[21a+24b 28d+32c]*
On Tue, Mar 18, 2014 at 9:50 AM, Eelco Hoogendoorn <hoogendoorn.eelco@gmail.com> wrote:
To elaborate a little on such a more general and explicit method of specifying linear operations (perhaps 'expressions with named axes' is a good nomer to cover this topic). [...]
This is a good topic to bring up on numpy-discussion, but maybe you should start a new thread? That way it's both more likely to be noticed by interested parties, and also it will make it easier for me to keep track of what's going on in this thread, which is about a specific concrete decision we need to make ;-). -n -- Nathaniel J. Smith Postdoctoral researcher - Informatics - University of Edinburgh http://vorpus.org
On Tue, Mar 18, 2014 at 12:54 AM, Nathaniel Smith <njs@pobox.com> wrote:
On Sat, Mar 15, 2014 at 6:28 PM, Nathaniel Smith <njs@pobox.com> wrote:
Mathematica: instead of having an associativity, a @ b @ c gets converted into mdot([a, b, c])
So, I've been thinking about this (thanks to @rfateman for pointing it out), and wondering if Mathematica's approach is worth following up more. (It would need to make it past python-dev, of course, but worst case is just that they say no and we're back where we are now, so we might as well think it through.)
I predict with near-certainty that this will be rejected, but that doesn't prevent it from derailing the discussion. This proposal is unlike anything else in Python. Chained comparisons are *not* similar to this proposal. The chaining only happens at the syntax level, not the semantics. `a < b < c` gets compiled down to `a.__lt__(b) and b.__lt__(c)`, not `do_comparison([a, b, c], [lt, lt])`. We have approval for a binary @ operator. Take the win. -- Robert Kern
On Tue, Mar 18, 2014 at 9:14 AM, Robert Kern <robert.kern@gmail.com> wrote:
On Tue, Mar 18, 2014 at 12:54 AM, Nathaniel Smith <njs@pobox.com> wrote:
On Sat, Mar 15, 2014 at 6:28 PM, Nathaniel Smith <njs@pobox.com> wrote:
Mathematica: instead of having an associativity, a @ b @ c gets converted into mdot([a, b, c])
So, I've been thinking about this (thanks to @rfateman for pointing it out), and wondering if Mathematica's approach is worth following up more. (It would need to make it past python-dev, of course, but worst case is just that they say no and we're back where we are now, so we might as well think it through.)
I predict with near-certainty that this will be rejected,
I guess that's what everyone thought about @ too? ;-)
but that doesn't prevent it from derailing the discussion. This proposal is unlike anything else in Python. Chained comparisons are *not* similar to this proposal. The chaining only happens at the syntax level, not the semantics. `a < b < c` gets compiled down to `a.__lt__(b) and b.__lt__(c)`, not `do_comparison([a, b, c], [lt, lt])`.
Yes, the syntax is the same as chained comparisons, and the dispatch is a generalization of regular operators. It is unusual; OTOH, @ is unusual in that no other operators in Python have the property that evaluating in the wrong order can cost you seconds of time and gigabytes of memory. Perhaps.
We have approval for a binary @ operator. Take the win.
We have approval, and we have a request: that we figure out how @ should work in detail to be most useful to us. Maybe that's this proposal; maybe not. Ultimately rejected-or-not-rejected comes down to how strong the arguments for something are. And while we can make some guesses about that, it's impossible to know how strong an argument will be until one sits down and works it out. So I still would like to hear what people think, even if it just ends in the conclusion that it's a terrible idea ;-). As for arguments against the "grouping" semantics, I did think of one another case where @ is not associative, though it's pretty weird: In [9]: a = np.arange(16, dtype=np.int8).reshape((4, 4)) In [10]: np.dot(a, np.dot(a, a.astype(float))) Out[10]: array([[ 1680., 1940., 2200., 2460.], [ 4880., 5620., 6360., 7100.], [ 8080., 9300., 10520., 11740.], [ 11280., 12980., 14680., 16380.]]) In [12]: np.dot(np.dot(a, a), a.astype(float)) Out[12]: array([[ 1680., 1940., 2200., 2460.], [-1264., -1548., -1832., -2116.], [ 1936., 2132., 2328., 2524.], [-1008., -1100., -1192., -1284.]]) (What's happening is that we have int8 @ int8 @ float, so (int8 @ int8) @ float has overflows in the first computation, but int8 @ (int8 @ float) does all the computations in float, with no overflows.) -- Nathaniel J. Smith Postdoctoral researcher - Informatics - University of Edinburgh http://vorpus.org
On Wed, Mar 19, 2014 at 2:24 PM, Nathaniel Smith <njs@pobox.com> wrote:
On Tue, Mar 18, 2014 at 9:14 AM, Robert Kern <robert.kern@gmail.com> wrote:
On Tue, Mar 18, 2014 at 12:54 AM, Nathaniel Smith <njs@pobox.com> wrote:
On Sat, Mar 15, 2014 at 6:28 PM, Nathaniel Smith <njs@pobox.com> wrote:
Mathematica: instead of having an associativity, a @ b @ c gets converted into mdot([a, b, c])
So, I've been thinking about this (thanks to @rfateman for pointing it out), and wondering if Mathematica's approach is worth following up more. (It would need to make it past python-dev, of course, but worst case is just that they say no and we're back where we are now, so we might as well think it through.)
I predict with near-certainty that this will be rejected,
I guess that's what everyone thought about @ too? ;-)
but that doesn't prevent it from derailing the discussion. This proposal is unlike anything else in Python. Chained comparisons are *not* similar to this proposal. The chaining only happens at the syntax level, not the semantics. `a < b < c` gets compiled down to `a.__lt__(b) and b.__lt__(c)`, not `do_comparison([a, b, c], [lt, lt])`.
Yes, the syntax is the same as chained comparisons, and the dispatch is a generalization of regular operators. It is unusual; OTOH, @ is unusual in that no other operators in Python have the property that evaluating in the wrong order can cost you seconds of time and gigabytes of memory. Perhaps.
We have approval for a binary @ operator. Take the win.
We have approval, and we have a request: that we figure out how @ should work in detail to be most useful to us. Maybe that's this proposal; maybe not. Ultimately rejected-or-not-rejected comes down to how strong the arguments for something are. And while we can make some guesses about that, it's impossible to know how strong an argument will be until one sits down and works it out. So I still would like to hear what people think, even if it just ends in the conclusion that it's a terrible idea ;-).
What happens if you have 5 @ in a row? My head hurts if I had to think about what would actually be going on. and don't forget, the sparse matrix is stuck in the middle. But I would be happy to have a optimizing multi_dot or chain_dot function when it feels safe enough.
As for arguments against the "grouping" semantics, I did think of one another case where @ is not associative, though it's pretty weird:
In [9]: a = np.arange(16, dtype=np.int8).reshape((4, 4))
In [10]: np.dot(a, np.dot(a, a.astype(float))) Out[10]: array([[ 1680., 1940., 2200., 2460.], [ 4880., 5620., 6360., 7100.], [ 8080., 9300., 10520., 11740.], [ 11280., 12980., 14680., 16380.]])
In [12]: np.dot(np.dot(a, a), a.astype(float)) Out[12]: array([[ 1680., 1940., 2200., 2460.], [-1264., -1548., -1832., -2116.], [ 1936., 2132., 2328., 2524.], [-1008., -1100., -1192., -1284.]])
(What's happening is that we have int8 @ int8 @ float, so (int8 @ int8) @ float has overflows in the first computation, but int8 @ (int8 @ float) does all the computations in float, with no overflows.)
That's similar to my example before that mixes in some scalar *. I thought of it as an argument for same-left Josef
-- Nathaniel J. Smith Postdoctoral researcher - Informatics - University of Edinburgh http://vorpus.org _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
On 03/19/2014 08:45 PM, josef.pktd@gmail.com wrote:
On Wed, Mar 19, 2014 at 2:24 PM, Nathaniel Smith <njs@pobox.com <mailto:njs@pobox.com>> wrote:
On Tue, Mar 18, 2014 at 9:14 AM, Robert Kern <robert.kern@gmail.com <mailto:robert.kern@gmail.com>> wrote: > On Tue, Mar 18, 2014 at 12:54 AM, Nathaniel Smith <njs@pobox.com <mailto:njs@pobox.com>> wrote: >> On Sat, Mar 15, 2014 at 6:28 PM, Nathaniel Smith <njs@pobox.com <mailto:njs@pobox.com>> wrote: >>> Mathematica: instead of having an associativity, a @ b @ c gets >>> converted into mdot([a, b, c]) >> >> So, I've been thinking about this (thanks to @rfateman for pointing it >> out), and wondering if Mathematica's approach is worth following up >> more. (It would need to make it past python-dev, of course, but worst >> case is just that they say no and we're back where we are now, so we >> might as well think it through.) > > I predict with near-certainty that this will be rejected,
I guess that's what everyone thought about @ too? ;-)
> but that > doesn't prevent it from derailing the discussion. This proposal is > unlike anything else in Python. Chained comparisons are *not* similar > to this proposal. The chaining only happens at the syntax level, not > the semantics. `a < b < c` gets compiled down to `a.__lt__(b) and > b.__lt__(c)`, not `do_comparison([a, b, c], [lt, lt])`.
Yes, the syntax is the same as chained comparisons, and the dispatch is a generalization of regular operators. It is unusual; OTOH, @ is unusual in that no other operators in Python have the property that evaluating in the wrong order can cost you seconds of time and gigabytes of memory. Perhaps.
> We have approval for a binary @ operator. Take the win.
We have approval, and we have a request: that we figure out how @ should work in detail to be most useful to us. Maybe that's this proposal; maybe not. Ultimately rejected-or-not-rejected comes down to how strong the arguments for something are. And while we can make some guesses about that, it's impossible to know how strong an argument will be until one sits down and works it out. So I still would like to hear what people think, even if it just ends in the conclusion that it's a terrible idea ;-).
What happens if you have 5 @ in a row?
My head hurts if I had to think about what would actually be going on. and don't forget, the sparse matrix is stuck in the middle.
Order-of-matrix-multiplication is literally my textbook example of a dynamic programming problem with complexity O(n^2) where n is number of terms (as in, it's how dynamic programming is introduced in my textbook). I don't think adding sparse or diagonal matrices changes this as long as you only deal with chained @ and make some simple assumptions of the cost of a FLOP in sparse @ dense, sparse @ sparse, dense @ dense, and so on. Where you need anything more than very simple dynamic programming algorithms is when you add + into the mix ("whether to use the distributive rule or not" and so on). I'm positive to the chained @ idea, I think it's the answer to "what we really want". Dag Sverre
On 03/20/2014 02:26 PM, Dag Sverre Seljebotn wrote:
On 03/19/2014 08:45 PM, josef.pktd@gmail.com wrote:
On Wed, Mar 19, 2014 at 2:24 PM, Nathaniel Smith <njs@pobox.com <mailto:njs@pobox.com>> wrote:
On Tue, Mar 18, 2014 at 9:14 AM, Robert Kern <robert.kern@gmail.com <mailto:robert.kern@gmail.com>> wrote: > On Tue, Mar 18, 2014 at 12:54 AM, Nathaniel Smith <njs@pobox.com <mailto:njs@pobox.com>> wrote: >> On Sat, Mar 15, 2014 at 6:28 PM, Nathaniel Smith <njs@pobox.com <mailto:njs@pobox.com>> wrote: >>> Mathematica: instead of having an associativity, a @ b @ c gets >>> converted into mdot([a, b, c]) >> >> So, I've been thinking about this (thanks to @rfateman for pointing it >> out), and wondering if Mathematica's approach is worth following up >> more. (It would need to make it past python-dev, of course, but worst >> case is just that they say no and we're back where we are now, so we >> might as well think it through.) > > I predict with near-certainty that this will be rejected,
I guess that's what everyone thought about @ too? ;-)
> but that > doesn't prevent it from derailing the discussion. This proposal is > unlike anything else in Python. Chained comparisons are *not* similar > to this proposal. The chaining only happens at the syntax level, not > the semantics. `a < b < c` gets compiled down to `a.__lt__(b) and > b.__lt__(c)`, not `do_comparison([a, b, c], [lt, lt])`.
Yes, the syntax is the same as chained comparisons, and the dispatch is a generalization of regular operators. It is unusual; OTOH, @ is unusual in that no other operators in Python have the property that evaluating in the wrong order can cost you seconds of time and gigabytes of memory. Perhaps.
> We have approval for a binary @ operator. Take the win.
We have approval, and we have a request: that we figure out how @ should work in detail to be most useful to us. Maybe that's this proposal; maybe not. Ultimately rejected-or-not-rejected comes down to how strong the arguments for something are. And while we can make some guesses about that, it's impossible to know how strong an argument will be until one sits down and works it out. So I still would like to hear what people think, even if it just ends in the conclusion that it's a terrible idea ;-).
What happens if you have 5 @ in a row?
My head hurts if I had to think about what would actually be going on. and don't forget, the sparse matrix is stuck in the middle.
Order-of-matrix-multiplication is literally my textbook example of a dynamic programming problem with complexity O(n^2) where n is number of terms (as in, it's how dynamic programming is introduced in my textbook).
I don't think adding sparse or diagonal matrices changes this as long as you only deal with chained @ and make some simple assumptions of the cost of a FLOP in sparse @ dense, sparse @ sparse, dense @ dense, and so on.
Where you need anything more than very simple dynamic programming algorithms is when you add + into the mix ("whether to use the distributive rule or not" and so on).
I'm positive to the chained @ idea, I think it's the answer to "what we really want".
Sorry, I totally misunderstood this. The question is of course how you dispatch technically (where the __matmul__ function lives and which one to use), not figuring out what you want done. I think you'd need to keep this very simple; for instance, just require the leftmost matrix to implement __matmul__ that takes a list, ditch __rmatmul__, and then solve the rest on the library level. In our case, everyone would delegate __matmul__ to something in NumPy that then supports hooks and solves this on the library level. That would work as I say above + hooks to plug in cost estimators and compute functions for various matrix products. (I've thought too much about these things as I wasted at least a month of my PhD on the now abandoned "oomatrix" project to find the optimal way of computing a linear algebra expressions.) Dag Sverre
On Thu, Mar 20, 2014 at 1:36 PM, Dag Sverre Seljebotn <d.s.seljebotn@astro.uio.no> wrote:
On 03/20/2014 02:26 PM, Dag Sverre Seljebotn wrote:
I'm positive to the chained @ idea, I think it's the answer to "what we really want".
Sorry, I totally misunderstood this. The question is of course how you dispatch technically (where the __matmul__ function lives and which one to use), not figuring out what you want done.
I think you'd need to keep this very simple; for instance, just require the leftmost matrix to implement __matmul__ that takes a list, ditch __rmatmul__, and then solve the rest on the library level.
In our case, everyone would delegate __matmul__ to something in NumPy that then supports hooks and solves this on the library level. That would work as I say above + hooks to plug in cost estimators and compute functions for various matrix products.
To me, that signals that it's time to drop the operator for a library of functions, just like I prefer solve() and company to a matrix division operator. -- Robert Kern
On Thu, Mar 20, 2014 at 1:36 PM, Dag Sverre Seljebotn <d.s.seljebotn@astro.uio.no> wrote:
On 03/20/2014 02:26 PM, Dag Sverre Seljebotn wrote:
Order-of-matrix-multiplication is literally my textbook example of a dynamic programming problem with complexity O(n^2) where n is number of terms (as in, it's how dynamic programming is introduced in my textbook).
I don't think adding sparse or diagonal matrices changes this as long as you only deal with chained @ and make some simple assumptions of the cost of a FLOP in sparse @ dense, sparse @ sparse, dense @ dense, and so on.
Where you need anything more than very simple dynamic programming algorithms is when you add + into the mix ("whether to use the distributive rule or not" and so on).
I'm positive to the chained @ idea, I think it's the answer to "what we really want".
Sorry, I totally misunderstood this. The question is of course how you dispatch technically (where the __matmul__ function lives and which one to use), not figuring out what you want done.
Or even more specifically, the question is whether getting the chance to use dynamic programming on chains of @'s (and only @'s!) is so valuable that we want to have a special parsing+dispatch rule to allow it. I have to say that after glancing at a few hundred 'dot' calls, I'm not as convinced that this is useful in practice. There are lots of complex expressions out there involving 'dot', and relatively few of them involve long chains of 'dot' calls [1][2]. There are strategies for doing whole-expression optimization that work for more general expressions, not just @ -- e.g. numexpr, numba, theano -- at the cost of a bit more intrusiveness. And as numpy gets better at supporting non-ndarray types, then it'll be easier to seamlessly support low-impact deferred computation APIs like: a, b, c = defer(a, b, c) d = np.sin(a) + a @ b @ c e = d / (a + b + c + d) return force(e) Having a special dispatch for @ would only help with one of the computations here. -n [1] http://mail.scipy.org/pipermail/numpy-discussion/2014-March/069565.html [2] http://mail.scipy.org/pipermail/numpy-discussion/2014-March/069578.html -- Nathaniel J. Smith Postdoctoral researcher - Informatics - University of Edinburgh http://vorpus.org
On Fri, Mar 14, 2014 at 11:41 PM, Nathaniel Smith <njs@pobox.com> wrote:
Hi all,
Here's the main blocker for adding a matrix multiply operator '@' to Python: we need to decide what we think its precedence and associativity should be. I'll explain what that means so we're on the same page, and what the choices are, and then we can all argue about it. But even better would be if we could get some data to guide our decision, and this would be a lot easier if some of you all can help; I'll suggest some ways you might be able to do that.
So! Precedence and left- versus right-associativity. If you already know what these are you can skim down until you see CAPITAL LETTERS.
We all know what precedence is. Code like this: a + b * c gets evaluated as: a + (b * c) because * has higher precedence than +. It "binds more tightly", as they say. Python's complete precedence able is here: http://docs.python.org/3/reference/expressions.html#operator-precedence
Associativity, in the parsing sense, is less well known, though it's just as important. It's about deciding how to evaluate code like this: a * b * c Do we use a * (b * c) # * is "right associative" or (a * b) * c # * is "left associative" ? Here all the operators have the same precedence (because, uh... they're the same operator), so precedence doesn't help. And mostly we can ignore this in day-to-day life, because both versions give the same answer, so who cares. But a programming language has to pick one (consider what happens if one of those objects has a non-default __mul__ implementation). And of course it matters a lot for non-associative operations like a - b - c or a / b / c So when figuring out order of evaluations, what you do first is check the precedence, and then if you have multiple operators next to each other with the same precedence, you check their associativity. Notice that this means that if you have different operators that share the same precedence level (like + and -, or * and /), then they have to all have the same associativity. All else being equal, it's generally considered nice to have fewer precedence levels, because these have to be memorized by users.
Right now in Python, every precedence level is left-associative, except for '**'. If you write these formulas without any parentheses, then what the interpreter will actually execute is: (a * b) * c (a - b) - c (a / b) / c but a ** (b ** c)
Okay, that's the background. Here's the question. We need to decide on precedence and associativity for '@'. In particular, there are three different options that are interesting:
OPTION 1 FOR @: Precedence: same as * Associativity: left My shorthand name for it: "same-left" (yes, very creative)
This means that if you don't use parentheses, you get: a @ b @ c -> (a @ b) @ c a * b @ c -> (a * b) @ c a @ b * c -> (a @ b) * c
OPTION 2 FOR @: Precedence: more-weakly-binding than * Associativity: right My shorthand name for it: "weak-right"
This means that if you don't use parentheses, you get: a @ b @ c -> a @ (b @ c) a * b @ c -> (a * b) @ c a @ b * c -> a @ (b * c)
OPTION 3 FOR @: Precedence: more-tightly-binding than * Associativity: right My shorthand name for it: "tight-right"
This means that if you don't use parentheses, you get: a @ b @ c -> a @ (b @ c) a * b @ c -> a * (b @ c) a @ b * c -> (a @ b) * c
We need to pick which of which options we think is best, based on whatever reasons we can think of, ideally more than "hmm, weak-right gives me warm fuzzy feelings" ;-). (In principle the other 2 possible options are tight-left and weak-left, but there doesn't seem to be any argument in favor of either, so we'll leave them out of the discussion.)
Some things to consider:
* and @ are actually not associative (in the math sense) with respect to each other, i.e., (a * b) @ c and a * (b @ c) in general give different results when 'a' is not a scalar. So considering the two expressions 'a * b @ c' and 'a @ b * c', we can see that each of these three options gives produces different results in some cases.
"Same-left" is the easiest to explain and remember, because it's just, "@ acts like * and /". So we already have to know the rule in order to understand other non-associative expressions like a / b / c or a - b - c, and it'd be nice if the same rule applied to things like a * b @ c so we only had to memorize *one* rule. (Of course there's ** which uses the opposite rule, but I guess everyone internalized that one in secondary school; that's not true for * versus @.) This is definitely the default we should choose unless we have a good reason to do otherwise.
BUT: there might indeed be a good reason to do otherwise, which is the whole reason this has come up. Consider: Mat1 @ Mat2 @ vec Obviously this will execute much more quickly if we do Mat1 @ (Mat2 @ vec) because that results in two cheap matrix-vector multiplies, while (Mat1 @ Mat2) @ vec starts out by doing an expensive matrix-matrix multiply. So: maybe @ should be right associative, so that we get the fast behaviour without having to use explicit parentheses! /If/ these kinds of expressions are common enough that having to remember to put explicit parentheses in all the time is more of a programmer burden than having to memorize a special associativity rule for @. Obviously Mat @ Mat @ vec is more common than vec @ Mat @ Mat, but maybe they're both so rare that it doesn't matter in practice -- I don't know.
Also, if we do want @ to be right associative, then I can't think of any clever reasons to prefer weak-right over tight-right, or vice-versa. For the scalar multiplication case, I believe both options produce the same result in the same amount of time. For the non-scalar case, they give different answers. Do people have strong intuitions about what expressions like a * b @ c a @ b * c should do actually? (I'm guessing not, but hey, you never know.)
And, while intuition is useful, it would be really *really* nice to be basing these decisions on more than *just* intuition, since whatever we decide will be subtly influencing the experience of writing linear algebra code in Python for the rest of time. So here's where I could use some help. First, of course, if you have any other reasons why one or the other of these options is better, then please share! But second, I think we need to know something about how often the Mat @ Mat @ vec type cases arise in practice. How often do non-scalar * and np.dot show up in the same expression? How often does it look like a * np.dot(b, c), and how often does it look like np.dot(a * b, c)? How often do we see expressions like np.dot(np.dot(a, b), c), and how often do we see expressions like np.dot(a, np.dot(b, c))? This would really help guide the debate. I don't have this data, and I'm not sure the best way to get it. A super-fancy approach would be to write a little script that uses the 'ast' module to count things automatically. A less fancy approach would be to just pick some code you've written, or a well-known package, grep through for calls to 'dot', and make notes on what you see. (An advantage of the less-fancy approach is that as a human you might be able to tell the difference between scalar and non-scalar *, or check whether it actually matters what order the 'dot' calls are done in.)
-n
-- Nathaniel J. Smith Postdoctoral researcher - Informatics - University of Edinburgh http://vorpus.org
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
I'm in favor of same-left because it's the easiest to remember. with scalar factors it is how I read formulas. Both calculating dot @ first or calculating elementwise * first sound logical, but I wouldn't know which should go first. (My "feeling" would be @ first.) two cases I remembered in statsmodels H = np.dot(results.model.pinv_wexog, scale[:,None] * results.model.pinv_wexog.T) se = (exog * np.dot(covb, exog.T).T).sum(1) we are mixing * and dot pretty freely in all combinations AFAIR my guess is that I wouldn't trust any sequence without parenthesis for a long time. (and I don't trust a sequence of dots @ without parenthesis either, in our applications.) x @ (W.T @ W) @ x ( W.shape = (10000, 5) ) or x * (W.T @ W) * x (w * x) @ x weighted sum of squares Josef
On Sat, Mar 15, 2014 at 7:20 PM, <josef.pktd@gmail.com> wrote:
On Fri, Mar 14, 2014 at 11:41 PM, Nathaniel Smith <njs@pobox.com> wrote:
Hi all,
Here's the main blocker for adding a matrix multiply operator '@' to Python: we need to decide what we think its precedence and associativity should be. I'll explain what that means so we're on the same page, and what the choices are, and then we can all argue about it. But even better would be if we could get some data to guide our decision, and this would be a lot easier if some of you all can help; I'll suggest some ways you might be able to do that.
So! Precedence and left- versus right-associativity. If you already know what these are you can skim down until you see CAPITAL LETTERS.
We all know what precedence is. Code like this: a + b * c gets evaluated as: a + (b * c) because * has higher precedence than +. It "binds more tightly", as they say. Python's complete precedence able is here: http://docs.python.org/3/reference/expressions.html#operator-precedence
Associativity, in the parsing sense, is less well known, though it's just as important. It's about deciding how to evaluate code like this: a * b * c Do we use a * (b * c) # * is "right associative" or (a * b) * c # * is "left associative" ? Here all the operators have the same precedence (because, uh... they're the same operator), so precedence doesn't help. And mostly we can ignore this in day-to-day life, because both versions give the same answer, so who cares. But a programming language has to pick one (consider what happens if one of those objects has a non-default __mul__ implementation). And of course it matters a lot for non-associative operations like a - b - c or a / b / c So when figuring out order of evaluations, what you do first is check the precedence, and then if you have multiple operators next to each other with the same precedence, you check their associativity. Notice that this means that if you have different operators that share the same precedence level (like + and -, or * and /), then they have to all have the same associativity. All else being equal, it's generally considered nice to have fewer precedence levels, because these have to be memorized by users.
Right now in Python, every precedence level is left-associative, except for '**'. If you write these formulas without any parentheses, then what the interpreter will actually execute is: (a * b) * c (a - b) - c (a / b) / c but a ** (b ** c)
Okay, that's the background. Here's the question. We need to decide on precedence and associativity for '@'. In particular, there are three different options that are interesting:
OPTION 1 FOR @: Precedence: same as * Associativity: left My shorthand name for it: "same-left" (yes, very creative)
This means that if you don't use parentheses, you get: a @ b @ c -> (a @ b) @ c a * b @ c -> (a * b) @ c a @ b * c -> (a @ b) * c
OPTION 2 FOR @: Precedence: more-weakly-binding than * Associativity: right My shorthand name for it: "weak-right"
This means that if you don't use parentheses, you get: a @ b @ c -> a @ (b @ c) a * b @ c -> (a * b) @ c a @ b * c -> a @ (b * c)
OPTION 3 FOR @: Precedence: more-tightly-binding than * Associativity: right My shorthand name for it: "tight-right"
This means that if you don't use parentheses, you get: a @ b @ c -> a @ (b @ c) a * b @ c -> a * (b @ c) a @ b * c -> (a @ b) * c
We need to pick which of which options we think is best, based on whatever reasons we can think of, ideally more than "hmm, weak-right gives me warm fuzzy feelings" ;-). (In principle the other 2 possible options are tight-left and weak-left, but there doesn't seem to be any argument in favor of either, so we'll leave them out of the discussion.)
Some things to consider:
* and @ are actually not associative (in the math sense) with respect to each other, i.e., (a * b) @ c and a * (b @ c) in general give different results when 'a' is not a scalar. So considering the two expressions 'a * b @ c' and 'a @ b * c', we can see that each of these three options gives produces different results in some cases.
"Same-left" is the easiest to explain and remember, because it's just, "@ acts like * and /". So we already have to know the rule in order to understand other non-associative expressions like a / b / c or a - b - c, and it'd be nice if the same rule applied to things like a * b @ c so we only had to memorize *one* rule. (Of course there's ** which uses the opposite rule, but I guess everyone internalized that one in secondary school; that's not true for * versus @.) This is definitely the default we should choose unless we have a good reason to do otherwise.
BUT: there might indeed be a good reason to do otherwise, which is the whole reason this has come up. Consider: Mat1 @ Mat2 @ vec Obviously this will execute much more quickly if we do Mat1 @ (Mat2 @ vec) because that results in two cheap matrix-vector multiplies, while (Mat1 @ Mat2) @ vec starts out by doing an expensive matrix-matrix multiply. So: maybe @ should be right associative, so that we get the fast behaviour without having to use explicit parentheses! /If/ these kinds of expressions are common enough that having to remember to put explicit parentheses in all the time is more of a programmer burden than having to memorize a special associativity rule for @. Obviously Mat @ Mat @ vec is more common than vec @ Mat @ Mat, but maybe they're both so rare that it doesn't matter in practice -- I don't know.
Also, if we do want @ to be right associative, then I can't think of any clever reasons to prefer weak-right over tight-right, or vice-versa. For the scalar multiplication case, I believe both options produce the same result in the same amount of time. For the non-scalar case, they give different answers. Do people have strong intuitions about what expressions like a * b @ c a @ b * c should do actually? (I'm guessing not, but hey, you never know.)
And, while intuition is useful, it would be really *really* nice to be basing these decisions on more than *just* intuition, since whatever we decide will be subtly influencing the experience of writing linear algebra code in Python for the rest of time. So here's where I could use some help. First, of course, if you have any other reasons why one or the other of these options is better, then please share! But second, I think we need to know something about how often the Mat @ Mat @ vec type cases arise in practice. How often do non-scalar * and np.dot show up in the same expression? How often does it look like a * np.dot(b, c), and how often does it look like np.dot(a * b, c)? How often do we see expressions like np.dot(np.dot(a, b), c), and how often do we see expressions like np.dot(a, np.dot(b, c))? This would really help guide the debate. I don't have this data, and I'm not sure the best way to get it. A super-fancy approach would be to write a little script that uses the 'ast' module to count things automatically. A less fancy approach would be to just pick some code you've written, or a well-known package, grep through for calls to 'dot', and make notes on what you see. (An advantage of the less-fancy approach is that as a human you might be able to tell the difference between scalar and non-scalar *, or check whether it actually matters what order the 'dot' calls are done in.)
-n
-- Nathaniel J. Smith Postdoctoral researcher - Informatics - University of Edinburgh http://vorpus.org
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
I'm in favor of same-left because it's the easiest to remember. with scalar factors it is how I read formulas.
Note that if there are no (interior) vectors involved then the two methods of association give theoretically identical results. But when there is a vector on the right and no vector on the left, then right association is more efficient and likely more numerically accurate.
Both calculating dot @ first or calculating elementwise * first sound logical, but I wouldn't know which should go first. (My "feeling" would be @ first.)
two cases I remembered in statsmodels H = np.dot(results.model.pinv_wexog, scale[:,None] * results.model.pinv_wexog.T) se = (exog * np.dot(covb, exog.T).T).sum(1)
we are mixing * and dot pretty freely in all combinations AFAIR
my guess is that I wouldn't trust any sequence without parenthesis for a long time. (and I don't trust a sequence of dots @ without parenthesis either, in our applications.)
x @ (W.T @ W) @ x ( W.shape = (10000, 5) ) or x * (W.T @ W) * x
Judicious use of parenthesis is definitely recommended no matter what is decided.
(w * x) @ x weighted sum of squares
Chuck
On Sat, Mar 15, 2014 at 11:30 PM, Charles R Harris < charlesr.harris@gmail.com> wrote:
On Sat, Mar 15, 2014 at 7:20 PM, <josef.pktd@gmail.com> wrote:
On Fri, Mar 14, 2014 at 11:41 PM, Nathaniel Smith <njs@pobox.com> wrote:
Hi all,
Here's the main blocker for adding a matrix multiply operator '@' to Python: we need to decide what we think its precedence and associativity should be. I'll explain what that means so we're on the same page, and what the choices are, and then we can all argue about it. But even better would be if we could get some data to guide our decision, and this would be a lot easier if some of you all can help; I'll suggest some ways you might be able to do that.
So! Precedence and left- versus right-associativity. If you already know what these are you can skim down until you see CAPITAL LETTERS.
We all know what precedence is. Code like this: a + b * c gets evaluated as: a + (b * c) because * has higher precedence than +. It "binds more tightly", as they say. Python's complete precedence able is here:
http://docs.python.org/3/reference/expressions.html#operator-precedence
Associativity, in the parsing sense, is less well known, though it's just as important. It's about deciding how to evaluate code like this: a * b * c Do we use a * (b * c) # * is "right associative" or (a * b) * c # * is "left associative" ? Here all the operators have the same precedence (because, uh... they're the same operator), so precedence doesn't help. And mostly we can ignore this in day-to-day life, because both versions give the same answer, so who cares. But a programming language has to pick one (consider what happens if one of those objects has a non-default __mul__ implementation). And of course it matters a lot for non-associative operations like a - b - c or a / b / c So when figuring out order of evaluations, what you do first is check the precedence, and then if you have multiple operators next to each other with the same precedence, you check their associativity. Notice that this means that if you have different operators that share the same precedence level (like + and -, or * and /), then they have to all have the same associativity. All else being equal, it's generally considered nice to have fewer precedence levels, because these have to be memorized by users.
Right now in Python, every precedence level is left-associative, except for '**'. If you write these formulas without any parentheses, then what the interpreter will actually execute is: (a * b) * c (a - b) - c (a / b) / c but a ** (b ** c)
Okay, that's the background. Here's the question. We need to decide on precedence and associativity for '@'. In particular, there are three different options that are interesting:
OPTION 1 FOR @: Precedence: same as * Associativity: left My shorthand name for it: "same-left" (yes, very creative)
This means that if you don't use parentheses, you get: a @ b @ c -> (a @ b) @ c a * b @ c -> (a * b) @ c a @ b * c -> (a @ b) * c
OPTION 2 FOR @: Precedence: more-weakly-binding than * Associativity: right My shorthand name for it: "weak-right"
This means that if you don't use parentheses, you get: a @ b @ c -> a @ (b @ c) a * b @ c -> (a * b) @ c a @ b * c -> a @ (b * c)
OPTION 3 FOR @: Precedence: more-tightly-binding than * Associativity: right My shorthand name for it: "tight-right"
This means that if you don't use parentheses, you get: a @ b @ c -> a @ (b @ c) a * b @ c -> a * (b @ c) a @ b * c -> (a @ b) * c
We need to pick which of which options we think is best, based on whatever reasons we can think of, ideally more than "hmm, weak-right gives me warm fuzzy feelings" ;-). (In principle the other 2 possible options are tight-left and weak-left, but there doesn't seem to be any argument in favor of either, so we'll leave them out of the discussion.)
Some things to consider:
* and @ are actually not associative (in the math sense) with respect to each other, i.e., (a * b) @ c and a * (b @ c) in general give different results when 'a' is not a scalar. So considering the two expressions 'a * b @ c' and 'a @ b * c', we can see that each of these three options gives produces different results in some cases.
"Same-left" is the easiest to explain and remember, because it's just, "@ acts like * and /". So we already have to know the rule in order to understand other non-associative expressions like a / b / c or a - b - c, and it'd be nice if the same rule applied to things like a * b @ c so we only had to memorize *one* rule. (Of course there's ** which uses the opposite rule, but I guess everyone internalized that one in secondary school; that's not true for * versus @.) This is definitely the default we should choose unless we have a good reason to do otherwise.
BUT: there might indeed be a good reason to do otherwise, which is the whole reason this has come up. Consider: Mat1 @ Mat2 @ vec Obviously this will execute much more quickly if we do Mat1 @ (Mat2 @ vec) because that results in two cheap matrix-vector multiplies, while (Mat1 @ Mat2) @ vec starts out by doing an expensive matrix-matrix multiply. So: maybe @ should be right associative, so that we get the fast behaviour without having to use explicit parentheses! /If/ these kinds of expressions are common enough that having to remember to put explicit parentheses in all the time is more of a programmer burden than having to memorize a special associativity rule for @. Obviously Mat @ Mat @ vec is more common than vec @ Mat @ Mat, but maybe they're both so rare that it doesn't matter in practice -- I don't know.
Also, if we do want @ to be right associative, then I can't think of any clever reasons to prefer weak-right over tight-right, or vice-versa. For the scalar multiplication case, I believe both options produce the same result in the same amount of time. For the non-scalar case, they give different answers. Do people have strong intuitions about what expressions like a * b @ c a @ b * c should do actually? (I'm guessing not, but hey, you never know.)
And, while intuition is useful, it would be really *really* nice to be basing these decisions on more than *just* intuition, since whatever we decide will be subtly influencing the experience of writing linear algebra code in Python for the rest of time. So here's where I could use some help. First, of course, if you have any other reasons why one or the other of these options is better, then please share! But second, I think we need to know something about how often the Mat @ Mat @ vec type cases arise in practice. How often do non-scalar * and np.dot show up in the same expression? How often does it look like a * np.dot(b, c), and how often does it look like np.dot(a * b, c)? How often do we see expressions like np.dot(np.dot(a, b), c), and how often do we see expressions like np.dot(a, np.dot(b, c))? This would really help guide the debate. I don't have this data, and I'm not sure the best way to get it. A super-fancy approach would be to write a little script that uses the 'ast' module to count things automatically. A less fancy approach would be to just pick some code you've written, or a well-known package, grep through for calls to 'dot', and make notes on what you see. (An advantage of the less-fancy approach is that as a human you might be able to tell the difference between scalar and non-scalar *, or check whether it actually matters what order the 'dot' calls are done in.)
-n
-- Nathaniel J. Smith Postdoctoral researcher - Informatics - University of Edinburgh http://vorpus.org
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
I'm in favor of same-left because it's the easiest to remember. with scalar factors it is how I read formulas.
Note that if there are no (interior) vectors involved then the two methods of association give theoretically identical results. But when there is a vector on the right and no vector on the left, then right association is more efficient and likely more numerically accurate.
What's so special about a vector on the right? What if I have a vector on the left, or, as is pretty common, a quadratic form? having a different associative rule between a * b * c and A @ B @ C looks confusing to me. there is something special about the last array (in numpy) np.arange(5).dot(np.diag(np.ones(5))).dot(np.arange(10).reshape(5, 2, order="F")) np.arange(5).dot(np.diag(np.ones(5))).dot(np.arange(5)) np.arange(5).dot(np.diag(np.ones(5))).dot(np.arange(5).reshape(5, 1)) chains go left to right Josef
Both calculating dot @ first or calculating elementwise * first sound logical, but I wouldn't know which should go first. (My "feeling" would be @ first.)
two cases I remembered in statsmodels H = np.dot(results.model.pinv_wexog, scale[:,None] * results.model.pinv_wexog.T) se = (exog * np.dot(covb, exog.T).T).sum(1)
we are mixing * and dot pretty freely in all combinations AFAIR
my guess is that I wouldn't trust any sequence without parenthesis for a long time. (and I don't trust a sequence of dots @ without parenthesis either, in our applications.)
x @ (W.T @ W) @ x ( W.shape = (10000, 5) ) or x * (W.T @ W) * x
Judicious use of parenthesis is definitely recommended no matter what is decided.
(w * x) @ x weighted sum of squares
Chuck
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
On Sat, Mar 15, 2014 at 10:53 PM, <josef.pktd@gmail.com> wrote:
On Sat, Mar 15, 2014 at 11:30 PM, Charles R Harris < charlesr.harris@gmail.com> wrote:
On Sat, Mar 15, 2014 at 7:20 PM, <josef.pktd@gmail.com> wrote:
On Fri, Mar 14, 2014 at 11:41 PM, Nathaniel Smith <njs@pobox.com> wrote:
Hi all,
Here's the main blocker for adding a matrix multiply operator '@' to Python: we need to decide what we think its precedence and associativity should be. I'll explain what that means so we're on the same page, and what the choices are, and then we can all argue about it. But even better would be if we could get some data to guide our decision, and this would be a lot easier if some of you all can help; I'll suggest some ways you might be able to do that.
So! Precedence and left- versus right-associativity. If you already know what these are you can skim down until you see CAPITAL LETTERS.
We all know what precedence is. Code like this: a + b * c gets evaluated as: a + (b * c) because * has higher precedence than +. It "binds more tightly", as they say. Python's complete precedence able is here:
http://docs.python.org/3/reference/expressions.html#operator-precedence
Associativity, in the parsing sense, is less well known, though it's just as important. It's about deciding how to evaluate code like this: a * b * c Do we use a * (b * c) # * is "right associative" or (a * b) * c # * is "left associative" ? Here all the operators have the same precedence (because, uh... they're the same operator), so precedence doesn't help. And mostly we can ignore this in day-to-day life, because both versions give the same answer, so who cares. But a programming language has to pick one (consider what happens if one of those objects has a non-default __mul__ implementation). And of course it matters a lot for non-associative operations like a - b - c or a / b / c So when figuring out order of evaluations, what you do first is check the precedence, and then if you have multiple operators next to each other with the same precedence, you check their associativity. Notice that this means that if you have different operators that share the same precedence level (like + and -, or * and /), then they have to all have the same associativity. All else being equal, it's generally considered nice to have fewer precedence levels, because these have to be memorized by users.
Right now in Python, every precedence level is left-associative, except for '**'. If you write these formulas without any parentheses, then what the interpreter will actually execute is: (a * b) * c (a - b) - c (a / b) / c but a ** (b ** c)
Okay, that's the background. Here's the question. We need to decide on precedence and associativity for '@'. In particular, there are three different options that are interesting:
OPTION 1 FOR @: Precedence: same as * Associativity: left My shorthand name for it: "same-left" (yes, very creative)
This means that if you don't use parentheses, you get: a @ b @ c -> (a @ b) @ c a * b @ c -> (a * b) @ c a @ b * c -> (a @ b) * c
OPTION 2 FOR @: Precedence: more-weakly-binding than * Associativity: right My shorthand name for it: "weak-right"
This means that if you don't use parentheses, you get: a @ b @ c -> a @ (b @ c) a * b @ c -> (a * b) @ c a @ b * c -> a @ (b * c)
OPTION 3 FOR @: Precedence: more-tightly-binding than * Associativity: right My shorthand name for it: "tight-right"
This means that if you don't use parentheses, you get: a @ b @ c -> a @ (b @ c) a * b @ c -> a * (b @ c) a @ b * c -> (a @ b) * c
We need to pick which of which options we think is best, based on whatever reasons we can think of, ideally more than "hmm, weak-right gives me warm fuzzy feelings" ;-). (In principle the other 2 possible options are tight-left and weak-left, but there doesn't seem to be any argument in favor of either, so we'll leave them out of the discussion.)
Some things to consider:
* and @ are actually not associative (in the math sense) with respect to each other, i.e., (a * b) @ c and a * (b @ c) in general give different results when 'a' is not a scalar. So considering the two expressions 'a * b @ c' and 'a @ b * c', we can see that each of these three options gives produces different results in some cases.
"Same-left" is the easiest to explain and remember, because it's just, "@ acts like * and /". So we already have to know the rule in order to understand other non-associative expressions like a / b / c or a - b - c, and it'd be nice if the same rule applied to things like a * b @ c so we only had to memorize *one* rule. (Of course there's ** which uses the opposite rule, but I guess everyone internalized that one in secondary school; that's not true for * versus @.) This is definitely the default we should choose unless we have a good reason to do otherwise.
BUT: there might indeed be a good reason to do otherwise, which is the whole reason this has come up. Consider: Mat1 @ Mat2 @ vec Obviously this will execute much more quickly if we do Mat1 @ (Mat2 @ vec) because that results in two cheap matrix-vector multiplies, while (Mat1 @ Mat2) @ vec starts out by doing an expensive matrix-matrix multiply. So: maybe @ should be right associative, so that we get the fast behaviour without having to use explicit parentheses! /If/ these kinds of expressions are common enough that having to remember to put explicit parentheses in all the time is more of a programmer burden than having to memorize a special associativity rule for @. Obviously Mat @ Mat @ vec is more common than vec @ Mat @ Mat, but maybe they're both so rare that it doesn't matter in practice -- I don't know.
Also, if we do want @ to be right associative, then I can't think of any clever reasons to prefer weak-right over tight-right, or vice-versa. For the scalar multiplication case, I believe both options produce the same result in the same amount of time. For the non-scalar case, they give different answers. Do people have strong intuitions about what expressions like a * b @ c a @ b * c should do actually? (I'm guessing not, but hey, you never know.)
And, while intuition is useful, it would be really *really* nice to be basing these decisions on more than *just* intuition, since whatever we decide will be subtly influencing the experience of writing linear algebra code in Python for the rest of time. So here's where I could use some help. First, of course, if you have any other reasons why one or the other of these options is better, then please share! But second, I think we need to know something about how often the Mat @ Mat @ vec type cases arise in practice. How often do non-scalar * and np.dot show up in the same expression? How often does it look like a * np.dot(b, c), and how often does it look like np.dot(a * b, c)? How often do we see expressions like np.dot(np.dot(a, b), c), and how often do we see expressions like np.dot(a, np.dot(b, c))? This would really help guide the debate. I don't have this data, and I'm not sure the best way to get it. A super-fancy approach would be to write a little script that uses the 'ast' module to count things automatically. A less fancy approach would be to just pick some code you've written, or a well-known package, grep through for calls to 'dot', and make notes on what you see. (An advantage of the less-fancy approach is that as a human you might be able to tell the difference between scalar and non-scalar *, or check whether it actually matters what order the 'dot' calls are done in.)
-n
-- Nathaniel J. Smith Postdoctoral researcher - Informatics - University of Edinburgh http://vorpus.org
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
I'm in favor of same-left because it's the easiest to remember. with scalar factors it is how I read formulas.
Note that if there are no (interior) vectors involved then the two methods of association give theoretically identical results. But when there is a vector on the right and no vector on the left, then right association is more efficient and likely more numerically accurate.
What's so special about a vector on the right? What if I have a vector on the left, or, as is pretty common, a quadratic form?
A vector on the right is a fairly common pattern. If one were to use parenthesis as you did in your example to gain efficiency, one would write 'A@(B@(C@v))' as all the multiplications are then matrix -- vector, which is computationally cheaper than matrix -- matrix. When the '@' operator is right associative the parenthesis don't need to be used to get the same result. Of course, for the less common pattern of a single (row) vector on the left, left associativity would be preferred. For vectors on both ends it doesn't matter. So the choice is driven simply by which pattern is the most common. Function composition works the same way: g@f@h(x) = g(f(h(x))). That is, the traditional notation goes right to left. <snip> Chuck
In article <CAPJVwBkLww7-ysZB76LMRZ+mmbyN_5T=ym_VU1pJGakRLBqOkw@mail.gmail.com>, Nathaniel Smith <njs@pobox.com> wrote:
OPTION 1 FOR @: Precedence: same as * Associativity: left My shorthand name for it: "same-left" (yes, very creative)
This means that if you don't use parentheses, you get: a @ b @ c -> (a @ b) @ c a * b @ c -> (a * b) @ c a @ b * c -> (a @ b) * c
OPTION 2 FOR @: Precedence: more-weakly-binding than * Associativity: right My shorthand name for it: "weak-right"
This means that if you don't use parentheses, you get: a @ b @ c -> a @ (b @ c) a * b @ c -> (a * b) @ c a @ b * c -> a @ (b * c)
OPTION 3 FOR @: Precedence: more-tightly-binding than * Associativity: right My shorthand name for it: "tight-right"
This means that if you don't use parentheses, you get: a @ b @ c -> a @ (b @ c) a * b @ c -> a * (b @ c) a @ b * c -> (a @ b) * c
We need to pick which of which options we think is best, based on whatever reasons we can think of, ideally more than "hmm, weak-right gives me warm fuzzy feelings" ;-). (In principle the other 2 possible options are tight-left and weak-left, but there doesn't seem to be any argument in favor of either, so we'll leave them out of the discussion.)
After seeing all the traffic on this thread, I am in favor of "same-left" because it is easiest to remember: - It introduces no new rules. - It is unambiguous. If we pick option 2 or 3 we have no strong reason to favor one over the other, leaving users to guess. To my mind, being able to easily reason about code you are reading is more important that hoping to increase efficiency for one common case when not using parenthesis. It also has the advantage that it needs the least justification. -- Russell
Hello, and what about something like that ? a @ b @ c -> (a @ b) @ c a * b @ c -> (a * b) @ c a @ b * c -> a @ (b * c) Easy to remember. The *-product has priority to @-product, and then we just to @-product from left to right. An advantage of this is that parsers do job from left to right so I realy think that is a better choice than the weak-right. Christophe BAL 2014-03-17 21:37 GMT+01:00 Russell E. Owen <rowen@uw.edu>:
In article <CAPJVwBkLww7-ysZB76LMRZ+mmbyN_5T=ym_VU1pJGakRLBqOkw@mail.gmail.com>, Nathaniel Smith <njs@pobox.com> wrote:
OPTION 1 FOR @: Precedence: same as * Associativity: left My shorthand name for it: "same-left" (yes, very creative)
This means that if you don't use parentheses, you get: a @ b @ c -> (a @ b) @ c a * b @ c -> (a * b) @ c a @ b * c -> (a @ b) * c
OPTION 2 FOR @: Precedence: more-weakly-binding than * Associativity: right My shorthand name for it: "weak-right"
This means that if you don't use parentheses, you get: a @ b @ c -> a @ (b @ c) a * b @ c -> (a * b) @ c a @ b * c -> a @ (b * c)
OPTION 3 FOR @: Precedence: more-tightly-binding than * Associativity: right My shorthand name for it: "tight-right"
This means that if you don't use parentheses, you get: a @ b @ c -> a @ (b @ c) a * b @ c -> a * (b @ c) a @ b * c -> (a @ b) * c
We need to pick which of which options we think is best, based on whatever reasons we can think of, ideally more than "hmm, weak-right gives me warm fuzzy feelings" ;-). (In principle the other 2 possible options are tight-left and weak-left, but there doesn't seem to be any argument in favor of either, so we'll leave them out of the discussion.)
After seeing all the traffic on this thread, I am in favor of "same-left" because it is easiest to remember: - It introduces no new rules. - It is unambiguous. If we pick option 2 or 3 we have no strong reason to favor one over the other, leaving users to guess.
To my mind, being able to easily reason about code you are reading is more important that hoping to increase efficiency for one common case when not using parenthesis.
It also has the advantage that it needs the least justification.
-- Russell
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Sorry for all the misspellings... 2014-03-17 22:32 GMT+01:00 Christophe Bal <projetmbc@gmail.com>:
Hello, and what about something like that ?
a @ b @ c -> (a @ b) @ c a * b @ c -> (a * b) @ c a @ b * c -> a @ (b * c)
Easy to remember. The *-product has priority to @-product, and then we just to @-product from left to right.
An advantage of this is that parsers do job from left to right so I realy think that is a better choice than the weak-right.
Christophe BAL
2014-03-17 21:37 GMT+01:00 Russell E. Owen <rowen@uw.edu>:
In article
<CAPJVwBkLww7-ysZB76LMRZ+mmbyN_5T=ym_VU1pJGakRLBqOkw@mail.gmail.com>, Nathaniel Smith <njs@pobox.com> wrote:
OPTION 1 FOR @: Precedence: same as * Associativity: left My shorthand name for it: "same-left" (yes, very creative)
This means that if you don't use parentheses, you get: a @ b @ c -> (a @ b) @ c a * b @ c -> (a * b) @ c a @ b * c -> (a @ b) * c
OPTION 2 FOR @: Precedence: more-weakly-binding than * Associativity: right My shorthand name for it: "weak-right"
This means that if you don't use parentheses, you get: a @ b @ c -> a @ (b @ c) a * b @ c -> (a * b) @ c a @ b * c -> a @ (b * c)
OPTION 3 FOR @: Precedence: more-tightly-binding than * Associativity: right My shorthand name for it: "tight-right"
This means that if you don't use parentheses, you get: a @ b @ c -> a @ (b @ c) a * b @ c -> a * (b @ c) a @ b * c -> (a @ b) * c
We need to pick which of which options we think is best, based on whatever reasons we can think of, ideally more than "hmm, weak-right gives me warm fuzzy feelings" ;-). (In principle the other 2 possible options are tight-left and weak-left, but there doesn't seem to be any argument in favor of either, so we'll leave them out of the discussion.)
After seeing all the traffic on this thread, I am in favor of "same-left" because it is easiest to remember: - It introduces no new rules. - It is unambiguous. If we pick option 2 or 3 we have no strong reason to favor one over the other, leaving users to guess.
To my mind, being able to easily reason about code you are reading is more important that hoping to increase efficiency for one common case when not using parenthesis.
It also has the advantage that it needs the least justification.
-- Russell
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Here is the translation. ;-) Hello, and what about something like that ? *a @ b @ c -> (a @ b) @ c* *a * b @ c -> (a * b) @ c* *a @ b * c -> a @ (b * c)* Easy to remember: the *-product has priority regarding to the @-product, and we just do @-product from left to right. An advantage of this is that most parsers do analyze from left to right. So I really think that it is a better choice than the weak-right one. Christophe BAL 2014-03-17 22:34 GMT+01:00 Christophe Bal <projetmbc@gmail.com>:
Sorry for all the misspellings...
2014-03-17 22:32 GMT+01:00 Christophe Bal <projetmbc@gmail.com>:
Hello,
and what about something like that ?
a @ b @ c -> (a @ b) @ c a * b @ c -> (a * b) @ c a @ b * c -> a @ (b * c)
Easy to remember. The *-product has priority to @-product, and then we just to @-product from left to right.
An advantage of this is that parsers do job from left to right so I realy think that is a better choice than the weak-right.
Christophe BAL
2014-03-17 21:37 GMT+01:00 Russell E. Owen <rowen@uw.edu>:
In article
<CAPJVwBkLww7-ysZB76LMRZ+mmbyN_5T=ym_VU1pJGakRLBqOkw@mail.gmail.com>, Nathaniel Smith <njs@pobox.com> wrote:
OPTION 1 FOR @: Precedence: same as * Associativity: left My shorthand name for it: "same-left" (yes, very creative)
This means that if you don't use parentheses, you get: a @ b @ c -> (a @ b) @ c a * b @ c -> (a * b) @ c a @ b * c -> (a @ b) * c
OPTION 2 FOR @: Precedence: more-weakly-binding than * Associativity: right My shorthand name for it: "weak-right"
This means that if you don't use parentheses, you get: a @ b @ c -> a @ (b @ c) a * b @ c -> (a * b) @ c a @ b * c -> a @ (b * c)
OPTION 3 FOR @: Precedence: more-tightly-binding than * Associativity: right My shorthand name for it: "tight-right"
This means that if you don't use parentheses, you get: a @ b @ c -> a @ (b @ c) a * b @ c -> a * (b @ c) a @ b * c -> (a @ b) * c
We need to pick which of which options we think is best, based on whatever reasons we can think of, ideally more than "hmm, weak-right gives me warm fuzzy feelings" ;-). (In principle the other 2 possible options are tight-left and weak-left, but there doesn't seem to be any argument in favor of either, so we'll leave them out of the discussion.)
After seeing all the traffic on this thread, I am in favor of "same-left" because it is easiest to remember: - It introduces no new rules. - It is unambiguous. If we pick option 2 or 3 we have no strong reason to favor one over the other, leaving users to guess.
To my mind, being able to easily reason about code you are reading is more important that hoping to increase efficiency for one common case when not using parenthesis.
It also has the advantage that it needs the least justification.
-- Russell
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
On Mon, Mar 17, 2014 at 9:38 PM, Christophe Bal <projetmbc@gmail.com> wrote:
Here is the translation. ;-)
Hello, and what about something like that ?
a @ b @ c -> (a @ b) @ c a * b @ c -> (a * b) @ c a @ b * c -> a @ (b * c)
Easy to remember: the *-product has priority regarding to the @-product, and we just do @-product from left to right.
In the terminology we've been using in this thread, this is "weak-left".
An advantage of this is that most parsers do analyze from left to right.
So I really think that it is a better choice than the weak-right one.
We've mostly ignored this option because of assuming that if we want left-associativity, we should go with "same-left" instead of "weak-left". Same-left is: a @ b @ c -> (a @ b) @ c a * b @ c -> (a * b) @ c a @ b * c -> (a @ b) * c i.e., even more left-to-right than weak-left :-) Do you think weak-left is better than same-left? -- Nathaniel J. Smith Postdoctoral researcher - Informatics - University of Edinburgh http://vorpus.org
I think that weak-left is a little strange, just think a little of the operators used by mathematicians that always follow a hierarchy. A parser is mostly done using grammars : see http://docs.python.org/3.1/reference/grammar.html. Defining *-product to have stronger priority than the @-product, and this last having stronger priority than +, will make the changes in the grammar easier. I'm now convinced of the usefulness of @ and @@ too but I also think that you must think of other uses than only for numpy. In other words, numpy is a the good argument for this new operators, but this can also open new perspectives for other uses.
On Mon, Mar 17, 2014 at 6:33 PM, Christophe Bal <projetmbc@gmail.com> wrote:
Defining *-product to have stronger priority than the @-product, and this last having stronger priority than +, will make the changes in the grammar easier.
The easiest is to give @ the same precedence as *. This will only require changing term: factor (('*'|'/'|'%'|'//') factor)* to term: factor (('*'|'/'|'%'|'//'|'@') factor)* Anything else will require an extra rule, but in any case implementation is trivial. I don't think we need to worry about implementation details at this point.
First of all I'm must be very tired because I've written *"I think that weak-left is a little strange..."* instead of *"I think that same-left is a little strange..."*. It is the night in french... ;-) So I'm definitely for the weak-left ! Here is my answer to Alexander Belopolsky. You are right from a grammar point of view but for a human this looks too weird because * and @ are of different kinds contrary to * and / for example.
I'm now convinced of the usefulness of @ and @@ too but I also think that you must think of other uses than only for numpy. In other words, numpy is a the good argument for this new operators, but this can also open new perspectives for other uses.
Speaking of `@@`, would the relative precedence of @ vs * be the same as @@ vs **?
If you see the operators as following a hierarchy, the answer is simply yes. 2014-03-18 0:16 GMT+01:00 Bago <mrbago@gmail.com>:
I'm now convinced of the usefulness of @ and @@ too but I also think that you must think of other uses than only for numpy. In other words, numpy is a the good argument for this new operators, but this can also open new perspectives for other uses.
Speaking of `@@`, would the relative precedence of @ vs * be the same as @@ vs **?
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
On Mon, Mar 17, 2014 at 11:16 PM, Bago <mrbago@gmail.com> wrote:
Speaking of `@@`, would the relative precedence of @ vs * be the same as @@ vs **?
This is one of the concerns that made Guido leery of @@ (but only one of them). Since we seem to be dropping @@: http://mail.scipy.org/pipermail/numpy-discussion/2014-March/069502.html we don't have to come up with an answer :-). -- Nathaniel J. Smith Postdoctoral researcher - Informatics - University of Edinburgh http://vorpus.org
On Mon, Mar 17, 2014 at 6:33 PM, Christophe Bal <projetmbc@gmail.com> wrote:
I think that weak-left is a little strange, just think a little of the operators used by mathematicians that always follow a hierarchy.
A parser is mostly done using grammars : see http://docs.python.org/3.1/reference/grammar.html.
Defining *-product to have stronger priority than the @-product, and this last having stronger priority than +, will make the changes in the grammar easier.
I'm now convinced of the usefulness of @ and @@ too but I also think that you must think of other uses than only for numpy. In other words, numpy is a the good argument for this new operators, but this can also open new perspectives for other uses.
My main problem with weak-left (* higher) and tight-left (@ higher) compared to same-left is that I don't see any obvious choice between the weak and tight. I don't think I would have problems with readability. Wikipedia doesn't say anything about precedence of Hadamard versus matrix product. matlab, IDL and Gauss (I checked the manual) all use same-left, as Nathaniel pointed out. For scalar * together with dot product which is more common in formulas, we would just read it sequentially, i.e. same-left. I don't remember when I have seen dot-in-a-circle in a paper, but I don't think there was any precedence either. --- I guess the same applies for other (mis)uses of @ from math import sqrt class MyOp(object): def __init__(self, func): self.func = func def __at__(self, x): return [self.func(xi) for xi in x] myop = MyOp(lambda x: sqrt(x)) print myop.__at__(range(3)) # myop @ range(5) print myop.__at__(range(3) * 2) # myop @ (range(5) * 2) print myop.__at__(range(3)) * 3 # myop @ range(5) * 3 ''' [0.0, 1.0, 1.4142135623730951] [0.0, 1.0, 1.4142135623730951, 0.0, 1.0, 1.4142135623730951] [0.0, 1.0, 1.4142135623730951, 0.0, 1.0, 1.4142135623730951, 0.0, 1.0, 1.4142135623730951] ''' ------------- Josef
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
On Mon, Mar 17, 2014 at 10:33 PM, Christophe Bal <projetmbc@gmail.com> wrote:
I think that weak-left is a little strange, just think a little of the operators used by mathematicians that always follow a hierarchy.
Not sure what you mean -- I don't think most mathematicians think that scalar and matrix multiplication are above or below each other in precedence, for example. (Well, it's a strange question because scalar multiplication commutes, but even so, people often forget that these are even different operations.)
A parser is mostly done using grammars : see http://docs.python.org/3.1/reference/grammar.html.
Defining *-product to have stronger priority than the @-product, and this last having stronger priority than +, will make the changes in the grammar easier.
I'm now convinced of the usefulness of @ and @@ too but I also think that you must think of other uses than only for numpy. In other words, numpy is a the good argument for this new operators, but this can also open new perspectives for other uses.
No, that's not how this game is played :-). The way it works is, we figure out the best possible way to handle the use case that we've demonstrated a need for (matrix multiplication), and then once we've done that someone might or might not find some other uses too. If they do then cool, if not then too bad. This follows the principle that it's better to be great at some things than to be mediocre at everything. -- Nathaniel J. Smith Postdoctoral researcher - Informatics - University of Edinburgh http://vorpus.org
This follows the principle that it's better to be great at some things than to be mediocre at everything.
You're right.
I think that weak-left is a little strange, just think a little of the operators used by mathematicians that always follow a hierarchy.
Not sure what you mean -- I don't think most mathematicians think that scalar and matrix multiplication are above or below each other in precedence, for example.
You're right but on the other hand, I've never seen mixed use of matrix and scalar products without parenthesis... Indeed in math, we can use < Au , Bv
for the scalar product of two matrix-vector products.
But here, I think that the situation is different because we are talking about operators from arrays to array : mainly @ , * and + (elementwise for the two last). Whereas in the preceding example, the scalar product is from arrays to scalar. As a math user, I think at this point that the arrays-to-array operators must follows a hierarchy. Who is the guy who have asked such a complicated question about precedence ? :-) 2014-03-18 0:30 GMT+01:00 Nathaniel Smith <njs@pobox.com>:
On Mon, Mar 17, 2014 at 10:33 PM, Christophe Bal <projetmbc@gmail.com> wrote:
I think that weak-left is a little strange, just think a little of the operators used by mathematicians that always follow a hierarchy.
Not sure what you mean -- I don't think most mathematicians think that scalar and matrix multiplication are above or below each other in precedence, for example. (Well, it's a strange question because scalar multiplication commutes, but even so, people often forget that these are even different operations.)
A parser is mostly done using grammars : see http://docs.python.org/3.1/reference/grammar.html.
Defining *-product to have stronger priority than the @-product, and this last having stronger priority than +, will make the changes in the grammar easier.
I'm now convinced of the usefulness of @ and @@ too but I also think that you must think of other uses than only for numpy. In other words, numpy is a the good argument for this new operators, but this can also open new perspectives for other uses.
No, that's not how this game is played :-). The way it works is, we figure out the best possible way to handle the use case that we've demonstrated a need for (matrix multiplication), and then once we've done that someone might or might not find some other uses too. If they do then cool, if not then too bad. This follows the principle that it's better to be great at some things than to be mediocre at everything.
-- Nathaniel J. Smith Postdoctoral researcher - Informatics - University of Edinburgh http://vorpus.org _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
On Tue, Mar 18, 2014 at 12:16 AM, Christophe Bal <projetmbc@gmail.com> wrote:
> I think that weak-left is a little strange, just think > a little of the operators used by mathematicians that > always follow a hierarchy.
Not sure what you mean -- I don't think most mathematicians think that scalar and matrix multiplication are above or below each other in precedence, for example.
You're right but on the other hand, I've never seen mixed use of matrix and scalar products without parenthesis... Indeed in math, we can use < Au , Bv
for the scalar product of two matrix-vector products.
Not scalar product, scalar multiplication -- you're saying (I think) that 3 * Matrix1 * Matrix2 is just like 3 * Matrix1 + Matrix2 in the sense that mathematicians think of the 3 * Matrix1 part is very different from, and higher precedence than, the Matrix1 + Matrix2 part. And similarly that Matrix1 * Matrix2 * 3 is just like Matrix1 + Matrix2 * 3 But in fact I think if you asked most mathematicians which of the "*"'s in Matrix1 * Matrix2 * 3 is higher precedence, they would think this question very odd! -n -- Nathaniel J. Smith Postdoctoral researcher - Informatics - University of Edinburgh http://vorpus.org
On Mon, Mar 17, 2014 at 8:37 PM, Russell E. Owen <rowen@uw.edu> wrote:
After seeing all the traffic on this thread, I am in favor of "same-left" because it is easiest to remember: - It introduces no new rules. - It is unambiguous. If we pick option 2 or 3 we have no strong reason to favor one over the other, leaving users to guess.
To my mind, being able to easily reason about code you are reading is more important that hoping to increase efficiency for one common case when not using parenthesis.
Personally I'm leaning in a similar direction (at least as far as left- versus right-associativity goes; I'm not sure yet what I think about the magic "grouping" thing I just posted :-)). The more I think about it, the weaker I find the avoiding-parentheses argument. If you're going to take the trouble to think about which ordering is best, you should write that down with parentheses no matter what the associativity is, so that when I have to read your code I'll see the parentheses and know that you thought about it! And certainly the slow part of this is not typing the parentheses, it's figuring out what order is best. (The potential advantage of "grouping" isn't that you don't have to write as many parentheses, it's that you don't have to *think* about parentheses.) The fact that Matlab et al get along fine with same-left also strikes me as strong evidence that right-associativity's benefits are at least not overwhelmingly compelling... -- Nathaniel J. Smith Postdoctoral researcher - Informatics - University of Edinburgh http://vorpus.org
On Sat, Mar 15, 2014 at 3:41 AM, Nathaniel Smith <njs@pobox.com> wrote:
I think we need to know something about how often the Mat @ Mat @ vec type cases arise in practice. How often do non-scalar * and np.dot show up in the same expression? How often does it look like a * np.dot(b, c), and how often does it look like np.dot(a * b, c)? How often do we see expressions like np.dot(np.dot(a, b), c), and how often do we see expressions like np.dot(a, np.dot(b, c))? This would really help guide the debate. I don't have this data, and I'm not sure the best way to get it. A super-fancy approach would be to write a little script that uses the 'ast' module to count things automatically. A less fancy approach would be to just pick some code you've written, or a well-known package, grep through for calls to 'dot', and make notes on what you see. (An advantage of the less-fancy approach is that as a human you might be able to tell the difference between scalar and non-scalar *, or check whether it actually matters what order the 'dot' calls are done in.)
Okay, I wrote a little script [1] to scan Python source files look for things like 'dot(a, dot(b, c))' or 'dot(dot(a, b), c)', or the ndarray.dot method equivalents. So what we get out is: - a count of how many 'dot' calls there are - a count of how often we see left-associative nestings: dot(dot(a, b), c) - a count of how often we see right-associative nestings: dot(a, dot(b, c)) Running it on a bunch of projects, I get: | project | dots | left | right | right/left | |--------------+------+------+-------+------------| | scipy | 796 | 53 | 27 | 0.51 | | nipy | 275 | 3 | 19 | 6.33 | | scikit-learn | 472 | 11 | 10 | 0.91 | | statsmodels | 803 | 46 | 38 | 0.83 | | astropy | 17 | 0 | 0 | nan | | scikit-image | 15 | 1 | 0 | 0.00 | |--------------+------+------+-------+------------| | total | 2378 | 114 | 94 | 0.82 | (Any other projects worth trying? This is something that could vary a lot between different projects, so it seems more important to get lots of projects here than to get a few giant projects. Or if anyone wants to run the script on their own private code, please do! Running it on my personal pile of random junk finds 3 left-associative and 1 right.) Two flaws with this approach: 1) Probably some proportion of those nested dot calls are places where it doesn't actually matter which evaluation order one uses -- dot() forces you to pick one, so you have to. If people prefer to, say, use the "left" form in cases where it doesn't matter, then this could bias the left-vs-right results -- hard to say. (Somewhere in this thread it was suggested that the use of the .dot method could create such a bias, because a.dot(b).dot(c) is more natural than a.dot(b.dot(c)), but only something like 6% of the dot calls here use the method form, so this probably doesn't matter.) OTOH, this also means that the total frequency of @ expressions where associativity even matters at all is probably *over*-estimated by the above. 2) This approach misses cases where the cumbersomeness of dot has caused people to introduce temporary variables, like 'foo = np.dot(a, b); bar = np.dot(foo, c)'. So this causes us to *under*-estimate how often associativity matters. I did read through the 'dot' uses in scikit-learn and nipy, though, and only caught a handful of such cases, so I doubt it changes anything much. -n [1] https://gist.github.com/njsmith/9157645#file-grep-dot-dot-py -- Nathaniel J. Smith Postdoctoral researcher - Informatics - University of Edinburgh http://vorpus.org
On Wed, Mar 19, 2014 at 7:45 PM, Nathaniel Smith <njs@pobox.com> wrote:
Okay, I wrote a little script [1] to scan Python source files look for things like 'dot(a, dot(b, c))' or 'dot(dot(a, b), c)', or the ndarray.dot method equivalents. So what we get out is: - a count of how many 'dot' calls there are - a count of how often we see left-associative nestings: dot(dot(a, b), c) - a count of how often we see right-associative nestings: dot(a, dot(b, c))
Running it on a bunch of projects, I get:
| project | dots | left | right | right/left | |--------------+------+------+-------+------------| | scipy | 796 | 53 | 27 | 0.51 | | nipy | 275 | 3 | 19 | 6.33 | | scikit-learn | 472 | 11 | 10 | 0.91 | | statsmodels | 803 | 46 | 38 | 0.83 | | astropy | 17 | 0 | 0 | nan | | scikit-image | 15 | 1 | 0 | 0.00 | |--------------+------+------+-------+------------| | total | 2378 | 114 | 94 | 0.82 |
Another way to visualize this, converting each contiguous "chain" of calls to np.dot into a parenthesized expression, and then counting how often we see each pattern. 1943 (_ @ _) 100 ((_ @ _) @ _) # left 86 (_ @ (_ @ _)) # right 2 (_ @ ((_ @ _) @ _)) 2 (((_ @ _) @ _) @ _) # left 1 ((_ @ (_ @ _)) @ _) 1 ((_ @ _) @ (_ @ _)) 1 (((_ @ _) @ _) @ (_ @ _)) 1 ((_ @ ((_ @ _) @ _)) @ _) 1 ((_ @ _) @ (_ @ (_ @ _))) (This is pooling scipy/nipy/scikit-learn/statsmodels.) I've noted the 3 different patterns that have a consistent associativity.
From this I'm leaning towards the conclusions that:
- Expressions with complex parenthesization do happen, but probably not often enough to justify elaborate stuff like my 'chaining' proposal -- only 8.7% of these cases involve more than one @. - There's very little support here for the intuition that right-associativity is more useful than left-associativity on a day-to-day basis. -- Nathaniel J. Smith Postdoctoral researcher - Informatics - University of Edinburgh http://vorpus.org
On Thu, Mar 20, 2014 at 1:25 PM, Nathaniel Smith <njs@pobox.com> wrote:
On Wed, Mar 19, 2014 at 7:45 PM, Nathaniel Smith <njs@pobox.com> wrote:
Okay, I wrote a little script [1] to scan Python source files look for things like 'dot(a, dot(b, c))' or 'dot(dot(a, b), c)', or the ndarray.dot method equivalents. So what we get out is: - a count of how many 'dot' calls there are - a count of how often we see left-associative nestings: dot(dot(a, b), c) - a count of how often we see right-associative nestings: dot(a, dot(b, c))
Running it on a bunch of projects, I get:
| project | dots | left | right | right/left | |--------------+------+------+-------+------------| | scipy | 796 | 53 | 27 | 0.51 | | nipy | 275 | 3 | 19 | 6.33 | | scikit-learn | 472 | 11 | 10 | 0.91 | | statsmodels | 803 | 46 | 38 | 0.83 | | astropy | 17 | 0 | 0 | nan | | scikit-image | 15 | 1 | 0 | 0.00 | |--------------+------+------+-------+------------| | total | 2378 | 114 | 94 | 0.82 |
Another way to visualize this, converting each contiguous "chain" of calls to np.dot into a parenthesized expression, and then counting how often we see each pattern.
1943 (_ @ _) 100 ((_ @ _) @ _) # left 86 (_ @ (_ @ _)) # right 2 (_ @ ((_ @ _) @ _)) 2 (((_ @ _) @ _) @ _) # left 1 ((_ @ (_ @ _)) @ _) 1 ((_ @ _) @ (_ @ _)) 1 (((_ @ _) @ _) @ (_ @ _)) 1 ((_ @ ((_ @ _) @ _)) @ _) 1 ((_ @ _) @ (_ @ (_ @ _)))
(This is pooling scipy/nipy/scikit-learn/statsmodels.) I've noted the 3 different patterns that have a consistent associativity.
From this I'm leaning towards the conclusions that:
- Expressions with complex parenthesization do happen, but probably not often enough to justify elaborate stuff like my 'chaining' proposal -- only 8.7% of these cases involve more than one @.
just for statsmodels We do have a very large amount of chaining, but in many cases this has been taken out of a single expression into a temporary or permanent variable for parts of the chain. (similar to the quadratic form example in the PEP), either for clarity (a temp variable), or because one dot product shows up several times in the same expression (quadratic forms) or because we need to keep it around for reuse in other expressions. That's what I tried to explain before, that chaining and breaking up larger multi-dot expressions is most of the time a intentional choice and not just random because the the dot function forces us. The most convincing argument for me for @ is that it makes parenthesis visible (until I realized that I didn't really care about @). This reduces the cases where we separate out a dot product for clarity and readibility, but still leaves us with the other two cases, where our chaining won't change whatever numpy provides additionally. Josef
- There's very little support here for the intuition that right-associativity is more useful than left-associativity on a day-to-day basis.
-- Nathaniel J. Smith Postdoctoral researcher - Informatics - University of Edinburgh http://vorpus.org
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
On Mar 15, 2014, at 4:41 AM, Nathaniel Smith wrote:
OPTION 1 FOR @: ... "same-left"
OPTION 2 FOR @: ... "weak-right"
OPTION 3 FOR @: ... "tight-right"
(In addition to more unusual forms, like 'grouping'.) There's another option, which is to "refuse the temptation to guess", and not allow X @ Y @ Z or mixing with any other operators. After all, several have pointed out that it should be in parenthesis anyway, in order to avoid likely confusion. There's even a bit of precedent for something like this in Python:
f(1, 2 for i in range(10)) File "<stdin>", line 1 SyntaxError: Generator expression must be parenthesized if not sole argument
I haven't seen this non-associative option come up in the discussion. To be frank though, I don't think this is a good idea, but Nathaniel wrote "In principle the other 2 possible options are ...", so I wanted to mention this for completion. My preference is for same-left. I rarely work with numpy, and it's more likely that I'll see '@' used in a non-numpy context. That is, people in general will see "@" as a sort of free-for-all operator, to use and abuse as they wish. [1] (For example, Pyparsing has a lot of operator overloads to help make a grammar definition, and they make good sense in that context, but '<<' for recursive definitions is perhaps past the edge.) Someone looking at a "@", without any intuition on precedence or associativity of matrix operations in a mathematical package, will have to figure things out from the documentation or (more likely) experimentation. If and when that happens, then
"Same-left" is the easiest to explain and remember, because it's just, "@ acts like * and /".
Cheers, Andrew dalke@dalkescientific.com I came up with two possible ways people might (ab)use it: 1) since "@" is server-like, then a service resolver: service = XMLRPCServer @ "http://localhost:1234/endpoint" There's no real need for this, since there are other equally good ways to structure this sort of call. But someone creative might come up with a good example of using '@' to mean some sort of routing. Interestingly, that creative person might prefer right-associative, to support the 'natural' ("janet" @ "moria" @ "uunet" @ uucpserver).send(message) rather than the inverted: (uucpserver @ "uunet" @ "moria" @ "janet").send(message) This would likely fall under the definition of "cute and ignorable". 2) "@" in XPath indicates an attribute. An XML tree API might support something like: tree = load_xml_tree(...) for node in tree.select("//item[@price > 2*@discount]"): print node @ price, node @ discount That might even be a reasonable short-hand, compared to, say, etree's node.attrib["price"] XML doesn't allow nodes as attributes, so that provides no guidance as to what node @ price @ 1950 might mean.
On Thu, Mar 20, 2014 at 4:01 AM, Andrew Dalke <dalke@dalkescientific.com> wrote:
My preference is for same-left. I rarely work with numpy, and it's more likely that I'll see '@' used in a non-numpy context. That is, people in general will see "@" as a sort of free-for-all operator, to use and abuse as they wish. [1]
(For example, Pyparsing has a lot of operator overloads to help make a grammar definition, and they make good sense in that context, but '<<' for recursive definitions is perhaps past the edge.)
Someone looking at a "@", without any intuition on precedence or associativity of matrix operations in a mathematical package, will have to figure things out from the documentation or (more likely) experimentation.
I think the one thing this discussion has settled is that there is *no* common intuition about the precedence or associativity of matrix operations in a mathematical package. :-) I think the operator-overload-as-DSL use cases actually argue somewhat for right-associativity. There is no lack of left-associative operators for these use cases to choose from since they usually don't have numeric or bitwise operations defined for them. Right-associativity adds some diversity into the ecosystem and opens up some design space. -- Robert Kern
On Mar 20, 2014, at 10:07 AM, Robert Kern wrote:
I think the operator-overload-as-DSL use cases actually argue somewhat for right-associativity. ... Right-associativity adds some diversity into the ecosystem and opens up some design space.
You say that like it's a good thing. My argument is that anything which adds another line to Python's precedence table is a bad idea. Unless there's a really good reason for it. The two examples were the best I could come up with, and I don't think they are that persuasive. Looking at the table, the only places for it are on the *, /, //, % line or on the ** line. Since ** is right-associative, then the diversity argument combined with the "no new line" argument means @ should be on the same line, and with the same precedence, as ** In DSL space, that means @ could be used as the inverse of ** by those who want to discard any ties to its use in numerics. Considering it now, I agree this would indeed open up some design space. I don't see anything disastrously wrong for that in matrix/vector use, though my intuition on this is very limited. I believe this gives results like the "strong right" option, no? As an observation, if this is done, and if you want operators to look symmetrical at the syntax level, and if a matrix exponentiation operator isn't critical, then perhaps use '@@' for matrix multiplication, and leave '@' for decorators? It would be a small reminder that '@@' has higher precedence than '*', and may help reduce the momentary confusion upon seeing something like @a@b @c def f(): pass Of course, "f(*x, **y)" are perfectly good unary-looking uses of '*' and '**' and don't cause confusion with the binary forms, so this is all that strong of an objection. Cheers, Andrew dalke@dalkescientific.com
On Thu, Mar 20, 2014 at 1:10 PM, Andrew Dalke <dalke@dalkescientific.com> wrote:
On Mar 20, 2014, at 10:07 AM, Robert Kern wrote:
I think the operator-overload-as-DSL use cases actually argue somewhat for right-associativity. ... Right-associativity adds some diversity into the ecosystem and opens up some design space.
You say that like it's a good thing.
Hey, I just want a multiplication operator. You're the one who wants a new DSL toy. ;-)
My argument is that anything which adds another line to Python's precedence table is a bad idea. Unless there's a really good reason for it. The two examples were the best I could come up with, and I don't think they are that persuasive.
Really? I mean, |, ^, and & *each* get their own precedence level. I'm not really sure that there is an aversion to adding precedence levels. In fact, it almost seems like the opposite: everything gets its own level unless if they obviously go together.
Looking at the table, the only places for it are on the *, /, //, % line or on the ** line. Since ** is right-associative, then the diversity argument combined with the "no new line" argument means @ should be on the same line, and with the same precedence, as **
In DSL space, that means @ could be used as the inverse of ** by those who want to discard any ties to its use in numerics. Considering it now, I agree this would indeed open up some design space.
I don't see anything disastrously wrong for that in matrix/vector use, though my intuition on this is very limited. I believe this gives results like the "strong right" option, no?
As an observation, if this is done, and if you want operators to look symmetrical at the syntax level, and if a matrix exponentiation operator isn't critical, then perhaps use '@@' for matrix multiplication, and leave '@' for decorators? It would be a small reminder that '@@' has higher precedence than '*', and may help reduce the momentary confusion upon seeing something like
@a@b @c def f(): pass
The decorator syntax is deliberately limited to prevent this: you can't put an arbitrary expression after the initiating @, just a (potentially dotted) name that may or may not be called maybe with some arguments. One couldn't do this, for example: [~/scratch] |1> s = """ ..> @a+b ..> def f(): ..> pass ..> """ [~/scratch] |4> compile(s, '<string>', 'exec') File "<string>", line 2 @a+b ^ SyntaxError: invalid syntax -- Robert Kern
On Thu, Mar 20, 2014 at 9:10 AM, Andrew Dalke <dalke@dalkescientific.com>wrote:
In DSL space, that means @ could be used as the inverse of ** by those who want to discard any ties to its use in numerics. Considering it now, I agree this would indeed open up some design space.
I don't see anything disastrously wrong for that in matrix/vector use, though my intuition on this is very limited. I believe this gives results like the "strong right" option, no?
It is not uncommon to have v**2 @ u in numerical code for a weighted sum of u with weights from v-squared. Under @ in the same line as **, this will be interpreted as v ** (2 @ u) and most likely be an error.
On Thu, Mar 20, 2014 at 9:07 AM, Robert Kern <robert.kern@gmail.com> wrote:
I think the operator-overload-as-DSL use cases actually argue somewhat for right-associativity. There is no lack of left-associative operators for these use cases to choose from since they usually don't have numeric or bitwise operations defined for them. Right-associativity adds some diversity into the ecosystem and opens up some design space.
Whether or not this is true, I think we should assign this argument ~zero weight for purposes of the present discussion. That's because: - We haven't been asked to figure out the best design of @ for Python overall, we've been asked to report back on what design of @ will be best for the numeric community, since that's where we have special expertise that python-dev lacks. Python-dev is entirely capable of then taking our report as input and then having a debate about how much weight to give to these other possible uses. - And anyway, my impression is that python-dev will give these other possible uses ~zero weight anyway -- if they thought random DSL operators were important for their own sake, they would have added @ long ago :-). Maybe if we say "we literally do not care at all what @'s precedence and associativity are", then it will matter as a tie-breaker, but first I don't think it's true that we don't care, and second even if it were then my guess is that the argument for consistency with the other operators would be a stronger tie-breaker. -- Nathaniel J. Smith Postdoctoral researcher - Informatics - University of Edinburgh http://vorpus.org
On Mar 20, 2014, at 3:02 PM, Nathaniel Smith wrote:
- And anyway, my impression is that python-dev will give these other possible uses ~zero weight anyway -- if they thought random DSL operators were important for their own sake, they would have added @ long ago :-).
Unlike what you all seem to think, I *don't* want '@' as a DSL. As I said, I'm a same-left person. There's very little additional power in same-left for a DSL, over the other 4 available operators. I think that weakness is a good thing. I'm saying that it will be (ab)used in a DSL. Given no strong bias one way or another, I prefer to minimize any temptations or weird effects that a new precedence level might cause. Hence why I prefer that it act either the same as "*" or as "**", and to keep it from being more widely used, I more strongly prefer it the same as "*". You say "we've been asked to report back on what design of @ will be best for the numeric community, since that's where we have special expertise that python-dev lacks." I don't really think that goal means you can avoid considering all non-numeric consequences. Nor do I think I'm making any weighty arguments here. My observation though is that the signal so far is low enough that second-order effects are likely to have some influence in the final decision, either here or with python-dev. On Mar 20, 2014, at 2:31 PM, Robert Kern wrote:
Really? I mean, |, ^, and & *each* get their own precedence level. I'm not really sure that there is an aversion to adding precedence levels. In fact, it almost seems like the opposite: everything gets its own level unless if they obviously go together.
No new expression precedence level has been added since ** in January 1996, at least, not that I can identify. Those boolean ones you mentioned were added in October 1991 and follows the C operator precedence. Given the number of other languages do the same, and Python's goals, this makes sense. The ** was a secondary influence by Fortran. The closest Python came since 1996 is support for the if/else expression, but that has the same precedence as lambda. Eg, compare 3.4: test: or_test ['if' or_test 'else' test] | lambdef to 1.5.2: test: and_test ('or' and_test)* | lambdef (The documentation says they are at different levels, but they actually are at the same. Consider: lambda: 1 if print('hi') else print('bye') This is not the same as (lambda: 1) if print('hi') else print('bye') .) If you came from a Pascal background then you would expect '+' '-' 'or' and 'xor' to have the same precedence, since the adding operators obviously go together. While, oddly enough, C's == is not at the same level as <=, which suggests that the Dennis Ritchie at one time didn't think it was obvious that those two go together. (Then again, he also started off with a =- 1 instead of a -= 1.) The arguments against a complex precedence table are well-trod ground. See: http://stackoverflow.com/questions/6320424/ http://c2.com/cgi/wiki?OperatorPrecedenceConsideredHarmful Larry Wall's summary about operator precedence in http://perl6.org/archive/doc/design/apo/A03.html is apropos to the general topic of deciding upon operator precedence. Cheers, Andrew dalke@dalkescientific.com
On Thu, Mar 20, 2014 at 8:38 PM, Andrew Dalke <dalke@dalkescientific.com> wrote:
You say "we've been asked to report back on what design of @ will be best for the numeric community, since that's where we have special expertise that python-dev lacks." I don't really think that goal means you can avoid considering all non-numeric consequences.
Sure, but that discussion will (and should) happen on python-ideas. When Nathaniel says that "we have been asked" to answer this very specific question, he means that literally. Guido has asked us to answer this specific question, not for our community's collective judgement call on the total question. Our answer will be considered along with other concerns (like the operator abuse-case) on python-ideas to end up at the final decision weighing all of the factors. We're trying to avoid the Abilene Paradox.
On Mar 20, 2014, at 2:31 PM, Robert Kern wrote:
Really? I mean, |, ^, and & *each* get their own precedence level. I'm not really sure that there is an aversion to adding precedence levels. In fact, it almost seems like the opposite: everything gets its own level unless if they obviously go together.
No new expression precedence level has been added since ** in January 1996, at least, not that I can identify. Those boolean ones you mentioned were added in October 1991 and follows the C operator precedence. Given the number of other languages do the same, and Python's goals, this makes sense. The ** was a secondary influence by Fortran.
The closest Python came since 1996 is support for the if/else expression, but that has the same precedence as lambda. Eg, compare 3.4:
test: or_test ['if' or_test 'else' test] | lambdef
to 1.5.2:
test: and_test ('or' and_test)* | lambdef
(The documentation says they are at different levels, but they actually are at the same. Consider: lambda: 1 if print('hi') else print('bye') This is not the same as (lambda: 1) if print('hi') else print('bye') .)
The observed parse is consistent with `if-else` having a higher precedence than lambda, as documented. If they *were* at the same level, the parse would be the paranthesized expression that you cite (just as 5%2*3 == (5%2)*3 != 5%(2*3)).
If you came from a Pascal background then you would expect '+' '-' 'or' and 'xor' to have the same precedence, since the adding operators obviously go together.
While, oddly enough, C's == is not at the same level as <=, which suggests that the Dennis Ritchie at one time didn't think it was obvious that those two go together. (Then again, he also started off with a =- 1 instead of a -= 1.)
The arguments against a complex precedence table are well-trod ground. See: http://stackoverflow.com/questions/6320424/ http://c2.com/cgi/wiki?OperatorPrecedenceConsideredHarmful
Larry Wall's summary about operator precedence in http://perl6.org/archive/doc/design/apo/A03.html is apropos to the general topic of deciding upon operator precedence.
I see a lot of assertions of fact there, but no empirical studies that actually make them facts. On that note, I wonder if we can convince Stefik and Siebert et al. to run a study for us to help answer these questions. If I were in their group, I'd jump at the chance to use their empirical methodology to help design an actual production language. http://neverworkintheory.org/2014/01/29/stefik-siebert-syntax.html -- Robert Kern
On Mar 20, 2014, at 10:39 PM, Robert Kern wrote:
Sure, but that discussion will (and should) happen on python-ideas. When Nathaniel says that "we have been asked" to answer this very specific question, he means that literally.
Ah, now I understand. Thanks! Andrew dalke@dalkescientific.com
participants (16)
-
Alexander Belopolsky
-
Andrew Dalke
-
Bago
-
Charles R Harris
-
Chris Laumann
-
Christophe Bal
-
Dag Sverre Seljebotn
-
Eelco Hoogendoorn
-
Jaime Fernández del Río
-
Joe Kington
-
josef.pktd@gmail.com
-
Mark Daoust
-
Nathaniel Smith
-
Robert Kern
-
Russell E. Owen
-
Sebastian Haase