On Mon, Mar 17, 2014 at 1:18 PM, <josef.pktd@gmail.com> wrote:



On Mon, Mar 17, 2014 at 12:50 PM, Alexander Belopolsky <ndarray@mac.com> wrote:

On Mon, Mar 17, 2014 at 12:13 PM, Nathaniel Smith <njs@pobox.com> wrote:
In practice all
well-behaved classes have to make sure that they implement __special__
methods in such a way that all the different variations work, no
matter which class ends up actually handling the operation.

"Well-behaved classes" are hard to come by in practice.  The @ operator may fix the situation with np.matrix, so take a look at MaskedArray with its 40-line __array_wrap__ and no end of bugs.

Requiring superclass __method__ to handle creation of subclass results correctly is turning Liskov principle on its head.  With enough clever tricks and tight control over the full class hierarchy you can make it work in some cases, but it is not a good design. 

I am afraid that making @ special among other binary operators that implement mathematically associative operations will create a lot of confusion.  (The pow operator is special because the corresponding mathematical operation is non-associative.)

Imagine teaching someone that a % b % c = (a % b) % c, but a @ b @ c = a @ (b @ c).  What are the chances that they will correctly figure out what a // b // c means after this? 

One case where we need to keep track of left or right is type promotion

>>> a.shape
(100,)
>>> 1. * a.dot(a)
-98.0
>>> (1.*a).dot(a)
328350.0
>>> a.dtype
dtype('int8')

>>> 1. * a @ a
???

similar to 
>>> 1. * 2 / 3
0.6666666666666666
>>> 1. * (2 / 3)   # I'm not in the `future`
0.0

I thought of sending a message with I'm +-1 on either, but I'm not

I'm again in favor of "left", because it's the simplest to understand
A.dot(B).dot(C)  with some * mixed in

I understand now the computational argument in favor of right

x @ inv(x.T @ x) @ x.T @ y   ( with shapes T,k   k,k   k,T  T,1  )
or 
x @ pinv(x) @ y    (with shapes T,k k,T  T,1 )

with  with T>>k      (last 1 could be a m>1 with T>>m)

However, we don't write code like that most of the time.
Alan's students won't care much if some intermediate arrays blow up.
In library code like in statsmodels it's almost always a conscious choice of where to set the parenthesis and, more often, which part of a long array expression is taken out as a temporary or permanent variable.

I think almost the only uses of chain_dot(A, B, C) (which is "right") is for quadratic forms

            xtxi = pinv(np.dot(exog.T, exog))       # k,k
            xtdx = np.dot(exog.T * d[np.newaxis, :], exog)   # k,k
            vcov = chain_dot(xtxi, xtdx, xtxi)      # kk, kk, kk
(from Quantreg)

I think optimizing this way is relatively easy


On the other hand, I worry a lot more about messy cases with different dtypes or different classes involved as Alexander has pointed out. Cases that might trip up medium to medium-advanced numpy users.

(Let's see, I have to read @ back to front, and * front to back, and why did I put a sparse matrix in the middle and a masked array at the end. Oh no, that's not a masked array it's a panda.)
compared to
(Somewhere there is a mistake, let's go through all terms from the beginning to the end)

Josef

 

Josef

 

_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

>>> 1. * a.dot(a)
-98.0