On Thu, Mar 20, 2014 at 1:25 PM, Nathaniel Smith <njs@pobox.com> wrote:
On Wed, Mar 19, 2014 at 7:45 PM, Nathaniel Smith <njs@pobox.com> wrote:
Okay, I wrote a little script [1] to scan Python source files look for things like 'dot(a, dot(b, c))' or 'dot(dot(a, b), c)', or the ndarray.dot method equivalents. So what we get out is:
- a count of how many 'dot' calls there are
- a count of how often we see left-associative nestings: dot(dot(a, b), c)
- a count of how often we see right-associative nestings: dot(a, dot(b, c))

Running it on a bunch of projects, I get:

| project      | dots | left | right | right/left |
|--------------+------+------+-------+------------|
| scipy        |  796 |   53 |    27 |       0.51 |
| nipy         |  275 |    3 |    19 |       6.33 |
| scikit-learn |  472 |   11 |    10 |       0.91 |
| statsmodels  |  803 |   46 |    38 |       0.83 |
| astropy      |   17 |    0 |     0 |        nan |
| scikit-image |   15 |    1 |     0 |       0.00 |
|--------------+------+------+-------+------------|
| total        | 2378 |  114 |    94 |       0.82 |


Another way to visualize this, converting each contiguous "chain" of calls to np.dot into a parenthesized expression, and then counting how often we see each pattern.

      1943  (_ @ _)
       100  ((_ @ _) @ _) # left
        86  (_ @ (_ @ _)) # right
         2  (_ @ ((_ @ _) @ _))
         2  (((_ @ _) @ _) @ _) # left
         1  ((_ @ (_ @ _)) @ _)
         1  ((_ @ _) @ (_ @ _))
         1  (((_ @ _) @ _) @ (_ @ _))
         1  ((_ @ ((_ @ _) @ _)) @ _)
         1  ((_ @ _) @ (_ @ (_ @ _)))

(This is pooling scipy/nipy/scikit-learn/statsmodels.) I've noted the 3 different patterns that have a consistent associativity.

From this I'm leaning towards the conclusions that:

- Expressions with complex parenthesization do happen, but probably not often enough to justify elaborate stuff like my 'chaining' proposal -- only 8.7% of these cases involve more than one @.

just for statsmodels

We do have a very large amount of chaining, but in many cases this has been taken out of a single expression into a temporary or permanent variable for parts of the chain. (similar to the quadratic form example in the PEP),
either for clarity (a temp variable), or because one dot product shows up several times in the same expression (quadratic forms) or because we need to keep it around for reuse in other expressions.

That's what I tried to explain before, that chaining and breaking up larger multi-dot expressions is most of the time a intentional choice and not just random because the the dot function forces us.

The most convincing argument for me for @ is that it makes parenthesis visible (until I realized that I didn't really care about @).
This reduces the cases where we separate out a dot product for clarity and readibility, but still leaves us with the other two cases, where our chaining won't change whatever numpy provides additionally.

Josef

 

- There's very little support here for the intuition that right-associativity is more useful than left-associativity on a day-to-day basis.

--
Nathaniel J. Smith
Postdoctoral researcher - Informatics - University of Edinburgh
http://vorpus.org

_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion