Get rid of special scalar arithmetic

Hi All, I've opened issue #7002 <https://github.com/numpy/numpy/issues/7002>, reproduced below, for discussion.
Numpy umath has a file scalarmath.c.src that implements scalar arithmetic using special functions that are about 10x faster than the equivalent ufuncs.
In [1]: a = np.float64(1)
In [2]: timeit a*a 10000000 loops, best of 3: 69.5 ns per loop
In [3]: timeit np.multiply(a, a) 1000000 loops, best of 3: 722 ns per loop
I contend that in large programs this improvement in execution time is not worth the complexity and maintenance overhead; it is unlikely that scalar-scalar arithmetic is a significant part of their execution time. Therefore I propose to use ufuncs for all of the scalar-scalar arithmetic. This would also bring the benefits of __numpy_ufunc__ to scalars with minimal effort.
Thoughts? Chuck

On Tue, Jan 12, 2016 at 9:18 PM, Charles R Harris <charlesr.harris@gmail.com> wrote:
Hi All,
I've opened issue #7002, reproduced below, for discussion.
Numpy umath has a file scalarmath.c.src that implements scalar arithmetic using special functions that are about 10x faster than the equivalent ufuncs.
In [1]: a = np.float64(1)
In [2]: timeit a*a 10000000 loops, best of 3: 69.5 ns per loop
In [3]: timeit np.multiply(a, a) 1000000 loops, best of 3: 722 ns per loop
I contend that in large programs this improvement in execution time is not worth the complexity and maintenance overhead; it is unlikely that scalar-scalar arithmetic is a significant part of their execution time. Therefore I propose to use ufuncs for all of the scalar-scalar arithmetic. This would also bring the benefits of __numpy_ufunc__ to scalars with minimal effort.
Thoughts?
+1e6, scalars are a maintenance disaster is so many ways. But can we actually pull it off? IIRC there were complaints about scalars getting slower at some point (and not 10x slower), because it's not actually too hard to have code that is heavy on scalar arithmetic. (Indexing an array returns a numpy scalar rather than a python object, even if these look similar, so any code that, say, does a Python loop over the elements of an array may well be bottlenecked by scalar arithmetic. Obviously it's better not to do such loops, but...) It still seems to me that surely we can surely speed up ufuncs on scalars / small arrays? Also I am somewhat encouraged that like you I get ~700 ns for multiply(scalar, scalar) versus ~70 ns for scalar * scalar, but I also get ~380 ns for 0d-array * 0d-array. (I guess probably for multiply(scalar, scalar) we're first calling asarray on both scalar objects, which is certainly avoidable.) Here's a profile of zerod * zerod [0]: http://vorpus.org/~njs/tmp/zerod.svg (Click on PyNumber_Multiply to zoom in on the relevant part) And here's multiply(scalar, scalar) [1]: http://vorpus.org/~njs/tmp/scalar.svg In principle it feels like tons of this stuff is fat that can be trimmed -- even in the first, faster, profile, we're allocating a 0d array and then converting it to a scalar, and the latter conversion in PyArray_Return takes 12% of time on its own; like 14% of the time is spent trying to figure out from scratch the complicated type resolution and casting procedure needed to multiply two float64s, ... [0] a = np.array(1, dtype=float) for i in range(...): a * a [1] s = np.float64(1) m = np.multiply for i in range(...): m(s, s) -n -- Nathaniel J. Smith -- http://vorpus.org

On Wed, Jan 13, 2016 at 5:18 AM, Charles R Harris <charlesr.harris@gmail.com> wrote:
Hi All,
I've opened issue #7002, reproduced below, for discussion.
Numpy umath has a file scalarmath.c.src that implements scalar
arithmetic using special functions that are about 10x faster than the equivalent ufuncs.
In [1]: a = np.float64(1)
In [2]: timeit a*a 10000000 loops, best of 3: 69.5 ns per loop
In [3]: timeit np.multiply(a, a) 1000000 loops, best of 3: 722 ns per loop
I contend that in large programs this improvement in execution time is
not worth the complexity and maintenance overhead; it is unlikely that scalar-scalar arithmetic is a significant part of their execution time. Therefore I propose to use ufuncs for all of the scalar-scalar arithmetic. This would also bring the benefits of __numpy_ufunc__ to scalars with minimal effort.
Thoughts?
Not all important-to-optimize programs are large in our field; interactive use is rampant. The scalar optimizations weren't added speculatively: people noticed that their Numeric code ran much slower under numpy and were reluctant to migrate. I was forever responding on comp.lang.python, "It's because scalar arithmetic hasn't been optimized yet. We know how to do it, we just need a volunteer to do the work. Contributions gratefully accepted!" The most critical areas tended to be optimization where you are often working with implicit scalars that pop out in the optimization loop. -- Robert Kern

Just thought I would add here a general comment I made in the thread: replacing scalars everywhere with array scalars (i.e., ndim=0) would be great also from the perspective of ndarray subclasses; as is, it is quite annoying to have to special-case, e.g., getting a single subclass element, and rewrapping the scalar in the subclass. -- Marten

On Mi, 2016-01-13 at 10:33 -0500, Marten van Kerkwijk wrote:
Just thought I would add here a general comment I made in the thread: replacing scalars everywhere with array scalars (i.e., ndim=0) would be great also from the perspective of ndarray subclasses; as is, it is quite annoying to have to special-case, e.g., getting a single subclass element, and rewrapping the scalar in the subclass. -- Marten
I understand the sentiment, and right now I think we usually give the subclass the chance to rewrap itself around 0-d arrays. But ideally I think this is incorrect. Either you want the scalar to be a scalar, or the array actually holds information which is associated with the dtype (i.e. units) and thus should survive conversion to scalar. To me personally, I don't think that we can really remove scalars, due to things such as mutability, sequence ABC registration and with that also hashability. My gut feeling is that there is actually an advantage in having a scalar object, even if internally this scalar object could reuse a lot. Note that a, e.g. 0-d write-only array would raise an error on `a += 1`.... Now practicality beating purity and all that, but to me it is not obvious that it would be the best thing to get rid of scalars completly (getting rid of the code duplication is a different issue). - Sebastian
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
participants (5)
-
Charles R Harris
-
Marten van Kerkwijk
-
Nathaniel Smith
-
Robert Kern
-
Sebastian Berg