[Numpy-discussion] is __array_ufunc__ ready for prime-time?

Ryan May rmay31 at gmail.com
Thu Nov 2 12:23:44 EDT 2017

On Thu, Nov 2, 2017 at 6:46 AM, <josef.pktd at gmail.com> wrote:

> On Wed, Nov 1, 2017 at 6:55 PM, Nathan Goldbaum <nathan12343 at gmail.com>
> wrote:
>> I think the biggest issues could be resolved if __array_concatenate__
>> were finished. Unfortunately I don't feel like I can take that on right now.
>> See Ryan May's talk at scipy about using an ndarray subclass for units
>> and the issues he's run into:
>> https://www.youtube.com/watch?v=qCo9bkT9sow
> Interesting talk, but I don't see how general library code should know
> what units the output has.
> for example if units are some flows per unit of time and we average, sum
> or integrate over time, then what are the new units? (e.g. pandas time
> aggregation)

A general library doesn't have to do anything--just not do annoying things
like isinstance() checks and calling np.asarray() everywhere. Honestly one
of those is the core of most of the problems I run into. It's currently
more complicated when doing things in compiled land, but that's
implementation details, not any kind of fundamental incompatibility.

For basic mathematical operations, units have perfectly well defined
semantics that many of us encountered in an introductory physics or
chemistry class:
- Want to add or subtract two things? They need to have the same units; a
units library can handle conversion provided they have the same
dimensionality (e.g. length, time)
- Multiplication/Divison: combine and cancel units ( m/s * s -> m)

Everything else we do on a computer with data in some way boils down to:
add, subtract, multiply, divide.

Average keeps the same units -- it's just a sum and division by a unit-less
Integration (in 1-D) involves *two* variables, your data as well as the
time/space coordinates (or dx or dt); fundamentally it's a multiplication
by dx and a summation. The units results then are e.g. data.units *
dx.units. This works just like it does in Physics 101 where you integrate
velocity (i.e. m/s) over time (e.g. s) and get displacement (e.g. m)

What are units of covariance or correlation between two variables with the
> same units, and what are they between variables with different units?

Well, covariance is subtracting the mean from each variable and multiplying
the residuals; therefore the units for cov(x, y):

(x.units - x.units) * (y.units - y.units) -> x.units * y.units

Correlation takes covariance and divides by the product of the standard
deviations, so that's:

(x.units * y.units) / (x.units * y.units) -> dimensionless

Which is what I'd expect for a correlation.

> How do you concatenate and operate arrays with different units?

If all arrays have compatible dimensionality (say meters, inches, miles),
you convert to one (say the first) and concatenate like normal. If they're
not compatible, you error out.

> interpolation or prediction would work with using the existing units.

I'm sure you wrote that thinking units didn't play a role, but the math
behind those operations works perfectly fine with units, with things
cancelling out properly to give the same units out as in.


Ryan May
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20171102/e07dd6c0/attachment.html>

More information about the NumPy-Discussion mailing list