On Thu, Nov 2, 2017 at 12:23 PM, Ryan May rmay31@gmail.com wrote:

On Thu, Nov 2, 2017 at 6:46 AM, josef.pktd@gmail.com wrote:

On Wed, Nov 1, 2017 at 6:55 PM, Nathan Goldbaum nathan12343@gmail.com wrote:

I think the biggest issues could be resolved if __array_concatenate__ were finished. Unfortunately I don't feel like I can take that on right now.

See Ryan May's talk at scipy about using an ndarray subclass for units and the issues he's run into:

Interesting talk, but I don't see how general library code should know what units the output has. for example if units are some flows per unit of time and we average, sum or integrate over time, then what are the new units? (e.g. pandas time aggregation)

A general library doesn't have to do anything--just not do annoying things like isinstance() checks and calling np.asarray() everywhere. Honestly one of those is the core of most of the problems I run into. It's currently more complicated when doing things in compiled land, but that's implementation details, not any kind of fundamental incompatibility.

For basic mathematical operations, units have perfectly well defined semantics that many of us encountered in an introductory physics or chemistry class:

- Want to add or subtract two things? They need to have the same units; a
units library can handle conversion provided they have the same dimensionality (e.g. length, time)

- Multiplication/Divison: combine and cancel units ( m/s * s -> m)
Everything else we do on a computer with data in some way boils down to: add, subtract, multiply, divide.

Average keeps the same units -- it's just a sum and division by a unit-less constant Integration (in 1-D) involves *two* variables, your data as well as the time/space coordinates (or dx or dt); fundamentally it's a multiplication by dx and a summation. The units results then are e.g. data.units * dx.units. This works just like it does in Physics 101 where you integrate velocity (i.e. m/s) over time (e.g. s) and get displacement (e.g. m)

What are units of covariance or correlation between two variables with the

same units, and what are they between variables with different units?

Well, covariance is subtracting the mean from each variable and multiplying the residuals; therefore the units for cov(x, y):

(x.units - x.units) * (y.units - y.units) -> x.units * y.units

Correlation takes covariance and divides by the product of the standard deviations, so that's:

(x.units * y.units) / (x.units * y.units) -> dimensionless

Which is what I'd expect for a correlation.

How do you concatenate and operate arrays with different units?

If all arrays have compatible dimensionality (say meters, inches, miles), you convert to one (say the first) and concatenate like normal. If they're not compatible, you error out.

interpolation or prediction would work with using the existing units.

I'm sure you wrote that thinking units didn't play a role, but the math behind those operations works perfectly fine with units, with things cancelling out properly to give the same units out as in.

Some of it is in my reply to Marten.

regression and polyfit requires an X matrix with different units and then some linear algebra like solve, pinv or svd.

So, while the predicted values have well defined units, the computation involves some messier operations, unless you want to forgo linear algebra in all intermediate step and reduce it to sum, division and inverse.

Josef

Ryan

-- Ryan May

NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion