[Numpy-discussion] Proposal: NEP 41 -- First step towards a new Datatype System

Chris Meyer cmeyer1969 at gmail.com
Tue Mar 17 18:34:10 EDT 2020


> On Mar 17, 2020, at 1:02 PM, Sebastian Berg <sebastian at sipsolutions.net> wrote:
> 
> in the spirit of trying to keep this moving, can I assume that the main
> reason for little discussion is that the actual changes proposed are
> not very far reaching as of now?  Or is the reason that this is a
> fairly complex topic that you need more time to think about it?
> If it is the latter, is there some way I can help with it?  I tried to
> minimize how much is part of this initial NEP.

One reason for not responding is that it seems a lot of discussion of this has already taken place and this NEP is presented more as a conclusion summary rather than a discussion point.

I implement scientific imaging software and overall this NEP looks useful.

My only caveat is that I don’t think tracking physical units should be a primary use case. Units are fundamentally different than data types, even though there are libraries out there that treat them more like data types.

For instance, it makes sense to have the same physical unit but with different storage types. For instance, data with nanometer physical units can be stored as a float32 or as an int16 and be equally useful.

In addition, a unit is something that is mutated by the operation. For instance, reducing a 2D image with physical units by a factor of two in each dimension produces a different unit scaling (1km/pixel goes to 2km/pixel); whereas cropping the center half does not (1km/pixel stays as 1km/pixel).

Finally, units may be different for each axis in multidimensional data. For instance, we want a float32 array with two dimensions with the units on one dimension being time and the other dimension being spatial. (3 seconds x 50 nm).

I’m not sure these comments take away from this NEP — but maybe there is another approach for units: metadata about the shape of the data rather than a new datatype for physical units. We do this in our software already - but it would be helpful if NumPy had a built-in mechanism for that.



More information about the NumPy-Discussion mailing list