[Numpy-discussion] numpy grant update

Nathaniel Smith njs at pobox.com
Thu Oct 26 21:20:47 EDT 2017

On Thu, Oct 26, 2017 at 2:11 PM, Nathan Goldbaum <nathan12343 at gmail.com> wrote:
> My understanding of this is that the dtype will only hold the unit metadata.
> So that means units would propogate through calculations automatically, but
> the dtype wouldn't be able to manipulate the array data (in an in-place unit
> conversion for example).

I think that'd be fine actually... dtypes have methods[1] that are
invoked to do any operation that involves touching the actual array
data. For example, when you copy array data from one place to another
(because someone called arr.copy(), or did x[...] = y, or because the
ufunc internals need to copy part of the array into a temporary bounce
buffer, etc.), you have to let the dtype do that, because only the
dtype knows how to safely copy entries of this dtype. (For many dtypes
it's just a simple (strided) memmove, but then for the object dtype
you have to take care of refcounting...)

Similarly, if your unit dtype implemented casting, then array(...,
dtype=WithUnits(float, meters)).astype(WithUnits(float, feet)) would
Just Work.

It looks like we don't currently expose a user-level API for doing
in-place dtype conversions, but there's no reason we can't add one;
all the underlying casting machinery already exists and works on
arbitrary memory buffers. (And in the mean time there's a cute trick
here [2] you could use to implement it yourself.) And if we do add
one, then you could use it equally well to do in-place conversion from
float64->int64 as for float64-in-meters to float64-in-feet.

[1] Well, technically right now they're not methods, but instead a
bunch of instance attributes holding C level function pointers that
act like methods. But basically this is just an obfuscated way of
implementing methods; it made sense at the time, but in retrospect
making them use the more usual Python machinery for this will make
things easier.
[2] https://stackoverflow.com/a/4396247/

> In this world, astropy quantities and yt's YTArray would become containers
> around an ndarray that would make use of the dtype metadata but also
> implement all of the unit semantics that they already implement. Since they
> would become container classes and would no longer be ndarray subclasses,
> that avoids most of the pitfalls one encounters these days.

I don't think you'd need a container class for basic functionality,
but it might turn out to be useful for some kind of
convenience/backwards-compatibility issues. For example, right now
with Quantity you can do 'arr.unit' to get the unit and 'arr.value' to
get the raw values with units stripped. It should definitely be
possible to support these with spellings like 'arr.dtype.unit' and
'asarray(arr, dtype=float)' (or 'astropy.quantities.value(arr)'), but
maybe not the short array attribute based spellings? We'll have to
have the discussion about whether we want to provide some mechanism
for *dtypes* to add new attributes to the *ndarray* namespace.
(There's some precedent in numpy's built-in .real and .imag, but OTOH
this is a kind of 'import *' feature that can easily be confusing and
create backwards compatibility issues -- what if ndarray and the dtype
have a name clash? Keeping in mind that it could be a clash between a
third-party dtype we don't even know about and a new ndarray attribute
that didn't exist when the third-party dtype was created...)


Nathaniel J. Smith -- https://vorpus.org

More information about the NumPy-Discussion mailing list