[Numpy-discussion] numpy grant update

Nathaniel Smith njs at pobox.com
Thu Oct 26 20:56:51 EDT 2017

On Thu, Oct 26, 2017 at 1:14 PM, Marten van Kerkwijk
<m.h.vankerkwijk at gmail.com> wrote:
> Hi Nathaniel,
> Thanks for the link. The plans sounds great! You'll not be surprised
> to hear I'm particularly interested in the units aspect (and, no, I
> don't mind at all if we can stop subclassing ndarray...). Is the idea
> that there will be a general way for allow a dtype to define how to
> convert an array to one with another dtype? (Just as one now
> implicitly is able to convert between, say, int and float.) And, if
> so, is the idea that one of those conversion possibilities might
> involve checking units? Or were you thinking of implementing units
> more directly? The former would seem most sensible, if only so you can
> initially focus on other things than deciding how to support, say, esu
> vs emu units, or whether or not to treat radians as equal to
> dimensionless (which they formally are, but it is not always handy to
> do so).

Well, to some extent the answers here are going to be "you tell me"
:-). I'm not an expert in unit handling, and these plans are pretty
high-level right now -- there will be lots more discussions to work
out details once we've hired people and they're ramping up, and as we
work out the larger context around how to improve the dtype system.

But, generally, yeah, one of the things that a custom dtype will need
to be able to do is to hook into the casting and ufunc dispatch
systems. That means, when you define a dtype, you get to answer
questions like "can you cast yourself into float32 without loss of
precision?", or "can you cast yourself into int64, truncating values
if you have to?". (Or even, "can you cast yourself to <this other unit
type>?", which would presumably trigger unit conversion.) And you'd
also get to define how things like overriding how np.add and
np.multiply work for your dtype -- it's already the case that ufuncs
have multiple implementations for different dtypes and there's
machinery to pick the best one; this would just be extending that to
these new dtypes as well.

One possible approach that I think might be particularly nice would be
to implement units as a "wrapper dtype". The idea would be that if we
have a standard interface that dtypes implement, then not only can you
implement those methods yourself to make a new dtype, but you can also
call those methods on an existing dtype. So you could do something

class WithUnits(np.dtype):
    def __init__(self, inner_dtype, unit):
        self.inner_dtype = np.dtype(inner_dtype)
        self.unit = unit

    # Simple operations like bulk data copying are delegated to the inner dtype
    # (Invoked by arr.copy(), making temporary buffers for calculations, etc.)
    def copy_data(self, source, dest):
        return self.inner_dtype.copy_data(source, dest)

    # Other operations like casting can do some unit-specific stuff and then
    # delegate
    def cast_to(self, other_dtype, source, dest):
        if isinstance(other_dtype, WithUnits):
            if other_dtype.unit == self.unit:
                # Something like casting WithUnits(float64, meters) ->
WithUnits(float32, meters)
                # So no unit trickiness needed; delegate to the inner
dtype to handle the storage
                # conversion (e.g. float64 -> float32)
                self.inner_dtype.cast_to(other_dtype.inner_dtype, source, dest)
            # ... other cases to handle unit conversion, etc. ...

And then as a user you'd use it like np.array([1, 2, 3],
dtype=WithUnits(float, meters)) or whatever. (Or some convenience
function that ultimately does this.)

This is obviously a hand-wavey sketch, I'm sure the actual details
will look very different. But hopefully it gives some sense of the
kind of possibilities here?


Nathaniel J. Smith -- https://vorpus.org

More information about the NumPy-Discussion mailing list