[Numpy-discussion] NEP 42 status – Store quantity in a NumPy array and convert it :)

Fri Mar 26 10:44:42 EDT 2021

Thanks Sebastian, I have your example running and will start experimenting
with DType.

Lee

On Thu, Mar 25, 2021 at 5:32 PM Sebastian Berg <sebastian at sipsolutions.net>
wrote:

> On Wed, 2021-03-17 at 17:12 -0500, Sebastian Berg wrote:
> > On Wed, 2021-03-17 at 07:56 -0500, Lee Johnston wrote:
>
> <snip>
>
> > 3. In parallel, I will create a small "toy" DType based on that
> >    experimental API.  Probably in a separate repo (in the NumPy
> >    organization?).
> >
>
> So this is started. What you need to do right now if you want to try is
> work of this branch in NumPy:
>
>
> https://github.com/numpy/numpy/compare/main...seberg:experimental-dtype-api
>
> Install NumPy with `NPY_USE_NEW_CASTINGIMPL=1 python -mpip install .`
> or your favorite alternative.
> (The `NPY_USE_NEW_CASTINGIMPL=1` should be unnecessary very soon,
> working of a branch and not "main" will hopefully also be unnecessary
> soon.)
>
>
> Then fetch: https://github.com/seberg/experimental_user_dtypes
> and install it as well in the same environment.
>
>
> After that, you can jump through the hoop of setting:
>
>     NUMPY_EXPERIMENTAL_DTYPE_API=1
>
> And you can enjoy these type of examples (while expecting hard crashes
> when going too far beyond!):
>
>     from experimental_user_dtypes import float64unit as u
>     import numpy as np
>
>     F = np.array([u.Quantity(70., "Fahrenheit")])
>     C = F.astype(u.Float64UnitDType("Celsius"))
>     print(repr(C))
>     # array([21.11111111111115 °C], dtype='Float64UnitDType(degC)')
>
>     m = np.array([u.Quantity(5., "m")])
>     m_squared = u.multiply(m, m)
>     print(repr(m_squared))
>     # array([25.0 m**2], dtype='Float64UnitDType(m**2)')
>
>     # Or conversion to SI the long route:
>     pc = np.arange(5., dtype="float64").view(u.Float64UnitDType("pc"))
>     pc.astype(pc.dtype.si())
>     # array([0.0 m, 3.085677580962325e+16 m, 6.17135516192465e+16 m,
>     #        9.257032742886974e+16 m, 1.23427103238493e+17 m],
>     #       dtype='Float64UnitDType(m)')
>
>
> Yes, the code has some horrible hacks around creating the DType, but
> the basic mechanism i.e. "functions you need to implement" are not
> expected to change lot.
>
> Right now, it forces you to use and implement the scalar `u.Quantity`
> and the code sample uses it. But you can also do:
>
>     np.arange(3.).view(u.Float64UnitDType("m"))
>
> I do have plans to "not have a scalar" so the 0-D result would still be
> an array.  But that option doesn't exist yet (and right now the scalar
> is used for printing).
>
>
> (There is also a `string_equal` "ufunc-like" that works on "S" dtypes.)
>
> Cheers,
>
> Sebastian
>
>
>
> PS: I need to figure out some details about how to create DTypes and
> DType instances with regards to our stable ABI.  The current "solution"
> is some weird subclassing hoops which are probably not good.
>
> That is painful unfortunately and any ideas would be great :).
> Unfortunately, it requires a grasp around the C-API and metaclassing...
>
>
>
> >
> > Anyone using the API, should expect bugs, crashes and changes for a
> > while.  But hopefully will only require small code modifications when
> > the API becomes public.
> >
> > My personal plan for a toy example is currently a "scaled integer".
> > E.g. a uint8 where you can set a range `[min_double, max_double]`
> > that
> > it maps to (which makes the DType "parametric").
> > We discussed some other examples, such as a "modernized" rational
> > DType, that could be nice as well, lets see...
> >
> > Units would be a great experiment, but seem a bit complex to me (I
> > don't know units well though). So to keep it baby steps :) I would
> > aim
> > for doing the above and then we can experiment on Units together!
> >
> >
> > Since it came up:  I agree that a Python API would be great to have.
> > It
> > is something I firmly kept on the back-burner...  It should not be
> > very
> > hard (if rudimentary), but unless it would help experiments a lot, I
> > would tend to leave it on the back-burner for now.
> >
> > Cheers,
> >
> > Sebastian
> >
> >
> > [1]  Maybe a `uint8` storage that maps to evenly spaced values on a
> > parametric range `[double_min, double_max]`.  That seems like a good
> > trade-off in complexity.
> >
> >
> >
> > > On Tue, Mar 16, 2021 at 4:11 PM Sebastian Berg <
> > > sebastian at sipsolutions.net>
> > > wrote:
> > >
> > > > On Tue, 2021-03-16 at 13:17 -0500, Lee Johnston wrote:
> > > > > Is the work on NEP 42 custom DTypes far enough along to
> > > > > experiment
> > > > > with?
> > > > >
> > > >
> > > > TL;DR:  Its not quite ready, but if we work together I think we
> > > > could
> > > > experiment a fair bit.  Mainly ufuncs are still limited (though
> > > > not
> > > > quite completely missing).  The main problem is that we need to
> > > > find a
> > > > way to expose the currently private API.
> > > >
> > > > I would be happy to discuss this also in a call.
> > > >
> > > >
> > > > ** The long story: **
> > > >
> > > > There is one more PR related to casting, for which merge should
> > > > be
> > > > around the corner. And which would bring a lot bang to such an
> > > > experiment:
> > > >
> > > > https://github.com/numpy/numpy/pull/18398
> > > >
> > > >
> > > > At that point, the new machinery supports (or is used for):
> > > >
> > > > * Array-coercion: `np.array([your_scalar])` or
> > > >   `np.array([1], dtype=your_dtype)`.
> > > >
> > > > * Casting (practically full support).
> > > >
> > > > * UFuncs do not quite work. But short of writing `np.add(arr1,
> > > > arr2)`
> > > >   with your DType involved, you can try a whole lot. (see below)
> > > >
> > > > * Promotion `np.result_type` should work very soon, but probably
> > > > isn't
> > > >   is not very relevant anyway until ufuncs are fully implemented.
> > > >
> > > > That should allow you to do a lot of good experimentation, but
> > > > due
> > > > to
> > > > the ufunc limitation, maybe not well on "existing" python code.
> > > >
> > > >
> > > > The long story about limitations is:
> > > >
> > > > We are missing exposure of the new public API.  I think I should
> > > > be
> > > > able to provide a solution for this pretty quickly, but it might
> > > > require working of a NumPy branch.  (I will write another email
> > > > about
> > > > it, hopefully we can find a better solution.)
> > > >
> > > >
> > > > Limitations for UFuncs:  UFuncs are the next big project, so to
> > > > try
> > > > it
> > > > fully you will need some patience, unfortunately.
> > > >
> > > > But, there is some good news!  You can write most of the "ufunc"
> > > > already, you just can't "register" it.
> > > > So what I can already offer you is a "DType-specific UFunc",
> > > > e.g.:
> > > >
> > > >    unit_dtype_multiply(np.array([1.],
> > > > dtype=Float64UnitDType("m")),
> > > >                        np.array([2.],
> > > > dtype=Float64UnitDtype("s")))
> > > >
> > > > And get out `np.array([2.], dtype=Float64UnitDtype("m s"))`.
> > > >
> > > > But you can't write `np.multiple(arr1, arr2)` or `arr1 * arr2`
> > > > yet.
> > > > Both registration and "promotion" logic are missing.
> > > >
> > > > I admit promotion may be one of the trickiest things, but trying
> > > > this a
> > > > bit might help with getting a clearer picture for promotion as
> > > > well.
> > > >
> > > >
> > > > The main last limitation is that I did not replace or create
> > > > "fallback"
> > > > solutions and/or replacement for the legacy `dtype->f-><slots>`
> > > > yet.
> > > > This is not a serious limitation for experimentation, though.  It
> > > > might
> > > > even make sense to keep some of them around and replace them
> > > > slowly.
> > > >
> > > >
> > > > And of course, all the small issues/limitations that are not
> > > > fixed
> > > > because nobody tried yet...
> > > >
> > > >
> > > >
> > > > I hope this doesn't scare you away, or at least not for long :/.
> > > > It
> > > > could be very useful to start experimentation soon to push things
> > > > forward a bit quicker.  And I really want to have at least an
> > > > experimental version in NumPy 1.21.
> > > >
> > > > Cheers,
> > > >
> > > > Sebastian
> > > >
> > > >
> > > > > Lee
> > > > > _______________________________________________
> > > > > NumPy-Discussion mailing list
> > > > > NumPy-Discussion at python.org
> > > > > https://mail.python.org/mailman/listinfo/numpy-discussion
> > > >
> > > > _______________________________________________
> > > > NumPy-Discussion mailing list
> > > > NumPy-Discussion at python.org
> > > > https://mail.python.org/mailman/listinfo/numpy-discussion
> > > >
> > > _______________________________________________
> > > NumPy-Discussion mailing list
> > > NumPy-Discussion at python.org
> > > https://mail.python.org/mailman/listinfo/numpy-discussion
> >
> > _______________________________________________
> > NumPy-Discussion mailing list
> > NumPy-Discussion at python.org
> > https://mail.python.org/mailman/listinfo/numpy-discussion
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/numpy-discussion/attachments/20210326/dd26728b/attachment-0001.html>