<div dir="ltr">Thanks Sebastian, I have your example running and will start experimenting with DType.<div><br></div><div>Lee</div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Thu, Mar 25, 2021 at 5:32 PM Sebastian Berg <<a href="mailto:sebastian@sipsolutions.net">sebastian@sipsolutions.net</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">On Wed, 2021-03-17 at 17:12 -0500, Sebastian Berg wrote:<br>

> On Wed, 2021-03-17 at 07:56 -0500, Lee Johnston wrote:<br>

<br>

<snip><br>

<br>

> 3. In parallel, I will create a small "toy" DType based on that<br>

>    experimental API.  Probably in a separate repo (in the NumPy<br>

>    organization?).<br>

> <br>

<br>

So this is started. What you need to do right now if you want to try is<br>

work of this branch in NumPy:<br>

<br>

     <a href="https://github.com/numpy/numpy/compare/main...seberg:experimental-dtype-api" rel="noreferrer" target="_blank">https://github.com/numpy/numpy/compare/main...seberg:experimental-dtype-api</a><br>

<br>

Install NumPy with `NPY_USE_NEW_CASTINGIMPL=1 python -mpip install .`<br>

or your favorite alternative.<br>

(The `NPY_USE_NEW_CASTINGIMPL=1` should be unnecessary very soon,<br>

working of a branch and not "main" will hopefully also be unnecessary<br>

soon.)<br>

<br>

<br>

Then fetch: <a href="https://github.com/seberg/experimental_user_dtypes" rel="noreferrer" target="_blank">https://github.com/seberg/experimental_user_dtypes</a><br>

and install it as well in the same environment.<br>

<br>

<br>

After that, you can jump through the hoop of setting:<br>

<br>

    NUMPY_EXPERIMENTAL_DTYPE_API=1<br>

<br>

And you can enjoy these type of examples (while expecting hard crashes<br>

when going too far beyond!):<br>

<br>

    from experimental_user_dtypes import float64unit as u<br>

    import numpy as np<br>

<br>

    F = np.array([u.Quantity(70., "Fahrenheit")])<br>

    C = F.astype(u.Float64UnitDType("Celsius"))<br>

    print(repr(C))<br>

    # array([21.11111111111115 °C], dtype='Float64UnitDType(degC)')<br>

<br>

    m = np.array([u.Quantity(5., "m")])<br>

    m_squared = u.multiply(m, m)<br>

    print(repr(m_squared))<br>

    # array([25.0 m**2], dtype='Float64UnitDType(m**2)')<br>

<br>

    # Or conversion to SI the long route:<br>

    pc = np.arange(5., dtype="float64").view(u.Float64UnitDType("pc"))<br>

    pc.astype(<a href="http://pc.dtype.si" rel="noreferrer" target="_blank">pc.dtype.si</a>())<br>

    # array([0.0 m, 3.085677580962325e+16 m, 6.17135516192465e+16 m,<br>

    #        9.257032742886974e+16 m, 1.23427103238493e+17 m],<br>

    #       dtype='Float64UnitDType(m)')<br>

<br>

<br>

Yes, the code has some horrible hacks around creating the DType, but<br>

the basic mechanism i.e. "functions you need to implement" are not<br>

expected to change lot.<br>

<br>

Right now, it forces you to use and implement the scalar `u.Quantity`<br>

and the code sample uses it. But you can also do:<br>

<br>

    np.arange(3.).view(u.Float64UnitDType("m"))<br>

<br>

I do have plans to "not have a scalar" so the 0-D result would still be<br>

an array.  But that option doesn't exist yet (and right now the scalar<br>

is used for printing).<br>

<br>

<br>

(There is also a `string_equal` "ufunc-like" that works on "S" dtypes.)<br>

<br>

Cheers,<br>

<br>

Sebastian<br>

<br>

<br>

<br>

PS: I need to figure out some details about how to create DTypes and<br>

DType instances with regards to our stable ABI.  The current "solution"<br>

is some weird subclassing hoops which are probably not good.<br>

<br>

That is painful unfortunately and any ideas would be great :). <br>

Unfortunately, it requires a grasp around the C-API and metaclassing...<br>

<br>

<br>

<br>

> <br>

> Anyone using the API, should expect bugs, crashes and changes for a<br>

> while.  But hopefully will only require small code modifications when<br>

> the API becomes public.<br>

> <br>

> My personal plan for a toy example is currently a "scaled integer".<br>

> E.g. a uint8 where you can set a range `[min_double, max_double]`<br>

> that<br>

> it maps to (which makes the DType "parametric").<br>

> We discussed some other examples, such as a "modernized" rational<br>

> DType, that could be nice as well, lets see...<br>

> <br>

> Units would be a great experiment, but seem a bit complex to me (I<br>

> don't know units well though). So to keep it baby steps :) I would<br>

> aim<br>

> for doing the above and then we can experiment on Units together!<br>

> <br>

> <br>

> Since it came up:  I agree that a Python API would be great to have.<br>

> It<br>

> is something I firmly kept on the back-burner...  It should not be<br>

> very<br>

> hard (if rudimentary), but unless it would help experiments a lot, I<br>

> would tend to leave it on the back-burner for now.<br>

> <br>

> Cheers,<br>

> <br>

> Sebastian<br>

> <br>

> <br>

> [1]  Maybe a `uint8` storage that maps to evenly spaced values on a<br>

> parametric range `[double_min, double_max]`.  That seems like a good<br>

> trade-off in complexity.<br>

> <br>

> <br>

> <br>

> > On Tue, Mar 16, 2021 at 4:11 PM Sebastian Berg <<br>

> > <a href="mailto:sebastian@sipsolutions.net" target="_blank">sebastian@sipsolutions.net</a>><br>

> > wrote:<br>

> > <br>

> > > On Tue, 2021-03-16 at 13:17 -0500, Lee Johnston wrote:<br>

> > > > Is the work on NEP 42 custom DTypes far enough along to<br>

> > > > experiment<br>

> > > > with?<br>

> > > > <br>

> > > <br>

> > > TL;DR:  Its not quite ready, but if we work together I think we<br>

> > > could<br>

> > > experiment a fair bit.  Mainly ufuncs are still limited (though<br>

> > > not<br>

> > > quite completely missing).  The main problem is that we need to<br>

> > > find a<br>

> > > way to expose the currently private API.<br>

> > > <br>

> > > I would be happy to discuss this also in a call.<br>

> > > <br>

> > > <br>

> > > ** The long story: **<br>

> > > <br>

> > > There is one more PR related to casting, for which merge should<br>

> > > be<br>

> > > around the corner. And which would bring a lot bang to such an<br>

> > > experiment:<br>

> > > <br>

> > > <a href="https://github.com/numpy/numpy/pull/18398" rel="noreferrer" target="_blank">https://github.com/numpy/numpy/pull/18398</a><br>

> > > <br>

> > > <br>

> > > At that point, the new machinery supports (or is used for):<br>

> > > <br>

> > > * Array-coercion: `np.array([your_scalar])` or<br>

> > >   `np.array([1], dtype=your_dtype)`.<br>

> > > <br>

> > > * Casting (practically full support).<br>

> > > <br>

> > > * UFuncs do not quite work. But short of writing `np.add(arr1,<br>

> > > arr2)`<br>

> > >   with your DType involved, you can try a whole lot. (see below)<br>

> > > <br>

> > > * Promotion `np.result_type` should work very soon, but probably<br>

> > > isn't<br>

> > >   is not very relevant anyway until ufuncs are fully implemented.<br>

> > > <br>

> > > That should allow you to do a lot of good experimentation, but<br>

> > > due<br>

> > > to<br>

> > > the ufunc limitation, maybe not well on "existing" python code.<br>

> > > <br>

> > > <br>

> > > The long story about limitations is:<br>

> > > <br>

> > > We are missing exposure of the new public API.  I think I should<br>

> > > be<br>

> > > able to provide a solution for this pretty quickly, but it might<br>

> > > require working of a NumPy branch.  (I will write another email<br>

> > > about<br>

> > > it, hopefully we can find a better solution.)<br>

> > > <br>

> > > <br>

> > > Limitations for UFuncs:  UFuncs are the next big project, so to<br>

> > > try<br>

> > > it<br>

> > > fully you will need some patience, unfortunately.<br>

> > > <br>

> > > But, there is some good news!  You can write most of the "ufunc"<br>

> > > already, you just can't "register" it.<br>

> > > So what I can already offer you is a "DType-specific UFunc",<br>

> > > e.g.:<br>

> > > <br>

> > >    unit_dtype_multiply(np.array([1.],<br>

> > > dtype=Float64UnitDType("m")),<br>

> > >                        np.array([2.],<br>

> > > dtype=Float64UnitDtype("s")))<br>

> > > <br>

> > > And get out `np.array([2.], dtype=Float64UnitDtype("m s"))`.<br>

> > > <br>

> > > But you can't write `np.multiple(arr1, arr2)` or `arr1 * arr2`<br>

> > > yet.<br>

> > > Both registration and "promotion" logic are missing.<br>

> > > <br>

> > > I admit promotion may be one of the trickiest things, but trying<br>

> > > this a<br>

> > > bit might help with getting a clearer picture for promotion as<br>

> > > well.<br>

> > > <br>

> > > <br>

> > > The main last limitation is that I did not replace or create<br>

> > > "fallback"<br>

> > > solutions and/or replacement for the legacy `dtype->f-><slots>`<br>

> > > yet.<br>

> > > This is not a serious limitation for experimentation, though.  It<br>

> > > might<br>

> > > even make sense to keep some of them around and replace them<br>

> > > slowly.<br>

> > > <br>

> > > <br>

> > > And of course, all the small issues/limitations that are not<br>

> > > fixed<br>

> > > because nobody tried yet...<br>

> > > <br>

> > > <br>

> > > <br>

> > > I hope this doesn't scare you away, or at least not for long :/. <br>

> > > It<br>

> > > could be very useful to start experimentation soon to push things<br>

> > > forward a bit quicker.  And I really want to have at least an<br>

> > > experimental version in NumPy 1.21.<br>

> > > <br>

> > > Cheers,<br>

> > > <br>

> > > Sebastian<br>

> > > <br>

> > > <br>

> > > > Lee<br>

> > > > _______________________________________________<br>

> > > > NumPy-Discussion mailing list<br>

> > > > <a href="mailto:NumPy-Discussion@python.org" target="_blank">NumPy-Discussion@python.org</a><br>

> > > > <a href="https://mail.python.org/mailman/listinfo/numpy-discussion" rel="noreferrer" target="_blank">https://mail.python.org/mailman/listinfo/numpy-discussion</a><br>

> > > <br>

> > > _______________________________________________<br>

> > > NumPy-Discussion mailing list<br>

> > > <a href="mailto:NumPy-Discussion@python.org" target="_blank">NumPy-Discussion@python.org</a><br>

> > > <a href="https://mail.python.org/mailman/listinfo/numpy-discussion" rel="noreferrer" target="_blank">https://mail.python.org/mailman/listinfo/numpy-discussion</a><br>

> > > <br>

> > _______________________________________________<br>

> > NumPy-Discussion mailing list<br>

> > <a href="mailto:NumPy-Discussion@python.org" target="_blank">NumPy-Discussion@python.org</a><br>

> > <a href="https://mail.python.org/mailman/listinfo/numpy-discussion" rel="noreferrer" target="_blank">https://mail.python.org/mailman/listinfo/numpy-discussion</a><br>

> <br>

> _______________________________________________<br>

> NumPy-Discussion mailing list<br>

> <a href="mailto:NumPy-Discussion@python.org" target="_blank">NumPy-Discussion@python.org</a><br>

> <a href="https://mail.python.org/mailman/listinfo/numpy-discussion" rel="noreferrer" target="_blank">https://mail.python.org/mailman/listinfo/numpy-discussion</a><br>

<br>

_______________________________________________<br>

NumPy-Discussion mailing list<br>

<a href="mailto:NumPy-Discussion@python.org" target="_blank">NumPy-Discussion@python.org</a><br>

<a href="https://mail.python.org/mailman/listinfo/numpy-discussion" rel="noreferrer" target="_blank">https://mail.python.org/mailman/listinfo/numpy-discussion</a><br>

</blockquote></div>