DType Roadmap/NEP Discussion

Hi all,
to try and make some progress towards a decision since the broad design is pretty much settling from my side. I am thinking about making a meeting, and suggest Monday at 11am Pacific Time (I am open to other times though).
My hope is to get everyone interested on board, so that we can make an informed decision about the general direction very soon. So just reach out, or discuss on the mailing list as well.
The current draft for an NEP is here: https://hackmd.io/kxuh15QGSjueEKft5SaMug?both
There are some design goals that I would like to clear up. I would prefer to avoid deep discussions of some specific issues, since I think the important decision right now is that my general start is in the right direction.
It is not an easy topic, so my plan would be try and briefly summarize that and then hopefully clarify any questions and then we can discuss why alternatives are rejected. The most important thing is maybe gathering concerns which need to be clarified before we can go towards accepting the general design ideas.
The main point of the NEP draft is actually captured by the picture in the linked document: DTypes are classes (such as Float64) and what is attached to the array is an instance of that class "<float64" or ">float64". Additionally, we would have AbstractDType classes which cannot be instantiated but define a type hierarchy.
To list the main points:
* DTypes are classes (corresponding to the current type number)
* `arr.dtype` is an instances of its class, allowing to store additional information such as a physical unit, the string length.
* Most things are defined in special dtype slots similar to Pythons type and number slots. They will be hidden and can be set through an init function similar to `PyType_FromSpec` [1].
* Promotion is defined primarily on the DType classes
* Casting from one DType to another DType is defined by a new CastingImpl object (should become a special ufunc) - e.g. for strings, the CastingImpl is in charge of finding the correct string length
* The AbstractDType hierarchy will be used to decide the signature when calling UFuncs.
The main iffier points I can think of are:
* NumPy currently uses value based promotion in some cases, which requires special AbstractDTypes to describe (and some legacy paths). (They are used use more like instances than typical classes)
* Casting between flexible dtypes (such as strings) is a multi-step process to figure out the actual output dtype. - An example is: `np.can_cast("float64", "S3")` first finding that `Float64->String` is possible in principle and then asking the CastingImpl to find that `float64->S3` is not.
* We have to break ABI compatibility in very minor, back-portable way. More smaller incompatibilities are likely [2].
* Since it is a major redesign, a lot of code has to be added/touched, although it is possible to channel much of it back into the old machinery.
* A largish amount of new API around new DType type objects and also DTypeMeta type objects, which users can (although usually do not have to) subclass.
However, most other designs will have similar issues. Basically, I currently really think this is "right", even if some details may end up a tricky.
Best,
Sebastian
PS: The one thing outside the more general list above that I may want to discuss is how acceptable a global dict/mapping for dtype discovery during `np.array` coercion is (mapping python type -> dtype)...
[1] https://docs.python.org/3/c-api/type.html#c.PyType_FromSpec [2] One possible issue may be "S0" which is normally used to denote what in the new API would be the `String` DType class.

Hi Sebastian,
On Wed, Sep 18, 2019 at 4:35 PM Sebastian Berg sebastian@sipsolutions.net wrote:
Hi all,
to try and make some progress towards a decision since the broad design is pretty much settling from my side. I am thinking about making a meeting, and suggest Monday at 11am Pacific Time (I am open to other times though).
My hope is to get everyone interested on board, so that we can make an informed decision about the general direction very soon. So just reach out, or discuss on the mailing list as well.
The current draft for an NEP is here: https://hackmd.io/kxuh15QGSjueEKft5SaMug?both
There are some design goals that I would like to clear up.
The design itself seems very sensible to me insofar as I understand it. After having read your document again, I think you're still missing the actual goals though. "structure of class layout" and "type hierarchy" are important, but they're not the goals. You're touching on the real goals in places, but it may be valuable to be much more explicit there.
Here are some example goals:
1. Make creating new dtypes via the NumPy C API take >4x less lines of code on average (in practice: for rational/quaternion, hard to measure otherwise).
2. Make it possible to create new dypes with full functionality via the NumPy Python API. Performance may be up to 1-2 orders of magnitude worse than when creating the same dtype via the C API; the main purpose is to allow easier prototyping of new dtypes.
3. Make the NumPy codebase more maintainable by removing special-casing of datetime dtypes in many places.
4. Enable creation of a units library whose arrays *are* numpy arrays rather than a subclass or duck array. This will make such a library work much better with SciPy and other existing libraries that use np.asarray extensively.
5. Hide currently exposed implementation details in the C API so long-term .... (you have this one, but it would be nice to work it out a little more - after all we recently considered reverting the deprecation for direct field access, so how important is this?)
6. Improve casting behavior for external dtypes
7. Make np.char behavior better <in ... ways> (you mention fixed length strings work poorly now, but not what would change)
Listing non-goals would also be useful:
1. Performance: no significant performance improvements are expected. We aim for no performance regressions.
2. Introducing new dtypes into NumPy itself
3. Pandas ExtensionArrays? You mention them, but does this dtype redesign help Pandas in any way or not?
4. Changes to NumPy's current casting rules
5. Allow creation of dtypes that don't fit the current NumPy model of what a dtype is (e.g. ref [1]), such as a variable-length string dtype.
Many of those (and there can be more, this is just what came to mind now) can/should be a paragraph or section. In my experience describing these goals and requirements well takes about 15-30% of the length of the design description. Think of for example a Pandas or units library maintainer reading this: they should be able to stop reading at where you now have "Overview Graphic" and have a pretty clear high-level understanding of what this whole redesign will mean for them. Same for a NumPy maintainer who wants to get a sense of what the benefits and impacts will be: reading only (the expanded version of) your Abstract, Motivation and Scope, and Backwards Compatibility sections should be enough.
Here's a concrete question, that's the type of thing I'd like to understand without having to understand the whole design in detail: ```
import datetime
import pandas as pd
import datetime
dti = pd.to_datetime(['1/1/2018', np.datetime64('2018-01-01'),
... datetime.datetime(2018, 1, 1)])
dti.values
array(['2018-01-01T00:00:00.000000000', '2018-01-01T00:00:00.000000000', '2018-01-01T00:00:00.000000000'], dtype='datetime64[ns]')
dti.values.dtype
dtype('<M8[ns]')
isinstance(dti.values.dtype, np.dtype)
True
dti.dtype == dti.values.dtype # okay, that's nice
True
start = pd.to_datetime('2015-02-24')
rng = pd.date_range(start, periods=3)
t = pd.Series(rng)
t_withzone = t.dt.tz_localize('UTC').dt.tz_convert('Asia/Kolkata')
t_withzone
0 2015-02-24 05:30:00+05:30 1 2015-02-25 05:30:00+05:30 2 2015-02-26 05:30:00+05:30 dtype: datetime64[ns, Asia/Kolkata]
t_withzone.dtype
datetime64[ns, Asia/Kolkata]
t_withzone.values.dtype
dtype('<M8[ns]')
t_withzone.dtype == t_withzone.values.dtype # could this be True in
the future? False ``` So can Pandas create timezone-aware numpy dtypes in the future if they want to, or would they still be better off rolling their own?
Also one question/comment about the design content. When looking at the current external dtypes (e.g. [2]), a large part of the work of implementing a new dtype now deals with ufunc behavior. It's not clear from your document how that changes with the new design, can you add something about that?
Cheers, Ralf
[1] http://scipy-lectures.org/advanced/advanced_numpy/index.html#the-descriptor [2] https://github.com/moble/quaternion/blob/master/numpy_quaternion.c

On Wed, 2019-09-18 at 21:33 -0700, Ralf Gommers wrote:
Hi Sebastian,
On Wed, Sep 18, 2019 at 4:35 PM Sebastian Berg < sebastian@sipsolutions.net> wrote:
Hi all,
to try and make some progress towards a decision since the broad design is pretty much settling from my side. I am thinking about making a meeting, and suggest Monday at 11am Pacific Time (I am open to other times though).
My hope is to get everyone interested on board, so that we can make an informed decision about the general direction very soon. So just reach out, or discuss on the mailing list as well.
The current draft for an NEP is here: https://hackmd.io/kxuh15QGSjueEKft5SaMug?both
There are some design goals that I would like to clear up.
The design itself seems very sensible to me insofar as I understand it. After having read your document again, I think you're still missing the actual goals though. "structure of class layout" and "type hierarchy" are important, but they're not the goals. You're touching on the real goals in places, but it may be valuable to be much more explicit there.
Good points, I will try and incorporate some. Had answers to a few, but I do not think it is too helpful here and now; this got a bit longer than expected, but more general...
There is a bit of clash of long term vs. mid term goals. My goal is to enable pretty much any conceivable long term goal, but in the mid/short term, that means:
1. Convince you (and me) that the proposed API can handle everything we can think of now and can grow easily (e.g. optimization, new features).
2. Convince everyone that the current state is unacceptable enough that any added maintenance burden (during the transition phase) is acceptable. I personally think, the maintenance will definitely get better quickly, even if we reuse a lot of old code. The main issue is the initial massive set of changes.
3. Any necessary ABI/API breakage that may happen is acceptable. The DType breakage itself is very limited. Specific UFuncs may break more, but only in hidden features that I know only of astropy as users (and they are OK with us breaking it), numba might also be affected, but I think less so.
The main point right now is organizing everything from monolithic -> operator based, improving long term maintainability and extensibility. Dog feeding ourselves for the same reason.
E.g. the AbstractDType hierarchy... it is something we could discuss. I think it is right, since it replaces `dtype.kind` and makes for powerful organization of dispatching in ufuncs. But, we could limit it initially! To give one example: Say ora creates many DTypes with different datetime representations. ora could create an AbstractOraDType, so that you can do easy isinstance checks. Especially, during ufunc dispatch ora can use it to write a single function for figuring out promotion: `OraDType1 + OraDType1 -> OraDType1 + OraDType2.astype(OraDType1)`.
I agree that this probably missing: UFunc dispatch is a major reason for the split of "common DType" (class) and "common dtype instance" (of strings with different length) functionality. I think it is a reasonable split in any case, but for dispatching the first is sufficient, while the second is more naturally found after dispatching (only after you know you have Unit * Unit, can you reasonably figure out the actual output `Unit("m*m")`).
Best,
Sebastian
PS: The only real limitation that I see right now is allowing promotion to inspect array values. (This example is not very good probably) For example `int_arr.astype(Categorical)`, wants to find `Categorical(np.unique(int_arr))`). I think not providing that is acceptable, because categorical can provide its own function to find the actual categorical instance. Or implement a Categorical and FrozenCategorical, so the dtype instance is mutable in that it can add new categories. (For array coercion from a list of items, the issue is different, and allowing such things can be provided or added later)
Here are some example goals:
- Make creating new dtypes via the NumPy C API take >4x less lines
of code on average (in practice: for rational/quaternion, hard to measure otherwise).
- Make it possible to create new dypes with full functionality via
the NumPy Python API. Performance may be up to 1-2 orders of magnitude worse than when creating the same dtype via the C API; the main purpose is to allow easier prototyping of new dtypes.
- Make the NumPy codebase more maintainable by removing special-
casing of datetime dtypes in many places.
- Enable creation of a units library whose arrays *are* numpy arrays
rather than a subclass or duck array. This will make such a library work much better with SciPy and other existing libraries that use np.asarray extensively.
- Hide currently exposed implementation details in the C API so
long-term .... (you have this one, but it would be nice to work it out a little more - after all we recently considered reverting the deprecation for direct field access, so how important is this?)
Improve casting behavior for external dtypes
Make np.char behavior better <in ... ways> (you mention fixed
length strings work poorly now, but not what would change)
Listing non-goals would also be useful:
- Performance: no significant performance improvements are expected.
We aim for no performance regressions.
Introducing new dtypes into NumPy itself
Pandas ExtensionArrays? You mention them, but does this dtype
redesign help Pandas in any way or not?
Changes to NumPy's current casting rules
Allow creation of dtypes that don't fit the current NumPy model of
what a dtype is (e.g. ref [1]), such as a variable-length string dtype.
Many of those (and there can be more, this is just what came to mind now) can/should be a paragraph or section. In my experience describing these goals and requirements well takes about 15-30% of the length of the design description. Think of for example a Pandas or units library maintainer reading this: they should be able to stop reading at where you now have "Overview Graphic" and have a pretty clear high-level understanding of what this whole redesign will mean for them. Same for a NumPy maintainer who wants to get a sense of what the benefits and impacts will be: reading only (the expanded version of) your Abstract, Motivation and Scope, and Backwards Compatibility sections should be enough.
Here's a concrete question, that's the type of thing I'd like to understand without having to understand the whole design in detail:
>>> import datetime >>> import pandas as pd >>> import datetime >>> dti = pd.to_datetime(['1/1/2018', np.datetime64('2018-01-01'), ... datetime.datetime(2018, 1, 1)]) >>> >>> dti.values array(['2018-01-01T00:00:00.000000000', '2018-01- 01T00:00:00.000000000', '2018-01-01T00:00:00.000000000'], dtype='datetime64[ns]') >>> dti.values.dtype dtype('<M8[ns]') >>> isinstance(dti.values.dtype, np.dtype) True >>> dti.dtype == dti.values.dtype # okay, that's nice True >>> start = pd.to_datetime('2015-02-24') >>> rng = pd.date_range(start, periods=3) >>> t = pd.Series(rng) >>> t_withzone = t.dt.tz_localize('UTC').dt.tz_convert('Asia/Kolkata') >>> t_withzone 0 2015-02-24 05:30:00+05:30 1 2015-02-25 05:30:00+05:30 2 2015-02-26 05:30:00+05:30 dtype: datetime64[ns, Asia/Kolkata] >>> t_withzone.dtype datetime64[ns, Asia/Kolkata] >>> t_withzone.values.dtype dtype('<M8[ns]') >>> t_withzone.dtype == t_withzone.values.dtype # could this be True in the future? False
So can Pandas create timezone-aware numpy dtypes in the future if they want to, or would they still be better off rolling their own?
Also one question/comment about the design content. When looking at the current external dtypes (e.g. [2]), a large part of the work of implementing a new dtype now deals with ufunc behavior. It's not clear from your document how that changes with the new design, can you add something about that?
Cheers, Ralf
[1] http://scipy-lectures.org/advanced/advanced_numpy/index.html#the-descriptor [2] https://github.com/moble/quaternion/blob/master/numpy_quaternion.c
NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion

On 19/9/19 2:34 am, Sebastian Berg wrote:
Hi all,
to try and make some progress towards a decision since the broad design is pretty much settling from my side. I am thinking about making a meeting, and suggest Monday at 11am Pacific Time (I am open to other times though).
My hope is to get everyone interested on board, so that we can make an informed decision about the general direction very soon. So just reach out, or discuss on the mailing list as well.
The current draft for an NEP is here: https://hackmd.io/kxuh15QGSjueEKft5SaMug?both
Mon Sept 23 sounds good. Please reach out to the possible consumers of the API to get wider input.
- Pandas
- Astropy
- Numba
- ???
It may be a bit too short notice, but it seems like there is enough to talk about even if only the NumPy community show up.
Where/how will the meeting take place?
Matti

On Thu, 2019-09-19 at 21:35 +0300, Matti Picus wrote:
On 19/9/19 2:34 am, Sebastian Berg wrote:
Hi all,
to try and make some progress towards a decision since the broad design is pretty much settling from my side. I am thinking about making a meeting, and suggest Monday at 11am Pacific Time (I am open to other times though).
My hope is to get everyone interested on board, so that we can make an informed decision about the general direction very soon. So just reach out, or discuss on the mailing list as well.
The current draft for an NEP is here: https://hackmd.io/kxuh15QGSjueEKft5SaMug?both
Mon Sept 23 sounds good. Please reach out to the possible consumers of the API to get wider input.
Pandas
Astropy
Numba
???
It may be a bit too short notice, but it seems like there is enough to talk about even if only the NumPy community show up.
Where/how will the meeting take place?
Lets go for the typical zoom link. I will add a few points later probably, but to be able to update things easily, see:
https://hackmd.io/5S3ADAdOSIeaUwFxlvajMA
Best,
Sebastian
Matti
NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion

On Fri, Sep 20, 2019 at 3:10 PM Sebastian Berg sebastian@sipsolutions.net wrote:
On Thu, 2019-09-19 at 21:35 +0300, Matti Picus wrote:
On 19/9/19 2:34 am, Sebastian Berg wrote:
Hi all,
to try and make some progress towards a decision since the broad design is pretty much settling from my side. I am thinking about making a meeting, and suggest Monday at 11am Pacific Time (I am open to other times though).
My hope is to get everyone interested on board, so that we can make an informed decision about the general direction very soon. So just reach out, or discuss on the mailing list as well.
The current draft for an NEP is here: https://hackmd.io/kxuh15QGSjueEKft5SaMug?both
Mon Sept 23 sounds good. Please reach out to the possible consumers of the API to get wider input.
Pandas
Astropy
Numba
???
It may be a bit too short notice, but it seems like there is enough to talk about even if only the NumPy community show up.
Where/how will the meeting take place?
Lets go for the typical zoom link. I will add a few points later probably, but to be able to update things easily, see:
Is there a time set for this meeting? I'll try to attend from the pandas side of things.
Best,
Sebastian
Matti
NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion

On Mon, Sep 23, 2019 at 1:40 PM Tom Augspurger tom.w.augspurger@gmail.com wrote:
On Fri, Sep 20, 2019 at 3:10 PM Sebastian Berg sebastian@sipsolutions.net wrote:
On Thu, 2019-09-19 at 21:35 +0300, Matti Picus wrote:
On 19/9/19 2:34 am, Sebastian Berg wrote:
Hi all,
to try and make some progress towards a decision since the broad design is pretty much settling from my side. I am thinking about making a meeting, and suggest Monday at 11am Pacific Time (I am open to other times though).
My hope is to get everyone interested on board, so that we can make an informed decision about the general direction very soon. So just reach out, or discuss on the mailing list as well.
The current draft for an NEP is here: https://hackmd.io/kxuh15QGSjueEKft5SaMug?both
Mon Sept 23 sounds good. Please reach out to the possible consumers of the API to get wider input.
Pandas
Astropy
Numba
???
It may be a bit too short notice, but it seems like there is enough to talk about even if only the NumPy community show up.
Where/how will the meeting take place?
Lets go for the typical zoom link. I will add a few points later probably, but to be able to update things easily, see:
Is there a time set for this meeting? I'll try to attend from the pandas side of things.
The HackMD link above says 11am PST (so ~6 hours from now), and also contains a Zoom link to join the call.
Cheers, Ralf

On Mon, 2019-09-23 at 13:44 +0200, Ralf Gommers wrote:
On Mon, Sep 23, 2019 at 1:40 PM Tom Augspurger < tom.w.augspurger@gmail.com> wrote:
On Fri, Sep 20, 2019 at 3:10 PM Sebastian Berg < sebastian@sipsolutions.net> wrote:
On Thu, 2019-09-19 at 21:35 +0300, Matti Picus wrote:
On 19/9/19 2:34 am, Sebastian Berg wrote:
Hi all,
to try and make some progress towards a decision since the
broad
design is pretty much settling from my side. I am thinking about
making a
meeting, and suggest Monday at 11am Pacific Time (I am open
to
other times though).
My hope is to get everyone interested on board, so that we
can make
an informed decision about the general direction very soon. So
just
reach out, or discuss on the mailing list as well.
The current draft for an NEP is here: https://hackmd.io/kxuh15QGSjueEKft5SaMug?both
Mon Sept 23 sounds good. Please reach out to the possible
consumers
of the API to get wider input.
Pandas
Astropy
Numba
???
It may be a bit too short notice, but it seems like there is
enough
to talk about even if only the NumPy community show up.
Where/how will the meeting take place?
Lets go for the typical zoom link. I will add a few points later probably, but to be able to update things easily, see:
Is there a time set for this meeting? I'll try to attend from the pandas side of things.
The HackMD link above says 11am PST (so ~6 hours from now), and also contains a Zoom link to join the call.
Just to let you know, unfortunately our room is in use, so we will have to use a different zoom link: https://zoom.us/j/6398421986 (the HackMD is updated)
Cheers,
Sebastian
Cheers, Ralf _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion

Since it probably got lost. I am currently developing things at:
https://github.com/seberg/numpy/tree/dtypemeta
Please do not expect the tidiest code at the moment. The public API is not yet available, and currently mainly at a proof-of-concept stage, that things like: * Promotion * Casting * Array creation – coercion `np.array(...)` (fairly far along) * AbstractDTypes for value based casting
work. Also of course generally having a DTypeMeta class and "<f8" being a subclass of a Float64 class working well.
The largest missing point are that UFuncs need to be resturctured for new DTypes to be useful to the user. Until then, the only gain we have is structuring some code better.
Most of the time it should build and run (currently all, in the future maybe most) tests fine. I may force push occasionally (if just to rebase on master).
If you would like direct push access just send me a mail.
Maybe a side note: datetimes work of course, and they have a unit attached to them, so are a simple version of a "unit dtype" proof-of- concept.
Best,
Sebastian
On Mon, 2019-09-23 at 10:43 -0700, Sebastian Berg wrote:
On Mon, 2019-09-23 at 13:44 +0200, Ralf Gommers wrote:
On Mon, Sep 23, 2019 at 1:40 PM Tom Augspurger < tom.w.augspurger@gmail.com> wrote:
On Fri, Sep 20, 2019 at 3:10 PM Sebastian Berg < sebastian@sipsolutions.net> wrote:
On Thu, 2019-09-19 at 21:35 +0300, Matti Picus wrote:
On 19/9/19 2:34 am, Sebastian Berg wrote:
Hi all,
to try and make some progress towards a decision since the
broad
design is pretty much settling from my side. I am thinking about
making a
meeting, and suggest Monday at 11am Pacific Time (I am open
to
other times though).
My hope is to get everyone interested on board, so that we
can make
an informed decision about the general direction very soon. So
just
reach out, or discuss on the mailing list as well.
The current draft for an NEP is here: https://hackmd.io/kxuh15QGSjueEKft5SaMug?both
Mon Sept 23 sounds good. Please reach out to the possible
consumers
of the API to get wider input.
Pandas
Astropy
Numba
???
It may be a bit too short notice, but it seems like there is
enough
to talk about even if only the NumPy community show up.
Where/how will the meeting take place?
Lets go for the typical zoom link. I will add a few points later probably, but to be able to update things easily, see:
Is there a time set for this meeting? I'll try to attend from the pandas side of things.
The HackMD link above says 11am PST (so ~6 hours from now), and also contains a Zoom link to join the call.
Just to let you know, unfortunately our room is in use, so we will have to use a different zoom link: https://zoom.us/j/6398421986 (the HackMD is updated)
Cheers,
Sebastian
Cheers, Ralf _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion

On 9/18/19, Sebastian Berg sebastian@sipsolutions.net wrote:
Hi all,
to try and make some progress towards a decision since the broad design is pretty much settling from my side. I am thinking about making a meeting, and suggest Monday at 11am Pacific Time (I am open to other times though).
That works for me.
Warren
My hope is to get everyone interested on board, so that we can make an informed decision about the general direction very soon. So just reach out, or discuss on the mailing list as well.
The current draft for an NEP is here: https://hackmd.io/kxuh15QGSjueEKft5SaMug?both
There are some design goals that I would like to clear up. I would prefer to avoid deep discussions of some specific issues, since I think the important decision right now is that my general start is in the right direction.
It is not an easy topic, so my plan would be try and briefly summarize that and then hopefully clarify any questions and then we can discuss why alternatives are rejected. The most important thing is maybe gathering concerns which need to be clarified before we can go towards accepting the general design ideas.
The main point of the NEP draft is actually captured by the picture in the linked document: DTypes are classes (such as Float64) and what is attached to the array is an instance of that class "<float64" or ">float64". Additionally, we would have AbstractDType classes which cannot be instantiated but define a type hierarchy.
To list the main points:
DTypes are classes (corresponding to the current type number)
`arr.dtype` is an instances of its class, allowing to store additional information such as a physical unit, the string length.
Most things are defined in special dtype slots similar to Pythons type and number slots. They will be hidden and can be set through an init function similar to `PyType_FromSpec` [1].
Promotion is defined primarily on the DType classes
Casting from one DType to another DType is defined by a new CastingImpl object (should become a special ufunc)
- e.g. for strings, the CastingImpl is in charge of finding the correct string length
The AbstractDType hierarchy will be used to decide the signature when calling UFuncs.
The main iffier points I can think of are:
NumPy currently uses value based promotion in some cases, which requires special AbstractDTypes to describe (and some legacy paths). (They are used use more like instances than typical classes)
Casting between flexible dtypes (such as strings) is a multi-step process to figure out the actual output dtype.
- An example is: `np.can_cast("float64", "S3")` first finding that `Float64->String` is possible in principle and then asking the CastingImpl to find that `float64->S3` is not.
We have to break ABI compatibility in very minor, back-portable way. More smaller incompatibilities are likely [2].
Since it is a major redesign, a lot of code has to be added/touched, although it is possible to channel much of it back into the old machinery.
A largish amount of new API around new DType type objects and also DTypeMeta type objects, which users can (although usually do not have to) subclass.
However, most other designs will have similar issues. Basically, I currently really think this is "right", even if some details may end up a tricky.
Best,
Sebastian
PS: The one thing outside the more general list above that I may want to discuss is how acceptable a global dict/mapping for dtype discovery during `np.array` coercion is (mapping python type -> dtype)...
[1] https://docs.python.org/3/c-api/type.html#c.PyType_FromSpec [2] One possible issue may be "S0" which is normally used to denote what in the new API would be the `String` DType class.
participants (5)
-
Matti Picus
-
Ralf Gommers
-
Sebastian Berg
-
Tom Augspurger
-
Warren Weckesser