type and kind for custom dtypes
Hi, I'm working on building a number of related custom dtypes, and I'm not sure how to set the type and kind fields in PyArray_Descr. I tried using type='V' and choosing a single unused kind for all my dtypes; this mostly worked, except I found that coercions would sometimes treat values of two different dtypes as if they were the same. But not always... sometimes my registered cast functions would be called. Through trial and error, I've found that if I choose an unused type code for each dtype, coercion seems to work as I expect it to (no coercion unless I've provided a cast). The kind doesn't seem to matter. I couldn't find any guidance in the docs for how to choose these values. Apologies if I've overlooked something. Could someone please advise me? More widely, is there some global registry of these codes? Is the number of NumPy dtypes limited to the number of (UTF-8-encodable) chars? It seems like common practice to use dtype.kind in user code. If I use one or more for my custom dtypes, is there any mechanism to ensure they do not collide with others'? Are there any other semantics for either field I should take into account? Thanks, Alex
On May 5, 2019, at 10:58, Alex Samuel <alex@alexsamuel.net> wrote:
Through trial and error, I've found that if I choose an unused type code for each dtype, coercion seems to work as I expect it to (no coercion unless I've provided a cast). The kind doesn't seem to matter.
Apologies, a correction: I mixed up kind and type above. I meant that I've found I need to choose distinct kinds for the coercion rules to treat my dtypes as distinct, rather than the type.
Hi Alex, On Sun, 2019-05-05 at 11:03 -0400, Alex Samuel wrote:
On May 5, 2019, at 10:58, Alex Samuel <alex@alexsamuel.net> wrote:
Through trial and error, I've found that if I choose an unused type code for each dtype, coercion seems to work as I expect it to (no coercion unless I've provided a cast). The kind doesn't seem to matter.
Apologies, a correction: I mixed up kind and type above. I meant that I've found I need to choose distinct kinds for the coercion rules to treat my dtypes as distinct, rather than the type.
It is cool to here about interest in custom dtypes. Numpy has the concept of "same-kind" casting, which may be what bites you here? So you have unsafe casting, but because you pick the same "kind" numpy thinks it is OK to do it in ufuncs? There may also be issues surrounding 0-D arrays casting differently. I honestly do not think there is any way to ensure you do not collide with other kinds right now, but will check more closely tomorrow. I am currently not even quite sure how the type code really interacts when we have usertypes, and a bit surprised about what you describe. We are now starting the progress of trying to improve the situation with creating custom dtypes. There will actually be discussions about this end of next week (in Berkeley). But in any case I would be very interested in your specific use-case and needs, and hopefully we can help you also on your end with the current situation. We can discuss on the list, or get in contact privately. Best Regards, Sebastian
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
We are now starting the progress of trying to improve the situation with creating custom dtypes. There will actually be discussions about this end of next week (in Berkeley). But in any case I would be very interested in your specific use-case and needs, and hopefully we can help you also on your end with the current situation. We can discuss on the list, or get in contact privately.
Unfortunately, I'm in NYC, but I'd be happy to participate however I can, whether it is to describe my use case, or help writing docs, or just chat. Here's some info about my project: Ora (https://github.com/alexhsamuel/ora/ <https://github.com/alexhsamuel/ora/>) is a new date/time implementation. The intention is to provide types with ticks-since-epoch representation (rather than YMD, HMS) with full functionality for both standalone scalar (i.e. no NumPy) and ndarray use cases. Essentially, the convenience of datetime, with the performance of datetime64, and much of dateutil rolled in. I've also experimented with a number of other matters, including variable width/precision/range types. As a result I provide various time, date, and time-of-day types, for instance 32-, 64-, and 128-bit time types, and each has a corresponding dtype and complete NumPy support. It's possible to adjust this set of types, if you are willing to recompile (C++). That's why I'm interested in how dtypes are managed globally. Ora has a lot of functionality that works well, and performance is good, though it's so far a solo project and there are still lots of rough edges / missing features / bugs. I'd love to get feedback from people who work with dates and times a lot, either scalar or vectorized. My wish list for NumPy's dtype support is, - better docs on writing dtypes (though they are not bad) - ability to use a scalar type that doesn't derive from a NumPy base type, so that the scalar type can be used without importing NumPy - clear management for dtypes Please let me know how best I could participate or help. Regards, Alex
OK, I looked into the code, so here is a small followup. On Sun, 2019-05-05 at 10:58 -0400, Alex Samuel wrote:
Hi,
I'm working on building a number of related custom dtypes, and I'm not sure how to set the type and kind fields in PyArray_Descr. I tried using type='V' and choosing a single unused kind for all my dtypes; this mostly worked, except I found that coercions would sometimes treat values of two different dtypes as if they were the same. But not always... sometimes my registered cast functions would be called.
The reason is that when the "kind" and "itemsize" and "byte order" are identical, the numpy code decides that data types can be cast (because they are equivalent). So basically, the "kind" must not be equal unless the "type"/dtype only differs in precision or similar. (The relevant code is in multiarraymodule.c in PyArray_EquivTypes)
Through trial and error, I've found that if I choose an unused type code for each dtype, coercion seems to work as I expect it to (no coercion unless I've provided a cast). The kind doesn't seem to matter.
I couldn't find any guidance in the docs for how to choose these values. Apologies if I've overlooked something. Could someone please advise me?
Frankly, I do not think there is any, because nobody ever created many types (there is only quaternions and rationals publicly available).
More widely, is there some global registry of these codes? Is the number of NumPy dtypes limited to the number of (UTF-8-encodable) chars? It seems like common practice to use dtype.kind in user code. If I use one or more for my custom dtypes, is there any mechanism to ensure they do not collide with others'? Are there any other semantics for either field I should take into account?
I have checked the code, and no, there appears to be no such thing currently. I suppose (on the C-side) you could find all types, by using their type number and then asking them. dtype.kind is indeed used a lot, mostly to decide that a type is e.g. an integer. My best guess right now is that the rule you saw above is the only thing you have to take into account. Best, Sebastian
Thanks, Alex
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
Thanks very much for looking into this!
The reason is that when the "kind" and "itemsize" and "byte order" are identical, the numpy code decides that data types can be cast (because they are equivalent). So basically, the "kind" must not be equal unless the "type"/dtype only differs in precision or similar.
(The relevant code is in multiarraymodule.c in PyArray_EquivTypes)
That makes sense, and explains why the cast-less coercion takes place for some type pairs and not for others.
Frankly, I do not think there is any, because nobody ever created many types (there is only quaternions and rationals publicly available).
OK. I'm a bit surprised to hear this, as the API for adding dtypes is actually rather straightforward! For now, then, I will stick with my current scheme of assigning successive kind values to my dtypes, and hope for the best when running with other extension dtypes (which, it seems, may be unlikely).
participants (2)
-
Alex Samuel
-
Sebastian Berg