
Hi folks, Apologies if this is documented somewhere, but I haven't been able to find it. I've read through NEP-42 [1] and skimmed NEP-41 [2], but I'm not sure: (a) at what point of implementation we are, and (b) if it's pretty much done, *how* to define a custom categorical dtype. In my use case, I'd need a dtype that is implemented as some int scalar where only certain values are allowed, ie the NumPy equivalent of: class Label(Enum) CAR = 1 DOG = 45 NULL = 255 But with the ability to specify that I only need a uint8 in this case. Is that possible today using Python (no C/Cython) and if so, is there some documentation or user example or StackOverflow answer that shows how to do this? If not, is it a design goal of the NEPs to allow such a thing? (I can be patient 😂) Thank you! Juan. [1]: https://numpy.org/neps/nep-0042-new-dtypes.html [2]: https://numpy.org/neps/nep-0041-improved-dtype-support.html

On Mon, 2023-05-29 at 10:55 +1000, Juan Nunez-Iglesias wrote:
The NEP is pretty far along and we have some examples of use here: https://github.com/numpy/numpy-user-dtypes There are still kinks to be iron out thouh and nobody has tried a "categorical" type functionality yet. However, without C/Cython it is not possible at this time. What we need is a Categorical or Enum DType implemented in C, which would then allow creating the specific `LabelDType` in Python. [1] On the other hand, writing that single C implementation for a minimal `IntEnum` DType factory is likely quite reasonably scoped. (As a prototype implementation, but I expect adapting to a final version should be smooth.) - Sebastian [1] Maybe as a DType factory in C to create arbitrary `IntEnum` likes, maybe as parametric DType. I suspect the first is the right way, it may be tedious or even very hard right now, that is a kink that needs ironing out eventually. Python 3.12 has some fixes around Metaclass instantiation in C (with backcompat hacks) which hopefully make this less of a drain on sanity.

On Mon, 2023-05-29 at 10:55 +1000, Juan Nunez-Iglesias wrote:
The NEP is pretty far along and we have some examples of use here: https://github.com/numpy/numpy-user-dtypes There are still kinks to be iron out thouh and nobody has tried a "categorical" type functionality yet. However, without C/Cython it is not possible at this time. What we need is a Categorical or Enum DType implemented in C, which would then allow creating the specific `LabelDType` in Python. [1] On the other hand, writing that single C implementation for a minimal `IntEnum` DType factory is likely quite reasonably scoped. (As a prototype implementation, but I expect adapting to a final version should be smooth.) - Sebastian [1] Maybe as a DType factory in C to create arbitrary `IntEnum` likes, maybe as parametric DType. I suspect the first is the right way, it may be tedious or even very hard right now, that is a kink that needs ironing out eventually. Python 3.12 has some fixes around Metaclass instantiation in C (with backcompat hacks) which hopefully make this less of a drain on sanity.
participants (2)
-
Juan Nunez-Iglesias
-
Sebastian Berg