[Numpy-discussion] Proposal: NEP 41 -- First step towards a new Datatype System

Matti Picus matti.picus at gmail.com
Tue Mar 24 07:12:10 EDT 2020


On 24/3/20 11:48 am, Francesc Alted wrote:
>
> What I am trying to say is that NumPy should be rather agnostic about 
> providing data types beyond the relatively simple set that already 
> supports.  I am suggesting that focusing on providing a way to allow 
> the storage (not only in-memory, but also persisted arrays via 
> .npy/.npz files) of user-defined data types (or any other kind of 
> metadata)  and let 3rd party libraries use this machinery to 
> serialize/deserialize them might be a better use of resources.
>
> ...
> Cheers,
> Francesc
>
I agree that the goal is to enable user-defined data types, and even 
make the creation of them from python possible (with some caveats about 
performance). But I think this should be done in steps, and as the 
subject line says this is the first step. There are many scary details 
to work out around the problems of promotion and casting, what to do 
when the output might overflow, how to mark missing values and more. The 
question at hand is, as I understand it, one of finding the right way to 
create a data type object that will enable exactly what you propose. I 
think this is the correct path, as most large refactor-in-one-step 
efforts I have seem leave both the old code and the new code in an 
unusable state for years until the bugs are worked out.


As for serialization protocols: I think that is a separate issue. We 
already have the npy/npz protocol, PEP3118 buffer protocol, and the 
pickle 5 buffering protocol. Each of them handle user-defined data types 
in different ways, with differing amounts of success.


Matti



More information about the NumPy-Discussion mailing list