[Numpy-discussion] Proposal: NEP 41 -- First step towards a new Datatype System

Francesc Alted faltet at gmail.com
Tue Mar 24 09:31:30 EDT 2020

On Tue, Mar 24, 2020 at 12:12 PM Matti Picus <matti.picus at gmail.com> wrote:

> On 24/3/20 11:48 am, Francesc Alted wrote:
> >
> > What I am trying to say is that NumPy should be rather agnostic about
> > providing data types beyond the relatively simple set that already
> > supports.  I am suggesting that focusing on providing a way to allow
> > the storage (not only in-memory, but also persisted arrays via
> > .npy/.npz files) of user-defined data types (or any other kind of
> > metadata)  and let 3rd party libraries use this machinery to
> > serialize/deserialize them might be a better use of resources.
> >
> > ...
> > Cheers,
> > Francesc
> >
> I agree that the goal is to enable user-defined data types, and even
> make the creation of them from python possible (with some caveats about
> performance). But I think this should be done in steps, and as the
> subject line says this is the first step. There are many scary details
> to work out around the problems of promotion and casting, what to do
> when the output might overflow, how to mark missing values and more. The
> question at hand is, as I understand it, one of finding the right way to
> create a data type object that will enable exactly what you propose. I
> think this is the correct path, as most large refactor-in-one-step
> efforts I have seem leave both the old code and the new code in an
> unusable state for years until the bugs are worked out.

Thanks Matti for clarifying the goals of the NEP; having the sentence "New
Datatype System" in the title sounded scary to my ears indeed, and I share
your concerns about new code largely undergoing 'beta' stage for long time.
Before shutting up, I'll just reiterate that providing pretty shallow
machinery for allowing the integration with user-defined data types should
avoid big headaches: the simpler, the better.  But this is of course up to
the maintainers.

> As for serialization protocols: I think that is a separate issue. We
> already have the npy/npz protocol, PEP3118 buffer protocol, and the
> pickle 5 buffering protocol. Each of them handle user-defined data types
> in different ways, with differing amounts of success.

Yup, I forgot the buffer protocol an pickle 5.  Thanks for reminder.

Francesc Alted
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20200324/18746104/attachment-0001.html>

More information about the NumPy-Discussion mailing list