[Numpy-discussion] Custom dtypes without C -- or, a standard ndarray-like type

Stephan Hoyer shoyer at gmail.com
Tue Sep 23 02:42:13 EDT 2014

On Sun, Sep 21, 2014 at 8:31 PM, Nathaniel Smith <njs at pobox.com> wrote:

> For cases where people genuinely want to implement a new array-like
> types (e.g. DataFrame or scipy.sparse) then numpy provides a fair
> amount of support for this already (e.g., the various hooks that allow
> things like np.asarray(mydf) or np.sin(mydf) to work), and we're
> working on adding more over time (e.g., __numpy_ufunc__).

Agreed, numpy does a great job of this. It has been a surprising pleasure
to integrate with numpy for my custom array-like types in xray.
__numpy_ufunc__ will let us add a few more neat tricks.

> My feeling though is that in most of the cases you mention,
> implementing a new array-like type is huge overkill. ndarray's
> interface is vast and reimplementing even 90% of it is a huge effort.
> For most of the cases that people seem to run into in practice, the
> solution is to enhance numpy's dtype interface so that it's possible
> for mere mortals to implement new dtypes, e.g. by just subclassing
> np.dtype. This is totally doable and would enable a ton of
> awesomeness, but it requires someone with the time to sit down and
> work on it, and no-one has volunteered yet. Unfortunately it does
> require hacking on C code though.

Something to allow mere mortals such as myself to implement new dtypes
sounds wonderful!

Would it be useful to prototype something like this in pure Python? That
sounds like a task that I could be up for. Like I said, I expect a (mostly)
pure Python solution, at least for categorical and datetime, would be a
more maintainable and even performant enough for use in pandas (given that
this is basically the current approach), as long as the bottlenecks are
dealt with appropriately. Anyone else interested in hacking on this with me?

For what it's worth, I am not convinced that it is that terrible to
reimplement most of the ndarray interface. As long as your object looks
pretty much like an ndarray with a custom dtype, it should be quite
straightforward to wrap the underlying array's methods/properties. So I'm
not too scared of that option, although I agree that it is a complete waste
to do it again and again.

Nathaniel and Jeff  -- thank you so much for detailed replies.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20140922/c930baa3/attachment.html>

More information about the NumPy-Discussion mailing list