[Numpy-discussion] Custom dtypes without C -- or, a standard ndarray-like type

Charles R Harris charlesr.harris at gmail.com
Sun Sep 21 20:13:39 EDT 2014


On Sun, Sep 21, 2014 at 5:50 PM, Stephan Hoyer <shoyer at gmail.com> wrote:

> pandas has some hacks to support custom types of data for which numpy
> can't handle well enough or at all. Examples include datetime and
> Categorical [1], and others like GeoArray [2] that haven't make it into
> pandas yet.
>
> Most of these look like numpy arrays but with custom dtypes and type
> specific methods/properties. But clearly nobody is particularly excited
> about writing the the C necessary to implement custom dtypes [3]. Nor is do
> we need the ndarray ABI.
>
> In many cases, writing C may not actually even be necessary for
> performance reasons, e.g., categorical can be fast enough just by wrapping
> an integer ndarray for the internal storage and using vectorized
> operations. And even if it is necessary, I think we'd all rather write
> Cython than C.
>
> It's great for pandas to write its own ndarray-like wrappers (*not*
> subclasses) that work with pandas, but it's a shame that there isn't a
> standard interface like the ndarray to make these arrays useable for the
> rest of the scientific Python ecosystem. For example, pandas has loads of
> fixes for np.datetime64, but nobody seems to be up for porting them to
> numpy (I doubt it would be easy).
>
> I know these sort of concerns are not new, but I wish I had a sense of
> what the solution looks like. Is anyone actively working on these issues?
> Does the fix belong in numpy, pandas, blaze or a new project? I'd love to
> get a sense of where things stand and how I could help -- without writing
> any C :).
>
>
I haven't thought much about this myself, but others (Nathaniel?) have, and
it would be good to explore the topic and maybe put together some
examples/templates to make this approach easier. Input from someone with
some experience would be *much* appreciated.

The datetime problem persists and I've thinking it would be nice to replace
the current implementation with something simpler that can be stolen from
elsewhere. It would be nice to hear how someone else dealt with the problem.

Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20140921/c0279c4c/attachment.html>


More information about the NumPy-Discussion mailing list