[Numpy-discussion] Custom dtypes without C -- or, a standard ndarray-like type

Stephan Hoyer shoyer at gmail.com
Sun Sep 21 19:50:12 EDT 2014

pandas has some hacks to support custom types of data for which numpy can't
handle well enough or at all. Examples include datetime and Categorical
[1], and others like GeoArray [2] that haven't make it into pandas yet.

Most of these look like numpy arrays but with custom dtypes and type
specific methods/properties. But clearly nobody is particularly excited
about writing the the C necessary to implement custom dtypes [3]. Nor is do
we need the ndarray ABI.

In many cases, writing C may not actually even be necessary for performance
reasons, e.g., categorical can be fast enough just by wrapping an integer
ndarray for the internal storage and using vectorized operations. And even
if it is necessary, I think we'd all rather write Cython than C.

It's great for pandas to write its own ndarray-like wrappers (*not*
subclasses) that work with pandas, but it's a shame that there isn't a
standard interface like the ndarray to make these arrays useable for the
rest of the scientific Python ecosystem. For example, pandas has loads of
fixes for np.datetime64, but nobody seems to be up for porting them to
numpy (I doubt it would be easy).

I know these sort of concerns are not new, but I wish I had a sense of what
the solution looks like. Is anyone actively working on these issues? Does
the fix belong in numpy, pandas, blaze or a new project? I'd love to get a
sense of where things stand and how I could help -- without writing any C


[1] https://github.com/pydata/pandas/pull/7217
[2] https://github.com/geopandas/geopandas/issues/166
[3] https://github.com/numpy/numpy-dtypes
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20140921/0433f7a3/attachment.html>

More information about the NumPy-Discussion mailing list