<div dir="ltr">pandas has some hacks to support custom types of data for which numpy can't handle well enough or at all. Examples include datetime and Categorical [1], and others like GeoArray [2] that haven't make it into pandas yet.<div><div><br></div><div>Most of these look like numpy arrays but with custom dtypes and type specific methods/properties. But clearly nobody is particularly excited about writing the the C necessary to implement custom dtypes [3]. Nor is do we need the ndarray ABI.</div><div><br></div><div>In many cases, writing C may not actually even be necessary for performance reasons, e.g., categorical can be fast enough just by wrapping an integer ndarray for the internal storage and using vectorized operations. And even if it is necessary, I think we'd all rather write Cython than C.</div><div><br></div><div>It's great for pandas to write its own ndarray-like wrappers (*not* subclasses) that work with pandas, but it's a shame that there isn't a standard interface like the ndarray to make these arrays useable for the rest of the scientific Python ecosystem. For example, pandas has loads of fixes for np.datetime64, but nobody seems to be up for porting them to numpy (I doubt it would be easy).</div><div><br></div><div>I know these sort of concerns are not new, but I wish I had a sense of what the solution looks like. Is anyone actively working on these issues? Does the fix belong in numpy, pandas, blaze or a new project? I'd love to get a sense of where things stand and how I could help -- without writing any C :).</div><div><div><br></div><div>Thanks,</div><div>Stephan</div><div><br></div><div>[1] <a href="https://github.com/pydata/pandas/pull/7217">https://github.com/pydata/pandas/pull/7217</a><br></div><div>[2] <a href="https://github.com/geopandas/geopandas/issues/166">https://github.com/geopandas/geopandas/issues/166</a></div></div><div><div>[3] <a href="https://github.com/numpy/numpy-dtypes">https://github.com/numpy/numpy-dtypes</a></div></div><div><br></div></div></div>