On Sat, Jan 27, 2018 at 8:50 PM, Allan Haldane email@example.com wrote:
On 01/26/2018 06:01 PM, firstname.lastname@example.org wrote:
I thought recarrays were pretty cool back in the day, but pandas is a much better option. So I pretty much only use structured arrays for data exchange with C code....
My impression is that this turns into a deprecate recarrays and supporting recfunction issue.
*should* we have any dataframe-like functionality in numpy?
We get requests every once in a while about how to sort rows, or about adding a "groupby" function. I myself have used recarrays in a dataframe-like way, when I wanted a quick multiple-array object that supported numpy indexing. So there is some demand to have minimal "dataframe-like" behavior in numpy itself.
recarrays play part of this role currently, though imperfectly due to padding and cache issues. I think I'm comfortable with supporting some minor use of structured/recarrays as dataframe-like, with a warning in docs that the user should really look at pandas/xarray, and that structured arrays are primarily for data exchange.
Well, I think we should either:
deprecate recarrays -- i.e. explicitly not support DataFrame-like functionality in numpy, keeping only the data-exchange functionality as maintained.
Properly support it -- which doesn't mean re-implementing Pandas or xarray, but it would mean addressing any bug-like issues like not dealing properly with padding.
Personally, I don't need/want it enough to contribute, but if someone does, great.
This reminds me a bit of the old numpy.Matrix issue -- it was ALMOST there, but not quite, with issues, and there was essentially no overlap between the people that wanted it and the people that had the time and skills to really make it work.
(If we want to dream, maybe one day we should make a minimal multiple-array
container class. I imagine it would look pretty similar to recarray, but stored as a set of arrays instead of a structured array. But maybe recarrays are good enough, and let's not reimplement pandas either.)
Exactly -- we really don't need to re-implement Pandas....
(except it's CSV reading capability :-) )