On Mon, Jan 29, 2018 at 1:22 PM, Eric Wieser <wieser.eric+numpy@gmail.com> wrote:
I think that there's a lot of confusion going around about recarrays vs structured arrays.

[`recarray`](https://github.com/numpy/numpy/blob/v1.13.0/numpy/core/records.py) are a wrapper around structured arrays that provide:
* Attribute access to fields as `arr.field` in addition to the normal `arr['field']`
* Automatic datatype-guessing for nested lists of tuples (which needs a little work, but seems like a justifiable feature)
* An undocumented `field` method that behaves like the 1.14 indexing behavior (!)

Meanwhile, `recfunctions` is a collection of functions that work on normal structured arrays - so is misleadingly named.
The only link to recarrays is that most of the functions have a `asrecarray` parameter which applies `.view(recarray)` to the result.

deprecate recarrays

Given how thin an abstraction they are over structured arrays, I don't think you mean this.
Are you advocating for deprecating structured arrays entirely, or just deprecating recfunctions?

First, statsmodels is in the pandas camp for dataframes, so I don't have any invested interest in recarrays/structured dtypes anymore.

What I meant was that structured dtypes with implicit (hidden?) padding becomes unintuitive for the recarray/dataframe usecase. (At least I won't try to update my intuition about having extra things in there that are not specified by the main structured dtype.) Also the dataframe_like usage of structured dtypes doesn't seem to be much under consideration anymore.

So, my **impression** is that the recent changes make the recarray/dataframe usecase for structured dtypes more difficult.

Given that there is pandas, xarray, dask and more, numpy could as well drop any pretense of supporting dataframe_likes. Or, adjust the recfunctions so we can still work dataframe_like with structured dtypes/recarrays/recfunctions.




On Mon, 29 Jan 2018 at 09:39 Chris Barker <chris.barker@noaa.gov> wrote:
On Sat, Jan 27, 2018 at 8:50 PM, Allan Haldane <allanhaldane@gmail.com> wrote:
On 01/26/2018 06:01 PM, josef.pktd@gmail.com wrote:
    I thought recarrays were pretty cool back in the day, but pandas is
    a much better option.

    So I pretty much only use structured arrays for data exchange with C

My impression is that this turns into a deprecate recarrays and supporting recfunction issue.

*should* we have any dataframe-like functionality in numpy?

We get requests every once in a while about how to sort rows, or about adding a "groupby" function. I myself have used recarrays in a dataframe-like way, when I wanted a quick multiple-array object that supported numpy indexing. So there is some demand to have minimal "dataframe-like" behavior in numpy itself.

recarrays play part of this role currently, though imperfectly due to padding and cache issues. I think I'm comfortable with supporting some minor use of structured/recarrays as dataframe-like, with a warning in docs that the user should really look at pandas/xarray, and that structured arrays are primarily for data exchange.

Well, I think we should either:

deprecate recarrays -- i.e. explicitly not support DataFrame-like functionality in numpy, keeping only the data-exchange functionality as maintained.


Properly support it -- which doesn't mean re-implementing Pandas or xarray, but it would mean addressing any bug-like issues like not dealing properly with padding.

Personally, I don't need/want it enough to contribute, but if someone does, great.

This reminds me a bit of the old numpy.Matrix issue -- it was ALMOST there, but not quite, with issues, and there was essentially no overlap between the people that wanted it and the people that had the time and skills to really make it work.

(If we want to dream, maybe one day we should make a minimal multiple-array container class. I imagine it would look pretty similar to recarray, but stored as a set of arrays instead of a structured array. But maybe recarrays are good enough, and let's not reimplement pandas either.)

Exactly -- we really don't need to re-implement Pandas....

(except it's CSV reading capability :-) )



Christopher Barker, Ph.D.

Emergency Response Division
NOAA/NOS/OR&R            (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception

NumPy-Discussion mailing list

NumPy-Discussion mailing list