[Numpy-discussion] Setting custom dtypes and 1.14

josef.pktd at gmail.com josef.pktd at gmail.com
Mon Jan 29 14:10:56 EST 2018

On Mon, Jan 29, 2018 at 1:22 PM, Eric Wieser <wieser.eric+numpy at gmail.com>

> I think that there's a lot of confusion going around about recarrays vs
> structured arrays.
> [`recarray`](https://github.com/numpy/numpy/blob/v1.13.0/
> numpy/core/records.py) are a wrapper around structured arrays that
> provide:
> * Attribute access to fields as `arr.field` in addition to the normal
> `arr['field']`
> * Automatic datatype-guessing for nested lists of tuples (which needs a
> little work, but seems like a justifiable feature)
> * An undocumented `field` method that behaves like the 1.14 indexing
> behavior (!)
> Meanwhile, `recfunctions` is a collection of functions that work on normal
> structured arrays - so is misleadingly named.
> The only link to recarrays is that most of the functions have a
> `asrecarray` parameter which applies `.view(recarray)` to the result.
> > deprecate recarrays
> Given how thin an abstraction they are over structured arrays, I don't
> think you mean this.
> Are you advocating for deprecating structured arrays entirely, or just
> deprecating recfunctions?

First, statsmodels is in the pandas camp for dataframes, so I don't have
any invested interest in recarrays/structured dtypes anymore.

What I meant was that structured dtypes with implicit (hidden?) padding
becomes unintuitive for the recarray/dataframe usecase. (At least I won't
try to update my intuition about having extra things in there that are not
specified by the main structured dtype.) Also the dataframe_like usage of
structured dtypes doesn't seem to be much under consideration anymore.

So, my **impression** is that the recent changes make the
recarray/dataframe usecase for structured dtypes more difficult.

Given that there is pandas, xarray, dask and more, numpy could as well drop
any pretense of supporting dataframe_likes. Or, adjust the recfunctions so
we can still work dataframe_like with structured


> Eric
> On Mon, 29 Jan 2018 at 09:39 Chris Barker <chris.barker at noaa.gov> wrote:
>> On Sat, Jan 27, 2018 at 8:50 PM, Allan Haldane <allanhaldane at gmail.com>
>> wrote:
>>> On 01/26/2018 06:01 PM, josef.pktd at gmail.com wrote:
>>>>     I thought recarrays were pretty cool back in the day, but pandas is
>>>>     a much better option.
>>>>     So I pretty much only use structured arrays for data exchange with C
>>>>     code....
>>>> My impression is that this turns into a deprecate recarrays and
>>>> supporting recfunction issue.
>>> *should* we have any dataframe-like functionality in numpy?
>>> We get requests every once in a while about how to sort rows, or about
>>> adding a "groupby" function. I myself have used recarrays in a
>>> dataframe-like way, when I wanted a quick multiple-array object that
>>> supported numpy indexing. So there is some demand to have minimal
>>> "dataframe-like" behavior in numpy itself.
>>> recarrays play part of this role currently, though imperfectly due to
>>> padding and cache issues. I think I'm comfortable with supporting some
>>> minor use of structured/recarrays as dataframe-like, with a warning in docs
>>> that the user should really look at pandas/xarray, and that structured
>>> arrays are primarily for data exchange.
>> Well, I think we should either:
>> deprecate recarrays -- i.e. explicitly not support DataFrame-like
>> functionality in numpy, keeping only the data-exchange functionality as
>> maintained.
>> or
>> Properly support it -- which doesn't mean re-implementing Pandas or
>> xarray, but it would mean addressing any bug-like issues like not dealing
>> properly with padding.
>> Personally, I don't need/want it enough to contribute, but if someone
>> does, great.
>> This reminds me a bit of the old numpy.Matrix issue -- it was ALMOST
>> there, but not quite, with issues, and there was essentially no overlap
>> between the people that wanted it and the people that had the time and
>> skills to really make it work.
>> (If we want to dream, maybe one day we should make a minimal
>>> multiple-array container class. I imagine it would look pretty similar to
>>> recarray, but stored as a set of arrays instead of a structured array. But
>>> maybe recarrays are good enough, and let's not reimplement pandas either.)
>> Exactly -- we really don't need to re-implement Pandas....
>> (except it's CSV reading capability :-) )
>> -CHB
>> --
>> Christopher Barker, Ph.D.
>> Oceanographer
>> Emergency Response Division
>> NOAA/NOS/OR&R            (206) 526-6959   voice
>> 7600 Sand Point Way NE   (206) 526-6329   fax
>> Seattle, WA  98115       (206) 526-6317   main reception
>> Chris.Barker at noaa.gov
>> _______________________________________________
>> NumPy-Discussion mailing list
>> NumPy-Discussion at python.org
>> https://mail.python.org/mailman/listinfo/numpy-discussion
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180129/1dc2e2dd/attachment.html>

More information about the NumPy-Discussion mailing list