[Numpy-discussion] Setting custom dtypes and 1.14
josef.pktd at gmail.com
josef.pktd at gmail.com
Mon Jan 29 14:10:56 EST 2018
On Mon, Jan 29, 2018 at 1:22 PM, Eric Wieser <wieser.eric+numpy at gmail.com>
> I think that there's a lot of confusion going around about recarrays vs
> structured arrays.
> numpy/core/records.py) are a wrapper around structured arrays that
> * Attribute access to fields as `arr.field` in addition to the normal
> * Automatic datatype-guessing for nested lists of tuples (which needs a
> little work, but seems like a justifiable feature)
> * An undocumented `field` method that behaves like the 1.14 indexing
> behavior (!)
> Meanwhile, `recfunctions` is a collection of functions that work on normal
> structured arrays - so is misleadingly named.
> The only link to recarrays is that most of the functions have a
> `asrecarray` parameter which applies `.view(recarray)` to the result.
> > deprecate recarrays
> Given how thin an abstraction they are over structured arrays, I don't
> think you mean this.
> Are you advocating for deprecating structured arrays entirely, or just
> deprecating recfunctions?
First, statsmodels is in the pandas camp for dataframes, so I don't have
any invested interest in recarrays/structured dtypes anymore.
What I meant was that structured dtypes with implicit (hidden?) padding
becomes unintuitive for the recarray/dataframe usecase. (At least I won't
try to update my intuition about having extra things in there that are not
specified by the main structured dtype.) Also the dataframe_like usage of
structured dtypes doesn't seem to be much under consideration anymore.
So, my **impression** is that the recent changes make the
recarray/dataframe usecase for structured dtypes more difficult.
Given that there is pandas, xarray, dask and more, numpy could as well drop
any pretense of supporting dataframe_likes. Or, adjust the recfunctions so
we can still work dataframe_like with structured
> On Mon, 29 Jan 2018 at 09:39 Chris Barker <chris.barker at noaa.gov> wrote:
>> On Sat, Jan 27, 2018 at 8:50 PM, Allan Haldane <allanhaldane at gmail.com>
>>> On 01/26/2018 06:01 PM, josef.pktd at gmail.com wrote:
>>>> I thought recarrays were pretty cool back in the day, but pandas is
>>>> a much better option.
>>>> So I pretty much only use structured arrays for data exchange with C
>>>> My impression is that this turns into a deprecate recarrays and
>>>> supporting recfunction issue.
>>> *should* we have any dataframe-like functionality in numpy?
>>> We get requests every once in a while about how to sort rows, or about
>>> adding a "groupby" function. I myself have used recarrays in a
>>> dataframe-like way, when I wanted a quick multiple-array object that
>>> supported numpy indexing. So there is some demand to have minimal
>>> "dataframe-like" behavior in numpy itself.
>>> recarrays play part of this role currently, though imperfectly due to
>>> padding and cache issues. I think I'm comfortable with supporting some
>>> minor use of structured/recarrays as dataframe-like, with a warning in docs
>>> that the user should really look at pandas/xarray, and that structured
>>> arrays are primarily for data exchange.
>> Well, I think we should either:
>> deprecate recarrays -- i.e. explicitly not support DataFrame-like
>> functionality in numpy, keeping only the data-exchange functionality as
>> Properly support it -- which doesn't mean re-implementing Pandas or
>> xarray, but it would mean addressing any bug-like issues like not dealing
>> properly with padding.
>> Personally, I don't need/want it enough to contribute, but if someone
>> does, great.
>> This reminds me a bit of the old numpy.Matrix issue -- it was ALMOST
>> there, but not quite, with issues, and there was essentially no overlap
>> between the people that wanted it and the people that had the time and
>> skills to really make it work.
>> (If we want to dream, maybe one day we should make a minimal
>>> multiple-array container class. I imagine it would look pretty similar to
>>> recarray, but stored as a set of arrays instead of a structured array. But
>>> maybe recarrays are good enough, and let's not reimplement pandas either.)
>> Exactly -- we really don't need to re-implement Pandas....
>> (except it's CSV reading capability :-) )
>> Christopher Barker, Ph.D.
>> Emergency Response Division
>> NOAA/NOS/OR&R (206) 526-6959 voice
>> 7600 Sand Point Way NE (206) 526-6329 fax
>> Seattle, WA 98115 (206) 526-6317 main reception
>> Chris.Barker at noaa.gov
>> NumPy-Discussion mailing list
>> NumPy-Discussion at python.org
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the NumPy-Discussion