On Mon, Jan 29, 2018 at 5:50 PM, <josef.pktd@gmail.com> wrote:

On Mon, Jan 29, 2018 at 4:11 PM, Allan Haldane <allanhaldane@gmail.com> wrote:
On 01/29/2018 04:02 PM, josef.pktd@gmail.com wrote:
>
>
> On Mon, Jan 29, 2018 at 3:44 PM, Benjamin Root <ben.v.root@gmail.com
> <mailto:ben.v.root@gmail.com>> wrote:
>
> I <3 structured arrays. I love the fact that I can access data by
> row and then by fieldname, or vice versa. There are times when I
> need to pass just a column into a function, and there are times when
> I need to process things row by row. Yes, pandas is nice if you want
> the specialized indexing features, but it becomes a bear to deal
> with if all you want is normal indexing, or even the ability to
> easily loop over the dataset.
>
>
> I don't think there is a doubt that structured arrays, arrays with
> structured dtypes, are a useful container. The question is whether they
> should be more or the foundation for more.
>
> For example, computing a mean, or reduce operation, over numeric element
> ("columns"). Before padded views it was possible to index by selecting
> the relevant "columns" and view them as standard array. With padded
> views that breaks and AFAICS, there is no way in numpy 1.14.0 to compute
> a mean of some "columns". (I don't have numpy 1.14 to try or find a
> workaround, like maybe looping over all relevant columns.)
>
> Josef

Just to clarify, structured types have always had padding bytes, that
isn't new.

What *is* new (which we are pushing to 1.15, I think) is that it may be
somewhat more common to end up with padding than before, and only if you
are specifically using multi-field indexing, which is a fairly
specialized case.

I think recfunctions already account properly for padding bytes. Except
for the bug in #8100, which we will fix, padding-bytes in recarrays are
more or less invisible to a non-expert who only cares about
dataframe-like behavior.

In other words, padding is no obstacle at all to computing a mean over a
column, and single-field indexes in 1.15 behave identically as before.
The only thing that will change in 1.15 is multi-field indexing, and it
has never been possible to compute a mean (or any binary operation) on
multiple fields.

from the example in the other thread
a[['b', 'c']].view(('f8', 2)).mean(0)

(from the statsmodels usecase:
read csv with genfromtext to get recarray or structured array
select/index the numeric columns
view them as standard array
do whatever we can do with standard numpy arrays
)

Or, to phrase it as a question:

How do we get a standard array with homogeneous dtype from the corresponding elements of a structured dtype in numpy 1.14.0?

Josef

Josef

Allan

>
> Cheers!
> Ben Root
>
> On Mon, Jan 29, 2018 at 3:24 PM, <josef.pktd@gmail.com
> <mailto:josef.pktd@gmail.com>> wrote:
>
>
>
> On Mon, Jan 29, 2018 at 2:55 PM, Stefan van der Walt
> <stefanv@berkeley.edu <mailto:stefanv@berkeley.edu>> wrote:
>
> On Mon, 29 Jan 2018 14:10:56 -0500, josef.pktd@gmail.com

> <mailto:josef.pktd@gmail.com> wrote:
>
> Given that there is pandas, xarray, dask and more, numpy
> could as well drop
> any pretense of supporting dataframe_likes. Or, adjust
> the recfunctions so
> we can still work dataframe_like with structured
> dtypes/recarrays/recfunctions.
>
>
> I haven't been following the duckarray discussion carefully,
> but could
> this be an opportunity for a dataframe protocol, so that we
> can have
> libraries ingest structured arrays, record arrays, pandas
> dataframes,
> etc. without too much specialized code?
>
>
> AFAIU while not being in the data handling area, pandas defines
> the interface and other libraries provide pandas compatible
> interfaces or implementations.
>
> statsmodels currently still has recarray support and usage. In
> some interfaces we support pandas, recarrays and plain arrays,
> or anything where asarray works correctly.
>
> But recarrays became messy to support, one rewrite of some
> functions last year converts recarrays to pandas, does the
> manipulation and then converts back to recarrays.
> Also we need to adjust our recarray usage with new numpy
> versions. But there is no real benefit because I doubt that
> statsmodels still has any recarray/structured dtype users. So,
> we only have to remove our own uses in the datasets and unit tests.
>
> Josef
>
>
>
>
> Stéfan
>
> _______________________________________________
> NumPy-Discussion mailing list

> NumPy-Discussion@python.org <mailto:NumPy-Discussion@python.org>
> https://mail.python.org/mailman/listinfo/numpy-discussion
> <https://mail.python.org/mailman/listinfo/numpy-discussion>
>
>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org <mailto:NumPy-Discussion@python.org>
> https://mail.python.org/mailman/listinfo/numpy-discussion
> <https://mail.python.org/mailman/listinfo/numpy-discussion>
>
>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org <mailto:NumPy-Discussion@python.org>
> https://mail.python.org/mailman/listinfo/numpy-discussion

> <https://mail.python.org/mailman/listinfo/numpy-discussion>
>
>
>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>

_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion