[Numpy-discussion] Setting custom dtypes and 1.14

josef.pktd at gmail.com josef.pktd at gmail.com
Mon Jan 29 17:50:20 EST 2018


On Mon, Jan 29, 2018 at 4:11 PM, Allan Haldane <allanhaldane at gmail.com>
wrote:

> On 01/29/2018 04:02 PM, josef.pktd at gmail.com wrote:
> >
> >
> > On Mon, Jan 29, 2018 at 3:44 PM, Benjamin Root <ben.v.root at gmail.com
> > <mailto:ben.v.root at gmail.com>> wrote:
> >
> >     I <3 structured arrays. I love the fact that I can access data by
> >     row and then by fieldname, or vice versa. There are times when I
> >     need to pass just a column into a function, and there are times when
> >     I need to process things row by row. Yes, pandas is nice if you want
> >     the specialized indexing features, but it becomes a bear to deal
> >     with if all you want is normal indexing, or even the ability to
> >     easily loop over the dataset.
> >
> >
> > I don't think there is a doubt that structured arrays, arrays with
> > structured dtypes, are a useful container. The question is whether they
> > should be more or the foundation for more.
> >
> > For example, computing a mean, or reduce operation, over numeric element
> > ("columns"). Before padded views it was possible to index by selecting
> > the relevant "columns" and view them as standard array. With padded
> > views that breaks and AFAICS, there is no way in numpy 1.14.0 to compute
> > a mean of some "columns". (I don't have numpy 1.14 to try or find a
> > workaround, like maybe looping over all relevant columns.)
> >
> > Josef
>
> Just to clarify, structured types have always had padding bytes, that
> isn't new.
>
> What *is* new (which we are pushing to 1.15, I think) is that it may be
> somewhat more common to end up with padding than before, and only if you
> are specifically using multi-field indexing, which is a fairly
> specialized case.
>
> I think recfunctions already account properly for padding bytes. Except
> for the bug in #8100, which we will fix, padding-bytes in recarrays are
> more or less invisible to a non-expert who only cares about
> dataframe-like behavior.
>
> In other words, padding is no obstacle at all to computing a mean over a
> column, and single-field indexes in 1.15 behave identically as before.
> The only thing that will change in 1.15 is multi-field indexing, and it
> has never been possible to compute a mean (or any binary operation) on
> multiple fields.
>

from the example in the other thread
a[['b', 'c']].view(('f8', 2)).mean(0)


(from the statsmodels usecase:
read csv with genfromtext to get recarray or structured array
select/index the numeric columns
view them as standard array
do whatever we can do with standard numpy  arrays
)

Josef


>
> Allan
>
> >
> >     Cheers!
> >     Ben Root
> >
> >     On Mon, Jan 29, 2018 at 3:24 PM, <josef.pktd at gmail.com
> >     <mailto:josef.pktd at gmail.com>> wrote:
> >
> >
> >
> >         On Mon, Jan 29, 2018 at 2:55 PM, Stefan van der Walt
> >         <stefanv at berkeley.edu <mailto:stefanv at berkeley.edu>> wrote:
> >
> >             On Mon, 29 Jan 2018 14:10:56 -0500, josef.pktd at gmail.com
> >             <mailto:josef.pktd at gmail.com> wrote:
> >
> >                 Given that there is pandas, xarray, dask and more, numpy
> >                 could as well drop
> >                 any pretense of supporting dataframe_likes. Or, adjust
> >                 the recfunctions so
> >                 we can still work dataframe_like with structured
> >                 dtypes/recarrays/recfunctions.
> >
> >
> >             I haven't been following the duckarray discussion carefully,
> >             but could
> >             this be an opportunity for a dataframe protocol, so that we
> >             can have
> >             libraries ingest structured arrays, record arrays, pandas
> >             dataframes,
> >             etc. without too much specialized code?
> >
> >
> >         AFAIU while not being in the data handling area, pandas defines
> >         the interface and other libraries provide pandas compatible
> >         interfaces or implementations.
> >
> >         statsmodels currently still has recarray support and usage. In
> >         some interfaces we support pandas, recarrays and plain arrays,
> >         or anything where asarray works correctly.
> >
> >         But recarrays became messy to support, one rewrite of some
> >         functions last year converts recarrays to pandas, does the
> >         manipulation and then converts back to recarrays.
> >         Also we need to adjust our recarray usage with new numpy
> >         versions. But there is no real benefit because I doubt that
> >         statsmodels still has any recarray/structured dtype users. So,
> >         we only have to remove our own uses in the datasets and unit
> tests.
> >
> >         Josef
> >
> >
> >
> >
> >             Stéfan
> >
> >             _______________________________________________
> >             NumPy-Discussion mailing list
> >             NumPy-Discussion at python.org <mailto:NumPy-Discussion@
> python.org>
> >             https://mail.python.org/mailman/listinfo/numpy-discussion
> >             <https://mail.python.org/mailman/listinfo/numpy-discussion>
> >
> >
> >
> >         _______________________________________________
> >         NumPy-Discussion mailing list
> >         NumPy-Discussion at python.org <mailto:NumPy-Discussion at python.org>
> >         https://mail.python.org/mailman/listinfo/numpy-discussion
> >         <https://mail.python.org/mailman/listinfo/numpy-discussion>
> >
> >
> >
> >     _______________________________________________
> >     NumPy-Discussion mailing list
> >     NumPy-Discussion at python.org <mailto:NumPy-Discussion at python.org>
> >     https://mail.python.org/mailman/listinfo/numpy-discussion
> >     <https://mail.python.org/mailman/listinfo/numpy-discussion>
> >
> >
> >
> >
> > _______________________________________________
> > NumPy-Discussion mailing list
> > NumPy-Discussion at python.org
> > https://mail.python.org/mailman/listinfo/numpy-discussion
> >
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180129/e2c6cd6b/attachment.html>


More information about the NumPy-Discussion mailing list