[Numpy-discussion] `np.array()`, array-likes, nested sequences and subclasses
sebastian at sipsolutions.net
Thu Jun 18 10:49:35 EDT 2020
tl;dr: `np.array()` is somewhat ill-defined, also creating issues for
Quantities. In a recent PR I am cementing, and slightly broadening,
its definition. So we have to decide how we wish to handle code such
as in the long run:
Traditionally, we have two meanings of "array-like" as understood by
`np.array()` (In the text I use array-like for the second point here):
1. Nested sequences of scalars.
2. A single array-like object, meaning a buffer-interface, an array
subclass, a pandas dataframe (`__array__()`), etc.
However, the boundaries between these are fuzzy, and over the years
became more fuzzy. The reason is that a NumPy array (and many array-
likes) are also nested sequences of scalars.
I defined the current behaviour slightly clearer in my PR, but by that
also subtly broadened it up :
1. Any array-like embedded in the nested-sequences is converted to a
NumPy array.  (Any array-like is never interpreted as a sequence)
2. Any array-like's elements will be elements of the output.
We never enter array-likes recursively (including object arrays).
3. The `subok=True` parameter is implicitly ignored, unless the input
is a single ndarray sublcass.
Now to the issues at hand:
* We should make sure those defintions are good, they mainly cement
current behaviour, but if we want to roll back on features,
we should do it now.
* There are some issues around Quantity and masked arrays,
because their "scalars" are (sometimes) 0-D arrays. And they
currently rely on NumPy considering them to be scalars.
This has its own set of long term issues .
For now, I can simply roll the changes to 0-D array behaviour back.
But in the mid-to-long run, we have to make a decision, or perpetually
live with array subclasses being subtly broken:
1. Define Quantity and Masked arrays as wrong. They must use a
special DType, which consistently tells NumPy that the elements
cannot simply be copied by converting the Quantity to an array.
The up-side is, that it generalizes to N-D.
2. Independently, but partially addressing the Quantity issue, we have
to decide what `np.array()` should actually do. A sequence
containing array-likes, in most cases is better written using
`np.stack()`, but due to the fuzzy boundaries, code like
`np.array([dataframe, dataframe])` is probably common.
We could try to deprecate though.
The downsides to deprecation seem to me that I feel we have to reject
viewing array-likes as sequences. To me doing that has its own set of
issues. If just that `np.array([arraylike])` seems perfectly
reasonable, but may be very slow.
 It is hard to list how exactly it is broadened up, because the
current behaviour has very subtle behaviours, such as actually
iterating a `memoryview()`, which does always the same thing, but only
works for 1-D memoryviews, and fails for both 0-D and N-D.
 There are some subtleties which are not important here, such that I
do anticipate the possibility of having array-likes which are
considered scalars with respect to a given dtype, such as
`np.array([poly], dtype=Polynomial)` where a poly object itself is an
works, by ending up calling:
res = float(0d_array) # quantity.__float__ is used!
which works nice for the typical float/int dtype, is tricky to get
right for general dtypes (e.g. longdouble/clongdouble). This is a
small issue now, but it could become a problem when more user-dtypes
-------------- next part --------------
A non-text attachment was scrubbed...
Size: 833 bytes
Desc: This is a digitally signed message part
More information about the NumPy-Discussion