[Numpy-discussion] NEP: array API standard adoption (NEP 47)
ralf.gommers at gmail.com
Thu Mar 11 07:49:33 EST 2021
On Wed, Mar 10, 2021 at 11:41 PM Sebastian Berg <sebastian at sipsolutions.net>
> On Wed, 2021-03-10 at 13:44 -0700, Aaron Meurer wrote:
> > On Wed, Mar 10, 2021 at 10:42 AM Sebastian Berg
> > <sebastian at sipsolutions.net> wrote:
> > >
> > > 2. `np.result_type` special cases array-scalars (the current PR),
> > > NEP
> > > 47 promises it will not. The PR could attempt to work around that
> > > using `arr.dtype` int `result_type`, I expect there are more
> > > details to
> > > fight with there, but I am not sure.
> > The idea is to work around it everywhere, so that it follows the
> > rules
> > in the spec (no array scalars, no value-based casting). I haven't
> > started it yet, though, so I don't know yet how hard it will be. If
> > it
> > ends up being too hard we could put it in the same camp as device
> > support and dlpack support where it needs some basic implementation
> > in
> > numpy itself first before we can properly do it in the array API
> > namespace.
> Quite frankly. If you really want to implement a minimal API, it may be
> best to just write it yourself and ditch NumPy. (Of course I currently
> doubt that the NEP 47 implementation should be minimal.)
I'm not really sure what to say other than that I don't think anyone will
be served by "ditching NumPy".
The goal for this "minimal" part is to provide an API that you can write
code against that will work portably across other array libraries. That
seems like a valuable goal, right? And if you want NumPy-specific things
that other libraries don't commonly (or at all) implement and are not
supported by array_api, then you don't use this API but the existing main
> About doing promotion yourself ("promotion" as in what ufuncs do; I
> call `np.result_type` "common DType", because it is used e.g. in
> Ufuncs have at least one more rule for true-division, plus there may be
> mixed float-int loops, etc. Since the standard is very limited and you
> only have numeric dtypes that might be all though.
> In any case, my point is: If NumPy does strange things (and it does
> with 0-D arrays currently). You could cook your own soup there also,
> and implement it in NumPy by using `signature=...` in the ufunc call.
> > > 4. Now that I looked at the above, I do not feel its reasonable to
> > > limit this functionality to numeric dtypes. If someone uses a
> > > NumPy
> > > rational-dtype, why should a SciPy function currently implemented
> > > in
> > > pure NumPy reject that? In other words, I think this is the point
> > > where trying to be "minimal" is counterproductive.
SciPy would still be free to implement *both* a portable code path and a
numpy-specific path (if that makes sense, which I doubt in many cases).
There's just no way those two code paths can be 100% common, because no
other library implements a rational dtype.
> > The idea of minimality is to make it so users can be sure they will
> > be
> > able to use other libraries, once they also have array API compliant
> > namespaces. A rational-dtype wouldn't ever be implemented in those
> > other libraries, because it isn't part of the standard, so if a user
> > is using those, that is a sign they are using things that aren't in
> > the array API, so they can't expect to be able to swap out their
> > dtypes. If a user wants to use something that's only in NumPy, then
> > they should just use NumPy.
> This is not about the "user", in your scenario the end-user does use
> NumPy. The way I understand this is not a prerequisite. If it is, a
> lot of things will be simpler though, and most of my doubts will go
> away (but be replaced with uncertainty about the usefulness).
> The problem is that SciPy as the "library author" wants to to use NEP
> 47 without limiting the end-user (or the end-user even noticing!).
> The distinction between end-user and library author (someone who writes
> a function that should work with numpy, pytorch, etc.) is very
> important here and too all of these "protocol" discussions.
The example feels a little forced. >99% of end user code written against
libraries like SciPy uses standard numerical dtypes. Things like a rational
dtype are very niche. A rationale dtype works with most NumPy functions,
but is not at all guaranteed to work with SciPy functions - and if it does
it's accidental, untested and may break if SciPy would change its
implementation (e.g. move from pure Python + NumPy to Cython or C++).
> I assume that SciPy should be able to have the cake and eat it to:
> * Uses the limited array-api and make sure to only rely on the minimal
> * Not artificially limit end-users who pass in NumPy arrays.
> The second point can also be read as: SciPy would be able to support
> practically all current NumPy array use cases without jumping through
> any additional hoops (or well, maybe a bit of churn, but churn that is
> made easy by as of now undefined API).
I suspect you have things in mind that are not actually supported by SciPy
today. The rational dtype is one example, but so are ndarray subclasses.
Take masked arrays as an example - these are not supported today, except
for scipy.stats.mstats functionality - where support is intentional,
special-cases and tested.
For masked arrays as well as other arbitrary fancy subclasses, there's some
not-well-defined subset of functionality that may work today, but that is
fragile, untested and can break without warning in any release. Only
Liskov-substitutable ndarray subclasses are not fragile - those are simply
coerced to ndarray via the ubiquitous `np.asarray` pattern, and ndarrays
are returned. That must and will remain working.
This is a complex topic, and it's possible that I'm missing other use cases
you have in mind, so I thought I'd make a diagram to explain the difference
between the custom dtypes & subclasses that are supported by NumPy itself
but not by downstream libraries:
> > > 6. Is there any provision on how to deal with mixed array-like
> > > inputs?
> > > CuPy+numpy, etc.?
> > Neither of these are defined in the spec. The spec only deals with
> > staying inside of the compliant namespace. It doesn't require any
> > behavior mixing things from other namespaces. That's generally
> > considered a much harder problem, and there is the data interchange
> > protocol to deal with it
> > (
> > ).
> OK, maybe you can get away with it, since the current proposal seems to
> be that `get_namespace()` raises on mixed input. Still seems like
> something that should probably raise an error rather than coerce to
> NumPy when calling: `nep47_array_object + dask_array`.
Agreed, this must raise too.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the NumPy-Discussion