From melissawm at gmail.com Fri Apr 3 17:42:11 2020 From: melissawm at gmail.com (=?UTF-8?Q?Melissa_Mendon=C3=A7a?=) Date: Fri, 3 Apr 2020 18:42:11 -0300 Subject: [Numpy-discussion] Documentation Team Meeting - Monday April 6 Message-ID: Hi all, This is a reminder that we're having a Documentation Team Meeting next monday, April 6th, at 3PM UTC**. If you wish to join on Zoom, you need to use this link https://zoom.us/j/420005230 Here's the permanent hackmd document with the meeting notes: https://hackmd.io/oB_boakvRqKR-_2jRV-Qjg Hope to see you around! ** You can click this link to get the correct time at your timezone: https://www.timeanddate.com/worldclock/fixedtime.html?msg=NumPy+Documentation+Team+Meeting&iso=20200406T15&p1=1440&ah=1 - Melissa -------------- next part -------------- An HTML attachment was scrubbed... URL: From warren.weckesser at gmail.com Sat Apr 4 23:27:16 2020 From: warren.weckesser at gmail.com (Warren Weckesser) Date: Sat, 4 Apr 2020 23:27:16 -0400 Subject: [Numpy-discussion] Is `numpy.lib.shape_base.normalize_axis_index` considered part of the public API? Message-ID: It would be handy if in scipy we can use the function `numpy.lib.shape_base.normalize_axis_index` as a consistent method for validating an `axis` argument. Is this function considered part of the public API? There are modules in numpy that do not have leading underscores but are still usually considered private. I'm not sure if `numpy.lib.shape_base` is one of those. `normalize_axis_index` is not in the top-level `numpy` namespace, and it is not included in the API reference (https://numpy.org/devdocs/search.html?q=normalize_axis_index&check_keywords=yes&area=default), so I'm not sure if we can safely consider this function to be public. Warren From warren.weckesser at gmail.com Sun Apr 5 00:43:02 2020 From: warren.weckesser at gmail.com (Warren Weckesser) Date: Sun, 5 Apr 2020 00:43:02 -0400 Subject: [Numpy-discussion] Is `numpy.lib.shape_base.normalize_axis_index` considered part of the public API? In-Reply-To: References: Message-ID: On 4/4/20, Warren Weckesser wrote: > It would be handy if in scipy we can use the function > `numpy.lib.shape_base.normalize_axis_index` as a consistent method for > validating an `axis` argument. Is this function considered part of > the public API? > > There are modules in numpy that do not have leading underscores but > are still usually considered private. I'm not sure if > `numpy.lib.shape_base` is one of those. `normalize_axis_index` is not > in the top-level `numpy` namespace, and it is not included in the API > reference > (https://numpy.org/devdocs/search.html?q=normalize_axis_index&check_keywords=yes&area=default), > so I'm not sure if we can safely consider this function to be public. > > Warren > Answering my own question: "shape_base.py" is not where `normalize_axis_index` is originally defined, so that module can be ignored. The function is actually defined in `numpy.core.multiarray`. The pull request in which the function was created is https://github.com/numpy/numpy/pull/8584. Whether or not the function was to be public is discussed starting here: https://github.com/numpy/numpy/pull/8584#issuecomment-281179399. A leading underscore was discussed and intentionally not added to the function. On the other hand, it was not added to the top-level namespace, and Eric Wieser wrote "Right now, it is only accessible via np.core.multiarray.normalize_axis_index, so yes, an internal function". There is another potentially useful function, `normalize_axis_tuple`, defined in `numpy.core.numeric`. This function is also not in the top-level numpy namespace. So it looks like neither of these functions is currently intended to be public. For the moment, I think we'll create our own utility functions in scipy. We can switch to using the numpy functions if those functions are ever intentionally made public. Warren From sebastian at sipsolutions.net Sun Apr 5 10:00:47 2020 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Sun, 05 Apr 2020 09:00:47 -0500 Subject: [Numpy-discussion] Is `numpy.lib.shape_base.normalize_axis_index` considered part of the public API? In-Reply-To: References: Message-ID: On Sun, 2020-04-05 at 00:43 -0400, Warren Weckesser wrote: > On 4/4/20, Warren Weckesser wrote: > > It would be handy if in scipy we can use the function > > `numpy.lib.shape_base.normalize_axis_index` as a consistent method > > for > > validating an `axis` argument. Is this function considered part of > > the public API? > > > > There are modules in numpy that do not have leading underscores but > > are still usually considered private. I'm not sure if > > `numpy.lib.shape_base` is one of those. `normalize_axis_index` is > > not > > in the top-level `numpy` namespace, and it is not included in the > > API > > reference > > ( > > https://numpy.org/devdocs/search.html?q=normalize_axis_index&check_keywords=yes&area=default > > ), > > so I'm not sure if we can safely consider this function to be > > public. > > I do not see a reason why we should not make those functions public. The only thing I see is that they are maybe not really required in the main namespace, i.e. you can be expected to use:: from numpy.something import normalize_axis_tuple I think, since this is a function for library authors more than end- users. And we do not have much prior art around where to put something like that. Cheers, Sebastian > > Warren > > > > Answering my own question: > > "shape_base.py" is not where `normalize_axis_index` is originally > defined, so that module can be ignored. > > The function is actually defined in `numpy.core.multiarray`. The > pull > request in which the function was created is > https://github.com/numpy/numpy/pull/8584. Whether or not the function > was to be public is discussed starting here: > https://github.com/numpy/numpy/pull/8584#issuecomment-281179399. A > leading underscore was discussed and intentionally not added to the > function. On the other hand, it was not added to the top-level > namespace, and Eric Wieser wrote "Right now, it is only accessible > via > np.core.multiarray.normalize_axis_index, so yes, an internal > function". > > There is another potentially useful function, `normalize_axis_tuple`, > defined in `numpy.core.numeric`. This function is also not in the > top-level numpy namespace. > > So it looks like neither of these functions is currently intended to > be public. For the moment, I think we'll create our own utility > functions in scipy. We can switch to using the numpy functions if > those functions are ever intentionally made public. > > Warren > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: This is a digitally signed message part URL: From warren.weckesser at gmail.com Mon Apr 6 09:05:22 2020 From: warren.weckesser at gmail.com (Warren Weckesser) Date: Mon, 6 Apr 2020 09:05:22 -0400 Subject: [Numpy-discussion] Is `numpy.lib.shape_base.normalize_axis_index` considered part of the public API? In-Reply-To: References: Message-ID: On 4/5/20, Sebastian Berg wrote: > On Sun, 2020-04-05 at 00:43 -0400, Warren Weckesser wrote: >> On 4/4/20, Warren Weckesser wrote: >> > It would be handy if in scipy we can use the function >> > `numpy.lib.shape_base.normalize_axis_index` as a consistent method >> > for >> > validating an `axis` argument. Is this function considered part of >> > the public API? >> > >> > There are modules in numpy that do not have leading underscores but >> > are still usually considered private. I'm not sure if >> > `numpy.lib.shape_base` is one of those. `normalize_axis_index` is >> > not >> > in the top-level `numpy` namespace, and it is not included in the >> > API >> > reference >> > ( >> > https://numpy.org/devdocs/search.html?q=normalize_axis_index&check_keywords=yes&area=default >> > ), >> > so I'm not sure if we can safely consider this function to be >> > public. >> > > > I do not see a reason why we should not make those functions public. > The only thing I see is that they are maybe not really required in the > main namespace, i.e. you can be expected to use:: > > from numpy.something import normalize_axis_tuple > > I think, since this is a function for library authors more than end- > users. And we do not have much prior art around where to put something > like that. > > Cheers, > > Sebastian Thanks, Sebastian. For now, I proposed a private Python implementation in scipy: https://github.com/scipy/scipy/pull/11797 If the numpy version is added to the public numpy API, it will be easy to change scipy to use it. Warren > > > >> > Warren >> > >> >> Answering my own question: >> >> "shape_base.py" is not where `normalize_axis_index` is originally >> defined, so that module can be ignored. >> >> The function is actually defined in `numpy.core.multiarray`. The >> pull >> request in which the function was created is >> https://github.com/numpy/numpy/pull/8584. Whether or not the function >> was to be public is discussed starting here: >> https://github.com/numpy/numpy/pull/8584#issuecomment-281179399. A >> leading underscore was discussed and intentionally not added to the >> function. On the other hand, it was not added to the top-level >> namespace, and Eric Wieser wrote "Right now, it is only accessible >> via >> np.core.multiarray.normalize_axis_index, so yes, an internal >> function". >> >> There is another potentially useful function, `normalize_axis_tuple`, >> defined in `numpy.core.numeric`. This function is also not in the >> top-level numpy namespace. >> >> So it looks like neither of these functions is currently intended to >> be public. For the moment, I think we'll create our own utility >> functions in scipy. We can switch to using the numpy functions if >> those functions are ever intentionally made public. >> >> Warren >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at python.org >> https://mail.python.org/mailman/listinfo/numpy-discussion >> > > From wieser.eric+numpy at gmail.com Mon Apr 6 09:30:35 2020 From: wieser.eric+numpy at gmail.com (Eric Wieser) Date: Mon, 6 Apr 2020 14:30:35 +0100 Subject: [Numpy-discussion] Is `numpy.lib.shape_base.normalize_axis_index` considered part of the public API? In-Reply-To: References: Message-ID: When I added this function, it was always my intent for it to be consumed by downstream packages, but as Sebastian remarks, it wasn't really desirable to put it in the top-level namespace. I think I would be reasonably happy to make the guarantee that it would not be removed (or more likely, moved) without a lengthy deprecation cycle. Perhaps worth opening a github issue, so we can keep track of how many downstream projects are already using it. Eric On Sun, 5 Apr 2020 at 15:06, Sebastian Berg wrote: > On Sun, 2020-04-05 at 00:43 -0400, Warren Weckesser wrote: > > On 4/4/20, Warren Weckesser wrote: > > > It would be handy if in scipy we can use the function > > > `numpy.lib.shape_base.normalize_axis_index` as a consistent method > > > for > > > validating an `axis` argument. Is this function considered part of > > > the public API? > > > > > > There are modules in numpy that do not have leading underscores but > > > are still usually considered private. I'm not sure if > > > `numpy.lib.shape_base` is one of those. `normalize_axis_index` is > > > not > > > in the top-level `numpy` namespace, and it is not included in the > > > API > > > reference > > > ( > > > > https://numpy.org/devdocs/search.html?q=normalize_axis_index&check_keywords=yes&area=default > > > ), > > > so I'm not sure if we can safely consider this function to be > > > public. > > > > > I do not see a reason why we should not make those functions public. > The only thing I see is that they are maybe not really required in the > main namespace, i.e. you can be expected to use:: > > from numpy.something import normalize_axis_tuple > > I think, since this is a function for library authors more than end- > users. And we do not have much prior art around where to put something > like that. > > Cheers, > > Sebastian > > > > > > Warren > > > > > > > Answering my own question: > > > > "shape_base.py" is not where `normalize_axis_index` is originally > > defined, so that module can be ignored. > > > > The function is actually defined in `numpy.core.multiarray`. The > > pull > > request in which the function was created is > > https://github.com/numpy/numpy/pull/8584. Whether or not the function > > was to be public is discussed starting here: > > https://github.com/numpy/numpy/pull/8584#issuecomment-281179399. A > > leading underscore was discussed and intentionally not added to the > > function. On the other hand, it was not added to the top-level > > namespace, and Eric Wieser wrote "Right now, it is only accessible > > via > > np.core.multiarray.normalize_axis_index, so yes, an internal > > function". > > > > There is another potentially useful function, `normalize_axis_tuple`, > > defined in `numpy.core.numeric`. This function is also not in the > > top-level numpy namespace. > > > > So it looks like neither of these functions is currently intended to > > be public. For the moment, I think we'll create our own utility > > functions in scipy. We can switch to using the numpy functions if > > those functions are ever intentionally made public. > > > > Warren > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at python.org > > https://mail.python.org/mailman/listinfo/numpy-discussion > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at gmail.com Mon Apr 6 09:52:14 2020 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Mon, 6 Apr 2020 15:52:14 +0200 Subject: [Numpy-discussion] Is `numpy.lib.shape_base.normalize_axis_index` considered part of the public API? In-Reply-To: References: Message-ID: On Mon, Apr 6, 2020 at 3:31 PM Eric Wieser wrote: > When I added this function, it was always my intent for it to be consumed > by downstream packages, but as Sebastian remarks, it wasn't really > desirable to put it in the top-level namespace. > This is a nice function indeed, +1 for making it public. Regarding namespace, it would be nice to decouple the `numpy` and `numpy.lib` namespaces, so we can put this in `numpy.lib` and say that's where library author functions go from now on. That'd be better than making all `numpy.lib.*` submodules public. Cheers, Ralf > > I think I would be reasonably happy to make the guarantee that it would > not be removed (or more likely, moved) without a lengthy deprecation cycle. > > Perhaps worth opening a github issue, so we can keep track of how many > downstream projects are already using it. > > Eric > > On Sun, 5 Apr 2020 at 15:06, Sebastian Berg > wrote: > >> On Sun, 2020-04-05 at 00:43 -0400, Warren Weckesser wrote: >> > On 4/4/20, Warren Weckesser wrote: >> > > It would be handy if in scipy we can use the function >> > > `numpy.lib.shape_base.normalize_axis_index` as a consistent method >> > > for >> > > validating an `axis` argument. Is this function considered part of >> > > the public API? >> > > >> > > There are modules in numpy that do not have leading underscores but >> > > are still usually considered private. I'm not sure if >> > > `numpy.lib.shape_base` is one of those. `normalize_axis_index` is >> > > not >> > > in the top-level `numpy` namespace, and it is not included in the >> > > API >> > > reference >> > > ( >> > > >> https://numpy.org/devdocs/search.html?q=normalize_axis_index&check_keywords=yes&area=default >> > > ), >> > > so I'm not sure if we can safely consider this function to be >> > > public. >> > > >> >> I do not see a reason why we should not make those functions public. >> The only thing I see is that they are maybe not really required in the >> main namespace, i.e. you can be expected to use:: >> >> from numpy.something import normalize_axis_tuple >> >> I think, since this is a function for library authors more than end- >> users. And we do not have much prior art around where to put something >> like that. >> >> Cheers, >> >> Sebastian >> >> >> >> > > Warren >> > > >> > >> > Answering my own question: >> > >> > "shape_base.py" is not where `normalize_axis_index` is originally >> > defined, so that module can be ignored. >> > >> > The function is actually defined in `numpy.core.multiarray`. The >> > pull >> > request in which the function was created is >> > https://github.com/numpy/numpy/pull/8584. Whether or not the function >> > was to be public is discussed starting here: >> > https://github.com/numpy/numpy/pull/8584#issuecomment-281179399. A >> > leading underscore was discussed and intentionally not added to the >> > function. On the other hand, it was not added to the top-level >> > namespace, and Eric Wieser wrote "Right now, it is only accessible >> > via >> > np.core.multiarray.normalize_axis_index, so yes, an internal >> > function". >> > >> > There is another potentially useful function, `normalize_axis_tuple`, >> > defined in `numpy.core.numeric`. This function is also not in the >> > top-level numpy namespace. >> > >> > So it looks like neither of these functions is currently intended to >> > be public. For the moment, I think we'll create our own utility >> > functions in scipy. We can switch to using the numpy functions if >> > those functions are ever intentionally made public. >> > >> > Warren >> > _______________________________________________ >> > NumPy-Discussion mailing list >> > NumPy-Discussion at python.org >> > https://mail.python.org/mailman/listinfo/numpy-discussion >> > >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at python.org >> https://mail.python.org/mailman/listinfo/numpy-discussion >> > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From warren.weckesser at gmail.com Mon Apr 6 10:15:29 2020 From: warren.weckesser at gmail.com (Warren Weckesser) Date: Mon, 6 Apr 2020 10:15:29 -0400 Subject: [Numpy-discussion] Is `numpy.lib.shape_base.normalize_axis_index` considered part of the public API? In-Reply-To: References: Message-ID: On 4/6/20, Ralf Gommers wrote: > On Mon, Apr 6, 2020 at 3:31 PM Eric Wieser > wrote: > >> When I added this function, it was always my intent for it to be consumed >> by downstream packages, but as Sebastian remarks, it wasn't really >> desirable to put it in the top-level namespace. >> > > This is a nice function indeed, +1 for making it public. > > Regarding namespace, it would be nice to decouple the `numpy` and > `numpy.lib` namespaces, so we can put this in `numpy.lib` and say that's > where library author functions go from now on. That'd be better than making > all `numpy.lib.*` submodules public. > > Cheers, > Ralf > Thanks all. So far, it looks like folks are in favor of ensuring that `normalize_axis_index` is public. So I'll remove the implementation from the scipy PR, and use the one in numpy. For the current and older releases of numpy, scipy can import the function `numpy.core.multiarray`. If a newer version of numpy is found, scipy can grab it from wherever it is decided its public home should be. Can we also make `normalize_axis_tuple` public? Currently it resides in `numpy.core.numeric`. Warren > > >> >> I think I would be reasonably happy to make the guarantee that it would >> not be removed (or more likely, moved) without a lengthy deprecation >> cycle. >> >> Perhaps worth opening a github issue, so we can keep track of how many >> downstream projects are already using it. >> >> Eric >> >> On Sun, 5 Apr 2020 at 15:06, Sebastian Berg >> wrote: >> >>> On Sun, 2020-04-05 at 00:43 -0400, Warren Weckesser wrote: >>> > On 4/4/20, Warren Weckesser wrote: >>> > > It would be handy if in scipy we can use the function >>> > > `numpy.lib.shape_base.normalize_axis_index` as a consistent method >>> > > for >>> > > validating an `axis` argument. Is this function considered part of >>> > > the public API? >>> > > >>> > > There are modules in numpy that do not have leading underscores but >>> > > are still usually considered private. I'm not sure if >>> > > `numpy.lib.shape_base` is one of those. `normalize_axis_index` is >>> > > not >>> > > in the top-level `numpy` namespace, and it is not included in the >>> > > API >>> > > reference >>> > > ( >>> > > >>> https://numpy.org/devdocs/search.html?q=normalize_axis_index&check_keywords=yes&area=default >>> > > ), >>> > > so I'm not sure if we can safely consider this function to be >>> > > public. >>> > > >>> >>> I do not see a reason why we should not make those functions public. >>> The only thing I see is that they are maybe not really required in the >>> main namespace, i.e. you can be expected to use:: >>> >>> from numpy.something import normalize_axis_tuple >>> >>> I think, since this is a function for library authors more than end- >>> users. And we do not have much prior art around where to put something >>> like that. >>> >>> Cheers, >>> >>> Sebastian >>> >>> >>> >>> > > Warren >>> > > >>> > >>> > Answering my own question: >>> > >>> > "shape_base.py" is not where `normalize_axis_index` is originally >>> > defined, so that module can be ignored. >>> > >>> > The function is actually defined in `numpy.core.multiarray`. The >>> > pull >>> > request in which the function was created is >>> > https://github.com/numpy/numpy/pull/8584. Whether or not the function >>> > was to be public is discussed starting here: >>> > https://github.com/numpy/numpy/pull/8584#issuecomment-281179399. A >>> > leading underscore was discussed and intentionally not added to the >>> > function. On the other hand, it was not added to the top-level >>> > namespace, and Eric Wieser wrote "Right now, it is only accessible >>> > via >>> > np.core.multiarray.normalize_axis_index, so yes, an internal >>> > function". >>> > >>> > There is another potentially useful function, `normalize_axis_tuple`, >>> > defined in `numpy.core.numeric`. This function is also not in the >>> > top-level numpy namespace. >>> > >>> > So it looks like neither of these functions is currently intended to >>> > be public. For the moment, I think we'll create our own utility >>> > functions in scipy. We can switch to using the numpy functions if >>> > those functions are ever intentionally made public. >>> > >>> > Warren >>> > _______________________________________________ >>> > NumPy-Discussion mailing list >>> > NumPy-Discussion at python.org >>> > https://mail.python.org/mailman/listinfo/numpy-discussion >>> > >>> >>> _______________________________________________ >>> NumPy-Discussion mailing list >>> NumPy-Discussion at python.org >>> https://mail.python.org/mailman/listinfo/numpy-discussion >>> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at python.org >> https://mail.python.org/mailman/listinfo/numpy-discussion >> > From sargam18262 at iiitd.ac.in Mon Apr 6 07:36:06 2020 From: sargam18262 at iiitd.ac.in (Sargam Monga) Date: Mon, 6 Apr 2020 17:06:06 +0530 Subject: [Numpy-discussion] unsubscribe from mailing list Message-ID: please unsubscribe me from the mailing list -- Sargam Monga 2018262 CSAM Undergraduate | IIIT-D -------------- next part -------------- An HTML attachment was scrubbed... URL: From robert.kern at gmail.com Mon Apr 6 13:41:08 2020 From: robert.kern at gmail.com (Robert Kern) Date: Mon, 6 Apr 2020 13:41:08 -0400 Subject: [Numpy-discussion] unsubscribe from mailing list In-Reply-To: References: Message-ID: Unsubscription is a self-serve operation. Go to this page and enter your email address down at the bottom where it talks about unsubscribing: https://mail.python.org/mailman/listinfo/numpy-discussion On Mon, Apr 6, 2020 at 1:07 PM Sargam Monga wrote: > please unsubscribe me from the mailing list > > -- > Sargam Monga > 2018262 > CSAM Undergraduate | IIIT-D > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > -- Robert Kern -------------- next part -------------- An HTML attachment was scrubbed... URL: From sebastian at sipsolutions.net Tue Apr 7 16:48:33 2020 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Tue, 07 Apr 2020 15:48:33 -0500 Subject: [Numpy-discussion] NumPy Development Meeting - Triage Focus Message-ID: Hi all, Our bi-weekly triage-focused NumPy development meeting is tomorrow (Wednesday, March 25) at 11 am Pacific Time (18:00 UTC). Everyone is invited to join in and edit the work-in-progress meeting topics and notes: https://hackmd.io/68i_JvOYQfy9ERiHgXMPvg I encourage everyone to notify us of issues or PRs that you feel should be prioritized or simply discussed briefly. Just comment on it so we can label it, or add your PR/issue to this weeks topics for discussion. Best regards Sebastian -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: This is a digitally signed message part URL: From chris.barker at noaa.gov Wed Apr 8 15:37:08 2020 From: chris.barker at noaa.gov (Chris Barker) Date: Wed, 8 Apr 2020 12:37:08 -0700 Subject: [Numpy-discussion] New DTypes: Are scalars a central concept in NumPy or not? In-Reply-To: References: <9716a1b7-871b-1776-11fe-59e095b89cf0@gmail.com> Message-ID: sorry to have fallen off the numpy grid for a bit, but: On Mon, Mar 23, 2020 at 1:37 PM Sebastian Berg wrote: > On Mon, 2020-03-23 at 11:45 -0700, Chris Barker wrote: > > But, backward compatibility aside, could we have ONLY Scalars? > > Well, it is hard to write functions that work on N-Dimensions (where N > can be 0), if the 0-D array does not exist. You can get away with > scalars in most cases, because they pretend to be arrays in most cases > (aside from mutability). > > But I am pretty sure we have a bunch of cases that need > `res = np.asarray(res)` simply because `res` is N-D but could then be > silently converted to a scalar. E.g. see > https://github.com/numpy/numpy/issues/13105 for an issue about this > (although it does not actually list any specific problems). > I'm not sure this is insolvable (again, backwards compatibility aside) -- after all, one of the key issues is that it's undetermined what the rank should be of: array(a_scalar) -- 0-d is the only unambiguous answer, but then it's not really an array in the usual sense anyway. So in theory, we could not allow that conversion without specifying a rank. at the end of the day, there has to be some endpoint on how far you can reduce the rank of an array and have it work -- why not have 1 be the lower limit? -CHB > - Sebastian > > > > There is certainly a need for more numpy-like scalars: more than the > > built > > in data types, and some handy attributes and methods, like dtype, > > .itemsize, etc. But could we make an enhanced scalar that had > > everything we > > actually need from a zero-d array? > > > > The key point would be mutability -- but do we really need mutable > > scalars? > > I can't think of any time I've needed that, when I couldn't have used > > a 1-d > > array of length 1. > > > > Is there a use case for zero-d arrays that could not be met with an > > enhanced scalar? > > > > -CHB > > > > > > > > > > > > > > > > On Mon, Feb 24, 2020 at 12:30 PM Allan Haldane < > > allanhaldane at gmail.com> > > wrote: > > > > > I have some thoughts on scalars from playing with ndarray ducktypes > > > (__array_function__), eg a MaskedArray ndarray-ducktype, for which > > > I > > > wanted an associated "MaskedScalar" type. > > > > > > In summary, the ways scalars currently work makes ducktyping > > > (duck-scalars) difficult: > > > > > > * numpy scalar types are not subclassable, so my duck-scalars > > > aren't > > > subclasses of numpy scalars and aren't in the type hierarchy > > > * even if scalars were subclassable, I would have to subclass > > > each > > > scalar datatype individually to make masked versions > > > * lots of code checks `np.isinstance(var, np.float64)` which > > > breaks > > > for my duck-scalars > > > * it was difficult to distinguish between a duck-scalar and a > > > duck-0d > > > array. The method I used in the end seems hacky. > > > > > > This has led to some daydreams about how scalars should work, and > > > also > > > led me last to read through your NEPs 40/41 with specific focus on > > > what > > > you said about scalars, and was about to post there until I saw > > > this > > > discussion. I agree with what you said in the NEPs about not making > > > scalars be dtype instances. > > > > > > Here is what ducktypes led me to: > > > > > > If we are able to do something like define a `np.numpy_scalar` type > > > covering all numpy scalars, which has a `.dtype` attribute like you > > > describe in the NEPs, then that would seem to solve the ducktype > > > problems above. Ducktype implementors would need to make a "duck- > > > scalar" > > > type in parallel to their "duck-ndarray" type, but I found that to > > > be > > > pretty easy using an abstract class in my MaskedArray ducktype, > > > since > > > the MaskedArray and MaskedScalar share a lot of behavior. > > > > > > A numpy_scalar type would also help solve some object-array > > > problems if > > > the object scalars are wrapped in the np_scalar type. A long time > > > ago I > > > started to try to fix up various funny/strange behaviors of object > > > datatypes, but there are lots of special cases, and the main > > > problem was > > > that the returned objects (eg from indexing) were not numpy types > > > and > > > did not support numpy attributes or indexing. Wrapping the returned > > > object in `np.numpy_scalar` might add an extra slight annoyance to > > > people who want to unwrap the object, but I think it would make > > > object > > > arrays less buggy and make code using object arrays easier to > > > reason > > > about and debug. > > > > > > Finally, a few random votes/comments based on the other emails on > > > the list: > > > > > > I think scalars have a place in numpy (rather than just reusing 0d > > > arrays), since there is a clear use in having hashable, immutable > > > scalars. Structured scalars should probably be immutable. > > > > > > I agree with your suggestion that scalars should not be indexable. > > > Thus, > > > my duck-scalars (and proposed numpy_scalar) would not be indexable. > > > However, I think they should encode their datatype though a .dtype > > > attribute like ndarrays, rather than by inheritance. > > > > > > Also, something to think about is that currently numpy scalars > > > satisfy > > > the property `isinstance(np.float64(1), float)`, i.e they are > > > within the > > > python numerical type hierarchy. 0d arrays do not have this > > > property. My > > > proposal above would break this. I'm not sure what to think about > > > whether this is a good property to maintain or not. > > > > > > Cheers, > > > Allan > > > > > > > > > > > > On 2/21/20 8:37 PM, Sebastian Berg wrote: > > > > Hi all, > > > > > > > > When we create new datatypes, we have the option to make new > > > > choices > > > > for the new datatypes [0] (not the existing ones). > > > > > > > > The question is: Should every NumPy datatype have a scalar > > > > associated > > > > and should operations like indexing return a scalar or a 0-D > > > > array? > > > > > > > > This is in my opinion a complex, almost philosophical, question, > > > > and we > > > > do not have to settle anything for a long time. But, if we do not > > > > decide a direction before we have many new datatypes the decision > > > > will > > > > make itself... > > > > So happy about any ideas, even if its just a gut feeling :). > > > > > > > > There are various points. I would like to mostly ignore the > > > > technical > > > > ones, but I am listing them anyway here: > > > > > > > > * Scalars are faster (although that can be optimized likely) > > > > > > > > * Scalars have a lower memory footprint > > > > > > > > * The current implementation incurs a technical debt in NumPy. > > > > (I do not think that is a general issue, though. We could > > > > automatically create scalars for each new datatype probably.) > > > > > > > > Advantages of having no scalars: > > > > > > > > * No need to keep track of scalars to preserve them in ufuncs, > > > > or > > > > libraries using `np.asarray`, do they need > > > > `np.asarray_or_scalar`? > > > > (or decide they return always arrays, although ufuncs may > > > > not) > > > > > > > > * Seems simpler in many ways, you always know the output will > > > > be an > > > > array if it has to do with NumPy. > > > > > > > > Advantages of having scalars: > > > > > > > > * Scalars are immutable and we are used to them from Python. > > > > A 0-D array cannot be used as a dictionary key consistently > > > > [1]. > > > > > > > > I.e. without scalars as first class citizen `dict[arr1d[0]]` > > > > cannot work, `dict[arr1d[0].item()]` may (if `.item()` is > > > > defined, > > > > and e.g. `dict[arr1d[0].frozen()]` could make a copy to work. > > > > [2] > > > > > > > > * Object arrays as we have them now make sense, `arr1d[0]` can > > > > reasonably return a Python object. I.e. arrays feel more like > > > > container if you can take elements out easily. > > > > > > > > Could go both ways: > > > > > > > > * Scalar math `scalar = arr1d[0]; scalar += 1` modifies the > > > > array > > > > without scalars. With scalars `arr1d[0, ...]` clarifies the > > > > meaning. (In principle it is good to never use `arr2d[0]` to > > > > get a 1D slice, probably more-so if scalars exist.) > > > > > > > > Note: array-scalars (the current NumPy scalars) are not useful in > > > > my > > > > opinion [3]. A scalar should not be indexed or have a shape. I do > > > > not > > > > believe in scalars pretending to be arrays. > > > > > > > > I personally tend towards liking scalars. If Python was a > > > > language > > > > where the array (array-programming) concept was ingrained into > > > > the > > > > language itself, I would lean the other way. But users are used > > > > to > > > > scalars, and they "put" scalars into arrays. Array objects are in > > > > some > > > > ways strange in Python, and I feel not having scalars detaches > > > > them > > > > further. > > > > > > > > Having scalars, however also means we should preserve them. I > > > > feel in > > > > principle that is actually fairly straight forward. E.g. for > > > > ufuncs: > > > > > > > > * np.add(scalar, scalar) -> scalar > > > > * np.add.reduce(arr, axis=None) -> scalar > > > > * np.add.reduce(arr, axis=1) -> array (even if arr is 1d) > > > > * np.add.reduce(scalar, axis=()) -> array > > > > > > > > Of course libraries that do `np.asarray` would/could basically > > > > chose to > > > > not preserve scalars: Their signature is defined as taking > > > > strictly > > > > array input. > > > > > > > > Cheers, > > > > > > > > Sebastian > > > > > > > > > > > > [0] At best this can be a vision to decide which way they may > > > > evolve. > > > > > > > > [1] E.g. PyTorch uses `hash(tensor) == id(tensor)` which is > > > > arguably > > > > strange. E.g. Quantity defines hash correctly, but does not fully > > > > ensure immutability for 0-D Quantities. Ensuring immutability in > > > > a > > > > world where "views" are a central concept requires a write-only > > > > copy. > > > > > > > > [2] Arguably `.item()` would always return a scalar, but it would > > > > be a > > > > second class citizen. (Although if it returns a scalar, at least > > > > we > > > > already have a scalar implementation.) > > > > > > > > [3] They are necessary due to technical debt for NumPy datatypes > > > > though. > > > > > > > > > > > > _______________________________________________ > > > > NumPy-Discussion mailing list > > > > NumPy-Discussion at python.org > > > > https://mail.python.org/mailman/listinfo/numpy-discussion > > > > > > > > > > _______________________________________________ > > > NumPy-Discussion mailing list > > > NumPy-Discussion at python.org > > > https://mail.python.org/mailman/listinfo/numpy-discussion > > > > > > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at python.org > > https://mail.python.org/mailman/listinfo/numpy-discussion > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From sebastian at sipsolutions.net Wed Apr 8 15:38:04 2020 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Wed, 08 Apr 2020 14:38:04 -0500 Subject: [Numpy-discussion] Accepting NEP 41: First step towards a new Datatype System In-Reply-To: <3ee04012570e418a874e4c436b5b6f8ac5f56cdb.camel@sipsolutions.net> References: <3ee04012570e418a874e4c436b5b6f8ac5f56cdb.camel@sipsolutions.net> Message-ID: Hi all, I propose to officially accepting NEP 41: "First step towards a new Datatype System" If you have any concerns please let me know or discuss here within a week. If there are no concerns voiced the NEP may be accepted. I realize that there may be some who need time to think about this individually and will of course wait, but at this time I hope that no larger discussion on the mailing list will be necessary. Again, the main immediate effect/design choice is that there will be classes for each NumPy dtype: float64 = np.dtype("float64") # Native byteorder float64 Float64DType = type(float64) # np.dtype[float64] issubclass(Float64DType, np.dtype) # True isinstance(float64, np.dtype) # True (as before) And in the above `float64.newbyteorder()` will also be an instance of the same `Float64DType` class. As such the class `Float64DType` in the above represents what is currently represented by the type number: `float64.num` This does admittedly mean that `Float64DType` effectively is a class with only a singleton instance in most cases, since non-native byte order or metadata are rarely used. Multiple instances are mainly necessary for datatypes such as current strings (with varying length) or datetimes (with a unit). There are probably alternatives and the boundaries between instances and can be drawn at different places (even within this framework), but I believe that it is the practical and intuitive approach to draw them at the current type numbers. Best, Sebastian On Wed, 2020-03-11 at 17:02 -0700, Sebastian Berg wrote: > Hi all, > > I am pleased to propose NEP 41: First step towards a new Datatype > System https://numpy.org/neps/nep-0041-improved-dtype-support.html > > This NEP motivates the larger restructure of the datatype machinery > in > NumPy and defines a few fundamental design aspects. The long term > user > impact will be allowing easier and more rich featured user defined > datatypes. > > As this is a large restructure, the NEP represents only the first > steps > with some additional information in further NEPs being drafted [1] > (this may be helpful to look at depending on the level of detail you > are interested in). > The NEP itself does not propose to add significant new public API. > Instead it proposes to move forward with an incremental internal > refactor and lays the foundation for this process. > > The main user facing change at this time is that datatypes will > become > classes (e.g. ``type(np.dtype("float64"))`` will be a float64 > specific > class. > For most users, the main impact should be many new datatypes in the > long run (see the user impact section). However, for those interested > in API design within NumPy or with respect to implementing new > datatypes, this and the following NEPs are important decisions in the > future roadmap for NumPy. > > The current full text is reproduced below, although the above link is > probably a better way to read it. > > Cheers > > Sebastian > > > [1] NEP 40 gives some background information about the current > systems > and issues with it: > https://github.com/numpy/numpy/blob/1248cf7a8765b7b53d883f9e7061173817533aac/doc/neps/nep-0040-legacy-datatype-impl.rst > and NEP 42 being a first draft of how the new API may look like: > > https://github.com/numpy/numpy/blob/f07e25cdff3967a19c4cc45c6e1a94a38f53cee3/doc/neps/nep-0042-new-dtypes.rst > (links to current rendered versions, check > https://github.com/numpy/numpy/pull/15505 and > https://github.com/numpy/numpy/pull/15507 for updates) > -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: This is a digitally signed message part URL: From sebastian at sipsolutions.net Wed Apr 8 16:16:13 2020 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Wed, 08 Apr 2020 15:16:13 -0500 Subject: [Numpy-discussion] New DTypes: Are scalars a central concept in NumPy or not? In-Reply-To: References: <9716a1b7-871b-1776-11fe-59e095b89cf0@gmail.com> Message-ID: <4e456f252d12c6b8634ac9c5b1476c3e13e5133e.camel@sipsolutions.net> On Wed, 2020-04-08 at 12:37 -0700, Chris Barker wrote: > sorry to have fallen off the numpy grid for a bit, but: > > On Mon, Mar 23, 2020 at 1:37 PM Sebastian Berg < > sebastian at sipsolutions.net> > wrote: > > > On Mon, 2020-03-23 at 11:45 -0700, Chris Barker wrote: > > > But, backward compatibility aside, could we have ONLY Scalars? > > Well, it is hard to write functions that work on N-Dimensions > > (where N > > can be 0), if the 0-D array does not exist. You can get away with > > scalars in most cases, because they pretend to be arrays in most > > cases > > (aside from mutability). > > > > But I am pretty sure we have a bunch of cases that need > > `res = np.asarray(res)` simply because `res` is N-D but could then > > be > > silently converted to a scalar. E.g. see > > https://github.com/numpy/numpy/issues/13105 for an issue about this > > (although it does not actually list any specific problems). > > > > I'm not sure this is insolvable (again, backwards compatibility > aside) -- > after all, one of the key issues is that it's undetermined what the > rank > should be of: array(a_scalar) -- 0-d is the only unambiguous answer, > but > then it's not really an array in the usual sense anyway. So in > theory, we > could not allow that conversion without specifying a rank. So as a (silly) example, the following does not generalize to 0d, even though it should: def weird_normalize_by_trace_inplace(stacked_matrices) """Devides matrices by their trace but retains sign (works in-place, and thus e.g. not for integer arrays) Parameters ---------- stacked_matrices : (..., N, M) ndarray """ assert stacked_matrices.shape[-1] == stacked_matrices.shape[-2] trace = np.trace(stacked_matrices, axis1=-2, axis2=-1) trace[trace < 0] *= -1 stacked_matrices /= trace Sure that function does not make sense and you could rewrite it, but the fact is that in that function you want to conditionally modify trace in-place, but trace can be 0d and the "conditional" modification breaks down. - Sebastian > > at the end of the day, there has to be some endpoint on how far you > can > reduce the rank of an array and have it work -- why not have 1 be the > lower > limit? > > -CHB > > > > > > > > > - Sebastian > > > > > > > There is certainly a need for more numpy-like scalars: more than > > > the > > > built > > > in data types, and some handy attributes and methods, like dtype, > > > .itemsize, etc. But could we make an enhanced scalar that had > > > everything we > > > actually need from a zero-d array? > > > > > > The key point would be mutability -- but do we really need > > > mutable > > > scalars? > > > I can't think of any time I've needed that, when I couldn't have > > > used > > > a 1-d > > > array of length 1. > > > > > > Is there a use case for zero-d arrays that could not be met with > > > an > > > enhanced scalar? > > > > > > -CHB > > > > > > > > > > > > > > > > > > > > > > > > On Mon, Feb 24, 2020 at 12:30 PM Allan Haldane < > > > allanhaldane at gmail.com> > > > wrote: > > > > > > > I have some thoughts on scalars from playing with ndarray > > > > ducktypes > > > > (__array_function__), eg a MaskedArray ndarray-ducktype, for > > > > which > > > > I > > > > wanted an associated "MaskedScalar" type. > > > > > > > > In summary, the ways scalars currently work makes ducktyping > > > > (duck-scalars) difficult: > > > > > > > > * numpy scalar types are not subclassable, so my duck-scalars > > > > aren't > > > > subclasses of numpy scalars and aren't in the type > > > > hierarchy > > > > * even if scalars were subclassable, I would have to subclass > > > > each > > > > scalar datatype individually to make masked versions > > > > * lots of code checks `np.isinstance(var, np.float64)` which > > > > breaks > > > > for my duck-scalars > > > > * it was difficult to distinguish between a duck-scalar and a > > > > duck-0d > > > > array. The method I used in the end seems hacky. > > > > > > > > This has led to some daydreams about how scalars should work, > > > > and > > > > also > > > > led me last to read through your NEPs 40/41 with specific focus > > > > on > > > > what > > > > you said about scalars, and was about to post there until I saw > > > > this > > > > discussion. I agree with what you said in the NEPs about not > > > > making > > > > scalars be dtype instances. > > > > > > > > Here is what ducktypes led me to: > > > > > > > > If we are able to do something like define a `np.numpy_scalar` > > > > type > > > > covering all numpy scalars, which has a `.dtype` attribute like > > > > you > > > > describe in the NEPs, then that would seem to solve the > > > > ducktype > > > > problems above. Ducktype implementors would need to make a > > > > "duck- > > > > scalar" > > > > type in parallel to their "duck-ndarray" type, but I found that > > > > to > > > > be > > > > pretty easy using an abstract class in my MaskedArray ducktype, > > > > since > > > > the MaskedArray and MaskedScalar share a lot of behavior. > > > > > > > > A numpy_scalar type would also help solve some object-array > > > > problems if > > > > the object scalars are wrapped in the np_scalar type. A long > > > > time > > > > ago I > > > > started to try to fix up various funny/strange behaviors of > > > > object > > > > datatypes, but there are lots of special cases, and the main > > > > problem was > > > > that the returned objects (eg from indexing) were not numpy > > > > types > > > > and > > > > did not support numpy attributes or indexing. Wrapping the > > > > returned > > > > object in `np.numpy_scalar` might add an extra slight annoyance > > > > to > > > > people who want to unwrap the object, but I think it would make > > > > object > > > > arrays less buggy and make code using object arrays easier to > > > > reason > > > > about and debug. > > > > > > > > Finally, a few random votes/comments based on the other emails > > > > on > > > > the list: > > > > > > > > I think scalars have a place in numpy (rather than just reusing > > > > 0d > > > > arrays), since there is a clear use in having hashable, > > > > immutable > > > > scalars. Structured scalars should probably be immutable. > > > > > > > > I agree with your suggestion that scalars should not be > > > > indexable. > > > > Thus, > > > > my duck-scalars (and proposed numpy_scalar) would not be > > > > indexable. > > > > However, I think they should encode their datatype though a > > > > .dtype > > > > attribute like ndarrays, rather than by inheritance. > > > > > > > > Also, something to think about is that currently numpy scalars > > > > satisfy > > > > the property `isinstance(np.float64(1), float)`, i.e they are > > > > within the > > > > python numerical type hierarchy. 0d arrays do not have this > > > > property. My > > > > proposal above would break this. I'm not sure what to think > > > > about > > > > whether this is a good property to maintain or not. > > > > > > > > Cheers, > > > > Allan > > > > > > > > > > > > > > > > On 2/21/20 8:37 PM, Sebastian Berg wrote: > > > > > Hi all, > > > > > > > > > > When we create new datatypes, we have the option to make new > > > > > choices > > > > > for the new datatypes [0] (not the existing ones). > > > > > > > > > > The question is: Should every NumPy datatype have a scalar > > > > > associated > > > > > and should operations like indexing return a scalar or a 0-D > > > > > array? > > > > > > > > > > This is in my opinion a complex, almost philosophical, > > > > > question, > > > > > and we > > > > > do not have to settle anything for a long time. But, if we do > > > > > not > > > > > decide a direction before we have many new datatypes the > > > > > decision > > > > > will > > > > > make itself... > > > > > So happy about any ideas, even if its just a gut feeling :). > > > > > > > > > > There are various points. I would like to mostly ignore the > > > > > technical > > > > > ones, but I am listing them anyway here: > > > > > > > > > > * Scalars are faster (although that can be optimized > > > > > likely) > > > > > > > > > > * Scalars have a lower memory footprint > > > > > > > > > > * The current implementation incurs a technical debt in > > > > > NumPy. > > > > > (I do not think that is a general issue, though. We could > > > > > automatically create scalars for each new datatype > > > > > probably.) > > > > > > > > > > Advantages of having no scalars: > > > > > > > > > > * No need to keep track of scalars to preserve them in > > > > > ufuncs, > > > > > or > > > > > libraries using `np.asarray`, do they need > > > > > `np.asarray_or_scalar`? > > > > > (or decide they return always arrays, although ufuncs may > > > > > not) > > > > > > > > > > * Seems simpler in many ways, you always know the output > > > > > will > > > > > be an > > > > > array if it has to do with NumPy. > > > > > > > > > > Advantages of having scalars: > > > > > > > > > > * Scalars are immutable and we are used to them from > > > > > Python. > > > > > A 0-D array cannot be used as a dictionary key > > > > > consistently > > > > > [1]. > > > > > > > > > > I.e. without scalars as first class citizen > > > > > `dict[arr1d[0]]` > > > > > cannot work, `dict[arr1d[0].item()]` may (if `.item()` is > > > > > defined, > > > > > and e.g. `dict[arr1d[0].frozen()]` could make a copy to > > > > > work. > > > > > [2] > > > > > > > > > > * Object arrays as we have them now make sense, `arr1d[0]` > > > > > can > > > > > reasonably return a Python object. I.e. arrays feel more > > > > > like > > > > > container if you can take elements out easily. > > > > > > > > > > Could go both ways: > > > > > > > > > > * Scalar math `scalar = arr1d[0]; scalar += 1` modifies the > > > > > array > > > > > without scalars. With scalars `arr1d[0, ...]` clarifies > > > > > the > > > > > meaning. (In principle it is good to never use `arr2d[0]` > > > > > to > > > > > get a 1D slice, probably more-so if scalars exist.) > > > > > > > > > > Note: array-scalars (the current NumPy scalars) are not > > > > > useful in > > > > > my > > > > > opinion [3]. A scalar should not be indexed or have a shape. > > > > > I do > > > > > not > > > > > believe in scalars pretending to be arrays. > > > > > > > > > > I personally tend towards liking scalars. If Python was a > > > > > language > > > > > where the array (array-programming) concept was ingrained > > > > > into > > > > > the > > > > > language itself, I would lean the other way. But users are > > > > > used > > > > > to > > > > > scalars, and they "put" scalars into arrays. Array objects > > > > > are in > > > > > some > > > > > ways strange in Python, and I feel not having scalars > > > > > detaches > > > > > them > > > > > further. > > > > > > > > > > Having scalars, however also means we should preserve them. I > > > > > feel in > > > > > principle that is actually fairly straight forward. E.g. for > > > > > ufuncs: > > > > > > > > > > * np.add(scalar, scalar) -> scalar > > > > > * np.add.reduce(arr, axis=None) -> scalar > > > > > * np.add.reduce(arr, axis=1) -> array (even if arr is 1d) > > > > > * np.add.reduce(scalar, axis=()) -> array > > > > > > > > > > Of course libraries that do `np.asarray` would/could > > > > > basically > > > > > chose to > > > > > not preserve scalars: Their signature is defined as taking > > > > > strictly > > > > > array input. > > > > > > > > > > Cheers, > > > > > > > > > > Sebastian > > > > > > > > > > > > > > > [0] At best this can be a vision to decide which way they may > > > > > evolve. > > > > > > > > > > [1] E.g. PyTorch uses `hash(tensor) == id(tensor)` which is > > > > > arguably > > > > > strange. E.g. Quantity defines hash correctly, but does not > > > > > fully > > > > > ensure immutability for 0-D Quantities. Ensuring immutability > > > > > in > > > > > a > > > > > world where "views" are a central concept requires a write- > > > > > only > > > > > copy. > > > > > > > > > > [2] Arguably `.item()` would always return a scalar, but it > > > > > would > > > > > be a > > > > > second class citizen. (Although if it returns a scalar, at > > > > > least > > > > > we > > > > > already have a scalar implementation.) > > > > > > > > > > [3] They are necessary due to technical debt for NumPy > > > > > datatypes > > > > > though. > > > > > > > > > > > > > > > _______________________________________________ > > > > > NumPy-Discussion mailing list > > > > > NumPy-Discussion at python.org > > > > > https://mail.python.org/mailman/listinfo/numpy-discussion > > > > > > > > > > > > > _______________________________________________ > > > > NumPy-Discussion mailing list > > > > NumPy-Discussion at python.org > > > > https://mail.python.org/mailman/listinfo/numpy-discussion > > > > > > > > > > _______________________________________________ > > > NumPy-Discussion mailing list > > > NumPy-Discussion at python.org > > > https://mail.python.org/mailman/listinfo/numpy-discussion > > > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at python.org > > https://mail.python.org/mailman/listinfo/numpy-discussion > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: This is a digitally signed message part URL: From t3kcit at gmail.com Wed Apr 8 17:04:29 2020 From: t3kcit at gmail.com (Andreas Mueller) Date: Wed, 8 Apr 2020 17:04:29 -0400 Subject: [Numpy-discussion] NEP 37: A dispatch protocol for NumPy-like modules In-Reply-To: References: <21692339-9f4b-029c-d422-ea549acbe6c3@gmail.com> <1cfce715d48b847e91739c2a56b9750f15b1958f.camel@sipsolutions.net> <39784d9f-17c0-d5b4-4575-3ad2826ea3ce@gmail.com> Message-ID: Hey all. Is there any update on this? Is there any input we can provide as users? I'm not entirely sure where you are in the decision making process right now :) Cheers, Andy On 3/3/20 6:34 PM, Sebastian Berg wrote: > On Fri, 2020-02-28 at 11:28 -0500, Allan Haldane wrote: >> On 2/23/20 6:59 PM, Ralf Gommers wrote: >>> One of the main rationales for the whole NEP, and the argument in >>> multiple places >>> ( >>> https://numpy.org/neps/nep-0037-array-module.html#opt-in-vs-opt-out-for-users >>> ) >>> is that it's now opt-in while __array_function__ was opt-out. This >>> isn't >>> really true - the problem is simply *moved*, from the duck array >>> libraries to the array-consuming libraries. The end user will still >>> see >>> the backwards incompatible change, with no way to turn it off. It >>> will >>> be easier with __array_module__ to warn users, but this should be >>> expanded on in the NEP. >> Might it be possible to flip this NEP back to opt-out while keeping >> the >> nice simplifications and configurabile array-creation routines, >> relative >> to __array_function__? >> >> That is, what if we define two modules, "numpy" and "numpy_strict". >> "numpy_strict" would raise an exception on duck-arrays defining >> __array_module__ (as numpy currently does). "numpy" would be a >> wrapper >> around "numpy_strict" that decorates all numpy methods with a call to >> "get_array_module(inputs).func(inputs)". > This would be possible, but I think we strongly leaned against the > idea. Basically, if you have to opt-out, from a library perspective > there may be `np.asarray` calls, which for example later call into C > and expect arrays. > So, I have large doubts that an opt-out solution works easily for > library authors. Array function is opt-out, but effectively most clean > library code already opted out... > > We had previously discussed the opposite, of having a namespace of > implicit dispatching based on get_array_module, but if we keep array > function around, I am not sure there is much reason for it. > >> Then end-user code that did "import numpy as np" would accept >> ducktypes >> by default, while library developers who want to signal they don't >> support ducktypes can opt-out by doing "import numpy_strict as np". >> Issues with `np.as_array` seem mitigated compared to >> __array_function__ >> since that method would now be ducktype-aware. > My tendency is that if we want to go there, we would need to push ahead > with the `np.duckarray()` idea instead. > > To be clear: I currently very much prefer the get_array_module() idea. > It just seems much cleaner for library authors, and they are the > primary issue at the moment in my opinion. > > - Sebastian > > >> -Allan >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at python.org >> https://mail.python.org/mailman/listinfo/numpy-discussion >> >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at python.org >> https://mail.python.org/mailman/listinfo/numpy-discussion -------------- next part -------------- An HTML attachment was scrubbed... URL: From sebastian at sipsolutions.net Wed Apr 8 17:56:35 2020 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Wed, 08 Apr 2020 16:56:35 -0500 Subject: [Numpy-discussion] NEP 37: A dispatch protocol for NumPy-like modules In-Reply-To: References: <21692339-9f4b-029c-d422-ea549acbe6c3@gmail.com> <1cfce715d48b847e91739c2a56b9750f15b1958f.camel@sipsolutions.net> <39784d9f-17c0-d5b4-4575-3ad2826ea3ce@gmail.com> Message-ID: On Wed, 2020-04-08 at 17:04 -0400, Andreas Mueller wrote: > Hey all. > Is there any update on this? Is there any input we can provide as > users? > I'm not entirely sure where you are in the decision making process > right > now :) > Hey, thanks for the ping. Things are a bit stuck right now. I think what we need is some clarity on the implications and alternatives. I was thinking about organizing a small conference call with the main people interested in the next weeks. There are also still some alternatives to this NEP in the race, and we may need to clarify which ones are actually still in the race... Maybe to see some of the possible sticking points: 1. What do we do about SciPy, have it under this umbrella? And how would we want to design that. 2. Context managers have some composition issues, maybe less so if they are in the downstream package. Or should we have global defaults as well? 3. How do we ensure safe transitions for users as much as possible. * If you use this, can functions suddenly return a different type in the future? * Should we force you to cast to NumPy arrays in a transition period, or force you to somehow silence a transition warning? 4. Is there a serious push to have a "reduced" API or even a versioned API? But I am probably forgetting some other things. In my personal opinion, I think NEP 37 with minor modifications is still the best duck in the race. I feel we should be able to find a reasonable solution for SciPy. Point 2. about Context managers may be true, but this is much smaller in scope from the ones uarray proposed IIRC, and I could not figure out major scoping issues with it yet (the sklearn draft). About the safe transition, that may be the stickiest point. But e.g. if you enable `get_array_module` sklearn could limit a certain function to error out if it finds something other than NumPy? The main problem is how to do opt-in into future behaviour. A context manager can do that, although the danger is that someone just uses that everywhere... On the reduced/versioned API front, I would hope that we can defer that as a semi-orthogonal issue, basically saying that for now you have to provide a NumPy API that faithfully reproduces whatever NumPy version is installed on the system. Cheers, Sebastian > Cheers, > Andy > > On 3/3/20 6:34 PM, Sebastian Berg wrote: > > On Fri, 2020-02-28 at 11:28 -0500, Allan Haldane wrote: > > > On 2/23/20 6:59 PM, Ralf Gommers wrote: > > > > One of the main rationales for the whole NEP, and the argument > > > > in > > > > multiple places > > > > ( > > > > https://numpy.org/neps/nep-0037-array-module.html#opt-in-vs-opt-out-for-users > > > > ) > > > > is that it's now opt-in while __array_function__ was opt-out. > > > > This > > > > isn't > > > > really true - the problem is simply *moved*, from the duck > > > > array > > > > libraries to the array-consuming libraries. The end user will > > > > still > > > > see > > > > the backwards incompatible change, with no way to turn it off. > > > > It > > > > will > > > > be easier with __array_module__ to warn users, but this should > > > > be > > > > expanded on in the NEP. > > > Might it be possible to flip this NEP back to opt-out while > > > keeping > > > the > > > nice simplifications and configurabile array-creation routines, > > > relative > > > to __array_function__? > > > > > > That is, what if we define two modules, "numpy" and > > > "numpy_strict". > > > "numpy_strict" would raise an exception on duck-arrays defining > > > __array_module__ (as numpy currently does). "numpy" would be a > > > wrapper > > > around "numpy_strict" that decorates all numpy methods with a > > > call to > > > "get_array_module(inputs).func(inputs)". > > This would be possible, but I think we strongly leaned against the > > idea. Basically, if you have to opt-out, from a library perspective > > there may be `np.asarray` calls, which for example later call into > > C > > and expect arrays. > > So, I have large doubts that an opt-out solution works easily for > > library authors. Array function is opt-out, but effectively most > > clean > > library code already opted out... > > > > We had previously discussed the opposite, of having a namespace of > > implicit dispatching based on get_array_module, but if we keep > > array > > function around, I am not sure there is much reason for it. > > > > > Then end-user code that did "import numpy as np" would accept > > > ducktypes > > > by default, while library developers who want to signal they > > > don't > > > support ducktypes can opt-out by doing "import numpy_strict as > > > np". > > > Issues with `np.as_array` seem mitigated compared to > > > __array_function__ > > > since that method would now be ducktype-aware. > > My tendency is that if we want to go there, we would need to push > > ahead > > with the `np.duckarray()` idea instead. > > > > To be clear: I currently very much prefer the get_array_module() > > idea. > > It just seems much cleaner for library authors, and they are the > > primary issue at the moment in my opinion. > > > > - Sebastian > > > > > > > -Allan > > > _______________________________________________ > > > NumPy-Discussion mailing list > > > NumPy-Discussion at python.org > > > https://mail.python.org/mailman/listinfo/numpy-discussion > > > > > > > > > _______________________________________________ > > > NumPy-Discussion mailing list > > > NumPy-Discussion at python.org > > > https://mail.python.org/mailman/listinfo/numpy-discussion > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: This is a digitally signed message part URL: From chris.barker at noaa.gov Wed Apr 8 18:45:58 2020 From: chris.barker at noaa.gov (Chris Barker) Date: Wed, 8 Apr 2020 15:45:58 -0700 Subject: [Numpy-discussion] New DTypes: Are scalars a central concept in NumPy or not? In-Reply-To: <4e456f252d12c6b8634ac9c5b1476c3e13e5133e.camel@sipsolutions.net> References: <9716a1b7-871b-1776-11fe-59e095b89cf0@gmail.com> <4e456f252d12c6b8634ac9c5b1476c3e13e5133e.camel@sipsolutions.net> Message-ID: On Wed, Apr 8, 2020 at 1:17 PM Sebastian Berg wrote: > > > > But, backward compatibility aside, could we have ONLY Scalars? > > > Well, it is hard to write functions that work on N-Dimensions > > > (where N > > > can be 0), if the 0-D array does not exist. > > So as a (silly) example, the following does not generalize to 0d, even > though it should: > > def weird_normalize_by_trace_inplace(stacked_matrices) > """Devides matrices by their trace but retains sign > (works in-place, and thus e.g. not for integer arrays) > > Parameters > ---------- > stacked_matrices : (..., N, M) ndarray > """ > assert stacked_matrices.shape[-1] == stacked_matrices.shape[-2] > > trace = np.trace(stacked_matrices, axis1=-2, axis2=-1) > trace[trace < 0] *= -1 > stacked_matrices /= trace > > Sure that function does not make sense and you could rewrite it, but > the fact is that in that function you want to conditionally modify > trace in-place, but trace can be 0d and the "conditional" modification > breaks down. > I guess that's what I'm getting at -- there is always an endpoint to reducing the rank. a function that's designed to work on a "stack" of something doesn't have to work on a single something, when it can, instead, work on a "stack" of hight one. Isn't the trace of a matrix always a scalar? and thus the trace(s) of a stack of matrixes would always be 1-D? So that function should do something like: stacked_matrixes.shape = (-1, M, M) yes? and then it would always work. Again, backwards compatibility, but there is a reason the np.atleast_*() functions exist -- you often need to make sure your inputs have the dimensionality expected. -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at gmail.com Thu Apr 9 07:52:05 2020 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Thu, 9 Apr 2020 13:52:05 +0200 Subject: [Numpy-discussion] NEP 37: A dispatch protocol for NumPy-like modules In-Reply-To: <90692cc4a2f009b061e55d8fe9b118f709ff1375.camel@sipsolutions.net> References: <21692339-9f4b-029c-d422-ea549acbe6c3@gmail.com> <1cfce715d48b847e91739c2a56b9750f15b1958f.camel@sipsolutions.net> <90692cc4a2f009b061e55d8fe9b118f709ff1375.camel@sipsolutions.net> Message-ID: On Wed, Mar 4, 2020 at 1:22 AM Sebastian Berg wrote: > On Sun, 2020-02-23 at 22:44 -0800, Stephan Hoyer wrote: > > On Sun, Feb 23, 2020 at 3:59 PM Ralf Gommers > > wrote: > > > > > > On Sun, Feb 23, 2020 at 3:31 PM Stephan Hoyer > > > wrote: > > > > On Thu, Feb 6, 2020 at 12:20 PM Sebastian Berg < > > > > sebastian at sipsolutions.net> wrote: > > > > > > > > > I don't think NumPy needs to do anything about warnings. It is > > > > straightforward for libraries that want to use use > > > > get_array_module() to issue their own warnings before calling > > > > get_array_module(), if desired. > > > > > > > > Or alternatively, if a library is about to add a new > > > > __array_module__ method, it is straightforward to issue a warning > > > > inside the new __array_module__ method before returning the NumPy > > > > functions. > > > > > > > > > > I don't think this is quite enough. Sebastian points out a fairly > > > important issue. One of the main rationales for the whole NEP, and > > > the argument in multiple places ( > > > > https://numpy.org/neps/nep-0037-array-module.html#opt-in-vs-opt-out-for-users > > > ) is that it's now opt-in while __array_function__ was opt-out. > > > This isn't really true - the problem is simply *moved*, from the > > > duck array libraries to the array-consuming libraries. The end user > > > will still see the backwards incompatible change, with no way to > > > turn it off. It will be easier with __array_module__ to warn users, > > > but this should be expanded on in the NEP. > > > > > > > Ralf, thanks for sharing your thoughts. > Sorry, this never made it back to the top of my todo list. > > > I'm not quite I understand the concerns about backwards > > incompatibility: > > 1. The intention is that implementing a __array_module__ method > > should be backwards compatible with all current uses of NumPy. This > > satisfies backwards compatibility concerns for an array-implementing > > library like JAX. > > 2. In contrast, calling get_array_module() offers no guarantees about > > backwards compatibility. This seems nearly impossible, because the > > entire point of the protocol is to make it possible to opt-in to new > > behavior. Indeed, it is nearly impossible. Except if there's a context manager or some other control mechanism exposed to the end user. Hence that should be a part of the design I think. Otherwise you're just solving something for the JAX devs, but not for the scikit-learn/scipy/etc devs who will then each have to invent their own wheel for backwards compat. So backwards compatibility isn't solved for Scikit-Learn > > switching to use get_array_module(), and after Scikit-Learn does so, > > adding __array_module__ to new types of arrays could potentially have > > backwards incompatible consequences for Scikit-Learn (unless sklearn > > uses default=None). > > > > Are you suggesting just adding something like what I'm writing here > > into the NEP? Perhaps along with advice to consider issuing warnings > > inside __array_module__ and falling back to legacy behavior when > > first implementing it on a new type? > > I think that should be sufficient, personally. We could mention that > scikit-learn will likely use a context manager to do this. > We can also think about providing a global default (which sklearn can > use as its own default if they wish so, but that is reserved to the > end-user). > +1 That would be a small amendment, and I think we could add it even after > accepting the NEP as it is. > > > > > We could also potentially make a few changes to make backwards > > compatibility even easier, by making the protocol less aggressive > > about assuming that NumPy is a safe fallback. Some non-exclusive > > options: > > a. We could switch the default value of "default" on > > get_array_module() to None, so an exception is raised if nothing > > implements __array_module__. > > I am not sure that I feel switching the default to None makes much of a > difference to be honest. Unless we use it to signal a super strict mode > similar to b. below. > I agree, that doesn't make a difference. > > b. We could includes *all* argument types in "types", not just types > > that implement __array_module__. NumPy's ndarray.__array_module__ > > could then recognize and refuse to return an implementation if there > > are other arguments that might implement __array_module__ in the > > future (e.g., anything outside the standard library?). > > That is a good point, anything that is not NumPy recognized could > simply be rejected. It does mean that you have to call > `module.asarray()` manually more often though. > For `list`, it could also make sense to just add np.ndarray to types. > > If we want to be conservative, maybe we could also just error before > calling `__array_module__`. Whenever there is something that we do not > know how to interpret force the user to clarify? > > > > > The downside of making either of these choices is that it would > > potentially make get_array_function() a bit less usable, because it > > is more likely to fail, e.g., if called on a float, or some custom > > type that should be treated as a scalar. > > Right, although we could relax it later if it seems overly annoying. > Interesting point. Not accepting sequences could be considered here. It may help a lot with robustness and typing to only accept ndarray, other objects with __array__, and scalars. > > > > > Also, I'm still not sure I agree with the tone of the discussion on > > > this topic. It's very heavily inspired by what the JAX devs are > > > telling you (the NEP still says PyTorch and scipy.sparse as well, > > > but that's not true in both cases). If you ask Dask and CuPy for > > > example, they're quite happy with __array_function__ and there > > > haven't been many complaints about backwards compat breakage. > > > > > > > I'm linking to comments you wrote in reference to PyTorch and > > scipy.sparse in the current draft of the NEP, so I certainly want to > > make sure that you agree my characterization :). > > > > Would it be fair to say: > > - JAX is reluctant to implement __array_function__ because of > > concerns about breaking existing code. JAX developers think that when > > users use NumPy functions on JAX arrays, they are explicitly choosing > > to convert from JAX to NumPy. This model is fundamentally > > incompatible __array_function__, which we chose to override the > > existing numpy namespace. > agreed > - PyTorch and scipy.sparse are not yet in position to implement > > __array_function__ (due to a lack of a direct implementation of > > NumPy's API), but these projects take backwards compatibility > > seriously. > True. I would say though that scipy.sparse will never implement either __array_function__ or array_module__ due to semantic imcompatibilities (it acts like np.matrix). So it's kind of irrelevant. And if PyTorch gets around to adding a numpy-compatible API, they're fine with __array_function__. > > > Does "take backwards compatibility seriously" sound about right to > > you? I'm very open to specific suggestions here. (TensorFlow could > > probably also be safely added to this second list.) > > This will need input from Ralf, my personal main concern is backward > compatibility in libraries: I am pretty sure sklearn would only use a > potential `np.asduckarray` when the user opted in. But in that case my > personal feeling is that the `get_array_module` solution is cleaner and > makes it easier to expand functionality slowly (for libraries). > > Two other points: > > First, I am wondering if we should add something like a `__qualname__` > to the contract. I.e. a returned module must have a well defined > `module.__name__` (that is usually already correct), so that sklearn > could do: > > module = np.get_array_module(*arrays) > if module.__name__ not in ("numpy", "sparse"): > raise TypeError("Currently only numpy and sparse are supported") > > if they wish so (that is trivial, but if you return a class acting as a > module it may be important). > > Second, we have to make progress on whether or not the "restricted" > namespace idea should have priority. My personal opinion is tending > strongly towards no. > I think it's quite important, and __array_module__ gives a chance to introduce it. However, it's not ready - so I'd say that if __array_module__ implementation is ready and there's no well-defined restricted API proposal (I expect to have that in August), then we can move ahead without it. The NumPy version should normally be older than other libraries, and if > NumPy updates the API so do the downstream implementers. > E.g. dask may have to provide multiple version of the same function > depending on the installed NumPy version, but that seems OK to me? That seems unworkable, and I don't think any libraries do this. Coupling the semantics of a single Dask function to the installed numpy version is odd. It is just as downstream libraries currently have to support multiple > NumPy versions. > We could add a contract that the first time `get_array_module` is used > to e.g. get the dask namespace and the NumPy version is too new, a > warning should be given. > I think we can't solve this until we have a well-defined API, which is the restricted API + API versioning. Until then it just remains with the current status, compatibility is implementation-defined. Cheers, Ralf > The practical thing seems to me that we ignore this for the moment (as > something we can do later on)? If there is missing API, in most cases > an AttributeError will be raised which could provide some additional > information to the user? > The only alternative seems the complete opposite?: Create a new module, > and make even NumPy only one of the implementers of that new > (restricted) module. That may be cleaner, but I fear that it is > impractical to be honest. > > > I will put this on the agenda for tomorrow, even if we discuss it only > very briefly. My feeling (and hope) is that we are nearing a point > where we can make a final decision. > > Best, > > Sebastian > > > > > > Best, > > Stephan > > > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at python.org > > https://mail.python.org/mailman/listinfo/numpy-discussion > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at gmail.com Thu Apr 9 07:52:12 2020 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Thu, 9 Apr 2020 13:52:12 +0200 Subject: [Numpy-discussion] NEP 37: A dispatch protocol for NumPy-like modules In-Reply-To: References: <21692339-9f4b-029c-d422-ea549acbe6c3@gmail.com> <1cfce715d48b847e91739c2a56b9750f15b1958f.camel@sipsolutions.net> <39784d9f-17c0-d5b4-4575-3ad2826ea3ce@gmail.com> Message-ID: On Thu, Apr 9, 2020 at 12:02 AM Sebastian Berg wrote: > On Wed, 2020-04-08 at 17:04 -0400, Andreas Mueller wrote: > > Hey all. > > Is there any update on this? Is there any input we can provide as > > users? > > I'm not entirely sure where you are in the decision making process > > right > > now :) > > > > Hey, > > thanks for the ping. Things are a bit stuck right now. I think what we > need is some clarity on the implications and alternatives. > I was thinking about organizing a small conference call with the main > people interested in the next weeks. > > There are also still some alternatives to this NEP in the race, and we > may need to clarify which ones are actually still in the race... > > > Maybe to see some of the possible sticking points: > > 1. What do we do about SciPy, have it under this umbrella? And how > would we want to design that. > Current feeling: best to ignore it for now. It's quite a bit of work to fix API incompatibilities for linalg that no one currently seems interested in tackling. We can revisit once that's done. > 2. Context managers have some composition issues, maybe less so if they > are in the downstream package. Or should we have global defaults as > well? > +1 for adding this right next to get_array_module(). > 3. How do we ensure safe transitions for users as much as possible. > * If you use this, can functions suddenly return a different type > in the future? > * Should we force you to cast to NumPy arrays in a transition > period, or force you to somehow silence a transition warning? > > 4. Is there a serious push to have a "reduced" API or even a versioned > API? > There is, it'll take a few months. > > But I am probably forgetting some other things. > > > In my personal opinion, I think NEP 37 with minor modifications is > still the best duck in the race. I feel we should be able to find a > reasonable solution for SciPy. > Point 2. about Context managers may be true, but this is much smaller > in scope from the ones uarray proposed IIRC, and I could not figure out > major scoping issues with it yet (the sklearn draft). > > About the safe transition, that may be the stickiest point. But e.g. if > you enable `get_array_module` sklearn could limit a certain function to > error out if it finds something other than NumPy? > The main problem is how to do opt-in into future behaviour. A context > manager can do that, although the danger is that someone just uses that > everywhere... > > On the reduced/versioned API front, I would hope that we can defer that > as a semi-orthogonal issue, basically saying that for now you have to > provide a NumPy API that faithfully reproduces whatever NumPy version > is installed on the system. > I think it would be nice to have a separate NEP 37 implementation outside of NumPy to play with. Unlike __array_function__, I don't think it has to go into NumPy immediately. This avoids the whole "experimental API" issue, it would be quite useful to test this with, e.g., CuPy + scikit-learn without being stuck with any decisions in a released NumPy version. Also makes switching on/off very easy for users, just (don't) `pip install numpy-array-module`. Cheers, Ralf > Cheers, > > Sebastian > > > > Cheers, > > Andy > > > > On 3/3/20 6:34 PM, Sebastian Berg wrote: > > > On Fri, 2020-02-28 at 11:28 -0500, Allan Haldane wrote: > > > > On 2/23/20 6:59 PM, Ralf Gommers wrote: > > > > > One of the main rationales for the whole NEP, and the argument > > > > > in > > > > > multiple places > > > > > ( > > > > > > https://numpy.org/neps/nep-0037-array-module.html#opt-in-vs-opt-out-for-users > > > > > ) > > > > > is that it's now opt-in while __array_function__ was opt-out. > > > > > This > > > > > isn't > > > > > really true - the problem is simply *moved*, from the duck > > > > > array > > > > > libraries to the array-consuming libraries. The end user will > > > > > still > > > > > see > > > > > the backwards incompatible change, with no way to turn it off. > > > > > It > > > > > will > > > > > be easier with __array_module__ to warn users, but this should > > > > > be > > > > > expanded on in the NEP. > > > > Might it be possible to flip this NEP back to opt-out while > > > > keeping > > > > the > > > > nice simplifications and configurabile array-creation routines, > > > > relative > > > > to __array_function__? > > > > > > > > That is, what if we define two modules, "numpy" and > > > > "numpy_strict". > > > > "numpy_strict" would raise an exception on duck-arrays defining > > > > __array_module__ (as numpy currently does). "numpy" would be a > > > > wrapper > > > > around "numpy_strict" that decorates all numpy methods with a > > > > call to > > > > "get_array_module(inputs).func(inputs)". > > > This would be possible, but I think we strongly leaned against the > > > idea. Basically, if you have to opt-out, from a library perspective > > > there may be `np.asarray` calls, which for example later call into > > > C > > > and expect arrays. > > > So, I have large doubts that an opt-out solution works easily for > > > library authors. Array function is opt-out, but effectively most > > > clean > > > library code already opted out... > > > > > > We had previously discussed the opposite, of having a namespace of > > > implicit dispatching based on get_array_module, but if we keep > > > array > > > function around, I am not sure there is much reason for it. > > > > > > > Then end-user code that did "import numpy as np" would accept > > > > ducktypes > > > > by default, while library developers who want to signal they > > > > don't > > > > support ducktypes can opt-out by doing "import numpy_strict as > > > > np". > > > > Issues with `np.as_array` seem mitigated compared to > > > > __array_function__ > > > > since that method would now be ducktype-aware. > > > My tendency is that if we want to go there, we would need to push > > > ahead > > > with the `np.duckarray()` idea instead. > > > > > > To be clear: I currently very much prefer the get_array_module() > > > idea. > > > It just seems much cleaner for library authors, and they are the > > > primary issue at the moment in my opinion. > > > > > > - Sebastian > > > > > > > > > > -Allan > > > > _______________________________________________ > > > > NumPy-Discussion mailing list > > > > NumPy-Discussion at python.org > > > > https://mail.python.org/mailman/listinfo/numpy-discussion > > > > > > > > > > > > _______________________________________________ > > > > NumPy-Discussion mailing list > > > > NumPy-Discussion at python.org > > > > https://mail.python.org/mailman/listinfo/numpy-discussion > > > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at python.org > > https://mail.python.org/mailman/listinfo/numpy-discussion > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From sebastian at sipsolutions.net Thu Apr 9 12:48:46 2020 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Thu, 09 Apr 2020 11:48:46 -0500 Subject: [Numpy-discussion] NEP 37: A dispatch protocol for NumPy-like modules In-Reply-To: References: <21692339-9f4b-029c-d422-ea549acbe6c3@gmail.com> <1cfce715d48b847e91739c2a56b9750f15b1958f.camel@sipsolutions.net> <90692cc4a2f009b061e55d8fe9b118f709ff1375.camel@sipsolutions.net> Message-ID: On Thu, 2020-04-09 at 13:52 +0200, Ralf Gommers wrote: > On Wed, Mar 4, 2020 at 1:22 AM Sebastian Berg < > sebastian at sipsolutions.net> > wrote: > > > On Sun, 2020-02-23 at 22:44 -0800, Stephan Hoyer wrote: > > > On Sun, Feb 23, 2020 at 3:59 PM Ralf Gommers < > > > ralf.gommers at gmail.com> > > > wrote: > > > > On Sun, Feb 23, 2020 at 3:31 PM Stephan Hoyer > > > > > > > > wrote: > > > > > On Thu, Feb 6, 2020 at 12:20 PM Sebastian Berg < > > > > > sebastian at sipsolutions.net> wrote: > > > > > > > I don't think NumPy needs to do anything about warnings. It > > > > > is > > > > > straightforward for libraries that want to use use > > > > > get_array_module() to issue their own warnings before calling > > > > > get_array_module(), if desired. > > > > > > > > > > Or alternatively, if a library is about to add a new > > > > > __array_module__ method, it is straightforward to issue a > > > > > warning > > > > > inside the new __array_module__ method before returning the > > > > > NumPy > > > > > functions. > > > > > > > > > > > > > I don't think this is quite enough. Sebastian points out a > > > > fairly > > > > important issue. One of the main rationales for the whole NEP, > > > > and > > > > the argument in multiple places ( > > > > > > https://numpy.org/neps/nep-0037-array-module.html#opt-in-vs-opt-out-for-users > > > > ) is that it's now opt-in while __array_function__ was opt-out. > > > > This isn't really true - the problem is simply *moved*, from > > > > the > > > > duck array libraries to the array-consuming libraries. The end > > > > user > > > > will still see the backwards incompatible change, with no way > > > > to > > > > turn it off. It will be easier with __array_module__ to warn > > > > users, > > > > but this should be expanded on in the NEP. > > > > > > > > > > Ralf, thanks for sharing your thoughts. > > Sorry, this never made it back to the top of my todo list. > > > > I'm not quite I understand the concerns about backwards > > > incompatibility: > > > 1. The intention is that implementing a __array_module__ method > > > should be backwards compatible with all current uses of NumPy. > > > This > > > satisfies backwards compatibility concerns for an array- > > > implementing > > > library like JAX. > > > 2. In contrast, calling get_array_module() offers no guarantees > > > about > > > backwards compatibility. This seems nearly impossible, because > > > the > > > entire point of the protocol is to make it possible to opt-in to > > > new > > > behavior. > > Indeed, it is nearly impossible. Except if there's a context manager > or > some other control mechanism exposed to the end user. Hence that > should be > a part of the design I think. Otherwise you're just solving something > for > the JAX devs, but not for the scikit-learn/scipy/etc devs who will > then > each have to invent their own wheel for backwards compat. > > So backwards compatibility isn't solved for Scikit-Learn > > > switching to use get_array_module(), and after Scikit-Learn does > > > so, > > > adding __array_module__ to new types of arrays could potentially > > > have > > > backwards incompatible consequences for Scikit-Learn (unless > > > sklearn > > > uses default=None). > > > > > > Are you suggesting just adding something like what I'm writing > > > here > > > into the NEP? Perhaps along with advice to consider issuing > > > warnings > > > inside __array_module__ and falling back to legacy behavior when > > > first implementing it on a new type? > > > > I think that should be sufficient, personally. We could mention > > that > > scikit-learn will likely use a context manager to do this. > > We can also think about providing a global default (which sklearn > > can > > use as its own default if they wish so, but that is reserved to the > > end-user). > > > > +1 > > That would be a small amendment, and I think we could add it even > after > > accepting the NEP as it is. > > > > > We could also potentially make a few changes to make backwards > > > compatibility even easier, by making the protocol less aggressive > > > about assuming that NumPy is a safe fallback. Some non-exclusive > > > options: > > > a. We could switch the default value of "default" on > > > get_array_module() to None, so an exception is raised if nothing > > > implements __array_module__. > > > > I am not sure that I feel switching the default to None makes much > > of a > > difference to be honest. Unless we use it to signal a super strict > > mode > > similar to b. below. > > > > I agree, that doesn't make a difference. > > > > > b. We could includes *all* argument types in "types", not just > > > types > > > that implement __array_module__. NumPy's ndarray.__array_module__ > > > could then recognize and refuse to return an implementation if > > > there > > > are other arguments that might implement __array_module__ in the > > > future (e.g., anything outside the standard library?). > > > > That is a good point, anything that is not NumPy recognized could > > simply be rejected. It does mean that you have to call > > `module.asarray()` manually more often though. > > For `list`, it could also make sense to just add np.ndarray to > > types. > > > > If we want to be conservative, maybe we could also just error > > before > > calling `__array_module__`. Whenever there is something that we do > > not > > know how to interpret force the user to clarify? > > > > > The downside of making either of these choices is that it would > > > potentially make get_array_function() a bit less usable, because > > > it > > > is more likely to fail, e.g., if called on a float, or some > > > custom > > > type that should be treated as a scalar. > > > > Right, although we could relax it later if it seems overly > > annoying. > > > > Interesting point. Not accepting sequences could be considered here. > It may > help a lot with robustness and typing to only accept ndarray, other > objects > with __array__, and scalars. > > > > > > Also, I'm still not sure I agree with the tone of the > > > > discussion on > > > > this topic. It's very heavily inspired by what the JAX devs are > > > > telling you (the NEP still says PyTorch and scipy.sparse as > > > > well, > > > > but that's not true in both cases). If you ask Dask and CuPy > > > > for > > > > example, they're quite happy with __array_function__ and there > > > > haven't been many complaints about backwards compat breakage. > > > > > > > > > > I'm linking to comments you wrote in reference to PyTorch and > > > scipy.sparse in the current draft of the NEP, so I certainly want > > > to > > > make sure that you agree my characterization :). > > > > > > Would it be fair to say: > > > - JAX is reluctant to implement __array_function__ because of > > > concerns about breaking existing code. JAX developers think that > > > when > > > users use NumPy functions on JAX arrays, they are explicitly > > > choosing > > > to convert from JAX to NumPy. This model is fundamentally > > > incompatible __array_function__, which we chose to override the > > > existing numpy namespace. > > agreed > > > - PyTorch and scipy.sparse are not yet in position to implement > > > __array_function__ (due to a lack of a direct implementation of > > > NumPy's API), but these projects take backwards compatibility > > > seriously. > > True. I would say though that scipy.sparse will never implement > either > __array_function__ or array_module__ due to semantic > imcompatibilities (it > acts like np.matrix). So it's kind of irrelevant. And if PyTorch gets > around to adding a numpy-compatible API, they're fine with > __array_function__. > > > > Does "take backwards compatibility seriously" sound about right > > > to > > > you? I'm very open to specific suggestions here. (TensorFlow > > > could > > > probably also be safely added to this second list.) > > > > This will need input from Ralf, my personal main concern is > > backward > > compatibility in libraries: I am pretty sure sklearn would only use > > a > > potential `np.asduckarray` when the user opted in. But in that case > > my > > personal feeling is that the `get_array_module` solution is cleaner > > and > > makes it easier to expand functionality slowly (for libraries). > > > > Two other points: > > > > First, I am wondering if we should add something like a > > `__qualname__` > > to the contract. I.e. a returned module must have a well defined > > `module.__name__` (that is usually already correct), so that > > sklearn > > could do: > > > > module = np.get_array_module(*arrays) > > if module.__name__ not in ("numpy", "sparse"): > > raise TypeError("Currently only numpy and sparse are > > supported") > > > > if they wish so (that is trivial, but if you return a class acting > > as a > > module it may be important). > > > > Second, we have to make progress on whether or not the "restricted" > > namespace idea should have priority. My personal opinion is > > tending > > strongly towards no. > > > > I think it's quite important, and __array_module__ gives a chance to > introduce it. However, it's not ready - so I'd say that if > __array_module__ > implementation is ready and there's no well-defined restricted API > proposal > (I expect to have that in August), then we can move ahead without it. > > The NumPy version should normally be older than other libraries, and > if > > NumPy updates the API so do the downstream implementers. > > E.g. dask may have to provide multiple version of the same function > > depending on the installed NumPy version, but that seems OK to me? > > That seems unworkable, and I don't think any libraries do this. > Coupling > the semantics of a single Dask function to the installed numpy > version is > odd. Is it all that odd? Libraries (not array providers) already need to test for NumPy version occasionally due to API changes, so they also have two versions of the same thing around (e.g. a fallback). This simply would move the burden to the array-object implementer to some degree. Assume that we have a versioned API in some form or another, it seems to me we either require: module = np.get_array_module(..., api_version=2) or define `module.__api_version__`. Where the latter means that sklearn/SciPy may have to check `__api_version__` on every function call, while currently such checks usually happen at import time. On the other hand, the former means that sklearn/scipy can only opt-in to new API after 3+ years easily? Saying that the NumPy version is what pins the api-version, is not much more than assuming/requiring that NumPy will be the least up-to-date package? Of course it is unworkable to get 100% right in practice but are you saying that because it seems like an impractical approach, or because the API surface is currently so large that, of course, we will never get it 100% right (but that is generally true, nobody will be able to implement NumPy 100% compatible)? `__array_function__` has same issue? If we change our API, Dask has to catch up. If SciPy expects it to be the old version though (based on the NumPy import) it will incorrectly assume the old-api will be used. - Sebastian > > It is just as downstream libraries currently have to support multiple > > NumPy versions. > > We could add a contract that the first time `get_array_module` is > > used > > to e.g. get the dask namespace and the NumPy version is too new, a > > warning should be given. > > > > I think we can't solve this until we have a well-defined API, which > is the > restricted API + API versioning. Until then it just remains with the > current status, compatibility is implementation-defined. > > Cheers, > Ralf > > > > The practical thing seems to me that we ignore this for the moment > > (as > > something we can do later on)? If there is missing API, in most > > cases > > an AttributeError will be raised which could provide some > > additional > > information to the user? > > The only alternative seems the complete opposite?: Create a new > > module, > > and make even NumPy only one of the implementers of that new > > (restricted) module. That may be cleaner, but I fear that it is > > impractical to be honest. > > > > > > I will put this on the agenda for tomorrow, even if we discuss it > > only > > very briefly. My feeling (and hope) is that we are nearing a point > > where we can make a final decision. > > > > Best, > > > > Sebastian > > > > > > > Best, > > > Stephan > > > > > > _______________________________________________ > > > NumPy-Discussion mailing list > > > NumPy-Discussion at python.org > > > https://mail.python.org/mailman/listinfo/numpy-discussion > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at python.org > > https://mail.python.org/mailman/listinfo/numpy-discussion > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: This is a digitally signed message part URL: From sebastian at sipsolutions.net Thu Apr 9 23:11:59 2020 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Thu, 09 Apr 2020 22:11:59 -0500 Subject: [Numpy-discussion] NEP 37: A dispatch protocol for NumPy-like modules In-Reply-To: References: <21692339-9f4b-029c-d422-ea549acbe6c3@gmail.com> <1cfce715d48b847e91739c2a56b9750f15b1958f.camel@sipsolutions.net> <39784d9f-17c0-d5b4-4575-3ad2826ea3ce@gmail.com> Message-ID: On Thu, 2020-04-09 at 13:52 +0200, Ralf Gommers wrote: > On Thu, Apr 9, 2020 at 12:02 AM Sebastian Berg < > sebastian at sipsolutions.net> > wrote: > > > > > I think it would be nice to have a separate NEP 37 implementation > outside > of NumPy to play with. Unlike __array_function__, I don't think it > has to > go into NumPy immediately. This avoids the whole "experimental API" > issue, Fair enough, I have created a hopefully working start here: https://github.com/seberg/numpy_dispatch (this is not tested much at all yet, so it could be very buggy). There are a couple of additional features that I added. 1. A global opt-in (it is impossible to opt-out once opted in!) 2. A local opt-in (to guarantee opt-in if global flag is not set) 3. I added features to allow transitioning:: get_array_module(*arrays, modules="numpy", future_modules=("dask.array", "cupy"), fallback="warn") Will give FutureWarning/DeprecationWarning where necessary, in the above "numpy" is supported, dask and cupy are supported but not enabled by default. `None` works to say "all modules". Once the transition is done, just move dask and cupy into `modules` and remove `fallback=None`. 4. If there are FutureWarnings/DeprecationWarnigs the user needs to be able to opt-in to future behaviour. Opting out can be done by casting inputs. Opting-in is done using:: with future_dispatch_behavior(): call_library_function() Obviously, we may not want these features, but I was curious how we could provide the tools to allow clean transitions. Both context managers should be thread-safe, but I did not test that. The best try would probably be cupy and sklearn again, so I will give a ping on the sklearn PR. To make that easier, I tried to hack a bit of a "util" to allow testing (please scroll down on the readme on github). Best, Sebastian > it would be quite useful to test this with, e.g., CuPy + scikit-learn > without being stuck with any decisions in a released NumPy version. > Also > makes switching on/off very easy for users, just (don't) `pip install > numpy-array-module`. > > Cheers, > Ralf -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: This is a digitally signed message part URL: From sebastian at sipsolutions.net Thu Apr 9 23:35:38 2020 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Thu, 09 Apr 2020 22:35:38 -0500 Subject: [Numpy-discussion] NEP 37: A dispatch protocol for NumPy-like modules In-Reply-To: References: <21692339-9f4b-029c-d422-ea549acbe6c3@gmail.com> <1cfce715d48b847e91739c2a56b9750f15b1958f.camel@sipsolutions.net> <39784d9f-17c0-d5b4-4575-3ad2826ea3ce@gmail.com> Message-ID: On Thu, 2020-04-09 at 22:11 -0500, Sebastian Berg wrote: > On Thu, 2020-04-09 at 13:52 +0200, Ralf Gommers wrote: > > On Thu, Apr 9, 2020 at 12:02 AM Sebastian Berg < > > sebastian at sipsolutions.net> > > wrote: > > > > > > > I think it would be nice to have a separate NEP 37 implementation > > outside > > of NumPy to play with. Unlike __array_function__, I don't think it > > has to > > go into NumPy immediately. This avoids the whole "experimental API" > > issue, > > Fair enough, I have created a hopefully working start here: > > https://github.com/seberg/numpy_dispatch > > (this is not tested much at all yet, so it could be very buggy). > > There are a couple of additional features that I added. > > 1. A global opt-in (it is impossible to opt-out once opted in!) > 2. A local opt-in (to guarantee opt-in if global flag is not set) > 3. I added features to allow transitioning:: > > get_array_module(*arrays, modules="numpy", > future_modules=("dask.array", "cupy"), fallback="warn") There is no immediate need to put modules and future_modules and fallback in there. The main convenience it gives is that we can more easily provide the user to opt-in context manager to opt-in to the new behaviour. Without that, libraries will have to do these checks, that is not difficult. But if we wish to provide a context manager to opt all of that in, the library will need additional API to query our context manager state. Or every library needs their own solution, which does not seem desirable (although it means you cannot opt-in internal functions accidentally to newer behaviour). - Sebastian > > Will give FutureWarning/DeprecationWarning where necessary, in the > above "numpy" is supported, dask and cupy are supported but not > enabled by default. `None` works to say "all modules". > Once the transition is done, just move dask and cupy into > `modules` > and remove `fallback=None`. > 4. If there are FutureWarnings/DeprecationWarnigs the user needs to > be > able to opt-in to future behaviour. Opting out can be done by > casting inputs. Opting-in is done using:: > > with future_dispatch_behavior(): > call_library_function() > > Obviously, we may not want these features, but I was curious how we > could provide the tools to allow clean transitions. > > Both context managers should be thread-safe, but I did not test that. > > The best try would probably be cupy and sklearn again, so I will give > a > ping on the sklearn PR. To make that easier, I tried to hack a bit of > a > "util" to allow testing (please scroll down on the readme on github). > > Best, > > Sebastian > > > > > it would be quite useful to test this with, e.g., CuPy + scikit- > > learn > > without being stuck with any decisions in a released NumPy version. > > Also > > makes switching on/off very easy for users, just (don't) `pip > > install > > numpy-array-module`. > > > > Cheers, > > Ralf > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: This is a digitally signed message part URL: From ralf.gommers at gmail.com Fri Apr 10 06:11:51 2020 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Fri, 10 Apr 2020 12:11:51 +0200 Subject: [Numpy-discussion] NEP 37: A dispatch protocol for NumPy-like modules In-Reply-To: References: <21692339-9f4b-029c-d422-ea549acbe6c3@gmail.com> <1cfce715d48b847e91739c2a56b9750f15b1958f.camel@sipsolutions.net> <90692cc4a2f009b061e55d8fe9b118f709ff1375.camel@sipsolutions.net> Message-ID: On Thu, Apr 9, 2020 at 6:54 PM Sebastian Berg wrote: > On Thu, 2020-04-09 at 13:52 +0200, Ralf Gommers wrote: > > On Wed, Mar 4, 2020 at 1:22 AM Sebastian Berg < > > sebastian at sipsolutions.net> > > wrote: > > > > The NumPy version should normally be older than other libraries, and > > if > > > NumPy updates the API so do the downstream implementers. > > > E.g. dask may have to provide multiple version of the same function > > > depending on the installed NumPy version, but that seems OK to me? > > > > That seems unworkable, and I don't think any libraries do this. > > Coupling > > the semantics of a single Dask function to the installed numpy > > version is > > odd. > > Is it all that odd? Libraries (not array providers) already need to > test for NumPy version occasionally due to API changes, so they also > have two versions of the same thing around (e.g. a fallback). > That's completely different, it's internal to a library and not visible to end users via different signatures/behavior. This simply would move the burden to the array-object implementer to > some degree. Assume that we have a versioned API in some form or > another, it seems to me we either require: > > module = np.get_array_module(..., api_version=2) > Yes, this is the version I was thinking about. > or define `module.__api_version__`. > > Where the latter means that sklearn/SciPy may have to check > `__api_version__` on every function call, while currently such checks > usually happen at import time. On the other hand, the former means that > sklearn/scipy can only opt-in to new API after 3+ years easily? > That's anyway the case, has very little to do with API versioning I think - it's simply determined by minimum NumPy version supported. > Saying that the NumPy version is what pins the api-version, is not much > more than assuming/requiring that NumPy will be the least up-to-date > package? > > Of course it is unworkable to get 100% right in practice but are you > saying that because it seems like an impractical approach, Yes this, impractical and undesired. or because > the API surface is currently so large that, of course, we will never > get it 100% right (but that is generally true, nobody will be able to > implement NumPy 100% compatible)? > That's true too, we *don't want* anyone to start adding compat features for outdated or "wish we could deprecate" NumPy features. > `__array_function__` has same issue? If we change our API, Dask has to > catch up. Yes, that's true. The restricted API should be more stable than the whole NumPy API, otherwise no one will be able to be fully compatible. If SciPy expects it to be the old version though (based on > the NumPy import) it will incorrectly assume the old-api will be used. > That's not incorrect unless it's a backwards-incompatible change, which should be rare. Cheers, Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at gmail.com Fri Apr 10 06:27:28 2020 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Fri, 10 Apr 2020 12:27:28 +0200 Subject: [Numpy-discussion] NEP 37: A dispatch protocol for NumPy-like modules In-Reply-To: References: <21692339-9f4b-029c-d422-ea549acbe6c3@gmail.com> <1cfce715d48b847e91739c2a56b9750f15b1958f.camel@sipsolutions.net> <39784d9f-17c0-d5b4-4575-3ad2826ea3ce@gmail.com> Message-ID: On Fri, Apr 10, 2020 at 5:17 AM Sebastian Berg wrote: > On Thu, 2020-04-09 at 13:52 +0200, Ralf Gommers wrote: > > On Thu, Apr 9, 2020 at 12:02 AM Sebastian Berg < > > sebastian at sipsolutions.net> > > wrote: > > > > > > > > > > I think it would be nice to have a separate NEP 37 implementation > > outside > > of NumPy to play with. Unlike __array_function__, I don't think it > > has to > > go into NumPy immediately. This avoids the whole "experimental API" > > issue, > > Fair enough, I have created a hopefully working start here: > > https://github.com/seberg/numpy_dispatch > > (this is not tested much at all yet, so it could be very buggy). > Thanks! > There are a couple of additional features that I added. > > 1. A global opt-in (it is impossible to opt-out once opted in!) > 2. A local opt-in (to guarantee opt-in if global flag is not set) > 3. I added features to allow transitioning:: > > get_array_module(*arrays, modules="numpy", > future_modules=("dask.array", "cupy"), fallback="warn") > > Will give FutureWarning/DeprecationWarning where necessary, in the > above "numpy" is supported, dask and cupy are supported but not > enabled by default. `None` works to say "all modules". > Once the transition is done, just move dask and cupy into `modules` > and remove `fallback=None`. > So future_modules explicitly excludes compatible libraries that are not listed. Why would you want anyone to do that? I don't understand "supported but not enabled", and it looks undesirable to me to special-case any library in this mechanism. Cheers, Ralf 4. If there are FutureWarnings/DeprecationWarnigs the user needs to be > able to opt-in to future behaviour. Opting out can be done by > casting inputs. Opting-in is done using:: > > with future_dispatch_behavior(): > call_library_function() > > Obviously, we may not want these features, but I was curious how we > could provide the tools to allow clean transitions. > > Both context managers should be thread-safe, but I did not test that. > > The best try would probably be cupy and sklearn again, so I will give a > ping on the sklearn PR. To make that easier, I tried to hack a bit of a > "util" to allow testing (please scroll down on the readme on github). > > Best, > > Sebastian > > > > > it would be quite useful to test this with, e.g., CuPy + scikit-learn > > without being stuck with any decisions in a released NumPy version. > > Also > > makes switching on/off very easy for users, just (don't) `pip install > > numpy-array-module`. > > > > Cheers, > > Ralf > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From sebastian at sipsolutions.net Fri Apr 10 09:01:32 2020 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Fri, 10 Apr 2020 08:01:32 -0500 Subject: [Numpy-discussion] NEP 37: A dispatch protocol for NumPy-like modules In-Reply-To: References: <21692339-9f4b-029c-d422-ea549acbe6c3@gmail.com> <1cfce715d48b847e91739c2a56b9750f15b1958f.camel@sipsolutions.net> <39784d9f-17c0-d5b4-4575-3ad2826ea3ce@gmail.com> Message-ID: <4fe14beb671d2d8201321f2d10e9e6913c3fb723.camel@sipsolutions.net> On Fri, 2020-04-10 at 12:27 +0200, Ralf Gommers wrote: > > 3. I added features to allow transitioning:: > > > > get_array_module(*arrays, modules="numpy", > > future_modules=("dask.array", "cupy"), fallback="warn") > > > > Will give FutureWarning/DeprecationWarning where necessary, in > the > > above "numpy" is supported, dask and cupy are supported but not > > enabled by default. `None` works to say "all modules". > > Once the transition is done, just move dask and cupy into > `modules` > > and remove `fallback=None`. > > > > So future_modules explicitly excludes compatible libraries that are > not > listed. Why would you want anyone to do that? I don't understand > "supported > but not enabled", and it looks undesirable to me to special-case any > library in this mechanism. We hav two (or three) types of modules (either could be "all"). 1. Supported modules that we dispatch to. 2. Modules that are supported but will be dispatched to by default only in the future. So if the user got a future_module, they will get a FutureWarning. They have to chose to cast the inputs or opt-in to the future behaviour. 3. Unsupported modules: If this is resolved it is an error. I currently assume that this does not need to be a negative list. You need to distinguish those somehow, since you need a way to transition. Even if you expect that modules would always be *all* modules, `numpy` is still the only accepted module originally. So, as I said, `future_modules` is only about transitioning and enabling `FutureWarning`s. Does not have to live there, but we need a way to transition. These options do not have to be handled by us, it only helps here with having context managers to opt-in to new behaviour, and maybe to get an idea for how transitions can look like. Alternatively, we could all to create project specific context managers to do the same and avoid possible scoping issues even more. - Sebastian -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: This is a digitally signed message part URL: From ralf.gommers at gmail.com Fri Apr 10 12:19:04 2020 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Fri, 10 Apr 2020 18:19:04 +0200 Subject: [Numpy-discussion] NEP 37: A dispatch protocol for NumPy-like modules In-Reply-To: <4fe14beb671d2d8201321f2d10e9e6913c3fb723.camel@sipsolutions.net> References: <21692339-9f4b-029c-d422-ea549acbe6c3@gmail.com> <1cfce715d48b847e91739c2a56b9750f15b1958f.camel@sipsolutions.net> <39784d9f-17c0-d5b4-4575-3ad2826ea3ce@gmail.com> <4fe14beb671d2d8201321f2d10e9e6913c3fb723.camel@sipsolutions.net> Message-ID: On Fri, Apr 10, 2020 at 3:03 PM Sebastian Berg wrote: > On Fri, 2020-04-10 at 12:27 +0200, Ralf Gommers wrote: > > > 3. I added features to allow transitioning:: > > > > > > get_array_module(*arrays, modules="numpy", > > > future_modules=("dask.array", "cupy"), fallback="warn") > > > > > > Will give FutureWarning/DeprecationWarning where necessary, in > > the > > > above "numpy" is supported, dask and cupy are supported but not > > > enabled by default. `None` works to say "all modules". > > > Once the transition is done, just move dask and cupy into > > `modules` > > > and remove `fallback=None`. > > > > > > > So future_modules explicitly excludes compatible libraries that are > > not > > listed. Why would you want anyone to do that? I don't understand > > "supported > > but not enabled", and it looks undesirable to me to special-case any > > library in this mechanism. > > We hav two (or three) types of modules (either could be "all"). > I think we only have modules that implement __array_module__, and ones that don't. > 1. Supported modules that we dispatch to. > 2. Modules that are supported but will be dispatched to by default only > in the future. So if the user got a future_module, they will get a > FutureWarning. They have to chose to cast the inputs or opt-in to > the future behaviour. > 3. Unsupported modules: If this is resolved it is an error. I currently > assume that this does not need to be a negative list. > > You need to distinguish those somehow, since you need a way to > transition. Even if you expect that modules would always be *all* > modules, `numpy` is still the only accepted module originally. > > So, as I said, `future_modules` is only about transitioning and > enabling `FutureWarning`s. Does not have to live there, but we need a > way to transition. > Sorry, I still don't get it - transition what? You seem to be operating on the assumption that the users of get_array_module want or need to control which numpy-like libraries they allow and which they don't. That seems fundamentally wrong. How would you treat, for example, an array library that is developed privately inside some company? Cheers, Ralf > These options do not have to be handled by us, it only helps here with > having context managers to opt-in to new behaviour, and maybe to get an > idea for how transitions can look like. > Alternatively, we could all to create project specific context managers > to do the same and avoid possible scoping issues even more. > > - Sebastian > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From sebastian at sipsolutions.net Fri Apr 10 13:17:35 2020 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Fri, 10 Apr 2020 12:17:35 -0500 Subject: [Numpy-discussion] NEP 37: A dispatch protocol for NumPy-like modules In-Reply-To: References: <21692339-9f4b-029c-d422-ea549acbe6c3@gmail.com> <1cfce715d48b847e91739c2a56b9750f15b1958f.camel@sipsolutions.net> <39784d9f-17c0-d5b4-4575-3ad2826ea3ce@gmail.com> <4fe14beb671d2d8201321f2d10e9e6913c3fb723.camel@sipsolutions.net> Message-ID: <49b5e334439bd7acded63dd6acf085caffb6f984.camel@sipsolutions.net> On Fri, 2020-04-10 at 18:19 +0200, Ralf Gommers wrote: > On Fri, Apr 10, 2020 at 3:03 PM Sebastian Berg < > sebastian at sipsolutions.net> > wrote: > > > On Fri, 2020-04-10 at 12:27 +0200, Ralf Gommers wrote: > > > > 3. I added features to allow transitioning:: > > > > > > > > get_array_module(*arrays, modules="numpy", > > > > future_modules=("dask.array", "cupy"), > > > > fallback="warn") > > > > > > > > Will give FutureWarning/DeprecationWarning where necessary, > > > > in > > > the > > > > above "numpy" is supported, dask and cupy are supported but > > > > not > > > > enabled by default. `None` works to say "all modules". > > > > Once the transition is done, just move dask and cupy into > > > `modules` > > > > and remove `fallback=None`. > > > > > > > > > > So future_modules explicitly excludes compatible libraries that > > > are > > > not > > > listed. Why would you want anyone to do that? I don't understand > > > "supported > > > but not enabled", and it looks undesirable to me to special-case > > > any > > > library in this mechanism. > > > > We hav two (or three) types of modules (either could be "all"). > > > > I think we only have modules that implement __array_module__, and > ones that > don't. > > > > 1. Supported modules that we dispatch to. > > 2. Modules that are supported but will be dispatched to by default > > only > > in the future. So if the user got a future_module, they will get > > a > > FutureWarning. They have to chose to cast the inputs or opt-in > > to > > the future behaviour. > > 3. Unsupported modules: If this is resolved it is an error. I > > currently > > assume that this does not need to be a negative list. > > > > You need to distinguish those somehow, since you need a way to > > transition. Even if you expect that modules would always be *all* > > modules, `numpy` is still the only accepted module originally. > > > > So, as I said, `future_modules` is only about transitioning and > > enabling `FutureWarning`s. Does not have to live there, but we need > > a > > way to transition. > > > > Sorry, I still don't get it - transition what? You seem to be > operating on > the assumption that the users of get_array_module want or need to > control > which numpy-like libraries they allow and which they don't. That > seems > fundamentally wrong. How would you treat, for example, an array > library > that is developed privately inside some company? > Well, you still need to transition from NumPy -> allow everything, so for now please just ignore that part if you like and use/assume: get_array_module(..., modules="numpy", future_modules=None, fallback="warn") during the transition, and: get_array_module(...) after it. After all this is a draft-project right now, so it is just as much about trying out what can be done. It is not unlikely that this transition burden will be put more on the library in any case, but it shows that it can be done. As to my "fundamentally wrong" assumption. Should libraries goal be to support everything? Definitely! But... I do not want to make that decision for libraries, so I if library authors tell me that they have no interest in it, all the better. Until then I am more than happy to keep that option on the table. If just as a thought for library authors to consider their options. Possible, brainstorming, reasons could be: 1. Say I currently heavily use cython code, so I am limited to NumPy (or at least arrays that can expose a buffer/`__array_interface__`). Now if someone adds a CUDA implementation, I would support cupy arrays, but not distributed arrays. I admit maybe checking that at function entry like this is the wrong approach there. 2. To limit to certain types is to say "We know (and test) that our library works with xarray, Dask, NumPy, and CuPy". Now you can say that is also a misconception, because if you stick to just NumPy API you should know that it will "just work" with everything. But in practice it seems like it might happen? In that case you may want to actually allow any odd array and just put a warning, a bit like the transition warnings I put in for testing. --- There are two other things I am wondering about. 1. Subclasses may want to return their superclasses module (even by default?), in which case their behaviour depends on the superclass module behaviour. Further a library would need to use `np.asanyarray()` to prevent the subclass from being cast to the superclass. 2. There is one transition that does not quite exists. What if an array-like starts implementing or expands `array-module`? That seems fine, but in that case the array-like will have to provide the `opt-in` context manager with a FutureWarning. The transition from no `__array_module__` to implementing it may need some thought, but I expect it is fine: The array-like simply always gives a FutureWarning, although it cannot know what will actually happen in the future (no change, error, or array-like takes control). - Sebastian > Cheers, > Ralf > > > > > These options do not have to be handled by us, it only helps here > > with > > having context managers to opt-in to new behaviour, and maybe to > > get an > > idea for how transitions can look like. > > Alternatively, we could all to create project specific context > > managers > > to do the same and avoid possible scoping issues even more. > > > > - Sebastian > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at python.org > > https://mail.python.org/mailman/listinfo/numpy-discussion > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: This is a digitally signed message part URL: From ralf.gommers at gmail.com Sun Apr 12 07:00:05 2020 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Sun, 12 Apr 2020 13:00:05 +0200 Subject: [Numpy-discussion] NEP 37: A dispatch protocol for NumPy-like modules In-Reply-To: <49b5e334439bd7acded63dd6acf085caffb6f984.camel@sipsolutions.net> References: <21692339-9f4b-029c-d422-ea549acbe6c3@gmail.com> <1cfce715d48b847e91739c2a56b9750f15b1958f.camel@sipsolutions.net> <39784d9f-17c0-d5b4-4575-3ad2826ea3ce@gmail.com> <4fe14beb671d2d8201321f2d10e9e6913c3fb723.camel@sipsolutions.net> <49b5e334439bd7acded63dd6acf085caffb6f984.camel@sipsolutions.net> Message-ID: On Fri, Apr 10, 2020 at 7:18 PM Sebastian Berg wrote: > On Fri, 2020-04-10 at 18:19 +0200, Ralf Gommers wrote: > > On Fri, Apr 10, 2020 at 3:03 PM Sebastian Berg < > > sebastian at sipsolutions.net> > > wrote: > > > > > On Fri, 2020-04-10 at 12:27 +0200, Ralf Gommers wrote: > > > > > 3. I added features to allow transitioning:: > > > > > > > > > > get_array_module(*arrays, modules="numpy", > > > > > future_modules=("dask.array", "cupy"), > > > > > fallback="warn") > > > > > > > > > > Will give FutureWarning/DeprecationWarning where necessary, > > > > > in > > > > the > > > > > above "numpy" is supported, dask and cupy are supported but > > > > > not > > > > > enabled by default. `None` works to say "all modules". > > > > > Once the transition is done, just move dask and cupy into > > > > `modules` > > > > > and remove `fallback=None`. > > > > > > > > > > > > > So future_modules explicitly excludes compatible libraries that > > > > are > > > > not > > > > listed. Why would you want anyone to do that? I don't understand > > > > "supported > > > > but not enabled", and it looks undesirable to me to special-case > > > > any > > > > library in this mechanism. > > > > > > We hav two (or three) types of modules (either could be "all"). > > > > > > > I think we only have modules that implement __array_module__, and > > ones that > > don't. > > > > > > > 1. Supported modules that we dispatch to. > > > 2. Modules that are supported but will be dispatched to by default > > > only > > > in the future. So if the user got a future_module, they will get > > > a > > > FutureWarning. They have to chose to cast the inputs or opt-in > > > to > > > the future behaviour. > > > 3. Unsupported modules: If this is resolved it is an error. I > > > currently > > > assume that this does not need to be a negative list. > > > > > > You need to distinguish those somehow, since you need a way to > > > transition. Even if you expect that modules would always be *all* > > > modules, `numpy` is still the only accepted module originally. > > > > > > So, as I said, `future_modules` is only about transitioning and > > > enabling `FutureWarning`s. Does not have to live there, but we need > > > a > > > way to transition. > > > > > > > Sorry, I still don't get it - transition what? You seem to be > > operating on > > the assumption that the users of get_array_module want or need to > > control > > which numpy-like libraries they allow and which they don't. That > > seems > > fundamentally wrong. How would you treat, for example, an array > > library > > that is developed privately inside some company? > > > > Well, you still need to transition from NumPy -> allow everything, so > for now please just ignore that part if you like and use/assume: > > get_array_module(..., > modules="numpy", future_modules=None, fallback="warn") > > during the transition, and: > > get_array_module(...) > > after it. After all this is a draft-project right now, so it is just as > much about trying out what can be done. > It is not unlikely that this transition burden will be put more on the > library in any case, but it shows that it can be done. > > > As to my "fundamentally wrong" assumption. Should libraries goal be to > support everything? Definitely! > > But... I do not want to make that decision for libraries, so I if > library authors tell me that they have no interest in it, all the > better. Until then I am more than happy to keep that option on the > table. If just as a thought for library authors to consider their > options. > > Possible, brainstorming, reasons could be: > > 1. Say I currently heavily use cython code, so I am limited to NumPy > (or at least arrays that can expose a buffer/`__array_interface__`). > Now if someone adds a CUDA implementation, I would support cupy arrays, > but not distributed arrays. > I admit maybe checking that at function entry like this is the wrong > approach there. > If you need a particular feature, then checking for that feature (e.g. `hasattr(__array_interface__)`, and same for __cuda_array_interface__) seems like the right thing to do. Ralf > 2. To limit to certain types is to say "We know (and test) that our > library works with xarray, Dask, NumPy, and CuPy". Now you can say that > is also a misconception, because if you stick to just NumPy API you > should know that it will "just work" with everything. But in practice > it seems like it might happen? > In that case you may want to actually allow any odd array and just put > a warning, a bit like the transition warnings I put in for testing. > > > --- > > There are two other things I am wondering about. > > 1. Subclasses may want to return their superclasses module (even by > default?), in which case their behaviour depends on the superclass > module behaviour. Further a library would need to use `np.asanyarray()` > to prevent the subclass from being cast to the superclass. > > 2. There is one transition that does not quite exists. What if an > array-like starts implementing or expands `array-module`? > That seems fine, but in that case the array-like will have to provide > the `opt-in` context manager with a FutureWarning. > The transition from no `__array_module__` to implementing it may need > some thought, but I expect it is fine: The array-like simply always > gives a FutureWarning, although it cannot know what will actually > happen in the future (no change, error, or array-like takes control). > > - Sebastian > > > > Cheers, > > Ralf > > > > > > > > > These options do not have to be handled by us, it only helps here > > > with > > > having context managers to opt-in to new behaviour, and maybe to > > > get an > > > idea for how transitions can look like. > > > Alternatively, we could all to create project specific context > > > managers > > > to do the same and avoid possible scoping issues even more. > > > > > > - Sebastian > > > _______________________________________________ > > > NumPy-Discussion mailing list > > > NumPy-Discussion at python.org > > > https://mail.python.org/mailman/listinfo/numpy-discussion > > > > > > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at python.org > > https://mail.python.org/mailman/listinfo/numpy-discussion > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From sebastian at sipsolutions.net Tue Apr 14 17:56:05 2020 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Tue, 14 Apr 2020 16:56:05 -0500 Subject: [Numpy-discussion] (Two hours later!) NumPy Community Meeting Wednesday Message-ID: Hi all, There will be a NumPy Community meeting Wednesday April 15th at 1pm Pacific Time (20:00 UTC). Everyone is invited and encouraged to join in and edit the work-in-progress meeting topics and notes: https://hackmd.io/76o-IxCjQX2mOXO_wwkcpg?both Best wishes Sebastian -------------- next part -------------- _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion at python.org https://mail.python.org/mailman/listinfo/numpy-discussion -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: This is a digitally signed message part URL: From josh.craig.wilson at gmail.com Sun Apr 19 14:46:50 2020 From: josh.craig.wilson at gmail.com (Joshua Wilson) Date: Sun, 19 Apr 2020 11:46:50 -0700 Subject: [Numpy-discussion] Using scalar constructors to produce arrays Message-ID: Over in the NumPy stubs there's an issue https://github.com/numpy/numpy-stubs/issues/41 which points out that you can in fact do something like ``` np.float32([1.0, 0.0, 0.0]) ``` to construct an ndarray of float32. It seems to me that though you can do that, it is not a best practice, and one should instead do ``` np.array([1.0, 0.0, 0.0], dtype=np.float32) ``` Do people agree with that assessment of what the best practice is? If so, it seems to make the most sense to continue banning constructs like `np.float32([1.0, 0.0, 0.0])` in the type stubs (as they should promote making easy-to-understand, scalable NumPy code). - Josh From ralf.gommers at gmail.com Sun Apr 19 15:07:56 2020 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Sun, 19 Apr 2020 21:07:56 +0200 Subject: [Numpy-discussion] Using scalar constructors to produce arrays In-Reply-To: References: Message-ID: On Sun, Apr 19, 2020 at 8:47 PM Joshua Wilson wrote: > Over in the NumPy stubs there's an issue > > https://github.com/numpy/numpy-stubs/issues/41 > > which points out that you can in fact do something like > > ``` > np.float32([1.0, 0.0, 0.0]) > ``` > > to construct an ndarray of float32. It seems to me that though you can > do that, it is not a best practice, and one should instead do > > ``` > np.array([1.0, 0.0, 0.0], dtype=np.float32) > ``` > > Do people agree with that assessment of what the best practice is? If > so, it seems to make the most sense to continue banning constructs > like `np.float32([1.0, 0.0, 0.0])` in the type stubs (as they should > promote making easy-to-understand, scalable NumPy code). > +1 for banning that construct, that's really ugly Cheers, Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From sebastian at sipsolutions.net Sun Apr 19 15:16:36 2020 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Sun, 19 Apr 2020 14:16:36 -0500 Subject: [Numpy-discussion] Using scalar constructors to produce arrays In-Reply-To: References: Message-ID: <336a6feb35493d0a503bc500b4d7c757fc62df64.camel@sipsolutions.net> On Sun, 2020-04-19 at 21:07 +0200, Ralf Gommers wrote: > On Sun, Apr 19, 2020 at 8:47 PM Joshua Wilson < > josh.craig.wilson at gmail.com> > wrote: > > > Over in the NumPy stubs there's an issue > > > > https://github.com/numpy/numpy-stubs/issues/41 > > > > which points out that you can in fact do something like > > > > ``` > > np.float32([1.0, 0.0, 0.0]) > > ``` > > > > to construct an ndarray of float32. It seems to me that though you > > can > > do that, it is not a best practice, and one should instead do > > > > ``` > > np.array([1.0, 0.0, 0.0], dtype=np.float32) > > ``` > > > > Do people agree with that assessment of what the best practice is? > > If > > so, it seems to make the most sense to continue banning constructs > > like `np.float32([1.0, 0.0, 0.0])` in the type stubs (as they > > should > > promote making easy-to-understand, scalable NumPy code). > > > > +1 for banning that construct, that's really ugly > I personally always considered it bad-practice. Unfortunately, I think it may not be be uncommon use, so I am not sure we should spend our deprecation chips/pain on it (if someone wants to try, we can see). But at least in my opinion it should not be advertised or used in docs/tutorials, and thus also not typed. - Sebastian > Cheers, > Ralf > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: This is a digitally signed message part URL: From charlesr.harris at gmail.com Sun Apr 19 16:44:05 2020 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sun, 19 Apr 2020 14:44:05 -0600 Subject: [Numpy-discussion] NumPy 1.18.3 released. Message-ID: Hi All, On behalf of the NumPy team I am pleased to announce that NumPy 1.18.3 has been released. This release contains various bug/regression fixes for the 1.18 series The Python versions supported in this release are 3.5-3.8. Downstream developers should use Cython >= 0.29.15 for Python 3.8 support and OpenBLAS >= 3.7 to avoid errors on the Skylake architecture. Wheels for this release can be downloaded from PyPI , source archives and release notes are available from Github . *Highlights* Fix for the method='eigh' and method='cholesky' options in numpy.random.multivariate_normal. Those were producing samples from the wrong distribution. *Contributors* A total of 6 people contributed to this release. People with a "+" by their names contributed a patch for the first time. - Charles Harris - Max Balandat + - @Mibu287 + - Pan Jan + - Sebastian Berg - @panpiort8 + *Pull requests merged* A total of 5 pull requests were merged for this release. - #15916: BUG: Fix eigh and cholesky methods of numpy.random.multivariate_normal - #15929: BUG,MAINT: Remove incorrect special case in string to number... - #15930: BUG: Guarantee array is in valid state after memory error occurs... - #15954: BUG: Check that pvals is 1D in _generator.multinomial. - #16017: BUG: Alpha parameter must be 1D in _generator.dirichlet Cheers, Charles Harris -------------- next part -------------- An HTML attachment was scrubbed... URL: From warren.weckesser at gmail.com Sun Apr 19 17:46:02 2020 From: warren.weckesser at gmail.com (Warren Weckesser) Date: Sun, 19 Apr 2020 17:46:02 -0400 Subject: [Numpy-discussion] [SciPy-Dev] NumPy 1.18.3 released. In-Reply-To: References: Message-ID: On 4/19/20, Charles R Harris wrote: > Hi All, > > On behalf of the NumPy team I am pleased to announce that NumPy 1.18.3 has > been released. This release contains various bug/regression fixes for the > 1.18 series Thanks Chuck! Warren > > The Python versions supported in this release are 3.5-3.8. Downstream > developers should use Cython >= 0.29.15 for Python 3.8 support and OpenBLAS >>= 3.7 to avoid errors on the Skylake architecture. Wheels for this > release can be downloaded from PyPI > , > source archives and release notes are available from Github > . > > *Highlights* > > Fix for the method='eigh' and method='cholesky' options in > numpy.random.multivariate_normal. Those were producing samples from the > wrong distribution. > > *Contributors* > > A total of 6 people contributed to this release. People with a "+" by > their > names contributed a patch for the first time. > > - Charles Harris > - Max Balandat + > - @Mibu287 + > - Pan Jan + > - Sebastian Berg > - @panpiort8 + > > > > *Pull requests merged* > A total of 5 pull requests were merged for this release. > > - #15916: BUG: Fix eigh and cholesky methods of > numpy.random.multivariate_normal > - #15929: BUG,MAINT: Remove incorrect special case in string to > number... > - #15930: BUG: Guarantee array is in valid state after memory error > occurs... > - #15954: BUG: Check that pvals is 1D in _generator.multinomial. > - #16017: BUG: Alpha parameter must be 1D in _generator.dirichlet > > > Cheers, > > Charles Harris > From melissawm at gmail.com Mon Apr 20 13:15:49 2020 From: melissawm at gmail.com (=?UTF-8?Q?Melissa_Mendon=C3=A7a?=) Date: Mon, 20 Apr 2020 14:15:49 -0300 Subject: [Numpy-discussion] Google Season of Docs Ideas Message-ID: Hello all, As some of you may know, we are aiming to participate in the Google Season of Docs program [1] again this year. The deadline for applications from open source organizations is May 4, so I've started on a proposal (heavily based on last year's version) here: https://github.com/numpy/numpy/wiki/Google-Season-of-Docs-2020-Project-Ideas If you have suggestions, especially for concrete project ideas, that would be great. Also, keep in mind this project is aimed at technical writers and not developers, so the focus should be mainly on writing/reviewing/organizing documentation content. Cheers, Melissa [1] https://developers.google.com/season-of-docs -------------- next part -------------- An HTML attachment was scrubbed... URL: From sebastian at sipsolutions.net Mon Apr 20 16:37:00 2020 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Mon, 20 Apr 2020 15:37:00 -0500 Subject: [Numpy-discussion] NEP 41: Is there still need to discuss DTypes vs. Scalars (or DType classes)? Message-ID: Hi all, the week has passed, and it has been discussed quite a bit longer, so I assume that NEP 41 can effectively be accepted. Even then, I will bring up one point again. I hope that if there is still need for discussion, it will hopefully happen in a timely manner, so that, I can go ahead with some changes proposed in NEP 41, and in the event of more concrete doubts/issues there will only be few changes that need to be undone. I would hate to revert large amount of work, simply because an important point/issue is raised in two months instead of two weeks. This whole thing is fairly complex, so please do not hesitate to ask for clarifications! I am also very happy to do a video conference with anyone interested at any time, or chat in private on Slack. So just in case: I will be available around 11:00 PDT (18 UTC) this Thursday on the NumPy Community Call zoom link [0]. As far as I am aware, there was only one (maybe 2, see point 2. below which may be independent) discussion points. In my proposal the DType class (i.e. `type(np.dtype("float64")`), is the core concept and different for every scalar type. It holds all the information on how to deal with array elements. This is some duplication of scalar types and it means that there would be (usually) exactly one DType for each (NumPy) scalar, possibly exposed using: np.dtype[scalar_type] e.g. np.dtype[np.float64] That does create a certain duality. For each scalar type/class, there is a corresponding DType class. And in theory the scalar does not even need to know that NumPy has a DType for it. From a typing theoretical point of view this is also a bit strange. The type of each array element is identical to the scalar type! But although there is only one type, there are two distinct classes: one for the scalar value, and one to explain them to NumPy and store them in an array. I lean in that direction because: 1. I wanted to modify scalars as little as possible, I am not sure we will enable this initially, but this is so that: * In principle you can create a DType for every Python type without touching the original Python scalar. * The scalar need not know about NumPy or DTypes thus creating no new dependency. (you can use the scalar without installing NumPy) 2. I somewhat like that DType classes have methods that get a "self" instance argument and are provided with the data by the array. * This makes functions `dtype.__get_array_item__(item_memory)` is implemented like a method: class DType: def __get_array_item__(self, item_memory): return item * There is an alternative approach to this, that I did not think about much, though. `item_memory` really is much like a scalar instance (it holds the actual value), so you can argue that `item_memory` is `self` here, and the dtype instance is the type of `item_memory` (the self). E.g. making `__get_array_item__` live on the dtype (not on the class). The dtype thus is the type/class of the array element. This is beautiful, but, in generally you still need to pass the dtype instance itself. For example strings cannot interpret without knowing their length. In other words, the scalar `self` is actually the tuple `(item_memory, dtype)`, which I think is why at least I do not have a clear grasp here. [1] 3. There may be `dtypes` without specific scalar types. I am not sure this is actually a tidy theoretical concept, but an example is the current Pandas Categorical. The type of the scalars within a categorical array are arbitrary. I am not actually sure that is theoretically tidy. E.g. Python uses `enum.Enum`, a class factory, for a similar purpose, and you have to use the `.value` attribute. But, desirable or not, it would seem less straight forward to potentially allow if we design this around the scalar type. The main downside to using DTypes as proposed in NEP 41 in my opinion is what I mentioned first: We must have a DType class for every scalar class, even though at least most scalars (i.e. all NumPy scalars, except the `object` dtype) can easily be expanded into including all necessary information, maybe they already include almost all of it. In the NEP 41 framework the scalar could be build from the DType in practice. Which may seem a bit strange. In general Scalar<->DType will form a Unit of a sort. And this means that somewhere we have to map scalars to DTypes. So, in many ways, I actually do find the scalar version tidier myself. But, I also find the "there is a DType class for every scalar type/class" a straight forward user story even if there will be subtle difference between DType and scalar class/type. The point 2. may be independent of the whole scalar story, I am conflating it here, because to me it applies more naturally in that context. Cheers, Sebastian [0] See the community meeting agenda document for the link: https://hackmd.io/76o-IxCjQX2mOXO_wwkcpg [1] These are thoughts mainly from: https://gist.github.com/eric-wieser/49c55bcab744b0e782f6c2740603180b#what-this-could-mean-for-dtypes and a discussion on the pull request, and I will not claim to represent them quite correctly and especially fully here. -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: This is a digitally signed message part URL: From matti.picus at gmail.com Tue Apr 21 00:48:33 2020 From: matti.picus at gmail.com (Matti Picus) Date: Tue, 21 Apr 2020 07:48:33 +0300 Subject: [Numpy-discussion] NEP 41: Is there still need to discuss DTypes vs. Scalars (or DType classes)? In-Reply-To: References: Message-ID: <114f5a88-15a3-9e24-1d4b-28a7c60bfe71@gmail.com> On 20/4/20 11:37 pm, Sebastian Berg wrote: > Hi all, > > ... > In my proposal the DType class (i.e. `type(np.dtype("float64")`), is > the core concept and different for every scalar type. It holds all the > information on how to deal with array elements. > > This is some duplication of scalar types and it means that there would > be (usually) exactly one DType for each (NumPy) scalar, possibly > exposed using: > > np.dtype[scalar_type] > e.g. np.dtype[np.float64] > > That does create a certain duality. For each scalar type/class, there > is a corresponding DType class. And in theory the scalar does not even > need to know that NumPy has a DType for it. > > ... > Cheers, > > Sebastian I think this is the correct choice, As we have only a little time before the 1.19 release, the refactoring will at the earliest reach users for 1.20. This gives us time to see how the whole refactoring works out, so the choice can be reevaluated in the future. Without diving into detail, this is the approach taken in the current version of the NEP, correct? If so, I suggest we accept the NEP in its current form and publish it one week from now. Matti From jni at fastmail.com Tue Apr 21 03:06:12 2020 From: jni at fastmail.com (Juan Nunez-Iglesias) Date: Tue, 21 Apr 2020 17:06:12 +1000 Subject: [Numpy-discussion] Proposal: add `force=` or `copy=` kwarg to `__array__` interface Message-ID: Hello NumPy-ers! The __array__ method is a great little tool to allow interoperability with NumPy. Briefly, calling `np.array()` or `np.asarray()` on an object with an `__array__` method, one can get a NumPy representation of that object, which may or may not involve data copying (this is up to the object?s implementation of `__array__`). Some references: https://numpy.org/devdocs/user/basics.dispatch.html https://docs.scipy.org/doc/numpy/reference/arrays.classes.html#numpy.class.__array__ https://numpy.org/devdocs/reference/generated/numpy.array.html https://numpy.org/devdocs/reference/generated/numpy.asarray.html (I couldn?t find an authoritative guide on good and bad practices with `__array__`, btw.) For people writing e.g. visualisation libraries, this is a wonderful thing, because if we know how to visualise NumPy arrays, we can suddenly visualise anything with an `__array__` method. As an example, napari, while not being aware of dask, can visualise large dask arrays out of the box, which allows us to view 100GB out-of-core datasets easily. However, in many cases, instantiating a NumPy array is an expensive operation, for example copying an array from GPU to CPU memory, or involves substantial loss of information. Some library authors are reluctant to allow implicit execution of such an operation, such as PyOpenCL [1], PyTorch [2], or even scipy.sparse. My proposal is to add an optional argument to `__array__` that would signal to the downstream library that we *really* want a NumPy array and are willing to wait for it. In the PyTorch issue I proposed `force=True`, and they are somewhat receptive of this, but, reading more about the existing NumPy APIs, I think `copy=True` would be a nice alternative: - np.array already has a copy= keyword argument. Under this proposal, it would attempt to pass it to the downstream library, and, if that failed, it would try again without it and run its own copy. - np.asarray could get a new copy= keyword argument that would match np.array?s. - It would neatly express the idea that the array is going to e.g. get passed around between devices. Or, we could just go with `force=`. One bit of expressivity we would miss is ?copy if necessary, but otherwise don?t bother?, but there are workarounds to this. What do people think? I would be happy to write a PR and/or NEP for this if there is general consensus that this would be useful. Thanks, Juan. Refs: [1]: https://github.com/inducer/pyopencl/pull/301 [2]: https://github.com/pytorch/pytorch/issues/36560 -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Tue Apr 21 17:55:44 2020 From: charlesr.harris at gmail.com (Charles R Harris) Date: Tue, 21 Apr 2020 15:55:44 -0600 Subject: [Numpy-discussion] Proposal: add `force=` or `copy=` kwarg to `__array__` interface In-Reply-To: References: Message-ID: On Tue, Apr 21, 2020 at 1:07 AM Juan Nunez-Iglesias wrote: > Hello NumPy-ers! > > The __array__ method is a great little tool to allow interoperability with > NumPy. Briefly, calling `np.array()` or `np.asarray()` on an object with an > `__array__` method, one can get a NumPy representation of that object, > which may or may not involve data copying (this is up to the object?s > implementation of `__array__`). Some references: > > https://numpy.org/devdocs/user/basics.dispatch.html > > > https://docs.scipy.org/doc/numpy/reference/arrays.classes.html#numpy.class.__array__ > https://numpy.org/devdocs/reference/generated/numpy.array.html > https://numpy.org/devdocs/reference/generated/numpy.asarray.html > > > (I couldn?t find an authoritative guide on good and bad practices with > `__array__`, btw.) > > For people writing e.g. visualisation libraries, this is a wonderful > thing, because if we know how to visualise NumPy arrays, we can suddenly > visualise anything with an `__array__` method. As an example, napari, while > not being aware of dask, can visualise large dask arrays out of the box, > which allows us to view 100GB out-of-core datasets easily. > > However, in many cases, instantiating a NumPy array is an expensive > operation, for example copying an array from GPU to CPU memory, or involves > substantial loss of information. Some library authors are reluctant to > allow implicit execution of such an operation, such as PyOpenCL [1], > PyTorch [2], or even scipy.sparse. > > My proposal is to add an optional argument to `__array__` that would > signal to the downstream library that we *really* want a NumPy array and > are willing to wait for it. In the PyTorch issue I proposed `force=True`, > and they are somewhat receptive of this, but, reading more about the > existing NumPy APIs, I think `copy=True` would be a nice alternative: > > - np.array already has a copy= keyword argument. Under this proposal, it > would attempt to pass it to the downstream library, and, if that failed, it > would try again without it and run its own copy. > - np.asarray could get a new copy= keyword argument that would match > np.array?s. > - It would neatly express the idea that the array is going to e.g. get > passed around between devices. > > Or, we could just go with `force=`. > > One bit of expressivity we would miss is ?copy if necessary, but otherwise > don?t bother?, but there are workarounds to this. > > What do people think? I would be happy to write a PR and/or NEP for this > if there is general consensus that this would be useful. > > This sounds like the sort of thing that is use case driven. If enough projects want to use it, then I have no objections to adding the keyword. OTOH, we need to be careful about adding too many interoperability tricks as they complicate the code and makes it hard for folks to determine the best solution. Interoperability is a hot topic and we need to be careful not put too leave behind too many experiments in the NumPy code. Do you have any other ideas of how to achieve the same effect? Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From sebastian at sipsolutions.net Tue Apr 21 18:54:30 2020 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Tue, 21 Apr 2020 17:54:30 -0500 Subject: [Numpy-discussion] NumPy Development Meeting - Triage Focus Message-ID: <6b9815bb8a8ea1f9a1953414d2355f188f0adaa8.camel@sipsolutions.net> Hi all, Our bi-weekly triage-focused NumPy development meeting is tomorrow (Wednesday, April 22) at 11 am Pacific Time (18:00 UTC). Everyone is invited to join in and edit the work-in-progress meeting topics and notes: https://hackmd.io/68i_JvOYQfy9ERiHgXMPvg I encourage everyone to notify us of issues or PRs that you feel should be prioritized or simply discussed briefly. Just comment on it so we can label it, or add your PR/issue to this weeks topics for discussion. Best regards Sebastian From mikofski at berkeley.edu Thu Apr 23 14:35:33 2020 From: mikofski at berkeley.edu (Dr. Mark Alexander Mikofski PhD) Date: Thu, 23 Apr 2020 11:35:33 -0700 Subject: [Numpy-discussion] ANN: pvlib v0.7.2 In-Reply-To: <2D02E2D3-3110-4616-A93E-90D4EB390667@sandia.gov> References: <2D02E2D3-3110-4616-A93E-90D4EB390667@sandia.gov> Message-ID: pvlib has a new minor release, v0.7.2 Release Notes: https://pvlib-python.readthedocs.io/en/v0.7.2/whatsnew.html PyPI: https://pypi.org/project/pvlib/ Read the Docs: https://pvlib-python.readthedocs.io/en/latest/ GitHub: https://github.com/pvlib/pvlib-python Highlights: - add new module pvlib.snow to contain models related to snow coverage and effects on a PV system. (GH764< https://github.com/pvlib/pvlib-python/pull/764>) - Renamed pvlib.losses to pvlib.soiling. Additional loss models will go into code modules named for the loss or effect type. (GH935< https://github.com/pvlib/pvlib-python/issues/935>, GH891< https://github.com/pvlib/pvlib-python/issues/891>) - updated compatibility with cftime 1.1. (GH895< https://github.com/pvlib/pvlib-python/issues/895>) There are a few breaking API changes and bug fixes. Users are advised to read the release notes before updating. ---------- Forwarded message --------- From: Stark, Cameron Thomas via Python-announce-list < python-announce-list at python.org> Date: Wed, Apr 22, 2020, 5:58 PM Subject: pvlib v0.7.2 To: Cc: Stark, Cameron Thomas pvlib has a new minor release, v0.7.2 Release Notes: https://pvlib-python.readthedocs.io/en/v0.7.2/whatsnew.html PyPI: https://pypi.org/project/pvlib/ Read the Docs: https://pvlib-python.readthedocs.io/en/latest/ GitHub: https://github.com/pvlib/pvlib-python Highlights: - add new module pvlib.snow to contain models related to snow coverage and effects on a PV system. (GH764< https://github.com/pvlib/pvlib-python/pull/764>) - Renamed pvlib.losses to pvlib.soiling. Additional loss models will go into code modules named for the loss or effect type. (GH935< https://github.com/pvlib/pvlib-python/issues/935>, GH891< https://github.com/pvlib/pvlib-python/issues/891>) - updated compatibility with cftime 1.1. (GH895< https://github.com/pvlib/pvlib-python/issues/895>) There are a few breaking API changes and bug fixes. Users are advised to read the release notes before updating. -- Python-announce-list mailing list -- python-announce-list at python.org To unsubscribe send an email to python-announce-list-leave at python.org https://mail.python.org/mailman3/lists/python-announce-list.python.org/ Support the Python Software Foundation: http://www.python.org/psf/donations/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From sebastian at sipsolutions.net Thu Apr 23 17:34:49 2020 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Thu, 23 Apr 2020 16:34:49 -0500 Subject: [Numpy-discussion] Update the Code of Conduct Committee Membership (new members wanted) Message-ID: <6ca65ab6bb3c75184a053ec0ec0831ed4ff77fa3.camel@sipsolutions.net> Hi all, it has come up in the last community call that many of our committee membership lists have not been updated in a while. This is not a big issue as such. But, while these committees are not very active on a day-to-day basis, they are an important part of the community and it is better to update them regularly and thus also ensure they remain representative of the community. We would like to start by updating the members of the Code of Conduct (CoC) committee. The CoC committee is in charge of responding and following up to any reports of CoC breaches, as stated in: https://docs.scipy.org/doc/numpy/dev/conduct/code_of_conduct.html#incident-reporting-resolution-code-of-conduct-enforcement If you are interested in or happy to serve on our CoC committee please let me or e.g. Ralf Gommers know, join the next community meeting (April 29th, 11:00PDT/18:00UTC), or reply on the list. I hope we will be able to discuss and reach a consensus between those interested and involved quickly (possibly already on the next community call). In either case, before any changes they will be run by the mailing list to ensure community consensus. Cheers, Sebastian From jni at fastmail.com Thu Apr 23 21:59:50 2020 From: jni at fastmail.com (Juan Nunez-Iglesias) Date: Fri, 24 Apr 2020 11:59:50 +1000 Subject: [Numpy-discussion] Proposal: add `force=` or `copy=` kwarg to `__array__` interface In-Reply-To: References: Message-ID: Hi everyone, > One bit of expressivity we would miss is ?copy if necessary, but otherwise don?t bother?, but there are workarounds to this. After a side discussion with St?fan van der Walt, we came up with `allow_copy=True`, which would express to the downstream library that we don?t mind waiting, but that zero-copy would also be ok. > This sounds like the sort of thing that is use case driven. If enough projects want to use it, then I have no objections to adding the keyword. OTOH, we need to be careful about adding too many interoperability tricks as they complicate the code and makes it hard for folks to determine the best solution. Interoperability is a hot topic and we need to be careful not put too leave behind too many experiments in the NumPy code. Do you have any other ideas of how to achieve the same effect? Personally, I don?t have any other ideas, but would be happy to hear some! My view regarding API/experiment creep is that `__array__` is the oldest and most basic of all the interop tricks and that this can be safely maintained for future generations. Currently it only takes `dtype=` as a keyword argument, so it is a very lean API. I think this particular use case is very natural and I?ve encountered the reluctance to implicitly copy twice, so I expect it is reasonably common. Regarding difficulty in determining the best solution, I would be happy to contribute to the dispatch basics guide together with the new kwarg. I agree that the protocols are getting quite numerous and I couldn?t find a single place that gathers all the best practices together. But, to reiterate my point: `__array__` is the simplest of these and I think this keyword is pretty safe to add. For ease of discussion, here are the API options discussed so far, as well as a few extra that I don?t like but might trigger other ideas: np.asarray(my_duck_array, allow_copy=True) # default is False, or None -> leave it to the duck array to decide np.asarray(my_duck_array, copy=True) # always copies, but, if supported by the duck array, defers to it for the copy np.asarray(my_duck_array, copy=?allow?) # could take values ?allow?, ?force?, ?no?, True(=?force?), False(=?no?) np.asarray(my_duck_array, force_copy=False, allow_copy=True) # separate concepts, but unclear what force_copy=True, allow_copy=False means! np.asarray(my_duck_array, force=True) Juan. -------------- next part -------------- An HTML attachment was scrubbed... URL: From wieser.eric+numpy at gmail.com Fri Apr 24 06:34:28 2020 From: wieser.eric+numpy at gmail.com (Eric Wieser) Date: Fri, 24 Apr 2020 11:34:28 +0100 Subject: [Numpy-discussion] Proposal: add `force=` or `copy=` kwarg to `__array__` interface In-Reply-To: References: Message-ID: Perhaps worth mentioning that we've discussed this sort of API before, in https://github.com/numpy/numpy/pull/11897. Under that proposal, the api would be something like: * `copy=True` - always copy, like it is today * `copy=False` - copy if needed, like it is today * `copy=np.never_copy` - never copy, throw an exception if not possible I think the discussion stalled on the precise spelling of the third option. `__array__` was not discussed there, but it seems like adding the `copy` argument to `__array__` would be a perfectly reasonable extension. Eric On Fri, 24 Apr 2020 at 03:00, Juan Nunez-Iglesias wrote: > Hi everyone, > > One bit of expressivity we would miss is ?copy if necessary, but otherwise >> don?t bother?, but there are workarounds to this. >> > > After a side discussion with St?fan van der Walt, we came up with > `allow_copy=True`, which would express to the downstream library that we > don?t mind waiting, but that zero-copy would also be ok. > > This sounds like the sort of thing that is use case driven. If enough > projects want to use it, then I have no objections to adding the keyword. > OTOH, we need to be careful about adding too many interoperability tricks > as they complicate the code and makes it hard for folks to determine the > best solution. Interoperability is a hot topic and we need to be careful > not put too leave behind too many experiments in the NumPy code. Do you > have any other ideas of how to achieve the same effect? > > > Personally, I don?t have any other ideas, but would be happy to hear some! > > My view regarding API/experiment creep is that `__array__` is the oldest > and most basic of all the interop tricks and that this can be safely > maintained for future generations. Currently it only takes `dtype=` as a > keyword argument, so it is a very lean API. I think this particular use > case is very natural and I?ve encountered the reluctance to implicitly copy > twice, so I expect it is reasonably common. > > Regarding difficulty in determining the best solution, I would be happy to > contribute to the dispatch basics guide together with the new kwarg. I > agree that the protocols are getting quite numerous and I couldn?t find a > single place that gathers all the best practices together. But, to > reiterate my point: `__array__` is the simplest of these and I think this > keyword is pretty safe to add. > > For ease of discussion, here are the API options discussed so far, as well > as a few extra that I don?t like but might trigger other ideas: > > np.asarray(my_duck_array, allow_copy=True) # default is False, or None -> > leave it to the duck array to decide > np.asarray(my_duck_array, copy=True) # always copies, but, if supported > by the duck array, defers to it for the copy > np.asarray(my_duck_array, copy=?allow?) # could take values ?allow?, > ?force?, ?no?, True(=?force?), False(=?no?) > np.asarray(my_duck_array, force_copy=False, allow_copy=True) # separate > concepts, but unclear what force_copy=True, allow_copy=False means! > np.asarray(my_duck_array, force=True) > > Juan. > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From sebastian at sipsolutions.net Fri Apr 24 09:26:32 2020 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Fri, 24 Apr 2020 08:26:32 -0500 Subject: [Numpy-discussion] Proposal: add `force=` or `copy=` kwarg to `__array__` interface In-Reply-To: References: Message-ID: <6a48c73fd4b3754bff8020029ac80a6afc159803.camel@sipsolutions.net> On Fri, 2020-04-24 at 11:34 +0100, Eric Wieser wrote: > Perhaps worth mentioning that we've discussed this sort of API > before, in > https://github.com/numpy/numpy/pull/11897. > > Under that proposal, the api would be something like: > > * `copy=True` - always copy, like it is today > * `copy=False` - copy if needed, like it is today > * `copy=np.never_copy` - never copy, throw an exception if not > possible > > I think the discussion stalled on the precise spelling of the third > option. > > `__array__` was not discussed there, but it seems like adding the > `copy` > argument to `__array__` would be a perfectly reasonable extension. > One thing to note is that `__array__` is actually asked to return a copy AFAIK. I doubt it always does, but if it does not I assume the object should and could provide `__array_interface__`. Under that assumption, it would be an opt-out right now since NumPy allows copies by default here. Defining things along copy does seem sensible, though I do not know how it would play with some of the current array-likes choosing to refuse `__array__`. - Sebastian > Eric > > On Fri, 24 Apr 2020 at 03:00, Juan Nunez-Iglesias > wrote: > > > Hi everyone, > > > > One bit of expressivity we would miss is ?copy if necessary, but > > otherwise > > > don?t bother?, but there are workarounds to this. > > > > > > > After a side discussion with St?fan van der Walt, we came up with > > `allow_copy=True`, which would express to the downstream library > > that we > > don?t mind waiting, but that zero-copy would also be ok. > > > > This sounds like the sort of thing that is use case driven. If > > enough > > projects want to use it, then I have no objections to adding the > > keyword. > > OTOH, we need to be careful about adding too many interoperability > > tricks > > as they complicate the code and makes it hard for folks to > > determine the > > best solution. Interoperability is a hot topic and we need to be > > careful > > not put too leave behind too many experiments in the NumPy > > code. Do you > > have any other ideas of how to achieve the same effect? > > > > > > Personally, I don?t have any other ideas, but would be happy to > > hear some! > > > > My view regarding API/experiment creep is that `__array__` is the > > oldest > > and most basic of all the interop tricks and that this can be > > safely > > maintained for future generations. Currently it only takes `dtype=` > > as a > > keyword argument, so it is a very lean API. I think this particular > > use > > case is very natural and I?ve encountered the reluctance to > > implicitly copy > > twice, so I expect it is reasonably common. > > > > Regarding difficulty in determining the best solution, I would be > > happy to > > contribute to the dispatch basics guide together with the new > > kwarg. I > > agree that the protocols are getting quite numerous and I couldn?t > > find a > > single place that gathers all the best practices together. But, to > > reiterate my point: `__array__` is the simplest of these and I > > think this > > keyword is pretty safe to add. > > > > For ease of discussion, here are the API options discussed so far, > > as well > > as a few extra that I don?t like but might trigger other ideas: > > > > np.asarray(my_duck_array, allow_copy=True) # default is False, or > > None -> > > leave it to the duck array to decide > > np.asarray(my_duck_array, copy=True) # always copies, but, if > > supported > > by the duck array, defers to it for the copy > > np.asarray(my_duck_array, copy=?allow?) # could take values > > ?allow?, > > ?force?, ?no?, True(=?force?), False(=?no?) > > np.asarray(my_duck_array, force_copy=False, allow_copy=True) # > > separate > > concepts, but unclear what force_copy=True, allow_copy=False means! > > np.asarray(my_duck_array, force=True) > > > > Juan. > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at python.org > > https://mail.python.org/mailman/listinfo/numpy-discussion > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion From josh.craig.wilson at gmail.com Fri Apr 24 11:45:17 2020 From: josh.craig.wilson at gmail.com (Joshua Wilson) Date: Fri, 24 Apr 2020 08:45:17 -0700 Subject: [Numpy-discussion] Feelings about type aliases in NumPy Message-ID: Hey everyone, Over in numpy-stubs we've been working on typing "array like": https://github.com/numpy/numpy-stubs/pull/66 It would be nice if the type were public so that downstream projects could use it (e.g. it would be very helpful in SciPy). Originally the plan was to only make it publicly available at typing time and not runtime, which would mean that no changes to NumPy are necessary; see https://github.com/numpy/numpy-stubs/pull/66#issuecomment-618784833 for more information on how that works. But, Stephan pointed out that it might be confusing to users for objects to only exist at typing time, so we came around to the question of whether people are open to the idea of including the type aliases in NumPy itself. Ralf's concrete proposal was to make a module numpy.types (or maybe numpy.typing) to hold the aliases so that they don't pollute the top-level namespace. The module would initially contain the types - ArrayLike - DtypeLike - (maybe) ShapeLike Note that we would not need to make changes to NumPy right away; instead it would probably be done when numpy-stubs is merged into NumPy itself. What do people think? - Josh From shoyer at gmail.com Fri Apr 24 13:12:08 2020 From: shoyer at gmail.com (Stephan Hoyer) Date: Fri, 24 Apr 2020 10:12:08 -0700 Subject: [Numpy-discussion] Proposal: add `force=` or `copy=` kwarg to `__array__` interface In-Reply-To: <6a48c73fd4b3754bff8020029ac80a6afc159803.camel@sipsolutions.net> References: <6a48c73fd4b3754bff8020029ac80a6afc159803.camel@sipsolutions.net> Message-ID: On Fri, Apr 24, 2020 at 6:31 AM Sebastian Berg wrote: > One thing to note is that `__array__` is actually asked to return a > copy AFAIK. The documentation on __array__ seems to quite limited, unfortunately. The most I can find are a few sentences here: https://numpy.org/doc/stable/reference/arrays.classes.html#numpy.class.__array__ I don't see anything about returning copies. My interpretation has always been that __array__ can return either a copy or a view, like the np.asarray() constructor. > I doubt it always does, but if it does not I assume the > object should and could provide `__array_interface__`. > Objects like xarray.DataArray and pandas.Series sometimes directly wrap NumPy arrays and sometimes don't. They both implement __array__ but not __array_inferace__. It's very obvious how to implement a "forwarding" __array__ method (just call `np.asarray()` on an argument that might implement it). I guess something similar could be done for __array_interface__, but it's not clear to me that it's right to implement __array_interface__ when doing so might require a copy. > Under that assumption, it would be an opt-out right now since NumPy > allows copies by default here. > Defining things along copy does seem sensible, though I do not know how > it would play with some of the current array-likes choosing to refuse > `__array__`. > > - Sebastian > > > > > Eric > > > > On Fri, 24 Apr 2020 at 03:00, Juan Nunez-Iglesias > > wrote: > > > > > Hi everyone, > > > > > > One bit of expressivity we would miss is ?copy if necessary, but > > > otherwise > > > > don?t bother?, but there are workarounds to this. > > > > > > > > > > After a side discussion with St?fan van der Walt, we came up with > > > `allow_copy=True`, which would express to the downstream library > > > that we > > > don?t mind waiting, but that zero-copy would also be ok. > > > > > > This sounds like the sort of thing that is use case driven. If > > > enough > > > projects want to use it, then I have no objections to adding the > > > keyword. > > > OTOH, we need to be careful about adding too many interoperability > > > tricks > > > as they complicate the code and makes it hard for folks to > > > determine the > > > best solution. Interoperability is a hot topic and we need to be > > > careful > > > not put too leave behind too many experiments in the NumPy > > > code. Do you > > > have any other ideas of how to achieve the same effect? > > > > > > > > > Personally, I don?t have any other ideas, but would be happy to > > > hear some! > > > > > > My view regarding API/experiment creep is that `__array__` is the > > > oldest > > > and most basic of all the interop tricks and that this can be > > > safely > > > maintained for future generations. Currently it only takes `dtype=` > > > as a > > > keyword argument, so it is a very lean API. I think this particular > > > use > > > case is very natural and I?ve encountered the reluctance to > > > implicitly copy > > > twice, so I expect it is reasonably common. > > > > > > Regarding difficulty in determining the best solution, I would be > > > happy to > > > contribute to the dispatch basics guide together with the new > > > kwarg. I > > > agree that the protocols are getting quite numerous and I couldn?t > > > find a > > > single place that gathers all the best practices together. But, to > > > reiterate my point: `__array__` is the simplest of these and I > > > think this > > > keyword is pretty safe to add. > > > > > > For ease of discussion, here are the API options discussed so far, > > > as well > > > as a few extra that I don?t like but might trigger other ideas: > > > > > > np.asarray(my_duck_array, allow_copy=True) # default is False, or > > > None -> > > > leave it to the duck array to decide > > > np.asarray(my_duck_array, copy=True) # always copies, but, if > > > supported > > > by the duck array, defers to it for the copy > > > np.asarray(my_duck_array, copy=?allow?) # could take values > > > ?allow?, > > > ?force?, ?no?, True(=?force?), False(=?no?) > > > np.asarray(my_duck_array, force_copy=False, allow_copy=True) # > > > separate > > > concepts, but unclear what force_copy=True, allow_copy=False means! > > > np.asarray(my_duck_array, force=True) > > > > > > Juan. > > > _______________________________________________ > > > NumPy-Discussion mailing list > > > NumPy-Discussion at python.org > > > https://mail.python.org/mailman/listinfo/numpy-discussion > > > > > > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at python.org > > https://mail.python.org/mailman/listinfo/numpy-discussion > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From stefanv at berkeley.edu Fri Apr 24 14:10:15 2020 From: stefanv at berkeley.edu (Stefan van der Walt) Date: Fri, 24 Apr 2020 11:10:15 -0700 Subject: [Numpy-discussion] Feelings about type aliases in NumPy In-Reply-To: References: Message-ID: On Fri, Apr 24, 2020, at 08:45, Joshua Wilson wrote: > But, Stephan pointed out that it might be confusing to users for > objects to only exist at typing time, so we came around to the > question of whether people are open to the idea of including the type > aliases in NumPy itself. Ralf's concrete proposal was to make a module > numpy.types (or maybe numpy.typing) to hold the aliases so that they > don't pollute the top-level namespace. The module would initially > contain the types That sounds very sensible. Having types available with NumPy should also encourage their use, especially if we can add some documentation around it. St?fan From sebastian at sipsolutions.net Fri Apr 24 14:23:35 2020 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Fri, 24 Apr 2020 13:23:35 -0500 Subject: [Numpy-discussion] Proposal: add `force=` or `copy=` kwarg to `__array__` interface In-Reply-To: References: <6a48c73fd4b3754bff8020029ac80a6afc159803.camel@sipsolutions.net> Message-ID: <8638791fbc5f4c23d95fdf96a76d22345a7eb287.camel@sipsolutions.net> On Fri, 2020-04-24 at 10:12 -0700, Stephan Hoyer wrote: > On Fri, Apr 24, 2020 at 6:31 AM Sebastian Berg < > sebastian at sipsolutions.net> > wrote: > > > One thing to note is that `__array__` is actually asked to return a > > copy AFAIK. > > The documentation on __array__ seems to quite limited, unfortunately. > The > most I can find are a few sentences here: > https://numpy.org/doc/stable/reference/arrays.classes.html#numpy.class.__array__ > > I don't see anything about returning copies. My interpretation has > always > been that __array__ can return either a copy or a view, like the > np.asarray() constructor. > Hmmm, right, I am not quite sure why I thought this was the case. The more important part is behaviour. And the fact is that if you do `np.array(array_like)` with an array like that implements `__array__` then we ensure a copy is made by default (`copy=True` by default), even though `__array__()` may already return a copy. In any case, the current default for `np.asarray`, i.e. `copy=False` is "copy if necessary". So if PyTorch uses a new parameter to Opt-In to copying, the default behaviour will depend on the object. The definition would then be: Copy if necessary but error if a copy is necessary and the object doesn't want to be copied silently. To be honest, that seems not totally terrible to me... The old statement remains true with the small caveat that it will sometimes cause a loud error explaining things. The only problem is that some users may want an the explicit `np.copy_if_necessary` to get PyTorch to do what most already do on `copy=False`. I guess the new behaviour would then be: if copy is np.never_copy: # or however we signal it try: arr = obj.__array__(copy=np.no_copy) except TypeError as e: raise TypeError("no copy appears unsupported by ...!") from e elif copy is np.copy_if_necessary: # Some users may want to tell PyTorch not to error, but # tell pandas, that a view is OK: try: arr = np.array(copy=np.copy_if_necessary) except TypeError: arr = obj.__array__() elif not copy: # Behaviour here may depend on the array-like! # current array likes may or may not return a copy, # new ones may choose to raise an error when a view # is not possible. arr = obj.__array__() else: try: arr = obj.__array__(copy=True) except TypeError: arr = obj.__array__() arr = arr.copy() # make sure its a copy PyTorch can then implement copy, but raise an error if `copy=False` (which must be the default). Current objects will error for `np.never_copy` but otherwise be fine. And they can implement `copy` to avoid an unnecessary double copy if they wish so. We could add the `np.copy_if_necessary` to be an explicit replacement for the current `copy=False`. This will be necessary, or nicer, unless everyone is happy to copy by default. Another side note: calls such as `np.array([arr1, arr2])` probably must always fail if `copy=np.never_copy` since a view is not guaranteed. - Sebastian > > > I doubt it always does, but if it does not I assume the > > object should and could provide `__array_interface__`. > > > > Objects like xarray.DataArray and pandas.Series sometimes directly > wrap > NumPy arrays and sometimes don't. > > They both implement __array__ but not __array_inferace__. It's very > obvious > how to implement a "forwarding" __array__ method (just call > `np.asarray()` > on an argument that might implement it). I guess something similar > could be > done for __array_interface__, but it's not clear to me that it's > right to > implement __array_interface__ when doing so might require a copy. > Yes, I do not think you should implement __array_interface__ then, unless "simplifying the array" is for some reason beneficial for yourself. I suppose you could raise an AttributeError, but it is questionable if thats good. > > > Under that assumption, it would be an opt-out right now since NumPy > > allows copies by default here. > > Defining things along copy does seem sensible, though I do not know > > how > > it would play with some of the current array-likes choosing to > > refuse > > `__array__`. > > > > - Sebastian > > > > > > > > > Eric > > > > > > On Fri, 24 Apr 2020 at 03:00, Juan Nunez-Iglesias < > > > jni at fastmail.com> > > > wrote: > > > > > > > Hi everyone, > > > > > > > > One bit of expressivity we would miss is ?copy if necessary, > > > > but > > > > otherwise > > > > > don?t bother?, but there are workarounds to this. > > > > > > > > > > > > > After a side discussion with St?fan van der Walt, we came up > > > > with > > > > `allow_copy=True`, which would express to the downstream > > > > library > > > > that we > > > > don?t mind waiting, but that zero-copy would also be ok. > > > > > > > > This sounds like the sort of thing that is use case driven. If > > > > enough > > > > projects want to use it, then I have no objections to adding > > > > the > > > > keyword. > > > > OTOH, we need to be careful about adding too many > > > > interoperability > > > > tricks > > > > as they complicate the code and makes it hard for folks to > > > > determine the > > > > best solution. Interoperability is a hot topic and we need to > > > > be > > > > careful > > > > not put too leave behind too many experiments in the NumPy > > > > code. Do you > > > > have any other ideas of how to achieve the same effect? > > > > > > > > > > > > Personally, I don?t have any other ideas, but would be happy to > > > > hear some! > > > > > > > > My view regarding API/experiment creep is that `__array__` is > > > > the > > > > oldest > > > > and most basic of all the interop tricks and that this can be > > > > safely > > > > maintained for future generations. Currently it only takes > > > > `dtype=` > > > > as a > > > > keyword argument, so it is a very lean API. I think this > > > > particular > > > > use > > > > case is very natural and I?ve encountered the reluctance to > > > > implicitly copy > > > > twice, so I expect it is reasonably common. > > > > > > > > Regarding difficulty in determining the best solution, I would > > > > be > > > > happy to > > > > contribute to the dispatch basics guide together with the new > > > > kwarg. I > > > > agree that the protocols are getting quite numerous and I > > > > couldn?t > > > > find a > > > > single place that gathers all the best practices together. But, > > > > to > > > > reiterate my point: `__array__` is the simplest of these and I > > > > think this > > > > keyword is pretty safe to add. > > > > > > > > For ease of discussion, here are the API options discussed so > > > > far, > > > > as well > > > > as a few extra that I don?t like but might trigger other ideas: > > > > > > > > np.asarray(my_duck_array, allow_copy=True) # default is False, > > > > or > > > > None -> > > > > leave it to the duck array to decide > > > > np.asarray(my_duck_array, copy=True) # always copies, but, if > > > > supported > > > > by the duck array, defers to it for the copy > > > > np.asarray(my_duck_array, copy=?allow?) # could take values > > > > ?allow?, > > > > ?force?, ?no?, True(=?force?), False(=?no?) > > > > np.asarray(my_duck_array, force_copy=False, allow_copy=True) # > > > > separate > > > > concepts, but unclear what force_copy=True, allow_copy=False > > > > means! > > > > np.asarray(my_duck_array, force=True) > > > > > > > > Juan. > > > > _______________________________________________ > > > > NumPy-Discussion mailing list > > > > NumPy-Discussion at python.org > > > > https://mail.python.org/mailman/listinfo/numpy-discussion > > > > > > > > > > _______________________________________________ > > > NumPy-Discussion mailing list > > > NumPy-Discussion at python.org > > > https://mail.python.org/mailman/listinfo/numpy-discussion > > > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at python.org > > https://mail.python.org/mailman/listinfo/numpy-discussion > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion From sebastian at sipsolutions.net Fri Apr 24 14:29:59 2020 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Fri, 24 Apr 2020 13:29:59 -0500 Subject: [Numpy-discussion] Feelings about type aliases in NumPy In-Reply-To: References: Message-ID: <1e1edc6b7e204fb152b0ab539793e665df635eac.camel@sipsolutions.net> On Fri, 2020-04-24 at 11:10 -0700, Stefan van der Walt wrote: > On Fri, Apr 24, 2020, at 08:45, Joshua Wilson wrote: > > But, Stephan pointed out that it might be confusing to users for > > objects to only exist at typing time, so we came around to the > > question of whether people are open to the idea of including the > > type > > aliases in NumPy itself. Ralf's concrete proposal was to make a > > module > > numpy.types (or maybe numpy.typing) to hold the aliases so that > > they > > don't pollute the top-level namespace. The module would initially > > contain the types > > That sounds very sensible. Having types available with NumPy should > also encourage their use, especially if we can add some documentation > around it. I agree, I might have a small tendency for `numpy.types` if we ever find any usage other than direct typing that may be the better name? Out of curiousity, I guess `ArrayLike` would be an ABC that a downstream project can register with? - Sebastian > > St?fan > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion From oberoi011115tina at gmail.com Sat Apr 25 01:00:38 2020 From: oberoi011115tina at gmail.com (Tina Oberoi) Date: Sat, 25 Apr 2020 10:30:38 +0530 Subject: [Numpy-discussion] beginner introduction to group Message-ID: Hi Everyone, I am new to contributing to numpy. I have read the contributors guide and done with the set-up. Hope to make some good contributions and also to connect with all you great people in the numpy community. Any suggestions and tips are always welcome. Thanks and Regards -------------- next part -------------- An HTML attachment was scrubbed... URL: From robert.kern at gmail.com Sat Apr 25 01:59:57 2020 From: robert.kern at gmail.com (Robert Kern) Date: Sat, 25 Apr 2020 01:59:57 -0400 Subject: [Numpy-discussion] beginner introduction to group In-Reply-To: References: Message-ID: On Sat, Apr 25, 2020 at 1:02 AM Tina Oberoi wrote: > Hi Everyone, > I am new to contributing to numpy. I have read the contributors guide and > done with the set-up. Hope to make some good contributions and also to > connect with all you great people in the numpy community. > Any suggestions and tips are always welcome. > Welcome! Do you have an idea what you would like to work on? -- Robert Kern -------------- next part -------------- An HTML attachment was scrubbed... URL: From shoyer at gmail.com Sat Apr 25 02:40:20 2020 From: shoyer at gmail.com (Stephan Hoyer) Date: Fri, 24 Apr 2020 23:40:20 -0700 Subject: [Numpy-discussion] Feelings about type aliases in NumPy In-Reply-To: <1e1edc6b7e204fb152b0ab539793e665df635eac.camel@sipsolutions.net> References: <1e1edc6b7e204fb152b0ab539793e665df635eac.camel@sipsolutions.net> Message-ID: On Fri, Apr 24, 2020 at 11:31 AM Sebastian Berg wrote: > On Fri, 2020-04-24 at 11:10 -0700, Stefan van der Walt wrote: > > On Fri, Apr 24, 2020, at 08:45, Joshua Wilson wrote: > > > But, Stephan pointed out that it might be confusing to users for > > > objects to only exist at typing time, so we came around to the > > > question of whether people are open to the idea of including the > > > type > > > aliases in NumPy itself. Ralf's concrete proposal was to make a > > > module > > > numpy.types (or maybe numpy.typing) to hold the aliases so that > > > they > > > don't pollute the top-level namespace. The module would initially > > > contain the types > > > > That sounds very sensible. Having types available with NumPy should > > also encourage their use, especially if we can add some documentation > > around it. > > I agree, I might have a small tendency for `numpy.types` if we ever > find any usage other than direct typing that may be the better name? Unless we anticipate adding a long list of type aliases (more than the three suggested so far), I would lean towards adding ArrayLike to the top level NumPy namespace as np.ArrayLike. Type annotations are becoming an increasingly core part of modern Python code. We should make it easy to appropriately type check functions that act on NumPy arrays, and a top level np.ArrayLike is definitely more convenient than np.types.ArrayLike. Out of curiousity, I guess `ArrayLike` would be an ABC that a > downstream project can register with? ArrayLike will be a typing Protocol, automatically recognizing attributes like __array__ to indicate that something can be cast to an array. > > - Sebastian > > > > > > St?fan > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at python.org > > https://mail.python.org/mailman/listinfo/numpy-discussion > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From kevin.k.sheppard at gmail.com Sat Apr 25 02:49:58 2020 From: kevin.k.sheppard at gmail.com (Kevin Sheppard) Date: Sat, 25 Apr 2020 07:49:58 +0100 Subject: [Numpy-discussion] Feelings about type aliases in NumPy In-Reply-To: References: <1e1edc6b7e204fb152b0ab539793e665df635eac.camel@sipsolutions.net> Message-ID: Typing is for library developers more than end users. I would also worry that putting it into the top level might discourage other typing classes since it is more difficult to add to the top level than to a lower level module. np.typing seems very clear to me. On Sat, Apr 25, 2020, 07:41 Stephan Hoyer wrote: > > > On Fri, Apr 24, 2020 at 11:31 AM Sebastian Berg < > sebastian at sipsolutions.net> wrote: > >> On Fri, 2020-04-24 at 11:10 -0700, Stefan van der Walt wrote: >> > On Fri, Apr 24, 2020, at 08:45, Joshua Wilson wrote: >> > > But, Stephan pointed out that it might be confusing to users for >> > > objects to only exist at typing time, so we came around to the >> > > question of whether people are open to the idea of including the >> > > type >> > > aliases in NumPy itself. Ralf's concrete proposal was to make a >> > > module >> > > numpy.types (or maybe numpy.typing) to hold the aliases so that >> > > they >> > > don't pollute the top-level namespace. The module would initially >> > > contain the types >> > >> > That sounds very sensible. Having types available with NumPy should >> > also encourage their use, especially if we can add some documentation >> > around it. >> >> I agree, I might have a small tendency for `numpy.types` if we ever >> find any usage other than direct typing that may be the better name? > > > Unless we anticipate adding a long list of type aliases (more than the > three suggested so far), I would lean towards adding ArrayLike to the top > level NumPy namespace as np.ArrayLike. > > Type annotations are becoming an increasingly core part of modern Python > code. We should make it easy to appropriately type check functions that act > on NumPy arrays, and a top level np.ArrayLike is definitely more convenient > than np.types.ArrayLike. > > Out of curiousity, I guess `ArrayLike` would be an ABC that a >> downstream project can register with? > > > ArrayLike will be a typing Protocol, automatically recognizing attributes > like __array__ to indicate that something can be cast to an array. > > >> >> - Sebastian >> >> >> > >> > St?fan >> > _______________________________________________ >> > NumPy-Discussion mailing list >> > NumPy-Discussion at python.org >> > https://mail.python.org/mailman/listinfo/numpy-discussion >> >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at python.org >> https://mail.python.org/mailman/listinfo/numpy-discussion >> > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From albuscode at gmail.com Sat Apr 25 03:26:32 2020 From: albuscode at gmail.com (Inessa Pawson) Date: Sat, 25 Apr 2020 17:26:32 +1000 Subject: [Numpy-discussion] NumPy-Discussion Digest, Vol 163, Issue 23 In-Reply-To: References: Message-ID: On Sat, Apr 25, 2020 at 4:50 PM wrote: > Send NumPy-Discussion mailing list submissions to > numpy-discussion at python.org > > To subscribe or unsubscribe via the World Wide Web, visit > https://mail.python.org/mailman/listinfo/numpy-discussion > or, via email, send a message with subject or body 'help' to > numpy-discussion-request at python.org > > You can reach the person managing the list at > numpy-discussion-owner at python.org > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of NumPy-Discussion digest..." > Today's Topics: > > 1. Re: Feelings about type aliases in NumPy (Sebastian Berg) > 2. beginner introduction to group (Tina Oberoi) > 3. Re: beginner introduction to group (Robert Kern) > 4. Re: Feelings about type aliases in NumPy (Stephan Hoyer) > 5. Re: Feelings about type aliases in NumPy (Kevin Sheppard) > > > > ---------- Forwarded message ---------- > From: Sebastian Berg > To: numpy-discussion at python.org > Cc: > Bcc: > Date: Fri, 24 Apr 2020 13:29:59 -0500 > Subject: Re: [Numpy-discussion] Feelings about type aliases in NumPy > On Fri, 2020-04-24 at 11:10 -0700, Stefan van der Walt wrote: > > On Fri, Apr 24, 2020, at 08:45, Joshua Wilson wrote: > > > But, Stephan pointed out that it might be confusing to users for > > > objects to only exist at typing time, so we came around to the > > > question of whether people are open to the idea of including the > > > type > > > aliases in NumPy itself. Ralf's concrete proposal was to make a > > > module > > > numpy.types (or maybe numpy.typing) to hold the aliases so that > > > they > > > don't pollute the top-level namespace. The module would initially > > > contain the types > > > > That sounds very sensible. Having types available with NumPy should > > also encourage their use, especially if we can add some documentation > > around it. > > I agree, I might have a small tendency for `numpy.types` if we ever > find any usage other than direct typing that may be the better name? > > Out of curiousity, I guess `ArrayLike` would be an ABC that a > downstream project can register with? > > - Sebastian > > > > > > St?fan > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at python.org > > https://mail.python.org/mailman/listinfo/numpy-discussion > > > > > > > ---------- Forwarded message ---------- > From: Tina Oberoi > To: numpy-discussion at python.org > Cc: > Bcc: > Date: Sat, 25 Apr 2020 10:30:38 +0530 > Subject: [Numpy-discussion] beginner introduction to group > Hi Everyone, > I am new to contributing to numpy. I have read the contributors guide and > done with the set-up. Hope to make some good contributions and also to > connect with all you great people in the numpy community. > Any suggestions and tips are always welcome. > > Thanks and Regards > Welcome, Tina! -- Inessa Pawson > > ---------- Forwarded message ---------- > From: Robert Kern > To: Discussion of Numerical Python > Cc: > Bcc: > Date: Sat, 25 Apr 2020 01:59:57 -0400 > Subject: Re: [Numpy-discussion] beginner introduction to group > On Sat, Apr 25, 2020 at 1:02 AM Tina Oberoi > wrote: > >> Hi Everyone, >> I am new to contributing to numpy. I have read the contributors guide and >> done with the set-up. Hope to make some good contributions and also to >> connect with all you great people in the numpy community. >> Any suggestions and tips are always welcome. >> > > Welcome! Do you have an idea what you would like to work on? > > -- > Robert Kern > > > > ---------- Forwarded message ---------- > From: Stephan Hoyer > To: Discussion of Numerical Python > Cc: > Bcc: > Date: Fri, 24 Apr 2020 23:40:20 -0700 > Subject: Re: [Numpy-discussion] Feelings about type aliases in NumPy > > > On Fri, Apr 24, 2020 at 11:31 AM Sebastian Berg < > sebastian at sipsolutions.net> wrote: > >> On Fri, 2020-04-24 at 11:10 -0700, Stefan van der Walt wrote: >> > On Fri, Apr 24, 2020, at 08:45, Joshua Wilson wrote: >> > > But, Stephan pointed out that it might be confusing to users for >> > > objects to only exist at typing time, so we came around to the >> > > question of whether people are open to the idea of including the >> > > type >> > > aliases in NumPy itself. Ralf's concrete proposal was to make a >> > > module >> > > numpy.types (or maybe numpy.typing) to hold the aliases so that >> > > they >> > > don't pollute the top-level namespace. The module would initially >> > > contain the types >> > >> > That sounds very sensible. Having types available with NumPy should >> > also encourage their use, especially if we can add some documentation >> > around it. >> >> I agree, I might have a small tendency for `numpy.types` if we ever >> find any usage other than direct typing that may be the better name? > > > Unless we anticipate adding a long list of type aliases (more than the > three suggested so far), I would lean towards adding ArrayLike to the top > level NumPy namespace as np.ArrayLike. > > Type annotations are becoming an increasingly core part of modern Python > code. We should make it easy to appropriately type check functions that act > on NumPy arrays, and a top level np.ArrayLike is definitely more convenient > than np.types.ArrayLike. > > Out of curiousity, I guess `ArrayLike` would be an ABC that a >> downstream project can register with? > > > ArrayLike will be a typing Protocol, automatically recognizing attributes > like __array__ to indicate that something can be cast to an array. > > >> >> - Sebastian >> >> >> > >> > St?fan >> > _______________________________________________ >> > NumPy-Discussion mailing list >> > NumPy-Discussion at python.org >> > https://mail.python.org/mailman/listinfo/numpy-discussion >> >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at python.org >> https://mail.python.org/mailman/listinfo/numpy-discussion >> > > > > ---------- Forwarded message ---------- > From: Kevin Sheppard > To: Discussion of Numerical Python > Cc: > Bcc: > Date: Sat, 25 Apr 2020 07:49:58 +0100 > Subject: Re: [Numpy-discussion] Feelings about type aliases in NumPy > Typing is for library developers more than end users. I would also worry > that putting it into the top level might discourage other typing classes > since it is more difficult to add to the top level than to a lower level > module. np.typing seems very clear to me. > > On Sat, Apr 25, 2020, 07:41 Stephan Hoyer wrote: > >> >> >> On Fri, Apr 24, 2020 at 11:31 AM Sebastian Berg < >> sebastian at sipsolutions.net> wrote: >> >>> On Fri, 2020-04-24 at 11:10 -0700, Stefan van der Walt wrote: >>> > On Fri, Apr 24, 2020, at 08:45, Joshua Wilson wrote: >>> > > But, Stephan pointed out that it might be confusing to users for >>> > > objects to only exist at typing time, so we came around to the >>> > > question of whether people are open to the idea of including the >>> > > type >>> > > aliases in NumPy itself. Ralf's concrete proposal was to make a >>> > > module >>> > > numpy.types (or maybe numpy.typing) to hold the aliases so that >>> > > they >>> > > don't pollute the top-level namespace. The module would initially >>> > > contain the types >>> > >>> > That sounds very sensible. Having types available with NumPy should >>> > also encourage their use, especially if we can add some documentation >>> > around it. >>> >>> I agree, I might have a small tendency for `numpy.types` if we ever >>> find any usage other than direct typing that may be the better name? >> >> >> Unless we anticipate adding a long list of type aliases (more than the >> three suggested so far), I would lean towards adding ArrayLike to the top >> level NumPy namespace as np.ArrayLike. >> >> Type annotations are becoming an increasingly core part of modern Python >> code. We should make it easy to appropriately type check functions that act >> on NumPy arrays, and a top level np.ArrayLike is definitely more convenient >> than np.types.ArrayLike. >> >> Out of curiousity, I guess `ArrayLike` would be an ABC that a >>> downstream project can register with? >> >> >> ArrayLike will be a typing Protocol, automatically recognizing attributes >> like __array__ to indicate that something can be cast to an array. >> >> >>> >>> - Sebastian >>> >>> >>> > >>> > St?fan >>> > _______________________________________________ >>> > NumPy-Discussion mailing list >>> > NumPy-Discussion at python.org >>> > https://mail.python.org/mailman/listinfo/numpy-discussion >>> >>> >>> _______________________________________________ >>> NumPy-Discussion mailing list >>> NumPy-Discussion at python.org >>> https://mail.python.org/mailman/listinfo/numpy-discussion >>> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at python.org >> https://mail.python.org/mailman/listinfo/numpy-discussion >> > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at gmail.com Sat Apr 25 13:39:08 2020 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Sat, 25 Apr 2020 19:39:08 +0200 Subject: [Numpy-discussion] Proposal: add `force=` or `copy=` kwarg to `__array__` interface In-Reply-To: References: Message-ID: On Fri, Apr 24, 2020 at 12:35 PM Eric Wieser wrote: > Perhaps worth mentioning that we've discussed this sort of API before, in > https://github.com/numpy/numpy/pull/11897. > > Under that proposal, the api would be something like: > > * `copy=True` - always copy, like it is today > * `copy=False` - copy if needed, like it is today > * `copy=np.never_copy` - never copy, throw an exception if not possible > There's a couple of issues I see with using `copy` for __array__: - copy is already weird (False doesn't mean no), and a [bool, some_obj_or_str] keyword isn't making that better - the behavior we're talking about can do more than copying, e.g. for PyTorch it would modify the autograd graph by adding detach(), and for sparse it's not just "make a copy" (which implies doubling memory use) but it densifies which can massively blow up the memory. - I'm -1 on adding things to the main namespace (never_copy) for something that can be handled differently (like a string, or a new keyword) tl;dr a new `force` keyword would be better Cheers, Ralf > I think the discussion stalled on the precise spelling of the third option. > > `__array__` was not discussed there, but it seems like adding the `copy` > argument to `__array__` would be a perfectly reasonable extension. > > Eric > > On Fri, 24 Apr 2020 at 03:00, Juan Nunez-Iglesias > wrote: > >> Hi everyone, >> >> One bit of expressivity we would miss is ?copy if necessary, but >>> otherwise don?t bother?, but there are workarounds to this. >>> >> >> After a side discussion with St?fan van der Walt, we came up with >> `allow_copy=True`, which would express to the downstream library that we >> don?t mind waiting, but that zero-copy would also be ok. >> >> This sounds like the sort of thing that is use case driven. If enough >> projects want to use it, then I have no objections to adding the keyword. >> OTOH, we need to be careful about adding too many interoperability tricks >> as they complicate the code and makes it hard for folks to determine the >> best solution. Interoperability is a hot topic and we need to be careful >> not put too leave behind too many experiments in the NumPy code. Do you >> have any other ideas of how to achieve the same effect? >> >> >> Personally, I don?t have any other ideas, but would be happy to hear some! >> >> My view regarding API/experiment creep is that `__array__` is the oldest >> and most basic of all the interop tricks and that this can be safely >> maintained for future generations. Currently it only takes `dtype=` as a >> keyword argument, so it is a very lean API. I think this particular use >> case is very natural and I?ve encountered the reluctance to implicitly copy >> twice, so I expect it is reasonably common. >> >> Regarding difficulty in determining the best solution, I would be happy >> to contribute to the dispatch basics guide together with the new kwarg. I >> agree that the protocols are getting quite numerous and I couldn?t find a >> single place that gathers all the best practices together. But, to >> reiterate my point: `__array__` is the simplest of these and I think this >> keyword is pretty safe to add. >> >> For ease of discussion, here are the API options discussed so far, as >> well as a few extra that I don?t like but might trigger other ideas: >> >> np.asarray(my_duck_array, allow_copy=True) # default is False, or None >> -> leave it to the duck array to decide >> np.asarray(my_duck_array, copy=True) # always copies, but, if supported >> by the duck array, defers to it for the copy >> np.asarray(my_duck_array, copy=?allow?) # could take values ?allow?, >> ?force?, ?no?, True(=?force?), False(=?no?) >> np.asarray(my_duck_array, force_copy=False, allow_copy=True) # separate >> concepts, but unclear what force_copy=True, allow_copy=False means! >> np.asarray(my_duck_array, force=True) >> >> Juan. >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at python.org >> https://mail.python.org/mailman/listinfo/numpy-discussion >> > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From shoyer at gmail.com Sat Apr 25 13:52:28 2020 From: shoyer at gmail.com (Stephan Hoyer) Date: Sat, 25 Apr 2020 10:52:28 -0700 Subject: [Numpy-discussion] Proposal: add `force=` or `copy=` kwarg to `__array__` interface In-Reply-To: References: Message-ID: On Sat, Apr 25, 2020 at 10:40 AM Ralf Gommers wrote: > > > On Fri, Apr 24, 2020 at 12:35 PM Eric Wieser > wrote: > >> Perhaps worth mentioning that we've discussed this sort of API before, in >> https://github.com/numpy/numpy/pull/11897. >> >> Under that proposal, the api would be something like: >> >> * `copy=True` - always copy, like it is today >> * `copy=False` - copy if needed, like it is today >> * `copy=np.never_copy` - never copy, throw an exception if not possible >> > > There's a couple of issues I see with using `copy` for __array__: > - copy is already weird (False doesn't mean no), and a [bool, > some_obj_or_str] keyword isn't making that better > - the behavior we're talking about can do more than copying, e.g. for > PyTorch it would modify the autograd graph by adding detach(), and for > sparse it's not just "make a copy" (which implies doubling memory use) but > it densifies which can massively blow up the memory. > - I'm -1 on adding things to the main namespace (never_copy) for something > that can be handled differently (like a string, or a new keyword) > > tl;dr a new `force` keyword would be better > I agree, ?copy? is not a good description of this desired coercion behavior. A new keyword argument like ?force? would be much clearer. > Cheers, > Ralf > > >> I think the discussion stalled on the precise spelling of the third >> option. >> >> `__array__` was not discussed there, but it seems like adding the `copy` >> argument to `__array__` would be a perfectly reasonable extension. >> >> Eric >> >> On Fri, 24 Apr 2020 at 03:00, Juan Nunez-Iglesias >> wrote: >> >>> Hi everyone, >>> >>> One bit of expressivity we would miss is ?copy if necessary, but >>>> otherwise don?t bother?, but there are workarounds to this. >>>> >>> >>> After a side discussion with St?fan van der Walt, we came up with >>> `allow_copy=True`, which would express to the downstream library that we >>> don?t mind waiting, but that zero-copy would also be ok. >>> >>> This sounds like the sort of thing that is use case driven. If enough >>> projects want to use it, then I have no objections to adding the keyword. >>> OTOH, we need to be careful about adding too many interoperability tricks >>> as they complicate the code and makes it hard for folks to determine the >>> best solution. Interoperability is a hot topic and we need to be careful >>> not put too leave behind too many experiments in the NumPy code. Do you >>> have any other ideas of how to achieve the same effect? >>> >>> >>> Personally, I don?t have any other ideas, but would be happy to hear >>> some! >>> >>> My view regarding API/experiment creep is that `__array__` is the oldest >>> and most basic of all the interop tricks and that this can be safely >>> maintained for future generations. Currently it only takes `dtype=` as a >>> keyword argument, so it is a very lean API. I think this particular use >>> case is very natural and I?ve encountered the reluctance to implicitly copy >>> twice, so I expect it is reasonably common. >>> >>> Regarding difficulty in determining the best solution, I would be happy >>> to contribute to the dispatch basics guide together with the new kwarg. I >>> agree that the protocols are getting quite numerous and I couldn?t find a >>> single place that gathers all the best practices together. But, to >>> reiterate my point: `__array__` is the simplest of these and I think this >>> keyword is pretty safe to add. >>> >>> For ease of discussion, here are the API options discussed so far, as >>> well as a few extra that I don?t like but might trigger other ideas: >>> >>> np.asarray(my_duck_array, allow_copy=True) # default is False, or None >>> -> leave it to the duck array to decide >>> np.asarray(my_duck_array, copy=True) # always copies, but, if supported >>> by the duck array, defers to it for the copy >>> np.asarray(my_duck_array, copy=?allow?) # could take values ?allow?, >>> ?force?, ?no?, True(=?force?), False(=?no?) >>> np.asarray(my_duck_array, force_copy=False, allow_copy=True) # separate >>> concepts, but unclear what force_copy=True, allow_copy=False means! >>> np.asarray(my_duck_array, force=True) >>> >>> Juan. >>> _______________________________________________ >>> NumPy-Discussion mailing list >>> NumPy-Discussion at python.org >>> https://mail.python.org/mailman/listinfo/numpy-discussion >>> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at python.org >> https://mail.python.org/mailman/listinfo/numpy-discussion >> > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From oberoi011115tina at gmail.com Sat Apr 25 17:24:43 2020 From: oberoi011115tina at gmail.com (Tina Oberoi) Date: Sun, 26 Apr 2020 02:54:43 +0530 Subject: [Numpy-discussion] beginner introduction to group Message-ID: On Sat, Sat, 25 Apr 2020 01:59:57 Robert Kern wrote: Welcome! Do you have an idea what you would like to work on? Hi Robert, Nothing specific for now, But I am at present trying to work on Issue #15961. Titled "Einsum indexing very fragile, because it tests for int(and int64 is not int). Tina Oberoi -------------- next part -------------- An HTML attachment was scrubbed... URL: From ilhanpolat at gmail.com Sun Apr 26 09:24:52 2020 From: ilhanpolat at gmail.com (Ilhan Polat) Date: Sun, 26 Apr 2020 15:24:52 +0200 Subject: [Numpy-discussion] Feelings about type aliases in NumPy In-Reply-To: References: <1e1edc6b7e204fb152b0ab539793e665df635eac.camel@sipsolutions.net> Message-ID: I agree that parking all these in a secondary namespace sounds a better option, can't say that I feel for the word "typing" though. There are already too many type, dtype, ctypeslib etc. Maybe we can go for a bit more distant name like "numpy.annotations" or whatever. On Sat, Apr 25, 2020 at 8:51 AM Kevin Sheppard wrote: > Typing is for library developers more than end users. I would also worry > that putting it into the top level might discourage other typing classes > since it is more difficult to add to the top level than to a lower level > module. np.typing seems very clear to me. > > On Sat, Apr 25, 2020, 07:41 Stephan Hoyer wrote: > >> >> >> On Fri, Apr 24, 2020 at 11:31 AM Sebastian Berg < >> sebastian at sipsolutions.net> wrote: >> >>> On Fri, 2020-04-24 at 11:10 -0700, Stefan van der Walt wrote: >>> > On Fri, Apr 24, 2020, at 08:45, Joshua Wilson wrote: >>> > > But, Stephan pointed out that it might be confusing to users for >>> > > objects to only exist at typing time, so we came around to the >>> > > question of whether people are open to the idea of including the >>> > > type >>> > > aliases in NumPy itself. Ralf's concrete proposal was to make a >>> > > module >>> > > numpy.types (or maybe numpy.typing) to hold the aliases so that >>> > > they >>> > > don't pollute the top-level namespace. The module would initially >>> > > contain the types >>> > >>> > That sounds very sensible. Having types available with NumPy should >>> > also encourage their use, especially if we can add some documentation >>> > around it. >>> >>> I agree, I might have a small tendency for `numpy.types` if we ever >>> find any usage other than direct typing that may be the better name? >> >> >> Unless we anticipate adding a long list of type aliases (more than the >> three suggested so far), I would lean towards adding ArrayLike to the top >> level NumPy namespace as np.ArrayLike. >> >> Type annotations are becoming an increasingly core part of modern Python >> code. We should make it easy to appropriately type check functions that act >> on NumPy arrays, and a top level np.ArrayLike is definitely more convenient >> than np.types.ArrayLike. >> >> Out of curiousity, I guess `ArrayLike` would be an ABC that a >>> downstream project can register with? >> >> >> ArrayLike will be a typing Protocol, automatically recognizing attributes >> like __array__ to indicate that something can be cast to an array. >> >> >>> >>> - Sebastian >>> >>> >>> > >>> > St?fan >>> > _______________________________________________ >>> > NumPy-Discussion mailing list >>> > NumPy-Discussion at python.org >>> > https://mail.python.org/mailman/listinfo/numpy-discussion >>> >>> >>> _______________________________________________ >>> NumPy-Discussion mailing list >>> NumPy-Discussion at python.org >>> https://mail.python.org/mailman/listinfo/numpy-discussion >>> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at python.org >> https://mail.python.org/mailman/listinfo/numpy-discussion >> > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From josh.craig.wilson at gmail.com Sun Apr 26 17:17:21 2020 From: josh.craig.wilson at gmail.com (Joshua Wilson) Date: Sun, 26 Apr 2020 14:17:21 -0700 Subject: [Numpy-discussion] Feelings about type aliases in NumPy In-Reply-To: References: <1e1edc6b7e204fb152b0ab539793e665df635eac.camel@sipsolutions.net> Message-ID: To try and add some more data points to the conversation: > Maybe we can go for a bit more distant name like "numpy.annotations" or whatever. Interestingly this was proposed independently here: https://github.com/numpy/numpy-stubs/pull/66#issuecomment-619131274 Related to that, Ralf was opposed to numpy.typing because it would shadow a stdlib module name: https://github.com/numpy/numpy-stubs/pull/66#issuecomment-619123629 But, types is _also_ a stdlib module name. Maybe the above points give some extra weight to "numpy.annotations"? > Unless we anticipate adding a long list of type aliases (more than the three suggested so far) While working on some types in SciPy here: https://github.com/scipy/scipy/pull/11936#discussion_r415280894 we ran into the issue of typing things that are "integer types" or "floating types". For the time being we just inlined a definition like Union[float, np.floating], but conceivably we would want to unify those definitions somewhere instead of redefining them in every project. (Note that existing types like SupportsInt etc. were not what we wanted.) This perhaps suggests that the ultimate number of type aliases might be larger than we initially thought. On Sun, Apr 26, 2020 at 6:25 AM Ilhan Polat wrote: > > I agree that parking all these in a secondary namespace sounds a better option, can't say that I feel for the word "typing" though. There are already too many type, dtype, ctypeslib etc. Maybe we can go for a bit more distant name like "numpy.annotations" or whatever. > > On Sat, Apr 25, 2020 at 8:51 AM Kevin Sheppard wrote: >> >> Typing is for library developers more than end users. I would also worry that putting it into the top level might discourage other typing classes since it is more difficult to add to the top level than to a lower level module. np.typing seems very clear to me. >> >> On Sat, Apr 25, 2020, 07:41 Stephan Hoyer wrote: >>> >>> >>> >>> On Fri, Apr 24, 2020 at 11:31 AM Sebastian Berg wrote: >>>> >>>> On Fri, 2020-04-24 at 11:10 -0700, Stefan van der Walt wrote: >>>> > On Fri, Apr 24, 2020, at 08:45, Joshua Wilson wrote: >>>> > > But, Stephan pointed out that it might be confusing to users for >>>> > > objects to only exist at typing time, so we came around to the >>>> > > question of whether people are open to the idea of including the >>>> > > type >>>> > > aliases in NumPy itself. Ralf's concrete proposal was to make a >>>> > > module >>>> > > numpy.types (or maybe numpy.typing) to hold the aliases so that >>>> > > they >>>> > > don't pollute the top-level namespace. The module would initially >>>> > > contain the types >>>> > >>>> > That sounds very sensible. Having types available with NumPy should >>>> > also encourage their use, especially if we can add some documentation >>>> > around it. >>>> >>>> I agree, I might have a small tendency for `numpy.types` if we ever >>>> find any usage other than direct typing that may be the better name? >>> >>> >>> Unless we anticipate adding a long list of type aliases (more than the three suggested so far), I would lean towards adding ArrayLike to the top level NumPy namespace as np.ArrayLike. >>> >>> Type annotations are becoming an increasingly core part of modern Python code. We should make it easy to appropriately type check functions that act on NumPy arrays, and a top level np.ArrayLike is definitely more convenient than np.types.ArrayLike. >>> >>>> Out of curiousity, I guess `ArrayLike` would be an ABC that a >>>> downstream project can register with? >>> >>> >>> ArrayLike will be a typing Protocol, automatically recognizing attributes like __array__ to indicate that something can be cast to an array. >>> >>>> >>>> >>>> - Sebastian >>>> >>>> >>>> > >>>> > St?fan >>>> > _______________________________________________ >>>> > NumPy-Discussion mailing list >>>> > NumPy-Discussion at python.org >>>> > https://mail.python.org/mailman/listinfo/numpy-discussion >>>> >>>> >>>> _______________________________________________ >>>> NumPy-Discussion mailing list >>>> NumPy-Discussion at python.org >>>> https://mail.python.org/mailman/listinfo/numpy-discussion >>> >>> _______________________________________________ >>> NumPy-Discussion mailing list >>> NumPy-Discussion at python.org >>> https://mail.python.org/mailman/listinfo/numpy-discussion >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at python.org >> https://mail.python.org/mailman/listinfo/numpy-discussion > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion From sebastian at sipsolutions.net Sun Apr 26 18:09:39 2020 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Sun, 26 Apr 2020 17:09:39 -0500 Subject: [Numpy-discussion] Proposal: add `force=` or `copy=` kwarg to `__array__` interface In-Reply-To: References: Message-ID: On Sat, 2020-04-25 at 10:52 -0700, Stephan Hoyer wrote: > On Sat, Apr 25, 2020 at 10:40 AM Ralf Gommers > > wrote: > > > > > On Fri, Apr 24, 2020 at 12:35 PM Eric Wieser < > > wieser.eric+numpy at gmail.com> > > wrote: > > > > > Perhaps worth mentioning that we've discussed this sort of API > > > before, in > > > https://github.com/numpy/numpy/pull/11897. > > > > > > Under that proposal, the api would be something like: > > > > > > * `copy=True` - always copy, like it is today > > > * `copy=False` - copy if needed, like it is today > > > * `copy=np.never_copy` - never copy, throw an exception if not > > > possible > > > > > > > There's a couple of issues I see with using `copy` for __array__: > > - copy is already weird (False doesn't mean no), and a [bool, > > some_obj_or_str] keyword isn't making that better > > - the behavior we're talking about can do more than copying, e.g. > > for > > PyTorch it would modify the autograd graph by adding detach(), and > > for > > sparse it's not just "make a copy" (which implies doubling memory > > use) but > > it densifies which can massively blow up the memory. > > - I'm -1 on adding things to the main namespace (never_copy) for > > something > > that can be handled differently (like a string, or a new keyword) > > > > tl;dr a new `force` keyword would be better > > > > I agree, ?copy? is not a good description of this desired coercion > behavior. > > A new keyword argument like ?force? would be much clearer. > That seems fine and practical. But, in the end it seems to me that the `force=` keyword, just means that some projects want to teach their users that: 1. `np.asarray()` can be expensive (and may always copy) 2. `np.asarray()` always loses type properties while others do not choose to teach about it. There seems very little or even no "promise" attached to either `force=True` or `force=False`. In the end, the question is whether sparse will actually want to implement `force=True` if the main reason we add is for library use. There is no difference between a visualization library or numpy. In both cases the users memory will blow up just the same. As for PyTorch, is `.detach()` even a good reason? Maybe I am missing things, but: >>> torch.ones(10, requires_grad=True) + np.arange(10) # RuntimeError: Can't call numpy() on Variable that requires grad. Use var.detach().numpy() instead. So arguably, there is no type-safety concern due to `.detach()`. There is an (obvious) general loss of type information that always occurs with an `np.asarray` call. But I do not see that creating any openings for bugs here, due to the wisdom of not allowing the above operation. In fact, it actually seems much worse for for xarray, or pandas. They do support the above operation and will potentially mess up if the arange was previously an xarray with a matching index, but different order. I am very much in favor of adding such things, but I still lack a bit of clarity as to whom we would be helping? If end-users will actually use `np.asarray(..., force=True)` over special methods, then great! But I am currently not sure the type- safety argument is all that big of a point. And the performance or memory-blowup argument remains true even for visualization libraries (where the array is purely input and never output as such). But yes, "never copy" is a somewhat different extension to `__array__` and `np.asarray`. It guarantees high speed and in-place behaviour which is useful for different settings. - Sebastian > > > Cheers, > > Ralf > > > > > > > I think the discussion stalled on the precise spelling of the > > > third > > > option. > > > > > > `__array__` was not discussed there, but it seems like adding the > > > `copy` > > > argument to `__array__` would be a perfectly reasonable > > > extension. > > > > > > Eric > > > > > > On Fri, 24 Apr 2020 at 03:00, Juan Nunez-Iglesias < > > > jni at fastmail.com> > > > wrote: > > > > > > > Hi everyone, > > > > > > > > One bit of expressivity we would miss is ?copy if necessary, > > > > but > > > > > otherwise don?t bother?, but there are workarounds to this. > > > > > > > > > > > > > After a side discussion with St?fan van der Walt, we came up > > > > with > > > > `allow_copy=True`, which would express to the downstream > > > > library that we > > > > don?t mind waiting, but that zero-copy would also be ok. > > > > > > > > This sounds like the sort of thing that is use case driven. If > > > > enough > > > > projects want to use it, then I have no objections to adding > > > > the keyword. > > > > OTOH, we need to be careful about adding too many > > > > interoperability tricks > > > > as they complicate the code and makes it hard for folks to > > > > determine the > > > > best solution. Interoperability is a hot topic and we need to > > > > be careful > > > > not put too leave behind too many experiments in the NumPy > > > > code. Do you > > > > have any other ideas of how to achieve the same effect? > > > > > > > > > > > > Personally, I don?t have any other ideas, but would be happy to > > > > hear > > > > some! > > > > > > > > My view regarding API/experiment creep is that `__array__` is > > > > the oldest > > > > and most basic of all the interop tricks and that this can be > > > > safely > > > > maintained for future generations. Currently it only takes > > > > `dtype=` as a > > > > keyword argument, so it is a very lean API. I think this > > > > particular use > > > > case is very natural and I?ve encountered the reluctance to > > > > implicitly copy > > > > twice, so I expect it is reasonably common. > > > > > > > > Regarding difficulty in determining the best solution, I would > > > > be happy > > > > to contribute to the dispatch basics guide together with the > > > > new kwarg. I > > > > agree that the protocols are getting quite numerous and I > > > > couldn?t find a > > > > single place that gathers all the best practices together. But, > > > > to > > > > reiterate my point: `__array__` is the simplest of these and I > > > > think this > > > > keyword is pretty safe to add. > > > > > > > > For ease of discussion, here are the API options discussed so > > > > far, as > > > > well as a few extra that I don?t like but might trigger other > > > > ideas: > > > > > > > > np.asarray(my_duck_array, allow_copy=True) # default is False, > > > > or None > > > > -> leave it to the duck array to decide > > > > np.asarray(my_duck_array, copy=True) # always copies, but, if > > > > supported > > > > by the duck array, defers to it for the copy > > > > np.asarray(my_duck_array, copy=?allow?) # could take values > > > > ?allow?, > > > > ?force?, ?no?, True(=?force?), False(=?no?) > > > > np.asarray(my_duck_array, force_copy=False, allow_copy=True) # > > > > separate > > > > concepts, but unclear what force_copy=True, allow_copy=False > > > > means! > > > > np.asarray(my_duck_array, force=True) > > > > > > > > Juan. > > > > _______________________________________________ > > > > NumPy-Discussion mailing list > > > > NumPy-Discussion at python.org > > > > https://mail.python.org/mailman/listinfo/numpy-discussion > > > > > > > _______________________________________________ > > > NumPy-Discussion mailing list > > > NumPy-Discussion at python.org > > > https://mail.python.org/mailman/listinfo/numpy-discussion > > > > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at python.org > > https://mail.python.org/mailman/listinfo/numpy-discussion > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion From melissawm at gmail.com Mon Apr 27 06:50:18 2020 From: melissawm at gmail.com (=?UTF-8?Q?Melissa_Mendon=C3=A7a?=) Date: Mon, 27 Apr 2020 07:50:18 -0300 Subject: [Numpy-discussion] Documentation Team Meeting - Monday April 27 In-Reply-To: References: Message-ID: Hi all! Sorry for the late reminder, but today (April 27) we have another documentation team meeting at 3PM UTC**. If you wish to join on Zoom, you need to use this link https://zoom.us/j/420005230 Here's the permanent hackmd document with the meeting notes: https://hackmd.io/oB_boakvRqKR-_2jRV-Qjg Hope to see you around! ** You can click this link to get the correct time at your timezone: https://www.timeanddate.com/worldclock/fixedtime.html?msg=NumPy+Documentation+Team+Meeting&iso=20200406T15&p1=1440&ah=1 - Melissa -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at gmail.com Mon Apr 27 10:05:29 2020 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Mon, 27 Apr 2020 16:05:29 +0200 Subject: [Numpy-discussion] Google Season of Docs Ideas In-Reply-To: References: Message-ID: On Mon, Apr 20, 2020 at 7:16 PM Melissa Mendon?a wrote: > Hello all, > > As some of you may know, we are aiming to participate in the Google Season > of Docs program [1] again this year. The deadline for applications from > open source organizations is May 4, so I've started on a proposal (heavily > based on last year's version) here: > > > https://github.com/numpy/numpy/wiki/Google-Season-of-Docs-2020-Project-Ideas > > If you have suggestions, especially for concrete project ideas, that would > be great. Also, keep in mind this project is aimed at technical writers and > not developers, so the focus should be mainly on > writing/reviewing/organizing documentation content. > I have added a second topic idea specifically on finding, curating and adapting of tutorials and other educational materials from outside NumPy. Cheers, Ralf > Cheers, > > Melissa > > [1] https://developers.google.com/season-of-docs > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ilhanpolat at gmail.com Mon Apr 27 13:49:59 2020 From: ilhanpolat at gmail.com (Ilhan Polat) Date: Mon, 27 Apr 2020 19:49:59 +0200 Subject: [Numpy-discussion] Feelings about type aliases in NumPy In-Reply-To: References: <1e1edc6b7e204fb152b0ab539793e665df635eac.camel@sipsolutions.net> Message-ID: > Interestingly this was proposed independently here: Wow apologies for missing the entire thread about it and the noise. On Sun, Apr 26, 2020 at 11:19 PM Joshua Wilson wrote: > To try and add some more data points to the conversation: > > > Maybe we can go for a bit more distant name like "numpy.annotations" or > whatever. > > Interestingly this was proposed independently here: > > https://github.com/numpy/numpy-stubs/pull/66#issuecomment-619131274 > > Related to that, Ralf was opposed to numpy.typing because it would > shadow a stdlib module name: > > https://github.com/numpy/numpy-stubs/pull/66#issuecomment-619123629 > > But, types is _also_ a stdlib module name. Maybe the above points give > some extra weight to "numpy.annotations"? > > > Unless we anticipate adding a long list of type aliases (more than the > three suggested so far) > > While working on some types in SciPy here: > > https://github.com/scipy/scipy/pull/11936#discussion_r415280894 > > we ran into the issue of typing things that are "integer types" or > "floating types". For the time being we just inlined a definition like > Union[float, np.floating], but conceivably we would want to unify > those definitions somewhere instead of redefining them in every > project. (Note that existing types like SupportsInt etc. were not what > we wanted.) This perhaps suggests that the ultimate number of type > aliases might be larger than we initially thought. > > On Sun, Apr 26, 2020 at 6:25 AM Ilhan Polat wrote: > > > > I agree that parking all these in a secondary namespace sounds a better > option, can't say that I feel for the word "typing" though. There are > already too many type, dtype, ctypeslib etc. Maybe we can go for a bit more > distant name like "numpy.annotations" or whatever. > > > > On Sat, Apr 25, 2020 at 8:51 AM Kevin Sheppard < > kevin.k.sheppard at gmail.com> wrote: > >> > >> Typing is for library developers more than end users. I would also > worry that putting it into the top level might discourage other typing > classes since it is more difficult to add to the top level than to a lower > level module. np.typing seems very clear to me. > >> > >> On Sat, Apr 25, 2020, 07:41 Stephan Hoyer wrote: > >>> > >>> > >>> > >>> On Fri, Apr 24, 2020 at 11:31 AM Sebastian Berg < > sebastian at sipsolutions.net> wrote: > >>>> > >>>> On Fri, 2020-04-24 at 11:10 -0700, Stefan van der Walt wrote: > >>>> > On Fri, Apr 24, 2020, at 08:45, Joshua Wilson wrote: > >>>> > > But, Stephan pointed out that it might be confusing to users for > >>>> > > objects to only exist at typing time, so we came around to the > >>>> > > question of whether people are open to the idea of including the > >>>> > > type > >>>> > > aliases in NumPy itself. Ralf's concrete proposal was to make a > >>>> > > module > >>>> > > numpy.types (or maybe numpy.typing) to hold the aliases so that > >>>> > > they > >>>> > > don't pollute the top-level namespace. The module would initially > >>>> > > contain the types > >>>> > > >>>> > That sounds very sensible. Having types available with NumPy should > >>>> > also encourage their use, especially if we can add some > documentation > >>>> > around it. > >>>> > >>>> I agree, I might have a small tendency for `numpy.types` if we ever > >>>> find any usage other than direct typing that may be the better name? > >>> > >>> > >>> Unless we anticipate adding a long list of type aliases (more than the > three suggested so far), I would lean towards adding ArrayLike to the top > level NumPy namespace as np.ArrayLike. > >>> > >>> Type annotations are becoming an increasingly core part of modern > Python code. We should make it easy to appropriately type check functions > that act on NumPy arrays, and a top level np.ArrayLike is definitely more > convenient than np.types.ArrayLike. > >>> > >>>> Out of curiousity, I guess `ArrayLike` would be an ABC that a > >>>> downstream project can register with? > >>> > >>> > >>> ArrayLike will be a typing Protocol, automatically recognizing > attributes like __array__ to indicate that something can be cast to an > array. > >>> > >>>> > >>>> > >>>> - Sebastian > >>>> > >>>> > >>>> > > >>>> > St?fan > >>>> > _______________________________________________ > >>>> > NumPy-Discussion mailing list > >>>> > NumPy-Discussion at python.org > >>>> > https://mail.python.org/mailman/listinfo/numpy-discussion > >>>> > >>>> > >>>> _______________________________________________ > >>>> NumPy-Discussion mailing list > >>>> NumPy-Discussion at python.org > >>>> https://mail.python.org/mailman/listinfo/numpy-discussion > >>> > >>> _______________________________________________ > >>> NumPy-Discussion mailing list > >>> NumPy-Discussion at python.org > >>> https://mail.python.org/mailman/listinfo/numpy-discussion > >> > >> _______________________________________________ > >> NumPy-Discussion mailing list > >> NumPy-Discussion at python.org > >> https://mail.python.org/mailman/listinfo/numpy-discussion > > > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at python.org > > https://mail.python.org/mailman/listinfo/numpy-discussion > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From numpy_gsod at bigriver.xyz Mon Apr 27 15:32:09 2020 From: numpy_gsod at bigriver.xyz (Ben Nathanson) Date: Mon, 27 Apr 2020 15:32:09 -0400 Subject: [Numpy-discussion] Google Season of Docs Ideas Message-ID: > > NumPy serves many kinds of users....The challenge: provide ways to guide > those users to the parts of the documentation most relevant to them. > I have a thought on how to approach this. We know many of the communities NumPy serves; let's next identify (for ourselves, not the proposal) what each of them needs. It could be as simple as: *Educator* - knows... - needs to know... *Researcher* - knows... - needs to know.. A table like that would be useful for self-assessment and planning. It helps answer questions like: - Which communities are we most shortchanging right now? - Which communities do we feel most strongly about (our largest base, most disadvantaged, etc.)? - If doc D is our next doc, does it help those communities? Or maybe we want to go round-robin through communities with each new doc. - What assumptions can a writer make about audience background? We're also then equipped to bring user categories out to a web page and meet the big-tent challenge head-on, with links like: - If you're an educator... - If you're a researcher... each one taking the user to an Educator, Researcher,..., page containing links to the information they're most likely to want. -------------- next part -------------- An HTML attachment was scrubbed... URL: From bennet at umich.edu Mon Apr 27 16:11:31 2020 From: bennet at umich.edu (Bennet Fauber) Date: Mon, 27 Apr 2020 16:11:31 -0400 Subject: [Numpy-discussion] Google Season of Docs Ideas In-Reply-To: References: Message-ID: I think I would add to this categorization which science 'domain' or area. Researchers's needs from agriculture and sociology may differ much more from each other than educators's and researchers's needs within those field differ from each other. So, if there is an up-and-coming area of study that is just starting to make its presence felt in the NumPy community, they might be a good target audience, and they might well be working on material such as what we are looking for themselves? On Mon, Apr 27, 2020 at 3:33 PM Ben Nathanson wrote: >> >> NumPy serves many kinds of users....The challenge: provide ways to guide those users to the parts of the documentation most relevant to them. > > > I have a thought on how to approach this. We know many of the communities NumPy serves; let's next identify (for ourselves, not the proposal) what each of them needs. It could be as simple as: > > Educator > > knows... > needs to know... > > Researcher > > knows... > needs to know.. > > > A table like that would be useful for self-assessment and planning. It helps answer questions like: > > Which communities are we most shortchanging right now? > Which communities do we feel most strongly about (our largest base, most disadvantaged, etc.)? > If doc D is our next doc, does it help those communities? Or maybe we want to go round-robin through communities with each new doc. > What assumptions can a writer make about audience background? > > We're also then equipped to bring user categories out to a web page and meet the big-tent challenge head-on, with links like: > > If you're an educator... > If you're a researcher... > > each one taking the user to an Educator, Researcher,..., page containing links to the information they're most likely to want. > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion From ralf.gommers at gmail.com Tue Apr 28 05:51:13 2020 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Tue, 28 Apr 2020 11:51:13 +0200 Subject: [Numpy-discussion] Proposal: add `force=` or `copy=` kwarg to `__array__` interface In-Reply-To: References: Message-ID: On Mon, Apr 27, 2020 at 12:10 AM Sebastian Berg wrote: > On Sat, 2020-04-25 at 10:52 -0700, Stephan Hoyer wrote: > > On Sat, Apr 25, 2020 at 10:40 AM Ralf Gommers > > > > wrote: > > > > > > > > On Fri, Apr 24, 2020 at 12:35 PM Eric Wieser < > > > wieser.eric+numpy at gmail.com> > > > wrote: > > > > > > > Perhaps worth mentioning that we've discussed this sort of API > > > > before, in > > > > https://github.com/numpy/numpy/pull/11897. > > > > > > > > Under that proposal, the api would be something like: > > > > > > > > * `copy=True` - always copy, like it is today > > > > * `copy=False` - copy if needed, like it is today > > > > * `copy=np.never_copy` - never copy, throw an exception if not > > > > possible > > > > > > > > > > There's a couple of issues I see with using `copy` for __array__: > > > - copy is already weird (False doesn't mean no), and a [bool, > > > some_obj_or_str] keyword isn't making that better > > > - the behavior we're talking about can do more than copying, e.g. > > > for > > > PyTorch it would modify the autograd graph by adding detach(), and > > > for > > > sparse it's not just "make a copy" (which implies doubling memory > > > use) but > > > it densifies which can massively blow up the memory. > > > - I'm -1 on adding things to the main namespace (never_copy) for > > > something > > > that can be handled differently (like a string, or a new keyword) > > > > > > tl;dr a new `force` keyword would be better > > > > > > > I agree, ?copy? is not a good description of this desired coercion > > behavior. > > > > A new keyword argument like ?force? would be much clearer. > > > > That seems fine and practical. But, in the end it seems to me that the > `force=` keyword, just means that some projects want to teach their > users that: > > 1. `np.asarray()` can be expensive (and may always copy) > 2. `np.asarray()` always loses type properties > > while others do not choose to teach about it. There seems very little > or even no "promise" attached to either `force=True` or `force=False`. > > > In the end, the question is whether sparse will actually want to > implement `force=True` if the main reason we add is for library use. > That's for PyData Sparse and scipy.sparse devs to answer. Maybe Hameer can answer for the former here. For SciPy that should be decided on the scipy-dev list, but my opinion would be: yes to adding __array__ that raises TypeError by default, and converts with `force=True`. > There is no difference between a visualization library or numpy. In > both cases the users memory will blow up just the same. > > As for PyTorch, is `.detach()` even a good reason? Maybe I am missing > things, but: > > >>> torch.ones(10, requires_grad=True) + np.arange(10) > # RuntimeError: Can't call numpy() on Variable that requires grad. Use > var.detach().numpy() instead. > > So arguably, there is no type-safety concern due to `.detach()`. I'm not sure what the question is here; no one mentioned type-safety. The PyTorch maintainers have already said they're fine with adding a force keyword. There > is an (obvious) general loss of type information that always occurs > with an `np.asarray` call. > But I do not see that creating any openings for bugs here, due to the > wisdom of not allowing the above operation. > In fact, it actually seems much worse for for xarray, or pandas. They > do support the above operation and will potentially mess up if the > arange was previously an xarray with a matching index, but different > order. > > > I am very much in favor of adding such things, but I still lack a bit > of clarity as to whom we would be helping? > See Juan's first email. I personally am ambivalent on this proposal, but if Juan and the Napari devs really want it, that's good enough for me. Cheers, Ralf > If end-users will actually use `np.asarray(..., force=True)` over > special methods, then great! But I am currently not sure the type- > safety argument is all that big of a point. And the performance or > memory-blowup argument remains true even for visualization libraries > (where the array is purely input and never output as such). > > > But yes, "never copy" is a somewhat different extension to `__array__` > and `np.asarray`. It guarantees high speed and in-place behaviour which > is useful for different settings. > > - Sebastian > > > > > > > Cheers, > > > Ralf > > > > > > > > > > I think the discussion stalled on the precise spelling of the > > > > third > > > > option. > > > > > > > > `__array__` was not discussed there, but it seems like adding the > > > > `copy` > > > > argument to `__array__` would be a perfectly reasonable > > > > extension. > > > > > > > > Eric > > > > > > > > On Fri, 24 Apr 2020 at 03:00, Juan Nunez-Iglesias < > > > > jni at fastmail.com> > > > > wrote: > > > > > > > > > Hi everyone, > > > > > > > > > > One bit of expressivity we would miss is ?copy if necessary, > > > > > but > > > > > > otherwise don?t bother?, but there are workarounds to this. > > > > > > > > > > > > > > > > After a side discussion with St?fan van der Walt, we came up > > > > > with > > > > > `allow_copy=True`, which would express to the downstream > > > > > library that we > > > > > don?t mind waiting, but that zero-copy would also be ok. > > > > > > > > > > This sounds like the sort of thing that is use case driven. If > > > > > enough > > > > > projects want to use it, then I have no objections to adding > > > > > the keyword. > > > > > OTOH, we need to be careful about adding too many > > > > > interoperability tricks > > > > > as they complicate the code and makes it hard for folks to > > > > > determine the > > > > > best solution. Interoperability is a hot topic and we need to > > > > > be careful > > > > > not put too leave behind too many experiments in the NumPy > > > > > code. Do you > > > > > have any other ideas of how to achieve the same effect? > > > > > > > > > > > > > > > Personally, I don?t have any other ideas, but would be happy to > > > > > hear > > > > > some! > > > > > > > > > > My view regarding API/experiment creep is that `__array__` is > > > > > the oldest > > > > > and most basic of all the interop tricks and that this can be > > > > > safely > > > > > maintained for future generations. Currently it only takes > > > > > `dtype=` as a > > > > > keyword argument, so it is a very lean API. I think this > > > > > particular use > > > > > case is very natural and I?ve encountered the reluctance to > > > > > implicitly copy > > > > > twice, so I expect it is reasonably common. > > > > > > > > > > Regarding difficulty in determining the best solution, I would > > > > > be happy > > > > > to contribute to the dispatch basics guide together with the > > > > > new kwarg. I > > > > > agree that the protocols are getting quite numerous and I > > > > > couldn?t find a > > > > > single place that gathers all the best practices together. But, > > > > > to > > > > > reiterate my point: `__array__` is the simplest of these and I > > > > > think this > > > > > keyword is pretty safe to add. > > > > > > > > > > For ease of discussion, here are the API options discussed so > > > > > far, as > > > > > well as a few extra that I don?t like but might trigger other > > > > > ideas: > > > > > > > > > > np.asarray(my_duck_array, allow_copy=True) # default is False, > > > > > or None > > > > > -> leave it to the duck array to decide > > > > > np.asarray(my_duck_array, copy=True) # always copies, but, if > > > > > supported > > > > > by the duck array, defers to it for the copy > > > > > np.asarray(my_duck_array, copy=?allow?) # could take values > > > > > ?allow?, > > > > > ?force?, ?no?, True(=?force?), False(=?no?) > > > > > np.asarray(my_duck_array, force_copy=False, allow_copy=True) # > > > > > separate > > > > > concepts, but unclear what force_copy=True, allow_copy=False > > > > > means! > > > > > np.asarray(my_duck_array, force=True) > > > > > > > > > > Juan. > > > > > _______________________________________________ > > > > > NumPy-Discussion mailing list > > > > > NumPy-Discussion at python.org > > > > > https://mail.python.org/mailman/listinfo/numpy-discussion > > > > > > > > > _______________________________________________ > > > > NumPy-Discussion mailing list > > > > NumPy-Discussion at python.org > > > > https://mail.python.org/mailman/listinfo/numpy-discussion > > > > > > > _______________________________________________ > > > NumPy-Discussion mailing list > > > NumPy-Discussion at python.org > > > https://mail.python.org/mailman/listinfo/numpy-discussion > > > > > > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at python.org > > https://mail.python.org/mailman/listinfo/numpy-discussion > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From einstein.edison at gmail.com Tue Apr 28 06:00:40 2020 From: einstein.edison at gmail.com (Hameer Abbasi) Date: Tue, 28 Apr 2020 10:00:40 +0000 Subject: [Numpy-discussion] Proposal: add `force=` or `copy=` kwarg to `__array__` interface In-Reply-To: References: , Message-ID: Hi! Yes, I would advocate for a `force=` kwarg but personally I don't think it's explicit enough, but probably as explicit as can be given NumPy's API. Personally, I'd also raise a warning within PyData/Sparse and I hope it's in big bold letters in the docs in NumPy to be careful with this. Get Outlook for iOS ________________________________ From: NumPy-Discussion on behalf of Ralf Gommers Sent: Tuesday, April 28, 2020 11:51:13 AM To: Discussion of Numerical Python Subject: Re: [Numpy-discussion] Proposal: add `force=` or `copy=` kwarg to `__array__` interface On Mon, Apr 27, 2020 at 12:10 AM Sebastian Berg > wrote: On Sat, 2020-04-25 at 10:52 -0700, Stephan Hoyer wrote: > On Sat, Apr 25, 2020 at 10:40 AM Ralf Gommers > > > wrote: > > > > > On Fri, Apr 24, 2020 at 12:35 PM Eric Wieser < > > wieser.eric+numpy at gmail.com> > > wrote: > > > > > Perhaps worth mentioning that we've discussed this sort of API > > > before, in > > > https://github.com/numpy/numpy/pull/11897. > > > > > > Under that proposal, the api would be something like: > > > > > > * `copy=True` - always copy, like it is today > > > * `copy=False` - copy if needed, like it is today > > > * `copy=np.never_copy` - never copy, throw an exception if not > > > possible > > > > > > > There's a couple of issues I see with using `copy` for __array__: > > - copy is already weird (False doesn't mean no), and a [bool, > > some_obj_or_str] keyword isn't making that better > > - the behavior we're talking about can do more than copying, e.g. > > for > > PyTorch it would modify the autograd graph by adding detach(), and > > for > > sparse it's not just "make a copy" (which implies doubling memory > > use) but > > it densifies which can massively blow up the memory. > > - I'm -1 on adding things to the main namespace (never_copy) for > > something > > that can be handled differently (like a string, or a new keyword) > > > > tl;dr a new `force` keyword would be better > > > > I agree, ?copy? is not a good description of this desired coercion > behavior. > > A new keyword argument like ?force? would be much clearer. > That seems fine and practical. But, in the end it seems to me that the `force=` keyword, just means that some projects want to teach their users that: 1. `np.asarray()` can be expensive (and may always copy) 2. `np.asarray()` always loses type properties while others do not choose to teach about it. There seems very little or even no "promise" attached to either `force=True` or `force=False`. In the end, the question is whether sparse will actually want to implement `force=True` if the main reason we add is for library use. That's for PyData Sparse and scipy.sparse devs to answer. Maybe Hameer can answer for the former here. For SciPy that should be decided on the scipy-dev list, but my opinion would be: yes to adding __array__ that raises TypeError by default, and converts with `force=True`. There is no difference between a visualization library or numpy. In both cases the users memory will blow up just the same. As for PyTorch, is `.detach()` even a good reason? Maybe I am missing things, but: >>> torch.ones(10, requires_grad=True) + np.arange(10) # RuntimeError: Can't call numpy() on Variable that requires grad. Use var.detach().numpy() instead. So arguably, there is no type-safety concern due to `.detach()`. I'm not sure what the question is here; no one mentioned type-safety. The PyTorch maintainers have already said they're fine with adding a force keyword. There is an (obvious) general loss of type information that always occurs with an `np.asarray` call. But I do not see that creating any openings for bugs here, due to the wisdom of not allowing the above operation. In fact, it actually seems much worse for for xarray, or pandas. They do support the above operation and will potentially mess up if the arange was previously an xarray with a matching index, but different order. I am very much in favor of adding such things, but I still lack a bit of clarity as to whom we would be helping? See Juan's first email. I personally am ambivalent on this proposal, but if Juan and the Napari devs really want it, that's good enough for me. Cheers, Ralf If end-users will actually use `np.asarray(..., force=True)` over special methods, then great! But I am currently not sure the type- safety argument is all that big of a point. And the performance or memory-blowup argument remains true even for visualization libraries (where the array is purely input and never output as such). But yes, "never copy" is a somewhat different extension to `__array__` and `np.asarray`. It guarantees high speed and in-place behaviour which is useful for different settings. - Sebastian > > > Cheers, > > Ralf > > > > > > > I think the discussion stalled on the precise spelling of the > > > third > > > option. > > > > > > `__array__` was not discussed there, but it seems like adding the > > > `copy` > > > argument to `__array__` would be a perfectly reasonable > > > extension. > > > > > > Eric > > > > > > On Fri, 24 Apr 2020 at 03:00, Juan Nunez-Iglesias < > > > jni at fastmail.com> > > > wrote: > > > > > > > Hi everyone, > > > > > > > > One bit of expressivity we would miss is ?copy if necessary, > > > > but > > > > > otherwise don?t bother?, but there are workarounds to this. > > > > > > > > > > > > > After a side discussion with St?fan van der Walt, we came up > > > > with > > > > `allow_copy=True`, which would express to the downstream > > > > library that we > > > > don?t mind waiting, but that zero-copy would also be ok. > > > > > > > > This sounds like the sort of thing that is use case driven. If > > > > enough > > > > projects want to use it, then I have no objections to adding > > > > the keyword. > > > > OTOH, we need to be careful about adding too many > > > > interoperability tricks > > > > as they complicate the code and makes it hard for folks to > > > > determine the > > > > best solution. Interoperability is a hot topic and we need to > > > > be careful > > > > not put too leave behind too many experiments in the NumPy > > > > code. Do you > > > > have any other ideas of how to achieve the same effect? > > > > > > > > > > > > Personally, I don?t have any other ideas, but would be happy to > > > > hear > > > > some! > > > > > > > > My view regarding API/experiment creep is that `__array__` is > > > > the oldest > > > > and most basic of all the interop tricks and that this can be > > > > safely > > > > maintained for future generations. Currently it only takes > > > > `dtype=` as a > > > > keyword argument, so it is a very lean API. I think this > > > > particular use > > > > case is very natural and I?ve encountered the reluctance to > > > > implicitly copy > > > > twice, so I expect it is reasonably common. > > > > > > > > Regarding difficulty in determining the best solution, I would > > > > be happy > > > > to contribute to the dispatch basics guide together with the > > > > new kwarg. I > > > > agree that the protocols are getting quite numerous and I > > > > couldn?t find a > > > > single place that gathers all the best practices together. But, > > > > to > > > > reiterate my point: `__array__` is the simplest of these and I > > > > think this > > > > keyword is pretty safe to add. > > > > > > > > For ease of discussion, here are the API options discussed so > > > > far, as > > > > well as a few extra that I don?t like but might trigger other > > > > ideas: > > > > > > > > np.asarray(my_duck_array, allow_copy=True) # default is False, > > > > or None > > > > -> leave it to the duck array to decide > > > > np.asarray(my_duck_array, copy=True) # always copies, but, if > > > > supported > > > > by the duck array, defers to it for the copy > > > > np.asarray(my_duck_array, copy=?allow?) # could take values > > > > ?allow?, > > > > ?force?, ?no?, True(=?force?), False(=?no?) > > > > np.asarray(my_duck_array, force_copy=False, allow_copy=True) # > > > > separate > > > > concepts, but unclear what force_copy=True, allow_copy=False > > > > means! > > > > np.asarray(my_duck_array, force=True) > > > > > > > > Juan. > > > > _______________________________________________ > > > > NumPy-Discussion mailing list > > > > NumPy-Discussion at python.org > > > > https://mail.python.org/mailman/listinfo/numpy-discussion > > > > > > > _______________________________________________ > > > NumPy-Discussion mailing list > > > NumPy-Discussion at python.org > > > https://mail.python.org/mailman/listinfo/numpy-discussion > > > > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at python.org > > https://mail.python.org/mailman/listinfo/numpy-discussion > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion at python.org https://mail.python.org/mailman/listinfo/numpy-discussion -------------- next part -------------- An HTML attachment was scrubbed... URL: From sebastian at sipsolutions.net Tue Apr 28 10:58:01 2020 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Tue, 28 Apr 2020 09:58:01 -0500 Subject: [Numpy-discussion] Proposal: add `force=` or `copy=` kwarg to `__array__` interface In-Reply-To: References: Message-ID: <384066402d534967cc6451d3d4337a4d67250600.camel@sipsolutions.net> On Tue, 2020-04-28 at 11:51 +0200, Ralf Gommers wrote: > > So arguably, there is no type-safety concern due to `.detach()`. > > I'm not sure what the question is here; no one mentioned type-safety. > The > PyTorch maintainers have already said they're fine with adding a > force > keyword. But type-safety is the reason to distinguish between: * np.asarrau(tensor) * np.asarray(tensor, force=True) Similar to: * operator.index(obj) * int(obj) # convert less type-safe (strings, floats)! I actually mentioned 3 reasons in my email: 1. Teach and Inform users (about the next two mainly) 2. Type-safety 3. Expensive conversion And only type-safety is related to `.detach()` mentioning that there may not be clear story about the usage in that case. (continued below) > > > > > > > I am very much in favor of adding such things, but I still lack a > > bit > > of clarity as to whom we would be helping? > > > > See Juan's first email. I personally am ambivalent on this proposal, > but if > Juan and the Napari devs really want it, that's good enough for me. Of course I read it, twice, but it is only good enough for me if we actually *solve the issue*, and for that I want to know which issue we are solving :), it seems obvious, but I am not so sure... That brings us to the other two reasons: Teaching and Informing users: If Napari uses `force=True` indiscriminately, it is not very clear to the user about whether or not the operation is expensive. I.e. the user can learn it is when using `np.asarray(sparse_arr)` with other libraries. But they are not notified that `napari.vis_func(sparse_arr)` might kill their computer. So the "Teaching" part can still partially work, but it does not inform the user well anymore on whether or not a function will blow-up memory. Expensive Conversion: If the main reason is expensive conversions, however, than, as a library I would probably just use it for half my API, since copying from GPU to CPU will still be much faster than my own function. Generally: I want to help Napari, but it seems like there may be more to this, and it may be good to finish these thoughts before making a call. E.g. Napari wants to use it, but do the array-providers want Napari to use it? For sparse Hameer just mentioned that he still would want big warnings both during the operation and in the `np.asarray` documentation. If we put such big warnings there, we should have an idea of who we want to ignore that warning? (Napari yes, sklearn sometimes, ...?) -> Is "whatever the library feels right" good enough? And if the conversion still gives warnings for some array-objects, have we actually gained much? -> Maybe we do, end-users may be happy to ignore those warnings... The one clear use-case for `force=True` is the end-user. Just like no library uses `int(obj)`, but end-users can use it very nicely. I am happy to help the end-user in this case, but if that is the target audience we may want to _discourage_ Napari from using `force=True` and encourage sparse not to put any RuntimeWarnings on it! - Sebastian > Cheers, > Ralf > > > > > If end-users will actually use `np.asarray(..., force=True)` over > > special methods, then great! But I am currently not sure the type- > > safety argument is all that big of a point. And the performance or > > memory-blowup argument remains true even for visualization > > libraries > > (where the array is purely input and never output as such). > > > > > > But yes, "never copy" is a somewhat different extension to > > `__array__` > > and `np.asarray`. It guarantees high speed and in-place behaviour > > which > > is useful for different settings. > > > > - Sebastian > > > > > > > > Cheers, > > > > Ralf > > > > > > > > > > > > > I think the discussion stalled on the precise spelling of the > > > > > third > > > > > option. > > > > > > > > > > `__array__` was not discussed there, but it seems like adding > > > > > the > > > > > `copy` > > > > > argument to `__array__` would be a perfectly reasonable > > > > > extension. > > > > > > > > > > Eric > > > > > > > > > > On Fri, 24 Apr 2020 at 03:00, Juan Nunez-Iglesias < > > > > > jni at fastmail.com> > > > > > wrote: > > > > > > > > > > > Hi everyone, > > > > > > > > > > > > One bit of expressivity we would miss is ?copy if > > > > > > necessary, > > > > > > but > > > > > > > otherwise don?t bother?, but there are workarounds to > > > > > > > this. > > > > > > > > > > > > > > > > > > > After a side discussion with St?fan van der Walt, we came > > > > > > up > > > > > > with > > > > > > `allow_copy=True`, which would express to the downstream > > > > > > library that we > > > > > > don?t mind waiting, but that zero-copy would also be ok. > > > > > > > > > > > > This sounds like the sort of thing that is use case driven. > > > > > > If > > > > > > enough > > > > > > projects want to use it, then I have no objections to > > > > > > adding > > > > > > the keyword. > > > > > > OTOH, we need to be careful about adding too many > > > > > > interoperability tricks > > > > > > as they complicate the code and makes it hard for folks to > > > > > > determine the > > > > > > best solution. Interoperability is a hot topic and we need > > > > > > to > > > > > > be careful > > > > > > not put too leave behind too many experiments in the NumPy > > > > > > code. Do you > > > > > > have any other ideas of how to achieve the same effect? > > > > > > > > > > > > > > > > > > Personally, I don?t have any other ideas, but would be > > > > > > happy to > > > > > > hear > > > > > > some! > > > > > > > > > > > > My view regarding API/experiment creep is that `__array__` > > > > > > is > > > > > > the oldest > > > > > > and most basic of all the interop tricks and that this can > > > > > > be > > > > > > safely > > > > > > maintained for future generations. Currently it only takes > > > > > > `dtype=` as a > > > > > > keyword argument, so it is a very lean API. I think this > > > > > > particular use > > > > > > case is very natural and I?ve encountered the reluctance to > > > > > > implicitly copy > > > > > > twice, so I expect it is reasonably common. > > > > > > > > > > > > Regarding difficulty in determining the best solution, I > > > > > > would > > > > > > be happy > > > > > > to contribute to the dispatch basics guide together with > > > > > > the > > > > > > new kwarg. I > > > > > > agree that the protocols are getting quite numerous and I > > > > > > couldn?t find a > > > > > > single place that gathers all the best practices together. > > > > > > But, > > > > > > to > > > > > > reiterate my point: `__array__` is the simplest of these > > > > > > and I > > > > > > think this > > > > > > keyword is pretty safe to add. > > > > > > > > > > > > For ease of discussion, here are the API options discussed > > > > > > so > > > > > > far, as > > > > > > well as a few extra that I don?t like but might trigger > > > > > > other > > > > > > ideas: > > > > > > > > > > > > np.asarray(my_duck_array, allow_copy=True) # default is > > > > > > False, > > > > > > or None > > > > > > -> leave it to the duck array to decide > > > > > > np.asarray(my_duck_array, copy=True) # always copies, but, > > > > > > if > > > > > > supported > > > > > > by the duck array, defers to it for the copy > > > > > > np.asarray(my_duck_array, copy=?allow?) # could take > > > > > > values > > > > > > ?allow?, > > > > > > ?force?, ?no?, True(=?force?), False(=?no?) > > > > > > np.asarray(my_duck_array, force_copy=False, > > > > > > allow_copy=True) # > > > > > > separate > > > > > > concepts, but unclear what force_copy=True, > > > > > > allow_copy=False > > > > > > means! > > > > > > np.asarray(my_duck_array, force=True) > > > > > > > > > > > > Juan. > > > > > > _______________________________________________ > > > > > > NumPy-Discussion mailing list > > > > > > NumPy-Discussion at python.org > > > > > > https://mail.python.org/mailman/listinfo/numpy-discussion > > > > > > > > > > > _______________________________________________ > > > > > NumPy-Discussion mailing list > > > > > NumPy-Discussion at python.org > > > > > https://mail.python.org/mailman/listinfo/numpy-discussion > > > > > > > > > _______________________________________________ > > > > NumPy-Discussion mailing list > > > > NumPy-Discussion at python.org > > > > https://mail.python.org/mailman/listinfo/numpy-discussion > > > > > > > > > > _______________________________________________ > > > NumPy-Discussion mailing list > > > NumPy-Discussion at python.org > > > https://mail.python.org/mailman/listinfo/numpy-discussion > > > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at python.org > > https://mail.python.org/mailman/listinfo/numpy-discussion > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion From sebastian at sipsolutions.net Tue Apr 28 12:49:35 2020 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Tue, 28 Apr 2020 11:49:35 -0500 Subject: [Numpy-discussion] Proposal: add `force=` or `copy=` kwarg to `__array__` interface In-Reply-To: <384066402d534967cc6451d3d4337a4d67250600.camel@sipsolutions.net> References: <384066402d534967cc6451d3d4337a4d67250600.camel@sipsolutions.net> Message-ID: On Tue, 2020-04-28 at 09:58 -0500, Sebastian Berg wrote: > On Tue, 2020-04-28 at 11:51 +0200, Ralf Gommers wrote: > > > > So arguably, there is no type-safety concern due to `.detach()`. > > > > I'm not sure what the question is here; no one mentioned type- > > safety. > > The > > PyTorch maintainers have already said they're fine with adding a > > force > > keyword. > > But type-safety is the reason to distinguish between: > > * np.asarrau(tensor) > * np.asarray(tensor, force=True) > > Similar to: > > * operator.index(obj) > * int(obj) # convert less type-safe (strings, floats)! > > I actually mentioned 3 reasons in my email: > > 1. Teach and Inform users (about the next two mainly) > 2. Type-safety > 3. Expensive conversion > > And only type-safety is related to `.detach()` mentioning that there > may not be clear story about the usage in that case. > (Sorry something got broken here) The question is what PyTorch's reasons are to feel `np.asarray(tensor)` should not work generally. I for one thought it was type-safety with regard to `.detach()`. And then I was surprised to realize that type-safety might not be a great reason to reject an implicit `.detach()` within `np.asarray(tensor)`. In any case, all the long talk is simply that I first want to be clear on what the concerns are why libraries reject `np.asarray(tensor)`. And then, I want to be clear that adding `force=True` will actually solves those concerns. And I was surprised myself that this became very much unclear to me. Again, one reason for it being not clear to me is half the ecosystem could potentially can just always use `force=True`. So there must be some "good usage" and some "bad usage" and I would like to know what that is. - Sebastian > (continued below) > > > > > > > > I am very much in favor of adding such things, but I still lack a > > > bit > > > of clarity as to whom we would be helping? > > > > > > > See Juan's first email. I personally am ambivalent on this > > proposal, > > but if > > Juan and the Napari devs really want it, that's good enough for me. > > Of course I read it, twice, but it is only good enough for me if we > actually *solve the issue*, and for that I want to know which issue > we > are solving :), it seems obvious, but I am not so sure... > > That brings us to the other two reasons: > > Teaching and Informing users: > > If Napari uses `force=True` indiscriminately, it is not very clear to > the user about whether or not the operation is expensive. I.e. the > user can learn it is when using `np.asarray(sparse_arr)` with other > libraries. But they are not notified that > `napari.vis_func(sparse_arr)` > might kill their computer. > > So the "Teaching" part can still partially work, but it does not > inform > the user well anymore on whether or not a function will blow-up > memory. > > Expensive Conversion: > > If the main reason is expensive conversions, however, than, as a > library I would probably just use it for half my API, since copying > from GPU to CPU will still be much faster than my own function. > > > Generally: > > I want to help Napari, but it seems like there may be more to this, > and > it may be good to finish these thoughts before making a call. > > E.g. Napari wants to use it, but do the array-providers want Napari > to > use it? > > For sparse Hameer just mentioned that he still would want big > warnings > both during the operation and in the `np.asarray` documentation. > If we put such big warnings there, we should have an idea of who we > want to ignore that warning? (Napari yes, sklearn sometimes, ...?) > > -> Is "whatever the library feels right" good enough? > > And if the conversion still gives warnings for some array-objects, > have > we actually gained much? > > -> Maybe we do, end-users may be happy to ignore those warnings... > > > The one clear use-case for `force=True` is the end-user. Just like no > library uses `int(obj)`, but end-users can use it very nicely. > I am happy to help the end-user in this case, but if that is the > target > audience we may want to _discourage_ Napari from using `force=True` > and > encourage sparse not to put any RuntimeWarnings on it! > > - Sebastian > > > > Cheers, > > Ralf > > > > > > > > > If end-users will actually use `np.asarray(..., force=True)` over > > > special methods, then great! But I am currently not sure the > > > type- > > > safety argument is all that big of a point. And the performance > > > or > > > memory-blowup argument remains true even for visualization > > > libraries > > > (where the array is purely input and never output as such). > > > > > > > > > But yes, "never copy" is a somewhat different extension to > > > `__array__` > > > and `np.asarray`. It guarantees high speed and in-place behaviour > > > which > > > is useful for different settings. > > > > > > - Sebastian > > > > > > > > > > > Cheers, > > > > > Ralf > > > > > > > > > > > > > > > > I think the discussion stalled on the precise spelling of > > > > > > the > > > > > > third > > > > > > option. > > > > > > > > > > > > `__array__` was not discussed there, but it seems like > > > > > > adding > > > > > > the > > > > > > `copy` > > > > > > argument to `__array__` would be a perfectly reasonable > > > > > > extension. > > > > > > > > > > > > Eric > > > > > > > > > > > > On Fri, 24 Apr 2020 at 03:00, Juan Nunez-Iglesias < > > > > > > jni at fastmail.com> > > > > > > wrote: > > > > > > > > > > > > > Hi everyone, > > > > > > > > > > > > > > One bit of expressivity we would miss is ?copy if > > > > > > > necessary, > > > > > > > but > > > > > > > > otherwise don?t bother?, but there are workarounds to > > > > > > > > this. > > > > > > > > > > > > > > > > > > > > > > After a side discussion with St?fan van der Walt, we came > > > > > > > up > > > > > > > with > > > > > > > `allow_copy=True`, which would express to the downstream > > > > > > > library that we > > > > > > > don?t mind waiting, but that zero-copy would also be ok. > > > > > > > > > > > > > > This sounds like the sort of thing that is use case > > > > > > > driven. > > > > > > > If > > > > > > > enough > > > > > > > projects want to use it, then I have no objections to > > > > > > > adding > > > > > > > the keyword. > > > > > > > OTOH, we need to be careful about adding too many > > > > > > > interoperability tricks > > > > > > > as they complicate the code and makes it hard for folks > > > > > > > to > > > > > > > determine the > > > > > > > best solution. Interoperability is a hot topic and we > > > > > > > need > > > > > > > to > > > > > > > be careful > > > > > > > not put too leave behind too many experiments in the > > > > > > > NumPy > > > > > > > code. Do you > > > > > > > have any other ideas of how to achieve the same effect? > > > > > > > > > > > > > > > > > > > > > Personally, I don?t have any other ideas, but would be > > > > > > > happy to > > > > > > > hear > > > > > > > some! > > > > > > > > > > > > > > My view regarding API/experiment creep is that > > > > > > > `__array__` > > > > > > > is > > > > > > > the oldest > > > > > > > and most basic of all the interop tricks and that this > > > > > > > can > > > > > > > be > > > > > > > safely > > > > > > > maintained for future generations. Currently it only > > > > > > > takes > > > > > > > `dtype=` as a > > > > > > > keyword argument, so it is a very lean API. I think this > > > > > > > particular use > > > > > > > case is very natural and I?ve encountered the reluctance > > > > > > > to > > > > > > > implicitly copy > > > > > > > twice, so I expect it is reasonably common. > > > > > > > > > > > > > > Regarding difficulty in determining the best solution, I > > > > > > > would > > > > > > > be happy > > > > > > > to contribute to the dispatch basics guide together with > > > > > > > the > > > > > > > new kwarg. I > > > > > > > agree that the protocols are getting quite numerous and I > > > > > > > couldn?t find a > > > > > > > single place that gathers all the best practices > > > > > > > together. > > > > > > > But, > > > > > > > to > > > > > > > reiterate my point: `__array__` is the simplest of these > > > > > > > and I > > > > > > > think this > > > > > > > keyword is pretty safe to add. > > > > > > > > > > > > > > For ease of discussion, here are the API options > > > > > > > discussed > > > > > > > so > > > > > > > far, as > > > > > > > well as a few extra that I don?t like but might trigger > > > > > > > other > > > > > > > ideas: > > > > > > > > > > > > > > np.asarray(my_duck_array, allow_copy=True) # default is > > > > > > > False, > > > > > > > or None > > > > > > > -> leave it to the duck array to decide > > > > > > > np.asarray(my_duck_array, copy=True) # always copies, > > > > > > > but, > > > > > > > if > > > > > > > supported > > > > > > > by the duck array, defers to it for the copy > > > > > > > np.asarray(my_duck_array, copy=?allow?) # could take > > > > > > > values > > > > > > > ?allow?, > > > > > > > ?force?, ?no?, True(=?force?), False(=?no?) > > > > > > > np.asarray(my_duck_array, force_copy=False, > > > > > > > allow_copy=True) # > > > > > > > separate > > > > > > > concepts, but unclear what force_copy=True, > > > > > > > allow_copy=False > > > > > > > means! > > > > > > > np.asarray(my_duck_array, force=True) > > > > > > > > > > > > > > Juan. > > > > > > > _______________________________________________ > > > > > > > NumPy-Discussion mailing list > > > > > > > NumPy-Discussion at python.org > > > > > > > https://mail.python.org/mailman/listinfo/numpy-discussion > > > > > > > > > > > > > _______________________________________________ > > > > > > NumPy-Discussion mailing list > > > > > > NumPy-Discussion at python.org > > > > > > https://mail.python.org/mailman/listinfo/numpy-discussion > > > > > > > > > > > _______________________________________________ > > > > > NumPy-Discussion mailing list > > > > > NumPy-Discussion at python.org > > > > > https://mail.python.org/mailman/listinfo/numpy-discussion > > > > > > > > > > > > > _______________________________________________ > > > > NumPy-Discussion mailing list > > > > NumPy-Discussion at python.org > > > > https://mail.python.org/mailman/listinfo/numpy-discussion > > > > > > _______________________________________________ > > > NumPy-Discussion mailing list > > > NumPy-Discussion at python.org > > > https://mail.python.org/mailman/listinfo/numpy-discussion > > > > > > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at python.org > > https://mail.python.org/mailman/listinfo/numpy-discussion > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion From sebastian at sipsolutions.net Tue Apr 28 12:55:20 2020 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Tue, 28 Apr 2020 11:55:20 -0500 Subject: [Numpy-discussion] NumPy Community Meeting Wednesday Message-ID: <1bc4e0d7160015e069ce5ec0d856561c8340db60.camel@sipsolutions.net> Hi all, There will be a NumPy Community meeting Wednesday April 29th at 1pm Pacific Time (20:00 UTC). Everyone is invited and encouraged to join in and edit the work-in-progress meeting topics and notes: https://hackmd.io/76o-IxCjQX2mOXO_wwkcpg?both Best wishes Sebastian From ralf.gommers at gmail.com Wed Apr 29 05:11:13 2020 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Wed, 29 Apr 2020 11:11:13 +0200 Subject: [Numpy-discussion] Proposal: add `force=` or `copy=` kwarg to `__array__` interface In-Reply-To: <384066402d534967cc6451d3d4337a4d67250600.camel@sipsolutions.net> References: <384066402d534967cc6451d3d4337a4d67250600.camel@sipsolutions.net> Message-ID: On Tue, Apr 28, 2020 at 5:03 PM Sebastian Berg wrote: > On Tue, 2020-04-28 at 11:51 +0200, Ralf Gommers wrote: > > > > So arguably, there is no type-safety concern due to `.detach()`. > > > > I'm not sure what the question is here; no one mentioned type-safety. > > The > > PyTorch maintainers have already said they're fine with adding a > > force > > keyword. > > But type-safety is the reason to distinguish between: > > * np.asarrau(tensor) > * np.asarray(tensor, force=True) > No it's not, the rationale given by library authors is expensive conversion / memory copies / side effects. `np.asarray(x)` is used all over the place, and can/will continue to be used by library authors. `force=True` is for cases where things like expensive conversion don't matter, like visualization - if you need a picture of an array then it helps, while the downside of writing inefficient/unreliable numerical code isn't present. > Similar to: > > * operator.index(obj) > * int(obj) # convert less type-safe (strings, floats)! > > I actually mentioned 3 reasons in my email: > > 1. Teach and Inform users (about the next two mainly) > 2. Type-safety > 3. Expensive conversion > > And only type-safety is related to `.detach()` mentioning that there > may not be clear story about the usage in that case. > > (continued below) > > > > > > > > > > > > > I am very much in favor of adding such things, but I still lack a > > > bit > > > of clarity as to whom we would be helping? > > > > > > > See Juan's first email. I personally am ambivalent on this proposal, > > but if > > Juan and the Napari devs really want it, that's good enough for me. > > Of course I read it, twice, but it is only good enough for me if we > actually *solve the issue*, and for that I want to know which issue we > are solving :), it seems obvious, but I am not so sure... > > That brings us to the other two reasons: > > Teaching and Informing users: > > If Napari uses `force=True` indiscriminately, it is not very clear to > the user about whether or not the operation is expensive. I.e. the > user can learn it is when using `np.asarray(sparse_arr)` with other > libraries. But they are not notified that `napari.vis_func(sparse_arr)` > might kill their computer. > > So the "Teaching" part can still partially work, but it does not inform > the user well anymore on whether or not a function will blow-up memory. > > Expensive Conversion: > > If the main reason is expensive conversions, however, than, as a > library I would probably just use it for half my API, since copying > from GPU to CPU will still be much faster than my own function. > > > Generally: > > I want to help Napari, but it seems like there may be more to this, and > it may be good to finish these thoughts before making a call. > > E.g. Napari wants to use it, but do the array-providers want Napari to > use it? > > For sparse Hameer just mentioned that he still would want big warnings > both during the operation and in the `np.asarray` documentation. > If we put such big warnings there, we should have an idea of who we > want to ignore that warning? (Napari yes, sklearn sometimes, ...?) > There clearly should not be warnings. And sklearn is irrelevant, it cannot use `force=True`. Ralf > -> Is "whatever the library feels right" good enough? > > And if the conversion still gives warnings for some array-objects, have > we actually gained much? > > -> Maybe we do, end-users may be happy to ignore those warnings... > > > The one clear use-case for `force=True` is the end-user. Just like no > library uses `int(obj)`, but end-users can use it very nicely. > I am happy to help the end-user in this case, but if that is the target > audience we may want to _discourage_ Napari from using `force=True` and > encourage sparse not to put any RuntimeWarnings on it! > > - Sebastian > > > > Cheers, > > Ralf > > > > > > > > > If end-users will actually use `np.asarray(..., force=True)` over > > > special methods, then great! But I am currently not sure the type- > > > safety argument is all that big of a point. And the performance or > > > memory-blowup argument remains true even for visualization > > > libraries > > > (where the array is purely input and never output as such). > > > > > > > > > But yes, "never copy" is a somewhat different extension to > > > `__array__` > > > and `np.asarray`. It guarantees high speed and in-place behaviour > > > which > > > is useful for different settings. > > > > > > - Sebastian > > > > > > > > > > > Cheers, > > > > > Ralf > > > > > > > > > > > > > > > > I think the discussion stalled on the precise spelling of the > > > > > > third > > > > > > option. > > > > > > > > > > > > `__array__` was not discussed there, but it seems like adding > > > > > > the > > > > > > `copy` > > > > > > argument to `__array__` would be a perfectly reasonable > > > > > > extension. > > > > > > > > > > > > Eric > > > > > > > > > > > > On Fri, 24 Apr 2020 at 03:00, Juan Nunez-Iglesias < > > > > > > jni at fastmail.com> > > > > > > wrote: > > > > > > > > > > > > > Hi everyone, > > > > > > > > > > > > > > One bit of expressivity we would miss is ?copy if > > > > > > > necessary, > > > > > > > but > > > > > > > > otherwise don?t bother?, but there are workarounds to > > > > > > > > this. > > > > > > > > > > > > > > > > > > > > > > After a side discussion with St?fan van der Walt, we came > > > > > > > up > > > > > > > with > > > > > > > `allow_copy=True`, which would express to the downstream > > > > > > > library that we > > > > > > > don?t mind waiting, but that zero-copy would also be ok. > > > > > > > > > > > > > > This sounds like the sort of thing that is use case driven. > > > > > > > If > > > > > > > enough > > > > > > > projects want to use it, then I have no objections to > > > > > > > adding > > > > > > > the keyword. > > > > > > > OTOH, we need to be careful about adding too many > > > > > > > interoperability tricks > > > > > > > as they complicate the code and makes it hard for folks to > > > > > > > determine the > > > > > > > best solution. Interoperability is a hot topic and we need > > > > > > > to > > > > > > > be careful > > > > > > > not put too leave behind too many experiments in the NumPy > > > > > > > code. Do you > > > > > > > have any other ideas of how to achieve the same effect? > > > > > > > > > > > > > > > > > > > > > Personally, I don?t have any other ideas, but would be > > > > > > > happy to > > > > > > > hear > > > > > > > some! > > > > > > > > > > > > > > My view regarding API/experiment creep is that `__array__` > > > > > > > is > > > > > > > the oldest > > > > > > > and most basic of all the interop tricks and that this can > > > > > > > be > > > > > > > safely > > > > > > > maintained for future generations. Currently it only takes > > > > > > > `dtype=` as a > > > > > > > keyword argument, so it is a very lean API. I think this > > > > > > > particular use > > > > > > > case is very natural and I?ve encountered the reluctance to > > > > > > > implicitly copy > > > > > > > twice, so I expect it is reasonably common. > > > > > > > > > > > > > > Regarding difficulty in determining the best solution, I > > > > > > > would > > > > > > > be happy > > > > > > > to contribute to the dispatch basics guide together with > > > > > > > the > > > > > > > new kwarg. I > > > > > > > agree that the protocols are getting quite numerous and I > > > > > > > couldn?t find a > > > > > > > single place that gathers all the best practices together. > > > > > > > But, > > > > > > > to > > > > > > > reiterate my point: `__array__` is the simplest of these > > > > > > > and I > > > > > > > think this > > > > > > > keyword is pretty safe to add. > > > > > > > > > > > > > > For ease of discussion, here are the API options discussed > > > > > > > so > > > > > > > far, as > > > > > > > well as a few extra that I don?t like but might trigger > > > > > > > other > > > > > > > ideas: > > > > > > > > > > > > > > np.asarray(my_duck_array, allow_copy=True) # default is > > > > > > > False, > > > > > > > or None > > > > > > > -> leave it to the duck array to decide > > > > > > > np.asarray(my_duck_array, copy=True) # always copies, but, > > > > > > > if > > > > > > > supported > > > > > > > by the duck array, defers to it for the copy > > > > > > > np.asarray(my_duck_array, copy=?allow?) # could take > > > > > > > values > > > > > > > ?allow?, > > > > > > > ?force?, ?no?, True(=?force?), False(=?no?) > > > > > > > np.asarray(my_duck_array, force_copy=False, > > > > > > > allow_copy=True) # > > > > > > > separate > > > > > > > concepts, but unclear what force_copy=True, > > > > > > > allow_copy=False > > > > > > > means! > > > > > > > np.asarray(my_duck_array, force=True) > > > > > > > > > > > > > > Juan. > > > > > > > _______________________________________________ > > > > > > > NumPy-Discussion mailing list > > > > > > > NumPy-Discussion at python.org > > > > > > > https://mail.python.org/mailman/listinfo/numpy-discussion > > > > > > > > > > > > > _______________________________________________ > > > > > > NumPy-Discussion mailing list > > > > > > NumPy-Discussion at python.org > > > > > > https://mail.python.org/mailman/listinfo/numpy-discussion > > > > > > > > > > > _______________________________________________ > > > > > NumPy-Discussion mailing list > > > > > NumPy-Discussion at python.org > > > > > https://mail.python.org/mailman/listinfo/numpy-discussion > > > > > > > > > > > > > _______________________________________________ > > > > NumPy-Discussion mailing list > > > > NumPy-Discussion at python.org > > > > https://mail.python.org/mailman/listinfo/numpy-discussion > > > > > > _______________________________________________ > > > NumPy-Discussion mailing list > > > NumPy-Discussion at python.org > > > https://mail.python.org/mailman/listinfo/numpy-discussion > > > > > > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at python.org > > https://mail.python.org/mailman/listinfo/numpy-discussion > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jni at fastmail.com Wed Apr 29 06:26:28 2020 From: jni at fastmail.com (Juan Nunez-Iglesias) Date: Wed, 29 Apr 2020 05:26:28 -0500 Subject: [Numpy-discussion] =?utf-8?q?Proposal=3A_add_=60force=3D=60_or_?= =?utf-8?b?YGNvcHk9YCBrd2FyZyB0byBgX19hcnJheV9fYCBpbnRlcmZhY2U=?= In-Reply-To: References: <384066402d534967cc6451d3d4337a4d67250600.camel@sipsolutions.net> Message-ID: <90677c5f-a5f2-4531-9c9e-227035863e6f@www.fastmail.com> Hi everyone, and thank you Ralf for carrying the flag in my absence. =D Sebastian, the *primary* motivation behind avoiding detach() in PyTorch is listed in original post of the PyTorch issue: > People not very familiar with `requires_grad` and cpu/gpu Tensors might go back and forth with numpy. For example doing pytorch -> numpy -> pytorch and backward on the last Tensor. This will backward without issue but not all the way to the first part of the code and won?t raise any error. The PyTorch team are concerned that they will be overwhelmed with help requests if np.array() silently succeeds on a tensor with gradients. I definitely get that. Avoiding .gpu() is more straightforwardly about avoiding implicit expensive computation. > while others do not choose to teach about it. There seems very little > or even no "promise" attached to either `force=True` or `force=False`. NumPy can set a precedent through policy. The *only* reason client libraries would implement `__array__` is to play well with NumPy, so if NumPy documents that `force=True` should *always* succeed, we can expect client libraries to follow suit. At least the PyTorch devs have indicated that they would be open to this. > E.g. Napari wants to use it, but do the array-providers want Napari to use it? As Ralf pointed out, the PyTorch devs have already agreed to it. >From the napari perspective, we'd be ok with leaving the decision on warnings to client libraries. We may or may not suppress them depending on user requests. ;) But the point is to have a way of saying "give me a NumPy array DAMMIT" without having to know about all the possible array libraries. Which are numerous and getting numerouser. Ralf, you said you don't want warnings ? even for sparse arrays? That was an area of concern for you on the PyTorch discussion side. > And if the conversion still gives warnings for some array-objects, have we actually gained much? Yes. Hameer, > I would advocate for a `force=` kwarg but personally I don't think it's explicit enough, but probably as explicit as can be given NumPy's API. Yeah, I agree that force is kind of vague, which is why I was looking for things like `allow_copy`. But it is hard to be general enough here: sparse requires an expensive instantiation, cupy requires copying from gpu to cpu, dask requires arbitrary computation, xarray requires information loss... I'm inclined to agree with Ralf that force= is the only generic-enough term, but I'm happy to entertain other options! Juan. -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at gmail.com Wed Apr 29 07:07:09 2020 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Wed, 29 Apr 2020 13:07:09 +0200 Subject: [Numpy-discussion] Proposal: add `force=` or `copy=` kwarg to `__array__` interface In-Reply-To: <90677c5f-a5f2-4531-9c9e-227035863e6f@www.fastmail.com> References: <384066402d534967cc6451d3d4337a4d67250600.camel@sipsolutions.net> <90677c5f-a5f2-4531-9c9e-227035863e6f@www.fastmail.com> Message-ID: On Wed, Apr 29, 2020 at 12:27 PM Juan Nunez-Iglesias wrote: > > > Ralf, you said you don't want warnings ? even for sparse arrays? That was > an area of concern for you on the PyTorch discussion side. > Providing a boolean keyword argument and then raising a warning every time anyone uses that keyword makes very little sense. This should simply be handled by clear documentation: use only in code like visualization where the array arrives at its "end station", never in functions that are again built on top of. Cheers, Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Wed Apr 29 11:42:18 2020 From: charlesr.harris at gmail.com (Charles R Harris) Date: Wed, 29 Apr 2020 09:42:18 -0600 Subject: [Numpy-discussion] 1.19.x branch Message-ID: Hi All, I thinking of making the 1.19.x branch in about two weeks. If there are PRs that you feel need to be in that release please let me know. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From sebastian at sipsolutions.net Wed Apr 29 14:17:03 2020 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Wed, 29 Apr 2020 13:17:03 -0500 Subject: [Numpy-discussion] Proposal: add `force=` or `copy=` kwarg to `__array__` interface In-Reply-To: <90677c5f-a5f2-4531-9c9e-227035863e6f@www.fastmail.com> References: <384066402d534967cc6451d3d4337a4d67250600.camel@sipsolutions.net> <90677c5f-a5f2-4531-9c9e-227035863e6f@www.fastmail.com> Message-ID: <563da090aeac69001af6ed2d8e8b980603538e19.camel@sipsolutions.net> On Wed, 2020-04-29 at 05:26 -0500, Juan Nunez-Iglesias wrote: > Hi everyone, and thank you Ralf for carrying the flag in my absence. > =D > > Sebastian, the *primary* motivation behind avoiding detach() in > PyTorch is listed in original post of the PyTorch issue: > > > People not very familiar with `requires_grad` and cpu/gpu Tensors > > might go back and forth with numpy. For example doing pytorch -> > > numpy -> pytorch and backward on the last Tensor. This will > > backward without issue but not all the way to the first part of the > > code and won?t raise any error. > > The PyTorch team are concerned that they will be overwhelmed with > help requests if np.array() silently succeeds on a tensor with > gradients. I definitely get that. Sorry for playing advocatus diaboli... I guess it is simply that before the end, it would be nice to have a short list with projects: * Napari, matplotlib on the "user" side * PyTorch, ...? on the "provider" side And maybe what their expectations on `force=True` are, to make sure they roughly align. The best definition for when to use `force=True` at this time seems to be "end-point" users (such as visualization or maybe writing to disk?). I still think performance can be just as valid of an issue there. For example it may be better to convert to a numpy array earlier in the computation. Or someone could be surprised that saving their gpu array to an hdf5 file is by far the slowest part of the computation. Maybe I have the feeling the definition we want is actually: There is definitely no way to do this computation faster or better than by converting it to a NumPy array. Since currently the main reason to reject it seems a bit to be: Wait, are you sure there is not a much better way than using NumPy arrays, be careful! And while that distinction is clear for PyTorch + visualization, I am not quite sure yet, that it is clear for various combinations of `force=True` and array-like users. Maybe CuPy does not want h5py to use `force=True`, because cupy has its own super fast "stream-to-file" functionality... But it wants to to do it for napari. - Sebastian > > Avoiding .gpu() is more straightforwardly about avoiding implicit > expensive computation. > > > while others do not choose to teach about it. There seems very > > little > > or even no "promise" attached to either `force=True` or > > `force=False`. > > NumPy can set a precedent through policy. The *only* reason client > libraries would implement `__array__` is to play well with NumPy, so > if NumPy documents that `force=True` should *always* succeed, we can > expect client libraries to follow suit. At least the PyTorch devs > have indicated that they would be open to this. > > > E.g. Napari wants to use it, but do the array-providers want Napari > > to use it? > > As Ralf pointed out, the PyTorch devs have already agreed to it. > > From the napari perspective, we'd be ok with leaving the decision on > warnings to client libraries. We may or may not suppress them > depending on user requests. ;) But the point is to have a way of > saying "give me a NumPy array DAMMIT" without having to know about > all the possible array libraries. Which are numerous and getting > numerouser. > > Ralf, you said you don't want warnings ? even for sparse arrays? That > was an area of concern for you on the PyTorch discussion side. > > > And if the conversion still gives warnings for some array-objects, > > have we actually gained much? > > Yes. > > Hameer, > > > I would advocate for a `force=` kwarg but personally I don't think > > it's explicit enough, but probably as explicit as can be given > > NumPy's API. > > Yeah, I agree that force is kind of vague, which is why I was looking > for things like `allow_copy`. But it is hard to be general enough > here: sparse requires an expensive instantiation, cupy requires > copying from gpu to cpu, dask requires arbitrary computation, xarray > requires information loss... I'm inclined to agree with Ralf that > force= is the only generic-enough term, but I'm happy to entertain > other options! > > Juan. > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion From sebastian at sipsolutions.net Thu Apr 30 13:31:45 2020 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Thu, 30 Apr 2020 12:31:45 -0500 Subject: [Numpy-discussion] Deprecate Promotion of numbers to strings? Message-ID: <8795015162c71a0a5d574ffc8d7dd91f057115b5.camel@sipsolutions.net> Hi all, in https://github.com/numpy/numpy/pull/15925 I propose to deprecate promotion of strings and numbers. I have to double check whether this has a large effect on pandas, but it currently seems to me that it will be reasonable. This means that `np.promote_types("S", "int8")`, etc. will lead to an error instead of returning `"S4"`. For the user, I believe the two main visible changes are that: np.array(["string", 0]) will stop creating a string array and return either an `object` array or give an error (object array would be the default currently). Another larger visible change will be code such as: np.concatenate([np.array(["string"]), np.array([2])]) will result in an error instead of returning a string array. (Users will have to cast manually here.) The alternative is to return an object array also for the concatenate example. I somewhat dislike that because `object` is not homogeneously typed and we thus lose type information. This also affects functions that wish to cast inputs to a common type (ufuncs also do this sometimes). A further example of this and discussion is at the end of the mail [1]. So the first question is whether we can form an agreement that an error is the better choice for `concatenate` and `np.promote_types()`. I.e. there is no one dtype that can faithfully represent both strings and integers. (This is currently the case e.g. for datetime64 and float64.) The second question is what to do for: np.array(["string", 0]) which currently always returns strings. Arguably, it must also either return an `object` array, or raise an error (requiring the user to pick string or object using `dtype=object`). The default would be to create a FutureWarning that an `object` array will be returned for `np.asarray(["string", 0])` in the future. But if we know already that we prefer an error, it would be better to give a DeprecationWarning right away. (It just does not seem nice to change the same thing twice even if the workaround is identical.) Cheers, Sebastian [1] A second more in-depth point is that code such as: common_dtype = np.result_type(arr1, arr2) # or promote_types arr1 = arr1.astype(common_dtype, copy=False) arr2 = arr2.astype(common_dtype, copy=False) will currently use `string` in this case while it would error in the future. This already fails with other type combinations such as `datetime64` and `float64` at the moment. The main alternative to this proposal is to return `object` for the common dtype, since an object array is not homogeneously typed, it arguably can represent both inputs. I do not quite like this choice personally because in the above example, it may be that the next line is something like: return arr1 * arr2 in which case, the preferred return may be `str` and not `object`. We currently never promote to `object` unless one of the arrays is already an `object` array, and that seems like the right choice to me. From wieser.eric+numpy at gmail.com Thu Apr 30 13:47:43 2020 From: wieser.eric+numpy at gmail.com (Eric Wieser) Date: Thu, 30 Apr 2020 18:47:43 +0100 Subject: [Numpy-discussion] Deprecate Promotion of numbers to strings? In-Reply-To: <8795015162c71a0a5d574ffc8d7dd91f057115b5.camel@sipsolutions.net> References: <8795015162c71a0a5d574ffc8d7dd91f057115b5.camel@sipsolutions.net> Message-ID: > Another larger visible change will be code such as: > > np.concatenate([np.array(["string"]), np.array([2])]) > > will result in an error instead of returning a string array. (Users > will have to cast manually here.) I wonder if we can lessen the blow by allowing `np.concatenate([np.array(["string"]), np.array([2])], casting='unsafe', dtype=str)` or similar in its place. It seems a little unfortunate that with this change, we lose the ability to concatenate numbers to strings without making intermediate copies. Eric On Thu, 30 Apr 2020 at 18:32, Sebastian Berg wrote: > Hi all, > > in https://github.com/numpy/numpy/pull/15925 I propose to deprecate > promotion of strings and numbers. I have to double check whether this > has a large effect on pandas, but it currently seems to me that it will > be reasonable. > > This means that `np.promote_types("S", "int8")`, etc. will lead to an > error instead of returning `"S4"`. For the user, I believe the two > main visible changes are that: > > np.array(["string", 0]) > > will stop creating a string array and return either an `object` array > or give an error (object array would be the default currently). > > Another larger visible change will be code such as: > > np.concatenate([np.array(["string"]), np.array([2])]) > > will result in an error instead of returning a string array. (Users > will have to cast manually here.) > > The alternative is to return an object array also for the concatenate > example. I somewhat dislike that because `object` is not homogeneously > typed and we thus lose type information. This also affects functions > that wish to cast inputs to a common type (ufuncs also do this > sometimes). > A further example of this and discussion is at the end of the mail [1]. > > > So the first question is whether we can form an agreement that an error > is the better choice for `concatenate` and `np.promote_types()`. > I.e. there is no one dtype that can faithfully represent both strings > and integers. (This is currently the case e.g. for datetime64 and > float64.) > > > The second question is what to do for: > > np.array(["string", 0]) > > which currently always returns strings. Arguably, it must also either > return an `object` array, or raise an error (requiring the user to pick > string or object using `dtype=object`). > > The default would be to create a FutureWarning that an `object` array > will be returned for `np.asarray(["string", 0])` in the future. > But if we know already that we prefer an error, it would be better to > give a DeprecationWarning right away. (It just does not seem nice to > change the same thing twice even if the workaround is identical.) > > Cheers, > > Sebastian > > > [1] > > A second more in-depth point is that code such as: > > common_dtype = np.result_type(arr1, arr2) # or promote_types > arr1 = arr1.astype(common_dtype, copy=False) > arr2 = arr2.astype(common_dtype, copy=False) > > will currently use `string` in this case while it would error in the > future. This already fails with other type combinations such as > `datetime64` and `float64` at the moment. > > The main alternative to this proposal is to return `object` for the > common dtype, since an object array is not homogeneously typed, it > arguably can represent both inputs. I do not quite like this choice > personally because in the above example, it may be that the next line > is something like: > > return arr1 * arr2 > > in which case, the preferred return may be `str` and not `object`. > We currently never promote to `object` unless one of the arrays is > already an `object` array, and that seems like the right choice to me. > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From shoyer at gmail.com Thu Apr 30 13:52:12 2020 From: shoyer at gmail.com (Stephan Hoyer) Date: Thu, 30 Apr 2020 10:52:12 -0700 Subject: [Numpy-discussion] Deprecate Promotion of numbers to strings? In-Reply-To: <8795015162c71a0a5d574ffc8d7dd91f057115b5.camel@sipsolutions.net> References: <8795015162c71a0a5d574ffc8d7dd91f057115b5.camel@sipsolutions.net> Message-ID: On Thu, Apr 30, 2020 at 10:32 AM Sebastian Berg wrote: > Hi all, > > in https://github.com/numpy/numpy/pull/15925 I propose to deprecate > promotion of strings and numbers. I have to double check whether this > has a large effect on pandas, but it currently seems to me that it will > be reasonable. > Sebastian -- thanks for driving this forward! Pandas and Xarray already override these casting rules, so I think this is generally a good idea: https://github.com/pydata/xarray/blob/3820fb77256682d909c1e41d962e29bec0edd62d/xarray/core/dtypes.py#L34-L42 Note that Xarray also overrides np.promote_types(np.bytes_, np.unicode_) to object. This means that `np.promote_types("S", "int8")`, etc. will lead to an > error instead of returning `"S4"`. For the user, I believe the two > main visible changes are that: > > np.array(["string", 0]) > > will stop creating a string array and return either an `object` array > or give an error (object array would be the default currently). > In the long term, I guess this would error as part of the plan to require explicitly writing dtype=object to get object arrays? > Another larger visible change will be code such as: > > np.concatenate([np.array(["string"]), np.array([2])]) > > will result in an error instead of returning a string array. (Users > will have to cast manually here.) > I agree, it is better to raise an error than inadvertently cast to object dtype. This can make errors appear later in strange ways. We would need to make this change slowly over several releases, e.g., by issuing a warning first. > The alternative is to return an object array also for the concatenate > example. I somewhat dislike that because `object` is not homogeneously > typed and we thus lose type information. This also affects functions > that wish to cast inputs to a common type (ufuncs also do this > sometimes). > A further example of this and discussion is at the end of the mail [1]. > > > So the first question is whether we can form an agreement that an error > is the better choice for `concatenate` and `np.promote_types()`. > I.e. there is no one dtype that can faithfully represent both strings > and integers. (This is currently the case e.g. for datetime64 and > float64.) > > > The second question is what to do for: > > np.array(["string", 0]) > > which currently always returns strings. Arguably, it must also either > return an `object` array, or raise an error (requiring the user to pick > string or object using `dtype=object`). > > The default would be to create a FutureWarning that an `object` array > will be returned for `np.asarray(["string", 0])` in the future. > But if we know already that we prefer an error, it would be better to > give a DeprecationWarning right away. (It just does not seem nice to > change the same thing twice even if the workaround is identical.) > > Cheers, > > Sebastian > > > [1] > > A second more in-depth point is that code such as: > > common_dtype = np.result_type(arr1, arr2) # or promote_types > arr1 = arr1.astype(common_dtype, copy=False) > arr2 = arr2.astype(common_dtype, copy=False) > > will currently use `string` in this case while it would error in the > future. This already fails with other type combinations such as > `datetime64` and `float64` at the moment. > > The main alternative to this proposal is to return `object` for the > common dtype, since an object array is not homogeneously typed, it > arguably can represent both inputs. I do not quite like this choice > personally because in the above example, it may be that the next line > is something like: > > return arr1 * arr2 > > in which case, the preferred return may be `str` and not `object`. > We currently never promote to `object` unless one of the arrays is > already an `object` array, and that seems like the right choice to me. > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From sebastian at sipsolutions.net Thu Apr 30 14:20:31 2020 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Thu, 30 Apr 2020 13:20:31 -0500 Subject: [Numpy-discussion] Deprecate Promotion of numbers to strings? In-Reply-To: References: <8795015162c71a0a5d574ffc8d7dd91f057115b5.camel@sipsolutions.net> Message-ID: <2fac4b7f781bd682380e042f97228818168098b6.camel@sipsolutions.net> On Thu, 2020-04-30 at 18:47 +0100, Eric Wieser wrote: > > Another larger visible change will be code such as: > > > > np.concatenate([np.array(["string"]), np.array([2])]) > > > > will result in an error instead of returning a string array. (Users > > will have to cast manually here.) > > I wonder if we can lessen the blow by allowing > `np.concatenate([np.array(["string"]), np.array([2])], > casting='unsafe', > dtype=str)` or similar in its place. > It seems a little unfortunate that with this change, we lose the > ability to > concatenate numbers to strings without making intermediate copies. > I agree we can do that for concatenate and am happy to add just add it. Adding the dtype argument (maybe for now only force-casting is fine?) to `np.concatenate` seems like a reasonable extension of concatenate even without the loss of this potential use-case. - Sebastian > Eric > > > > On Thu, 30 Apr 2020 at 18:32, Sebastian Berg < > sebastian at sipsolutions.net> > wrote: > > > Hi all, > > > > in https://github.com/numpy/numpy/pull/15925 I propose to deprecate > > promotion of strings and numbers. I have to double check whether > > this > > has a large effect on pandas, but it currently seems to me that it > > will > > be reasonable. > > > > This means that `np.promote_types("S", "int8")`, etc. will lead to > > an > > error instead of returning `"S4"`. For the user, I believe the two > > main visible changes are that: > > > > np.array(["string", 0]) > > > > will stop creating a string array and return either an `object` > > array > > or give an error (object array would be the default currently). > > > > Another larger visible change will be code such as: > > > > np.concatenate([np.array(["string"]), np.array([2])]) > > > > will result in an error instead of returning a string array. (Users > > will have to cast manually here.) > > > > The alternative is to return an object array also for the > > concatenate > > example. I somewhat dislike that because `object` is not > > homogeneously > > typed and we thus lose type information. This also affects > > functions > > that wish to cast inputs to a common type (ufuncs also do this > > sometimes). > > A further example of this and discussion is at the end of the mail > > [1]. > > > > > > So the first question is whether we can form an agreement that an > > error > > is the better choice for `concatenate` and `np.promote_types()`. > > I.e. there is no one dtype that can faithfully represent both > > strings > > and integers. (This is currently the case e.g. for datetime64 and > > float64.) > > > > > > The second question is what to do for: > > > > np.array(["string", 0]) > > > > which currently always returns strings. Arguably, it must also > > either > > return an `object` array, or raise an error (requiring the user to > > pick > > string or object using `dtype=object`). > > > > The default would be to create a FutureWarning that an `object` > > array > > will be returned for `np.asarray(["string", 0])` in the future. > > But if we know already that we prefer an error, it would be better > > to > > give a DeprecationWarning right away. (It just does not seem nice > > to > > change the same thing twice even if the workaround is identical.) > > > > Cheers, > > > > Sebastian > > > > > > [1] > > > > A second more in-depth point is that code such as: > > > > common_dtype = np.result_type(arr1, arr2) # or promote_types > > arr1 = arr1.astype(common_dtype, copy=False) > > arr2 = arr2.astype(common_dtype, copy=False) > > > > will currently use `string` in this case while it would error in > > the > > future. This already fails with other type combinations such as > > `datetime64` and `float64` at the moment. > > > > The main alternative to this proposal is to return `object` for the > > common dtype, since an object array is not homogeneously typed, it > > arguably can represent both inputs. I do not quite like this > > choice > > personally because in the above example, it may be that the next > > line > > is something like: > > > > return arr1 * arr2 > > > > in which case, the preferred return may be `str` and not `object`. > > We currently never promote to `object` unless one of the arrays is > > already an `object` array, and that seems like the right choice to > > me. > > > > > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at python.org > > https://mail.python.org/mailman/listinfo/numpy-discussion > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion From numpy_gsod at bigriver.xyz Thu Apr 30 14:24:34 2020 From: numpy_gsod at bigriver.xyz (Ben Nathanson) Date: Thu, 30 Apr 2020 14:24:34 -0400 Subject: [Numpy-discussion] Season of Docs technical writer Message-ID: I look forward to participating in this year's Season of Docs. Though it's early, I'm eager to start a conversation; I've posted the webpage https://bennathanson.com/numpy2020 to share my thoughts on contributing. -------------- next part -------------- An HTML attachment was scrubbed... URL: From melissawm at gmail.com Thu Apr 30 15:21:01 2020 From: melissawm at gmail.com (=?UTF-8?Q?Melissa_Mendon=C3=A7a?=) Date: Thu, 30 Apr 2020 16:21:01 -0300 Subject: [Numpy-discussion] Season of Docs technical writer In-Reply-To: References: Message-ID: Hi Ben, That is great news. Thanks for that! Let's keep our fingers crossed and see if we can participate in the program this year. Cheers! - Melissa Em qui, 30 de abr de 2020 15:26, Ben Nathanson escreveu: > I look forward to participating in this year's Season of Docs. Though it's > early, I'm eager to start a conversation; I've posted the webpage > https://bennathanson.com/numpy2020 to share my thoughts on contributing. > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: