NEP 50: Promotion rules for Python scalars
Hi all, I would like to share the first formal draft of NEP 50: Promotion rules for Python scalars with everyone. The full text can be found here: https://numpy.org/neps/nep-0050-scalar-promotion.html NEP 50 is an attempt to remove value-based casting/promotion. We wish to replace it with clearer rules for the resulting dtype when mixing NumPy arrays and Python scalars. As a brief example, the proposal allows the following (unchanged): >>> np.array([1, 2, 3], dtype=np.int8) + 100 np.array([101, 102, 103], dtype=np.int8) While clearing up confusion caused by the value-inspecting behavior that we see sometimes, such as: >>> np.array([1, 2, 3], dtype=np.int8) + 300 np.array([301, 302, 303], dtype=np.int16) # note the int16 Where 300 is too large to fit an ``int8``. As well as removing the special behavior of 0-D arrays or NumPy scalars: >>> res = np.array(1, dtype=np.int8) + 100 >>> res.dtype dtype('int64') This is the continuation of a long discussion (see the "Discussion" section), including the poll I once posted: https://discuss.scientific-python.org/t/poll-future-numpy-behavior-when-mixi... I would be happy for any feadback, be it just editorial or fundamental discussion. There are many alternatives which I have tried to capture in the NEP. So lets discuss here, or on discuss: https://discuss.scientific-python.org/t/nep-50-promotion-rules-for-python-sc... For smaller edits, don't hesitate to open a NumPy PR, or propose edits on my branch (you can use the edit button to create a PR): https://github.com/seberg/numpy/blob/nep50/doc/neps/nep-0050-scalar-promotio... An important part of moving forward will be assessing the real world impact. To start that process, I have created a branch as a draft PR (at this time): https://github.com/numpy/numpy/pull/21626 It is missing some parts, but should allow preliminary testing. The main missing part is that the integer warnings and errors are less strict than proposed in the NEP. It would be invaluable to get a better idea to what extent existing code, especially end-user code, is affected by the proposed changes. Thanks in advance for any input! This is a big, complicated proposal, but finding a way forward will hopefully clear up a source of confusion and inconsistencies that make both maintainers and users life harder. Cheers, Sebastian
On Wed, Jun 1, 2022 at 5:51 PM Sebastian Berg <sebastian@sipsolutions.net> wrote:
An important part of moving forward will be assessing the real world impact. To start that process, I have created a branch as a draft PR (at this time):
https://github.com/numpy/numpy/pull/21626
It is missing some parts, but should allow preliminary testing. The main missing part is that the integer warnings and errors are less strict than proposed in the NEP. It would be invaluable to get a better idea to what extent existing code, especially end-user code, is affected by the proposed changes.
Thanks Sebastian! For testing, did you already try with some of the usual suspects, or would it be helpful to use this branch on SciPy, Pandas, etc.? Also, do you expect it's useful to do platform-specific testing? I can imagine there's some Windows-specific behavior; adapting a SciPy CI job to work from your branch is easy to do if that would be helpful. Cheers, Ralf
On Wed, 2022-06-01 at 20:23 +0200, Ralf Gommers wrote:
On Wed, Jun 1, 2022 at 5:51 PM Sebastian Berg <sebastian@sipsolutions.net> wrote:
An important part of moving forward will be assessing the real world impact. To start that process, I have created a branch as a draft PR (at this time):
https://github.com/numpy/numpy/pull/21626
It is missing some parts, but should allow preliminary testing. The main missing part is that the integer warnings and errors are less strict than proposed in the NEP. It would be invaluable to get a better idea to what extent existing code, especially end-user code, is affected by the proposed changes.
Thanks Sebastian! For testing, did you already try with some of the usual suspects, or would it be helpful to use this branch on SciPy, Pandas, etc.? Also, do you expect it's useful to do platform-specific testing? I can imagine there's some Windows-specific behavior; adapting a SciPy CI job to work from your branch is easy to do if that would be helpful.
Yes, I have for SciPy. As noted in the PR, those look "mostly harmless" on first sight (not that it won't mean quite a bit of work, but I think it is manageable work). I would be more scared if there is a need to systematically vet all places where behavior (may have) changed. For example, in NumPy: np.median(np.float32([1, 2, 3, 4])) did return a float64 before and will now return a float32. I assume because somewhere we write: `(np.float64(3) + np.float32(2)) / 2`. There a few places that I suspect just need updated test or a bit of thought. And at least one or two that need to use the correct integer types (IIRC `scipy.io.idl` seems to be using some low precision or unsigned integer type internally and that leads to failures). I thought pandas would fail much harder, but it seems only had a 150- 200 failures (many probably clustered). One larger annoyance there is that one parametrized test runs into an infinite recursion which makes it run excruciatingly slow. In any case, I believe that it would be far more helpful if those more familiar with the libraries have a look at the failures. Not only do they know better how much impact they have; it also helps to get a feel for how painful the transition will be. One problem I see, is that I still expect that libraries are not the main issue. Using a SciPy integrator may end up with a float32 rather than a float64 result. In the SciPy test suite, that probably just means tweaking the test a bit. But that same change will also break someones script out there, somewhere. So the real affected persons (who may occasionally get less precise/breaking results) are likely the end-users rather than the libraries. Cheers, Sebastian
Cheers, Ralf _______________________________________________ NumPy-Discussion mailing list -- numpy-discussion@python.org To unsubscribe send an email to numpy-discussion-leave@python.org https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ Member address: sebastian@sipsolutions.net
For example, in NumPy:
np.median(np.float32([1, 2, 3, 4]))
did return a float64 before and will now return a float32. I assume because somewhere we write: `(np.float64(3) + np.float32(2)) / 2`.
Sorry, I missed this part of the discussion — I know the discussion centered around Python literals being weak, but for NumPy dtypes, I thought the larger dtype would always win? Indeed, reading the NEP I see: Expression: array([1.], float32) + array(1., float64) Old result: array([2.], float32) New result: array([2.], float64) which seems to contradict your statement above?
On Wed, 2022-06-01 at 18:37 -0500, Juan Nunez-Iglesias wrote:
For example, in NumPy:
np.median(np.float32([1, 2, 3, 4]))
did return a float64 before and will now return a float32. I assume because somewhere we write: `(np.float64(3) + np.float32(2)) / 2`.
Sorry, I missed this part of the discussion — I know the discussion centered around Python literals being weak, but for NumPy dtypes, I thought the larger dtype would always win?
Good reading carefully enough to notice :)! Sorry... my bad, the float64 is a typo. That should have read: (float32(3) + float32(2)) / 2 Which does show the change in behavior as described/discussed. If there was a float64 involved, of course the result would be/remain float64. - Sebastian
Indeed, reading the NEP I see:
Expression: array([1.], float32) + array(1., float64) Old result: array([2.], float32) New result: array([2.], float64)
which seems to contradict your statement above? _______________________________________________ NumPy-Discussion mailing list -- numpy-discussion@python.org To unsubscribe send an email to numpy-discussion-leave@python.org https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ Member address: sebastian@sipsolutions.net
On Wed, 2022-06-01 at 08:49 -0700, Sebastian Berg wrote:
Hi all,
I would like to share the first formal draft of
NEP 50: Promotion rules for Python scalars
with everyone. The full text can be found here:
As a brief update on this, as noted in the NEP (Note at the end of the abstract), our nightly wheels can now be used to try out the first changes here. To use you have to install NumPy from the nightlies: pip install -i https://pypi.anaconda.org/scipy-wheels-nightly/simple numpy --upgrade (I added an --upgrade in case you have a numpy version already). And then run Python e.g. with: NPY_PROMOTION_STATE=weak ipython Please see the NEP note for more information. As of now, especially the error when integers are too large is missing (this is added in an open PR). It would be very interesting to hear whether your use case (e.g. scripts) is affected by the changes! Right now mainly scipy and sklearn were tried, but for this change I believe the impact on end- users is much more important than that on libraries. Cheers, Sebastian
NEP 50 is an attempt to remove value-based casting/promotion. We wish to replace it with clearer rules for the resulting dtype when mixing NumPy arrays and Python scalars. As a brief example, the proposal allows the following (unchanged):
>>> np.array([1, 2, 3], dtype=np.int8) + 100 np.array([101, 102, 103], dtype=np.int8)
While clearing up confusion caused by the value-inspecting behavior that we see sometimes, such as:
>>> np.array([1, 2, 3], dtype=np.int8) + 300 np.array([301, 302, 303], dtype=np.int16) # note the int16
Where 300 is too large to fit an ``int8``. As well as removing the special behavior of 0-D arrays or NumPy scalars:
>>> res = np.array(1, dtype=np.int8) + 100 >>> res.dtype dtype('int64')
This is the continuation of a long discussion (see the "Discussion" section), including the poll I once posted: https://discuss.scientific-python.org/t/poll-future-numpy-behavior-when-mixi...
I would be happy for any feadback, be it just editorial or fundamental discussion. There are many alternatives which I have tried to capture in the NEP. So lets discuss here, or on discuss:
https://discuss.scientific-python.org/t/nep-50-promotion-rules-for-python-sc...
For smaller edits, don't hesitate to open a NumPy PR, or propose edits on my branch (you can use the edit button to create a PR):
https://github.com/seberg/numpy/blob/nep50/doc/neps/nep-0050-scalar-promotio...
An important part of moving forward will be assessing the real world impact. To start that process, I have created a branch as a draft PR (at this time):
https://github.com/numpy/numpy/pull/21626
It is missing some parts, but should allow preliminary testing. The main missing part is that the integer warnings and errors are less strict than proposed in the NEP. It would be invaluable to get a better idea to what extent existing code, especially end-user code, is affected by the proposed changes.
Thanks in advance for any input! This is a big, complicated proposal, but finding a way forward will hopefully clear up a source of confusion and inconsistencies that make both maintainers and users life harder.
Cheers,
Sebastian _______________________________________________ NumPy-Discussion mailing list -- numpy-discussion@python.org To unsubscribe send an email to numpy-discussion-leave@python.org https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ Member address: sebastian@sipsolutions.net
participants (3)
-
Juan Nunez-Iglesias
-
Ralf Gommers
-
Sebastian Berg