Mailman 3 June 2022 - NumPy-Discussion

mixed mode arithmetic
by Neal Becker 11 Jul '23

11 Jul '23

I've been browsing the numpy source. I'm wondering about mixed-mode arithmetic on arrays. I believe the way numpy handles this is that it never does mixed arithmetic, but instead converts arrays to a common type. Arguably, that might be efficient for a mix of say, double and float. Maybe not. But for a mix of complex and a scalar type (say, CDouble * Double), it's clearly suboptimal in efficiency. So, do I understand this correctly? If so, is that something we should improve?

4 6

Invalid value encoutered : how to prevent numpy.where to do this?
by Eric Emsellem 18 Feb '23

18 Feb '23

Dear all, I have a code using lots of "numpy.where" to make some constrained calculations as in: data = arange(10) result = np.where(data == 0, 0., 1./data) # or data1 = arange(10) data2 = arange(10)+1.0 result = np.where(data1 > data2, np.sqrt(data1-data2), np.sqrt(data2-data2)) which then produces warnings like: /usr/bin/ipython:1: RuntimeWarning: invalid value encountered in sqrt or for the first example: /usr/bin/ipython:1: RuntimeWarning: divide by zero encountered in divide How do I avoid these messages to appear? I know that I could in principle use numpy.seterr. However, I do NOT want to remove these warnings for other potential divide/multiply/sqrt etc errors. Only when I am using a "where", to in fact avoid such warnings! Note that the warnings only happen once, but since I am going to release that code, I would like to avoid the user to get such messages which are irrelevant here (because I am testing, with the where, when NOT to divide by zero or take a sqrt of a negative number). thanks! Eric

5 4

Documentation Team meeting - Monday June 8th
by Melissa Mendonça 04 Dec '22

04 Dec '22

Hi all! A reminder that on Monday, June 8, we have another documentation team meeting at 3PM UTC**. If you wish to join on Zoom, you need to use this link https://zoom.us/j/420005230 Here's the permanent hackmd document with the meeting notes: https://hackmd.io/oB_boakvRqKR-_2jRV-Qjg <https://www.google.com/url?q=https%3A%2F%2Fhackmd.io%2FoB_boakvRqKR-_2jRV-Q…> Hope to see you around (especially if you want to introduce yourself or discuss ideas for Google Season of Docs). ** You can click this link to get the correct time at your timezone: https://www.timeanddate.com/worldclock/fixedtime.html?msg=NumPy+Documentati… - Melissa

6 74

next NumPy Newcomers' Hour
by Inessa Pawson 02 Oct '22

02 Oct '22

Our next Newcomers' Hour will be held this Thursday, June 16th, at 4 pm UTC. Stop by to ask questions or just to say hi. To add to the meeting agenda the topics you’d like to discuss, follow the link: https://hackmd.io/3f3otyyuTte3FU9y3QzsLg?both Join the meeting via Zoom: https://us02web.zoom.us/j/87192457898 Cheers, Inessa Inessa Pawson Contributor Experience Lead | NumPy https://numpy.org/ Twitter: @inessapawson

1 3

An extension of the .npy file format
by Michael Siebert 25 Aug '22

25 Aug '22

Dear all, originally, I have planned to make an extension of the .npy file format a dedicated follow-up pull request, but I have upgraded my current request instead, since it was not as difficult to implement as I initially thought and probably a more straight-forward solution: https://github.com/numpy/numpy/pull/20321/ What is this pull request about? It is about appending to Numpy .npy files. Why? I see two main use cases: 1. creating .npy files larger than the main memory. They can, once finished, be loaded as memory maps 2. creating binary log files, which can be processed very efficiently without parsing Are there not other good file formats to do this? Theoretically yes, but practically they can be pretty complex and with very little tweaking .npy could do efficient appending too. Use case 1 is already covered by the Pip/Conda package npy-append-array I have created and getting the functionality directly into Numpy was the original goal of the pull request. This would have been possible without introducing a new file format version, just by adding some spare space in the header. During the pull request discussion it turned out that rewriting the header after each append would be desirable in case the writing program crashes to minimize data loss. Use case 2 however would highly profit from a new file format version as it would make rewriting the header unnecessary: since efficient appending can only take place along one axis, setting shape[-1] = -1 in case of Fortran order or shape[0] = -1 otherwise (default) in the .npy header on file creation could indicate that the array size is determined by the file size: when np.load (typically with memory mapping on) gets called, it constructs the ndarray with the actual shape by replacing the -1 in the constructor call. Otherwise, the header is not modified anymore, neither on append nor on file write finish. Concurrent appends to a single file would not be advisable and should be channeled through a single AppendArray instance. Concurrent reads while writes take place however should work relatively smooth: every time an np.load (ideally with mmap) is called, the ndarray would provide access to all data written until that time. Currently, my pull request provides: 1. A definition of .npy version 4.0 that supports -1 in the shape 2. implementations for fortran order and non-fortran order (default) including test cases 3. Updated np.load 4. The AppendArray class that does the actual appending Although there is a certain hassle with introducing a new .npy version, the changes themselves are very small. I could also implement a fallback mode for older Numpy installations, if someone is interested. What do you think about such a feature, would it make sense? Anyone available for some more code review? Best from Berlin, Michael PS thank you so far, I could improve my npy-append-array module as well and from what I have seen so far the Numpy code readability exceeded my already high expectations.

5 7

NumPy community meeting
by Inessa Pawson 31 Jul '22

31 Jul '22

The next NumPy community meeting will be held this Wednesday, May 25th at 18:00 (6 pm) UTC. Join us via Zoom: https://berkeley.zoom.us/j/762261535 Everyone is welcome and encouraged to attend. To add to the meeting agenda the topics you’d like to discuss, follow the link: https://hackmd.io/76o-IxCjQX2mOXO_wwkcpg?both Cheers, Inessa Inessa Pawson Contributor Experience Lead | NumPy

1 5

copy="never" discussion and no deprecation cycle?
by Sebastian Berg 21 Jul '22

21 Jul '22

Hi all, (sorry for the length, details/discussion below) On the triage call, there seemed a preference to just try to skip the deprecation and introduce `copy="never"`, `copy="if_needed"`, and `copy="always"` (i.e. string options for the `copy` keyword argument). Strictly speaking, this is against the typical policy (one year of warning/errors). But nobody could think of a reasonable chance that anyone actually uses it. (For me just "policy" will be enough of an argument to just take it slow.) BUT: If nobody has *any* concerns at all, I think we may just end up introducing the change right away. The PR is: https://github.com/numpy/numpy/pull/19173 ## The Feature There is the idea to add `copy=never` (or similar). This would modify the existing `copy` argument to make it a 3-way decision: * `copy=always` or `copy=True` to force a copy * `copy=if_needed` or `copy=False` to prefer no-copy behavior * `copy=never` to error when no-copy behavior is not possible (this ensures that a view is returned) this would affect the functions: * np.array(object, copy=...) * arr.astype(new_dtype, copy=...) * np.reshape(arr, new_shape, copy=...), and the method arr.reshape() * np.meshgrid and possibly Where `reshape` currently does not have the option and would benefit by allowing for `arr.reshape(-1, copy=never)`, which would guarantee a view. ## The Options We have three options that are currently being discussed: 1. We introduce a new `np.CopyMode` or `np.<something>.Copy` Enum with values `np.CopyMode.NEVER`, `np.CopyMode.IF_NEEDED`, and `np.CopyMode.ALWAYS` * Plus: No compatibility concerns * Downside(?): This would be a first in NumPy, and is untypical API due to that. 2. We introduce `copy="never"`, `copy="if_needed"` and `copy="always"` as strings (all other strings will be a `TypeError`): * Problem: `copy="never"` currently means `copy=True` (the opposite) Which means new code has to take care when it may run on older NumPy versions. And in theory could make old code return the wrong thing. * Plus: Strings are the typical for options in NumPy currently. 3. Same as 2. But we take it very slow: Make strings an error right now and only introduce the new options after two releases as per typical deprecation policy. ## Discussion We discussed it briefly today in the triage call and we were leaning towards strings. I was honestly expecting to converge to option 3 to avoid compatibility issues (mainly surprises with `copy="never"` on older versions). But considering how weird it is to currently pass `copy="never"`, the question was whether we should not change it with a release note. The probability of someone currently passing exactly one of those three (and no other) strings seems exceedingly small. Personally, I don't have a much of an opinion. But if *nobody* voices any concern about just changing the meaning of the string inputs, I think the current default may be to just do it. Cheers, Sebastian

12 28

NumPy Newcomers' Hour – Thursday, March 24th
by Inessa Pawson 10 Jul '22

10 Jul '22

Our next Newcomers' Hour will be held tomorrow, March 24th, at 4 pm UTC. We have no agenda this time. Stop by to ask questions or just to say hi. Join the meeting via Zoom: https://us02web.zoom.us/j/87192457898 Cheers, Inessa Inessa Pawson NumPy Contributor Experience Lead

1 4

NEP 50: Promotion rules for Python scalars
by Sebastian Berg 07 Jul '22

07 Jul '22

Hi all, I would like to share the first formal draft of NEP 50: Promotion rules for Python scalars with everyone. The full text can be found here: https://numpy.org/neps/nep-0050-scalar-promotion.html NEP 50 is an attempt to remove value-based casting/promotion. We wish to replace it with clearer rules for the resulting dtype when mixing NumPy arrays and Python scalars. As a brief example, the proposal allows the following (unchanged): >>> np.array([1, 2, 3], dtype=np.int8) + 100 np.array([101, 102, 103], dtype=np.int8) While clearing up confusion caused by the value-inspecting behavior that we see sometimes, such as: >>> np.array([1, 2, 3], dtype=np.int8) + 300 np.array([301, 302, 303], dtype=np.int16) # note the int16 Where 300 is too large to fit an ``int8``. As well as removing the special behavior of 0-D arrays or NumPy scalars: >>> res = np.array(1, dtype=np.int8) + 100 >>> res.dtype dtype('int64') This is the continuation of a long discussion (see the "Discussion" section), including the poll I once posted: https://discuss.scientific-python.org/t/poll-future-numpy-behavior-when-mix… I would be happy for any feadback, be it just editorial or fundamental discussion. There are many alternatives which I have tried to capture in the NEP. So lets discuss here, or on discuss: https://discuss.scientific-python.org/t/nep-50-promotion-rules-for-python-s… For smaller edits, don't hesitate to open a NumPy PR, or propose edits on my branch (you can use the edit button to create a PR): https://github.com/seberg/numpy/blob/nep50/doc/neps/nep-0050-scalar-promoti… An important part of moving forward will be assessing the real world impact. To start that process, I have created a branch as a draft PR (at this time): https://github.com/numpy/numpy/pull/21626 It is missing some parts, but should allow preliminary testing. The main missing part is that the integer warnings and errors are less strict than proposed in the NEP. It would be invaluable to get a better idea to what extent existing code, especially end-user code, is affected by the proposed changes. Thanks in advance for any input! This is a big, complicated proposal, but finding a way forward will hopefully clear up a source of confusion and inconsistencies that make both maintainers and users life harder. Cheers, Sebastian

3 5

next NumPy Newcomers Hour
by Inessa Pawson 05 Jul '22

05 Jul '22

The next NumPy Newcomers Hour will be held this Thursday, June 30th at 4 pm UTC. Ryan C. Cooper, an assistant professor-in-residence at the University of Connecticut (Mansfield, Connecticut, USA), will share how he uses NumPy in his Engineering classes, from individual student research to semester-long courses. We will talk about lessons learned and key strategies to motivate and engage new Python users. Join us via Zoom: https://us02web.zoom.us/j/87192457898 Cheers, Inessa Inessa Pawson Contributor Experience Lead | NumPy https://numpy.org/ Twitter: @inessapawson

1 1