Mailman 3 July 2021 - NumPy-Discussion

mixed mode arithmetic
by Neal Becker 11 Jul '23

11 Jul '23

I've been browsing the numpy source. I'm wondering about mixed-mode arithmetic on arrays. I believe the way numpy handles this is that it never does mixed arithmetic, but instead converts arrays to a common type. Arguably, that might be efficient for a mix of say, double and float. Maybe not. But for a mix of complex and a scalar type (say, CDouble * Double), it's clearly suboptimal in efficiency. So, do I understand this correctly? If so, is that something we should improve?

4 6

Invalid value encoutered : how to prevent numpy.where to do this?
by Eric Emsellem 18 Feb '23

18 Feb '23

Dear all, I have a code using lots of "numpy.where" to make some constrained calculations as in: data = arange(10) result = np.where(data == 0, 0., 1./data) # or data1 = arange(10) data2 = arange(10)+1.0 result = np.where(data1 > data2, np.sqrt(data1-data2), np.sqrt(data2-data2)) which then produces warnings like: /usr/bin/ipython:1: RuntimeWarning: invalid value encountered in sqrt or for the first example: /usr/bin/ipython:1: RuntimeWarning: divide by zero encountered in divide How do I avoid these messages to appear? I know that I could in principle use numpy.seterr. However, I do NOT want to remove these warnings for other potential divide/multiply/sqrt etc errors. Only when I am using a "where", to in fact avoid such warnings! Note that the warnings only happen once, but since I am going to release that code, I would like to avoid the user to get such messages which are irrelevant here (because I am testing, with the where, when NOT to divide by zero or take a sqrt of a negative number). thanks! Eric

5 4

Documentation Team meeting - Monday June 8th
by Melissa Mendonça 04 Dec '22

04 Dec '22

Hi all! A reminder that on Monday, June 8, we have another documentation team meeting at 3PM UTC**. If you wish to join on Zoom, you need to use this link https://zoom.us/j/420005230 Here's the permanent hackmd document with the meeting notes: https://hackmd.io/oB_boakvRqKR-_2jRV-Qjg <https://www.google.com/url?q=https%3A%2F%2Fhackmd.io%2FoB_boakvRqKR-_2jRV-Q…> Hope to see you around (especially if you want to introduce yourself or discuss ideas for Google Season of Docs). ** You can click this link to get the correct time at your timezone: https://www.timeanddate.com/worldclock/fixedtime.html?msg=NumPy+Documentati… - Melissa

6 74

copy="never" discussion and no deprecation cycle?
by Sebastian Berg 21 Jul '22

21 Jul '22

Hi all, (sorry for the length, details/discussion below) On the triage call, there seemed a preference to just try to skip the deprecation and introduce `copy="never"`, `copy="if_needed"`, and `copy="always"` (i.e. string options for the `copy` keyword argument). Strictly speaking, this is against the typical policy (one year of warning/errors). But nobody could think of a reasonable chance that anyone actually uses it. (For me just "policy" will be enough of an argument to just take it slow.) BUT: If nobody has *any* concerns at all, I think we may just end up introducing the change right away. The PR is: https://github.com/numpy/numpy/pull/19173 ## The Feature There is the idea to add `copy=never` (or similar). This would modify the existing `copy` argument to make it a 3-way decision: * `copy=always` or `copy=True` to force a copy * `copy=if_needed` or `copy=False` to prefer no-copy behavior * `copy=never` to error when no-copy behavior is not possible (this ensures that a view is returned) this would affect the functions: * np.array(object, copy=...) * arr.astype(new_dtype, copy=...) * np.reshape(arr, new_shape, copy=...), and the method arr.reshape() * np.meshgrid and possibly Where `reshape` currently does not have the option and would benefit by allowing for `arr.reshape(-1, copy=never)`, which would guarantee a view. ## The Options We have three options that are currently being discussed: 1. We introduce a new `np.CopyMode` or `np.<something>.Copy` Enum with values `np.CopyMode.NEVER`, `np.CopyMode.IF_NEEDED`, and `np.CopyMode.ALWAYS` * Plus: No compatibility concerns * Downside(?): This would be a first in NumPy, and is untypical API due to that. 2. We introduce `copy="never"`, `copy="if_needed"` and `copy="always"` as strings (all other strings will be a `TypeError`): * Problem: `copy="never"` currently means `copy=True` (the opposite) Which means new code has to take care when it may run on older NumPy versions. And in theory could make old code return the wrong thing. * Plus: Strings are the typical for options in NumPy currently. 3. Same as 2. But we take it very slow: Make strings an error right now and only introduce the new options after two releases as per typical deprecation policy. ## Discussion We discussed it briefly today in the triage call and we were leaning towards strings. I was honestly expecting to converge to option 3 to avoid compatibility issues (mainly surprises with `copy="never"` on older versions). But considering how weird it is to currently pass `copy="never"`, the question was whether we should not change it with a release note. The probability of someone currently passing exactly one of those three (and no other) strings seems exceedingly small. Personally, I don't have a much of an opinion. But if *nobody* voices any concern about just changing the meaning of the string inputs, I think the current default may be to just do it. Cheers, Sebastian

12 28

[Feature Request] Add alias of np.concatenate as np.concat
by Iordanis Fostiropoulos 10 May '22

10 May '22

In regard to Feature Request: https://github.com/numpy/numpy/issues/16469 It was suggested to sent to the mailing list. I think I can make a strong point as to why the support for this naming convention would make sense. Such as it would follow other frameworks that often work alongside numpy such as tensorflow. For backward compatibility, it can simply be an alias to np.concatenate I often convert portions of code from tf to np, it is as simple as changing the base module from tf to np. e.g. np.expand_dims -> tf.expand_dims. This is done either in debugging (e.g. converting tf to np without eager execution to debug portion of the code), or during prototyping, e.g. develop in numpy and convert in tf. I find myself more than at one occasion to getting syntax errors because of this particular function np.concatenate. It is unnecessarily long. I imagine there are more people that also run into the same problems. Pandas uses concat (torch on the other extreme uses simply cat, which I don't think is as descriptive).

7 6

request to remove the numpy-aarch64 package from PyPI
by Ralf Gommers 14 Mar '22

14 Mar '22

Hi all, FYI, I noticed this package that claimed to be maintained by us: https://pypi.org/project/numpy-aarch64/. That's not ours, so I tried to contact the author (no email provided, but guessed the same username on GitHub) and asked to remove it: https://github.com/tomasriv/DNA_Sequence/issues/1. There are a very large number of packages with "numpy" in the name on PyPI, and there's no way we can audit/police that effectively, but if it's a rebuild that pretends like it's official then I think it's worth doing something about. It could contain malicious code for all we know. Cheers, Ralf

5 7

Fwd: ndarray should offer __format__ that can adjust precision
by Ivan Gonzalez 03 Dec '21

03 Dec '21

It would be nice to be able to use the Python syntax we already use to format the precision of floating numbers in numpy: >>> a = np.array([-np.pi, np.pi]) >>> print(f"{a:+.2f}") [-3.14 +3.14] This is particularly useful when you have large arrangements. The problem is that if you want to do it today, it is not implemented: >>> print(f"{a:+.2f}") Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: unsupported format string passed to numpy.ndarray.__format__ In this PR (https://github.com/numpy/numpy/pull/19550) I propose a very basic formatting implementation for numeric numbers that uses `array2string` just like it currently does `str` At first, since we are only considering formatting the numeric type, floating numbers specifically, we are only interested in being able to change the precision, the sign, and possibly the rounding or truncation. Since the `array2string` function already does everything we need, we only need to implement the` __format__` function of the `ndarray` class which parses a predefined format (similar to the one already used by Python for built-in data types) to indicate the parameters before said. I propose a mini format specification inspired in the [Format Specification Mini-Language](https://docs.python.org/3/library/string.html#formatspec). ``` format_spec ::= [sign][.precision][type] sign ::= "+" | "-" | " " precision ::= [0-9]+ type ::= "f" | "e" ``` We are going to consider only 3 arguments of the `array2string` function:` precision`, `suppress_small`,` sign`. In particular, the `type` token sets the` suppress_small` argument to True when the type is `f` and False when it is `e`. This is in order to mimic Python's behavior in truncating decimals when using the fixed-point notation. As @brandon-rhodes said in gh-5543, the behavior when you try to format an array containing Python objects, the behavior should be the same as Python has implemented by default in the `object` class: ` format (a, "") ` should be equivalent to `str (a)` and `format(a, "not empty")` should raise an exception. What remains to be defined is the behavior when trying to format an array with a non-numeric data type (`np.numeric`) other than `np.object_`. Should we raise an exception? In my opinion yes, since in the future formatting is extended -- for example, for dates -- people are aware that before that was not implemented. I'm open to suggestions. - Ivan

3 2

Newcomer's meeting - later today (4pm UTC)!
by Melissa Mendonça 01 Dec '21

01 Dec '21

Hi all! Sorry for the late notice - our next Newcomer's Meeting is today, * July 15, at 4pm UTC.* This is an informal meeting with no agenda to ask questions, get to know other people and (hopefully) figure out ways to contribute to NumPy. Feel free to join if you are lurking around but found it hard to start contributing - we'll do our best to support you. If you wish to join on Zoom, use this link: https://zoom.us/j/6345425936 Hope to see you around! ** You can click this link to get the correct time at your timezone: https://www.timeanddate.com/worldclock/fixedtime.html?msg=NumPy+Newcomer%27… <https://www.timeanddate.com/worldclock/fixedtime.html?msg=NumPy+Newcomer%27…> *** You can add the NumPy community calendar to your google calendar by clicking this link: https://calendar.google.com/calendar /r?cid=YmVya2VsZXkuZWR1X2lla2dwaWdtMjMyamJobGRzZmIyYzJqODFjQGdyb3VwLmNhbGVuZGFyLmdvb2dsZS5jb20 - Melissa

1 7

Floating point precision expectations in NumPy
by Sebastian Berg 19 Aug '21

19 Aug '21

Hi all, there is a proposal to add some Intel specific fast math routine to NumPy: https://github.com/numpy/numpy/pull/19478 part of numerical algorithms is that there is always a speed vs. precision trade-off, giving a more precise result is slower. So there is a question what the general precision expectation should be in NumPy. And how much is it acceptable to diverge in the precision/speed trade-off depending on CPU/system? I doubt we can formulate very clear rules here, but any input on what precision you would expect or trade-offs seem acceptable would be appreciated! Some more details ----------------- This is mainly interesting e.g. for functions like logarithms, trigonometric functions, or cubic roots. Some basic functions (multiplication, addition) are correct as per IEEE standard and give the best possible result, but these are typically only correct within very small numerical errors. This is typically measured as "ULP": https://en.wikipedia.org/wiki/Unit_in_the_last_place where 0.5 ULP would be the best possible result. Merging the PR may mean relaxing the current precision slightly in some places. In general Intel advertises 4 ULP of precision (although the actual precision for most functions seems better). Here are two tables, one from glibc and one for the Intel functions: https://www.gnu.org/software/libc/manual/html_node/Errors-in-Math-Functions… (Mainly the LA column) https://software.intel.com/content/www/us/en/develop/documentation/onemkl-v… Different implementation give different accuracy, but formulating some guidelines/expectation (or referencing them) would be useful guidance. For basic

5 6

reducing effort spent on wheel builds?
by Ralf Gommers 09 Aug '21

09 Aug '21

Hey all, This whole thread is quite interesting: https://twitter.com/zooba/status/1415440484181417998. Given how much effort we are spending on really niche wheel builds, I’m wondering if we should just draw a line somewhere: - we do what we do now for the main platforms: Windows, Linux (x86, aarch64), macOS, *but*: - no wheels for ppc64le - no wheels for Alpine Linux - no wheels for PyPy - no wheels for Raspberry Pi, AIX or whatever other niche thing comes next. - drop 32-bit Linux in case it is becoming an annoyance. This is not an actual proposal (yet) and I should sleep on this some more, but I've seen Chuck and Matti burn a lot of time on the numpy-wheels repo again recently, and I've done the same for SciPy. The situation is not very sustainable and needs a rethink. The current recipe is "someone who cares about a platform writes a PEP, then pip/wheel add a platform tag for it (very little work), and then the maintainers of each Python package are now responsible for wheel builds (a ton of work)". Most of these platforms have package managers, which are all more capable than pip et al., and if they don't then wheels can be hosted elsewhere (example: https://www.piwheels.org/). And then there's Conda, Nix, Spack, etc. too of course. Drawing a line somewhere distributes the workload, where packagers who care about some platform and have better tools at hand can do the packaging, and maintainers can go do something with more impact like write new code or review PRs. <end of brainwave> Cheers, Ralf

8 14