Mailman 3 July 2021 - NumPy-Discussion

Proposal to accept NEP 49: Data allocation strategies
by Matti Picus July 18, 2021

July 18, 2021

Here is the current rendering of the NEP:https://numpy.org/neps/nep-0049.html The mailing list discussion, started on April 20 did not bring up any objections to the proposal, nor were there objections in the discussion around the text of the NEP. There were questions around details of the implementation, thank you reviewers for carefully looking at them and suggesting improvements. If there are no substantive objections within 7 days from this email, then the NEP will be accepted; see NEP 0 for more details. Matti

5 13

sinpi/cospi trigonometric functions
by Tom Programming July 14, 2021

July 14, 2021

Hi all, (I am very new to this mail list so please cut me some slack) trigonometric functions like sin(x) are usually implemented as: 1. Some very complicated function that does bit twiddling and basically computes the reminder of x by pi/2. An example in http://www.netlib.org/fdlibm/e_rem_pio2.c (that calls http://www.netlib.org/fdlibm/k_rem_pio2.c ). i.e. ~500 lines of branching C code. The complexity arises in part because for big values of x the subtraction becomes more and more ill defined, due to x being represented in binary base to which an irrational number has to subtracted and consecutive floating point values being more and more apart for higher absolute values. 2. A Taylor series for the small values of x, 3. Plus some manipulation to get the correct branch, deal with subnormal numbers, deal with -0, etc... If we used a function like sinpi(x) = sin(pi*x) part (1) can be greatly simplified, since it becomes trivial to separate the reminder of the division by pi/2. There are gains both in the accuracy and the performance. In large parts of the code anyways there is a pi inside the argument of sin since it is common to compute something like sin(2*pi*f*t) etc. So I wonder if it is feasible to implement those functions in numpy. To strengthen my argument I'll note that the IEEE standard, too, defines ( https://irem.univ-reunion.fr/IMG/pdf/ieee-754-2008.pdf ) the functions sinPi, cosPi, tanPi, atanPi, atan2Pi. And there are existing implementations, for example, in Julia ( https://github.com/JuliaLang/julia/blob/6aaedecc447e3d8226d5027fb13d0c3cbfb… ) and the Boost C++ Math library ( https://www.boost.org/doc/libs/1_54_0/boost/math/special_functions/sin_pi.h… ) And that issue caused by apparently inexact calculations have been raised in the past in various forums ( https://stackoverflow.com/questions/20903384/numpy-sinpi-returns-negative-v… https://stackoverflow.com/questions/51425732/how-to-make-sinpi-and-cospi-2-… https://www.reddit.com/r/Python/comments/2g99wa/why_does_python_not_make_si… ... ) PS: to be nitpicky I see that most implementation implement sinpi as sin(pi*x) for small values of x, i.e. they multiply x by pi and then use the same coefficients for the Taylor series as the canonical sin. A multiply instruction could be spared, in my opinion, by storing different Taylor expansion number coefficients tailored for the sinpi function. It is not clear to me if it is not done because the performance gain is small, because I am wrong about something, or because those 6 constants of the Taylor expansion have a "sacred aura" about them and nobody wants to enter deeply into this. PPS: I am aware that it could be seen as rude to request a feature from an open source project but I am asking if there is a point in providing these functions in the first place. I could try to provide implementations for them in some time if it is indeed a worthwhile effort Yours, Tom.

3 3

NumPy Development Meeting Wednesday
by Sebastian Berg July 14, 2021

July 14, 2021

Hi all, Our bi-weekly triage-focused NumPy development meeting is Wednesday, July 14th at 9 am Pacific Time (16:00 UTC). Everyone is invited to join in and edit the work-in-progress meeting topics and notes: https://hackmd.io/68i_JvOYQfy9ERiHgXMPvg I encourage everyone to notify us of issues or PRs that you feel should be prioritized, discussed, or reviewed. Best regards Sebastian

1 0

NumPy's BLAS library on macOS?
by Jerry Morrison July 9, 2021

July 9, 2021

Would someone please answer installation questions about NumPy's BLAS on macOS? I'm not finding the answers in the release notes <https://github.com/numpy/numpy/releases>, the PR <https://github.com/numpy/numpy/pull/18874> source, the docs <https://numpy.org/doc/1.21/>, or Stack Overflow <https://stackoverflow.com/search?tab=newest&q=%5bnumpy%5d%20openblas>. Q1. The NumPy 1.21.0 release note <https://github.com/numpy/numpy/releases/tag/v1.21.0> says "This change enables the Accelerate Framework as an option on macOS." How to set that option on/off? Q2. How to determine if NumPy uses Accelerate vs. its internal copy of OpenBLAS? After installing a wheel, `numpy.show_config()` shows the openblas_info library_dirs et al as '/usr/local/lib'. Neither '/usr/local/lib/' nor 'site-packages/numpy/' contains a *blas*.so library (for Python 3.8.* on macOS 10.14.6) but the doc <https://numpy.org/install/> says "The OpenBLAS libraries are included in the wheel." Q3. How to pip install NumPy 1.21.0 in a way that ensures it uses its embedded OpenBLAS on macOS as on Linux? I'm aiming for as portable results as possible. Or should we link NumPy to an external OpenBLAS via `pip install numpy --no-binary numpy==1.21.0` with `~/.numpy-site.cfg`? (Ditto for SciPy.) Q4. Can the new NPY_* environment variables select specific BLAS & LAPACK libraries through pip install, and perhaps install faster than building NumPy, SciPy, etc. from source? How to do that? Q5. Is NumPy's embedded OpenBLAS compiled by gcc or clang? Is that controllable via `pip install`? Thank you!

3 5

Floating point warnings/errors for comparisons, etc.?
by Sebastian Berg July 7, 2021

July 7, 2021

Hi all, I am trying to clean up our floating point warning handling: https://github.com/numpy/numpy/pull/19316 And an upcoming PR to remove most floating point error clearing. There are some things I am unsure about, though. Part of why it got so confusing, is that GCC seemed to have fixed/changed their behaviour for comparisons with NaN. In GCC 7 it did not give the warning, but GCC 8 does (correctly). Comparison with NaN ------------------- IEEE says that the default comparisons should give warnings for comparison with NaN (except == and !=). And notes that an alternative should be provided (C99 does this with `isless`, etc.). We currently break this by suppressing invalid value warnings for all comparisons (e.g. also `NaN > 0.`). We can easily do either version (aside possibly compiler issues). Making it give warnings had one test case fail for `quantile`, which uses the pattern: if not (np.all(q >= 0) and np.all(q <= 1)): raise ValueError("bad q") This would additionally (and first) give an "invalid value" warning and require `np.errstate(invalid="ignore") to suppress it. I dislike diverging from IEEE, but Python also does not warn for [1]: float("nan") >= 0 and presumably the user either explicitly created the NaN or has seen a warning earlier during computation when the NaN was first created. (IEEE does not distinguish creating a new NaN with `0./0.` from a comparison with `NaN > 0.` [2]. So we can't easily make this settable via `np.errstate` or so.) So, should we ignore the warning here? Compiler Issues --------------- Some compilers may get flags wrong. How much effort do we want to spend on details few users will notice? My current problem is `1 % 0` and `divmod(1, 0)`. The MacOS/clang CI does not set the correct "invalid value" warning flag. (The remainder is NaN, so a new NaN is created and that should be indicated but the C99 `fmod` does not set it.) Signalling NaNs --------------- I propose dropping any special concern for signalling NaNs. Which means they raise almost always. Although, rarely we might suppress the warning if we do it manually for normal NaNs [0]. We have two tests which check for behaviour on signalling NaNs. I could not find having any logic to them besides someone being surprised at signalling NaN behaviour at the time – not based on use-cases. Even functions like `isnan` give a warning for signalling NaNs! The "fix" for anyone having sNaN's is to convert them to qNaNs as early as possible. Which e.g. `np.positive(arr, out=arr)` should probably do. If this becomes an issue, maybe we could have an explicit ufunc. Cheers, Sebastian [0] Mainly it seems SSE2 does not provide some non-error comparisons. So trying to avoid manually clearing errors might make some SSE code considerable slower (e.g. `isfinite`, `np.min`). [1] Probably Python just does not check the CPU warning flags [2] https://www.gnu.org/software/libc/manual/html_node/FP-Exceptions.html

1 0

NumPy Community Meeting Wednesday
by Sebastian Berg July 7, 2021

July 7, 2021

Hi all, There will be a NumPy Community meeting Wednesday July 7th at 20:00 UTC. Everyone is invited and encouraged to join in and edit the work-in-progress meeting topics and notes at: https://hackmd.io/76o-IxCjQX2mOXO_wwkcpg?both Best wishes Sebastian

1 0

[ANN] Software job opportunity in clean energy
by Dr. Mark Alexander Mikofski PhD July 5, 2021

July 5, 2021

Dear Pythonistas, DNV Energy USA is looking for an experienced software engineer to help accelerate the renewable energy transition. Do you know any software engineers interested in clean energy? Would you mind sharing the following link with your network? https://www.linkedin.com/jobs/view/2574048777 Thank you! Mark A. Mikofski

1 0

is_triangular, is_diagonal, is_symmetric et al. in NumPy or SciPy linalg
by Ilhan Polat July 2, 2021

July 2, 2021

Dear all, I'm writing some helper Cythpm functions for scipy.linalg which is kinda performant and usable. And there is still quite some wiggle room for more. In many linalg routines there is a lot of performance benefit if the structure can be discovered in a cheap and reliable way at the outset. For example if symmetric then eig can delegate to eigh or if triangular then triangular solvers can be used in linalg.solve and lstsq so forth Here is the Cythonized version for Jupyter notebook to paste to discover the lower/upper bandwidth of square array A that competes well with A != 0 just to use some low level function (note the latter returns an array hence more cost is involved) There is a higher level supervisor function that checks C-contiguousness otherwise specializes to different versions of it Initial cell %load_ext Cython %load_ext line_profiler import cython import line_profiler Then another cell %%cython # cython: language_level=3 # cython: linetrace=True # cython: binding = True # distutils: define_macros=CYTHON_TRACE=1 # distutils: define_macros=CYTHON_TRACE_NOGIL=1 cimport cython cimport numpy as cnp import numpy as np import line_profiler ctypedef fused np_numeric_t: cnp.int8_t cnp.int16_t cnp.int32_t cnp.int64_t cnp.uint8_t cnp.uint16_t cnp.uint32_t cnp.uint64_t cnp.float32_t cnp.float64_t cnp.complex64_t cnp.complex128_t cnp.int_t cnp.long_t cnp.longlong_t cnp.uint_t cnp.ulong_t cnp.ulonglong_t cnp.intp_t cnp.uintp_t cnp.float_t cnp.double_t cnp.longdouble_t @cython.linetrace(True) @cython.initializedcheck(False) @cython.boundscheck(False) @cython.wraparound(False) cpdef inline (int, int) band_check_internal(np_numeric_t[:, ::1]A): cdef Py_ssize_t n = A.shape[0], lower_band = 0, upper_band = 0, r, c cdef np_numeric_t zero = 0 for r in xrange(n): # Only bother if outside the existing band: for c in xrange(r-lower_band): if A[r, c] != zero: lower_band = r - c break for c in xrange(n - 1, r + upper_band, -1): if A[r, c] != zero: upper_band = c - r break return lower_band, upper_band Final cell for use-case --------------- # Make arbitrary lower-banded array n = 50 # array size k = 3 # k'th subdiagonal R = np.zeros([n, n], dtype=np.float32) R[[x for x in range(n)], [x for x in range(n)]] = 1 R[[x for x in range(n-1)], [x for x in range(1,n)]] = 1 R[[x for x in range(1,n)], [x for x in range(n-1)]] = 1 R[[x for x in range(k,n)], [x for x in range(n-k)]] = 2 Some very haphazardly put together metrics %timeit band_check_internal(R) 2.59 µs ± 84.7 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) %timeit np.linalg.solve(R, zzz) 824 µs ± 6.24 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) %timeit R != 0. 1.65 µs ± 43.1 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each) So the worst case cost is negligible in general (note that the given code is slower as it uses the fused type however if I go with tempita standalone version is faster) Two questions: 1) This is missing np.half/float16 functionality since any arithmetic with float16 is might not be reliable including nonzero check. IS it safe to view it as np.uint16 and use that specialization? I'm not sure about the sign bit hence the question. I can leave this out since almost all linalg suite rejects this datatype due to well-known lack of supprt. 2) Should this be in NumPy or SciPy linalg? It is quite relevant to be on SciPy but then again this stuff is purely about array structures. But if the opinion is for NumPy then I would need a volunteer because NumPy codebase flies way above my head. All feedback welcome Best ilhan

3 4

`keepdims=True` for argmin/argmx and C-API `PyArray_ArgMaxWithKeepdims`
by Sebastian Berg July 2, 2021

July 2, 2021

Hi all, The PR https://github.com/numpy/numpy/pull/19211 proposes to extend argmin and argmax with a `keepdims=False` keyword-only argument. This is a standard argument in NumPy, so it is a small API addition. The PR also proposes to add: * `PyArray_ArgMinWithKeepdims` * `PyArray_ArgMaxWithKeepdims` in the C-API. We have barely extended the C-API in a very long time, so if anyone has concerns, we could pull that out again [1]. Otherwise, this should go in soon, and we will have `keepdims` for both of those functions in the next release :). Cheers, Sebastian [1] I do not see this is much of a maintenance concern, since the original function is just a one-line wrapper of the new one. The API is fairly large and it probably is not used much. So it doesn't feel important to add to me. Overally, I just don't have a preference.

4 5