Mailman 3 May 2021 - NumPy-Discussion

mixed mode arithmetic
by Neal Becker 11 Jul '23

11 Jul '23

I've been browsing the numpy source. I'm wondering about mixed-mode arithmetic on arrays. I believe the way numpy handles this is that it never does mixed arithmetic, but instead converts arrays to a common type. Arguably, that might be efficient for a mix of say, double and float. Maybe not. But for a mix of complex and a scalar type (say, CDouble * Double), it's clearly suboptimal in efficiency. So, do I understand this correctly? If so, is that something we should improve?

4 6

Invalid value encoutered : how to prevent numpy.where to do this?
by Eric Emsellem 18 Feb '23

18 Feb '23

Dear all, I have a code using lots of "numpy.where" to make some constrained calculations as in: data = arange(10) result = np.where(data == 0, 0., 1./data) # or data1 = arange(10) data2 = arange(10)+1.0 result = np.where(data1 > data2, np.sqrt(data1-data2), np.sqrt(data2-data2)) which then produces warnings like: /usr/bin/ipython:1: RuntimeWarning: invalid value encountered in sqrt or for the first example: /usr/bin/ipython:1: RuntimeWarning: divide by zero encountered in divide How do I avoid these messages to appear? I know that I could in principle use numpy.seterr. However, I do NOT want to remove these warnings for other potential divide/multiply/sqrt etc errors. Only when I am using a "where", to in fact avoid such warnings! Note that the warnings only happen once, but since I am going to release that code, I would like to avoid the user to get such messages which are irrelevant here (because I am testing, with the where, when NOT to divide by zero or take a sqrt of a negative number). thanks! Eric

5 4

Documentation Team meeting - Monday June 8th
by Melissa Mendonça 04 Dec '22

04 Dec '22

Hi all! A reminder that on Monday, June 8, we have another documentation team meeting at 3PM UTC**. If you wish to join on Zoom, you need to use this link https://zoom.us/j/420005230 Here's the permanent hackmd document with the meeting notes: https://hackmd.io/oB_boakvRqKR-_2jRV-Qjg <https://www.google.com/url?q=https%3A%2F%2Fhackmd.io%2FoB_boakvRqKR-_2jRV-Q…> Hope to see you around (especially if you want to introduce yourself or discuss ideas for Google Season of Docs). ** You can click this link to get the correct time at your timezone: https://www.timeanddate.com/worldclock/fixedtime.html?msg=NumPy+Documentati… - Melissa

6 74

[Feature Request] Add alias of np.concatenate as np.concat
by Iordanis Fostiropoulos 10 May '22

10 May '22

In regard to Feature Request: https://github.com/numpy/numpy/issues/16469 It was suggested to sent to the mailing list. I think I can make a strong point as to why the support for this naming convention would make sense. Such as it would follow other frameworks that often work alongside numpy such as tensorflow. For backward compatibility, it can simply be an alias to np.concatenate I often convert portions of code from tf to np, it is as simple as changing the base module from tf to np. e.g. np.expand_dims -> tf.expand_dims. This is done either in debugging (e.g. converting tf to np without eager execution to debug portion of the code), or during prototyping, e.g. develop in numpy and convert in tf. I find myself more than at one occasion to getting syntax errors because of this particular function np.concatenate. It is unnecessarily long. I imagine there are more people that also run into the same problems. Pandas uses concat (torch on the other extreme uses simply cat, which I don't think is as descriptive).

7 6

Re: [Numpy-discussion] example reading binary Fortran file
by Neil Martinsen-Burrell 22 Jul '21

22 Jul '21

David Froger <david.froger.info <at> gmail.com> writes: > Hy,My question is about reading Fortran binary file (oh no this question > again...) I've posted this before, but I finally got it cleaned up for the Cookbook. For this purpose I use a subclass of file that has methods for reading unformatted Fortran data. See http://www.scipy.org/Cookbook/FortranIO/FortranFile. I'd gladly see this in numpy or scipy somewhere, but I'm not sure where it belongs. > program makeArray > implicit none > integer,parameter:: nx=10,ny=20 > real(4),dimension(nx,ny):: ux,uy,p > integer :: i,j > open(11,file='uxuyp.bin',form='unformatted') > do i = 1,nx > do j = 1,ny > ux(i,j) = real(i*j) > uy(i,j) = real(i)/real(j) > p (i,j) = real(i) + real(j) > enddo > enddo > write(11) ux,uy > write(11) p > close(11) > end program makeArray When I run the above program compiled with gfortran on my Intel Mac, I can read it back with:: >>> import numpy as np >>> from fortranfile import FortranFile >>> f=FortranFile('uxuyp.bin', endian='<') >>> uxuy = f.readReals(prec='f') # 'f' for default reals >>> len(uxuy) 400 >>> ux = np.array(uxuy[:200]).reshape((20,10)).T >>> uy = np.array(uxuy[200:]).reshape((20,10)).T >>> p = f.readReals('f').reshape((20,10)).T >>> ux array([[ 1., 2., 3., 4., 5., 6., 7., 8., 9., 10., 11., 12., 13., 14., 15., 16., 17., 18., 19., 20.], [ 2., 4., 6., 8., 10., 12., 14., 16., 18., 20., 22., 24., 26., 28., 30., 32., 34., 36., 38., 40.], [ 3., 6., 9., 12., 15., 18., 21., 24., 27., 30., 33., 36., 39., 42., 45., 48., 51., 54., 57., 60.], [ 4., 8., 12., 16., 20., 24., 28., 32., 36., 40., 44., 48., 52., 56., 60., 64., 68., 72., 76., 80.], [ 5., 10., 15., 20., 25., 30., 35., 40., 45., 50., 55., 60., 65., 70., 75., 80., 85., 90., 95., 100.], [ 6., 12., 18., 24., 30., 36., 42., 48., 54., 60., 66., 72., 78., 84., 90., 96., 102., 108., 114., 120.], [ 7.,Proxy-Connection: keep-alive Cache-Control: max-age=0 14., 21., 28., 35., 42., 49., 56., 63., 70., 77., 84., 91., 98., 105., 112., 119., 126., 133., 140.], [ 8., 16., 24., 32., 40., 48., 56., 64., 72., 80., 88., 96., 104., 112., 120., 128., 136., 144., 152., 160.], [ 9., 18., 27., 36., 45., 54., 63., 72., 81., 90., 99., 108., 117., 126., 135., 144., 153., 162., 171., 180.], [ 10., 20., 30., 40., 50., 60., 70., 80., 90., 100., 110., 120., 130., 140., 150., 160., 170., 180., 190., 200.]]) >>> uy array([[ 1. , 0.5 , 0.33333334, 0.25 , 0.2 , 0.16666667, 0.14285715, 0.125 , 0.11111111, 0.1 , 0.09090909, 0.08333334, 0.07692308, 0.07142857, 0.06666667, 0.0625 , 0.05882353, 0.05555556, 0.05263158, 0.05 ], [ 2. , 1. , 0.66666669, 0.5 , 0.40000001, 0.33333334, 0.2857143 , 0.25 , 0.22222222, 0.2 , 0.18181819, 0.16666667, 0.15384616, 0.14285715, 0.13333334, 0.125 , 0.11764706, 0.11111111, 0.10526316, 0.1 ], [ 3. , 1.5 , 1. , 0.75 , 0.60000002, 0.5 , 0.42857143, 0.375 , 0.33333334, 0.30000001, 0.27272728, 0.25 , 0.23076923, 0.21428572, 0.2 , 0.1875 , 0.17647059, 0.16666667, 0.15789473, 0.15000001], [ 4. , 2. , 1.33333337, 1. , 0.80000001, 0.66666669, 0.5714286 , 0.5 , 0.44444445, 0.40000001, 0.36363637, 0.33333334, 0.30769232, 0.2857143 , 0.26666668, 0.25 , 0.23529412, 0.22222222, 0.21052632, 0.2 ], [ 5. , 2.5 , 1.66666663, 1.25 , 1. , 0.83333331, 0.71428573, 0.625 , 0.55555558, 0.5 , 0.45454547, 0.41666666, 0.38461539, 0.35714287, 0.33333334, 0.3125 , 0.29411766, 0.27777779, 0.2631579 , 0.25 ], [ 6. , 3. , 2. , 1.5 , 1.20000005, 1. , 0.85714287, 0.75 , 0.66666669, 0.60000002, 0.54545456, 0.5 , 0.46153846, 0.42857143, 0.40000001, 0.375 , 0.35294119, 0.33333334, 0.31578946, 0.30000001], [ 7. , 3.5 , 2.33333325, 1.75 , 1.39999998, 1.16666663, 1. , 0.875 , 0.77777779, 0.69999999, 0.63636363, 0.58333331, 0.53846157, 0.5 , 0.46666667, 0.4375 , 0.41176471, 0.3888889 , 0.36842105, 0.34999999], [ 8. , 4. , 2.66666675, 2. , 1.60000002, 1.33333337, 1.14285719, 1. , 0.8888889 , 0.80000001, 0.72727275, 0.66666669, 0.61538464, 0.5714286 , 0.53333336, 0.5 , 0.47058824, 0.44444445, 0.42105263, 0.40000001], [ 9. , 4.5 , 3. , 2.25 , 1.79999995, 1.5 , 1.28571427, 1.125 , 1. , 0.89999998, 0.81818181, 0.75 , 0.69230771, 0.64285713, 0.60000002, 0.5625 , 0.52941179, 0.5 , 0.47368422, 0.44999999], [ 10. , 5. , 3.33333325, 2.5 , 2. , 1.66666663, 1.42857146, 1.25 , 1.11111116, 1. , 0.90909094, 0.83333331, 0.76923078, 0.71428573, 0.66666669, 0.625 , 0.58823532, 0.55555558, 0.52631581, 0.5 ]]) >>> p array([[ 2., 3., 4., 5., 6., 7., 8., 9., 10., 11., 12., 13., 14., 15., 16., 17., 18., 19., 20., 21.], [ 3., 4., 5., 6., 7., 8., 9., 10., 11., 12., 13., 14., 15., 16., 17., 18., 19., 20., 21., 22.], [ 4., 5., 6., 7., 8., 9., 10., 11., 12., 13., 14., 15., 16., 17., 18., 19., 20., 21., 22., 23.], [ 5., 6., 7., 8., 9., 10., 11., 12., 13., 14., 15., 16., 17., 18., 19., 20., 21., 22., 23., 24.], [ 6., 7., 8., 9., 10., 11., 12., 13., 14., 15., 16., 17., 18., 19., 20., 21., 22., 23., 24., 25.], [ 7., 8., 9., 10., 11., 12., 13., 14., 15., 16., 17., 18., 19., 20., 21., 22., 23., 24., 25., 26.], [ 8., 9., 10., 11., 12., 13., 14., 15., 16., 17., 18., 19., 20., 21., 22., 23., 24., 25., 26., 27.], [ 9., 10., 11., 12., 13., 14., 15., 16., 17., 18., 19., 20., 21., 22., 23., 24., 25., 26., 27., 28.], [ 10., 11., 12., 13., 14., 15., 16., 17., 18., 19., 20., 21., 22., 23., 24., 25., 26., 27., 28., 29.], [ 11., 12., 13., 14., 15., 16., 17., 18., 19., 20., 21., 22., 23., 24., 25., 26., 27., 28., 29., 30.]]) Note that you have to provide the shape information for ux and uy because fortran writes them together as a stream of 400 numbers. -Neil

10 10

Add smallest_normal and smallest_subnormal attributes to finfo
by Stephannie Jiménez Gacha 19 Jul '21

19 Jul '21

Good afternoon, Given the discussions happened in the Data API consortium when looking into the attributes of `finfo` used in the wild, we found that `tiny` is used regularly but in a good amount of cases not for its intended purpose but rather as "just give me a small number". Following this we are proposing the addition of `smallest_normal` and `smallest_subnormal` attributes. Personally, I think that the `tiny` name is a little bit odd and misleading, so it will be great to leave that as an alias but have a clear name in this class. Right now the PR: https://github.com/numpy/numpy/pull/18536 has all the changes and all the values added were checked against IEEE-754 standard. One of the main concerns is the support of subnormal numbers in certain architectures, where the values can't be calculated accurately. Given the state of the discussion, we don't know if the best alternative is to not add the `smallest_subnormal` attribute and just add the `smallest_number` attribute as an alias to `tiny`. We open this to discussion to see what way we can go in order to get this PR merged. *Stephannie Jimenez Gacha*Software developer *Quansight* | Your Data Experts w: www.quansight.com e: sgacha(a)quansight.com <https://www.linkedin.com/company/quansight> <https://twitter.com/quansightai>

2 1

Proposal to accept NEP 49: Data allocation strategies
by Matti Picus 18 Jul '21

18 Jul '21

Here is the current rendering of the NEP:https://numpy.org/neps/nep-0049.html The mailing list discussion, started on April 20 did not bring up any objections to the proposal, nor were there objections in the discussion around the text of the NEP. There were questions around details of the implementation, thank you reviewers for carefully looking at them and suggesting improvements. If there are no substantive objections within 7 days from this email, then the NEP will be accepted; see NEP 0 for more details. Matti

5 13

EHN: Discusions about 'add numpy.topk'
by kangkai＠mail.ustc.edu.cn 31 May '21

31 May '21

Hi all, Finding topk elements is widely used in several fields, but missed in NumPy. I implement this functionality named as numpy.topk using core numpy functions and open a PR: https://github.com/numpy/numpy/pull/19117 Any discussion are welcome. Best wishes, Kang Kai

11 16

NumPy Community Meeting Wednesday
by Sebastian Berg 25 May '21

25 May '21

Hi all, There will be a NumPy Community meeting Wednesday Mai 26th at 20:00 UTC. Everyone is invited and encouraged to join in and edit the work-in-progress meeting topics and notes at: https://hackmd.io/76o-IxCjQX2mOXO_wwkcpg?both Best wishes Sebastian

1 0

The status of DType Refactor
by Sebastian Berg 24 May '21

24 May '21

Hi all, I thought I would give a brief update on where we are with new DTypes. Partially for Matti who is braving the brunt of the review, but also for anyone else interested. Please don't hesitate to ask for clarifications, any questions, or to schedule a meeting to discuss! Recap The past year, has seen most of the "big picture" changes merged into NumPy, a good chunk already part of 1.20: * dtype instances are not instances of np.dtype subclasses. I usually write DType for those. But DTypeType is also a good name :). * Array coercion using np.array(...) was completely rewritten, which was necessary to allow new user DTypes. * Introduced the ArrayMethod concept to unif casting and ufuncs as much as possible (NEP 42/43):Casting was first fixed up to support error returns."can-cast" logic was rewritten in terms of ArrayMethod (i.e. casting safety checks are integrated into Arraymethod)Casting largely reorganized around the ArrayMethod concept, including the casting safety. (Also this) * Promotion was implemented and later integrated everywhere, e.g. for np.result_type(...). * A larger refactor of UFuncs and a few smaller PRs set the stage for the ufunc refactor (see currently in progress) With the exception of universal functions, the above list covers all major areas of change in NumPy that are required to change. It also implements many of the things that new user DTypes will need and currently cannot do. Previously, these were either unavailable or limited in various ways; especially when it comes to parametric DTypes such as units or strings. Currently in Progress The current main reamining points are the universal functions. Since, a majority of NumPy features are organized as universal functions, and universal functions inheritently did not support parametric user defined DTypes. These need a major change. This change is proposed in NEP 43 (although that will need some smaller updates). The work on implemeting it, is mostly settling in the following PR and the following branch (I hope these will move in very soon): * PR 18905: Implements new promotion, dispatching and use for most ufuncs. * My developement branch extends this to the reductions. In parallel, the new DType API is only useful for users once it is exposed, I have a branch here to experiment with that: * The expermental DType API exposure branch. * And a repository with (currently cython) examples using it. This currently includes a very simplicitic Units DType and ufuncs for strings (previous difficult or not really possible). The exact way to write a new DType probably needs some alternative. But note that this should largely be limited to the boilerplate code. Future The main step still remaining is figuring out how to exactly expose the DType API best (ABI compatibility is the major concern) and finishing the NEP 43 (or most of it) as closing up. After that there are still some things that need to be done (although, this is unlikely to be exhaustive): * The way users should define new DTypes has to be decided (this seems tricky, unfortunately). * Some functionality is defined in the "old style" API that should be removed/discouraged. This includes things like sorting functions. (The old way could be allowed for a transition period.) To be specific, these are the ((PyArray_Descr *)descr)->f->funcs. * Some small parts of the new API are missing right now. E.g. ensure_nbo() in current NumPy code, has to use the ensure_canonical() as defined by NEP 42. Similarly, some parts will need tweaking. * Part of the API should be public, but it would also be nice to clean them up before doing so; An example for this is the get_loop() for/of ufuncs. For most use-cases, this is probably not too important, but the API is a bit awkward currently. (It would be possible to accept the awkward API and replace it in the future with a new get_loop(), deprecating the old one slowly) * There should be some new API for "reference counting" (more generally, any item with memory management). Cleaning up the split between the current transfer to NULL and PyArray_XDECREF. That is, we should unify it as much as possible (probably by using the transfer to NULL path). And then expose that also to custom DTypes. * Some utility functionality is missing at this time. For example a way for a Unit DType to fall back to the normal math implemented by NumPy (after figuring out the unit part). * A Python API is not on my explicit roadmap right now (although probably not hard). But most importantly, whatever comes up when potential users start exploring the API, hopefully soon! Otherwise, there are a couple of related improvements, that I think would make sense. Such as considering storing the actual power-of-two alignment in the array flags (they are getting a bit cramped if we assume int can be 16 bits though). Also the discussion about removing value based casting/promotion is one that would help with DTypes and pushing it forward probably makes sense as soon as the items that are "currently in progress" are largely settled and the next NumPy version is released. Cheers, Sebastian

1 0