Mailman 3 January 2017 - NumPy-Discussion

mixed mode arithmetic
by Neal Becker 11 Jul '23

11 Jul '23

I've been browsing the numpy source. I'm wondering about mixed-mode arithmetic on arrays. I believe the way numpy handles this is that it never does mixed arithmetic, but instead converts arrays to a common type. Arguably, that might be efficient for a mix of say, double and float. Maybe not. But for a mix of complex and a scalar type (say, CDouble * Double), it's clearly suboptimal in efficiency. So, do I understand this correctly? If so, is that something we should improve?

4 6

Invalid value encoutered : how to prevent numpy.where to do this?
by Eric Emsellem 18 Feb '23

18 Feb '23

Dear all, I have a code using lots of "numpy.where" to make some constrained calculations as in: data = arange(10) result = np.where(data == 0, 0., 1./data) # or data1 = arange(10) data2 = arange(10)+1.0 result = np.where(data1 > data2, np.sqrt(data1-data2), np.sqrt(data2-data2)) which then produces warnings like: /usr/bin/ipython:1: RuntimeWarning: invalid value encountered in sqrt or for the first example: /usr/bin/ipython:1: RuntimeWarning: divide by zero encountered in divide How do I avoid these messages to appear? I know that I could in principle use numpy.seterr. However, I do NOT want to remove these warnings for other potential divide/multiply/sqrt etc errors. Only when I am using a "where", to in fact avoid such warnings! Note that the warnings only happen once, but since I am going to release that code, I would like to avoid the user to get such messages which are irrelevant here (because I am testing, with the where, when NOT to divide by zero or take a sqrt of a negative number). thanks! Eric

5 4

Re: [Numpy-discussion] example reading binary Fortran file
by Neil Martinsen-Burrell 22 Jul '21

22 Jul '21

David Froger <david.froger.info <at> gmail.com> writes: > Hy,My question is about reading Fortran binary file (oh no this question > again...) I've posted this before, but I finally got it cleaned up for the Cookbook. For this purpose I use a subclass of file that has methods for reading unformatted Fortran data. See http://www.scipy.org/Cookbook/FortranIO/FortranFile. I'd gladly see this in numpy or scipy somewhere, but I'm not sure where it belongs. > program makeArray > implicit none > integer,parameter:: nx=10,ny=20 > real(4),dimension(nx,ny):: ux,uy,p > integer :: i,j > open(11,file='uxuyp.bin',form='unformatted') > do i = 1,nx > do j = 1,ny > ux(i,j) = real(i*j) > uy(i,j) = real(i)/real(j) > p (i,j) = real(i) + real(j) > enddo > enddo > write(11) ux,uy > write(11) p > close(11) > end program makeArray When I run the above program compiled with gfortran on my Intel Mac, I can read it back with:: >>> import numpy as np >>> from fortranfile import FortranFile >>> f=FortranFile('uxuyp.bin', endian='<') >>> uxuy = f.readReals(prec='f') # 'f' for default reals >>> len(uxuy) 400 >>> ux = np.array(uxuy[:200]).reshape((20,10)).T >>> uy = np.array(uxuy[200:]).reshape((20,10)).T >>> p = f.readReals('f').reshape((20,10)).T >>> ux array([[ 1., 2., 3., 4., 5., 6., 7., 8., 9., 10., 11., 12., 13., 14., 15., 16., 17., 18., 19., 20.], [ 2., 4., 6., 8., 10., 12., 14., 16., 18., 20., 22., 24., 26., 28., 30., 32., 34., 36., 38., 40.], [ 3., 6., 9., 12., 15., 18., 21., 24., 27., 30., 33., 36., 39., 42., 45., 48., 51., 54., 57., 60.], [ 4., 8., 12., 16., 20., 24., 28., 32., 36., 40., 44., 48., 52., 56., 60., 64., 68., 72., 76., 80.], [ 5., 10., 15., 20., 25., 30., 35., 40., 45., 50., 55., 60., 65., 70., 75., 80., 85., 90., 95., 100.], [ 6., 12., 18., 24., 30., 36., 42., 48., 54., 60., 66., 72., 78., 84., 90., 96., 102., 108., 114., 120.], [ 7.,Proxy-Connection: keep-alive Cache-Control: max-age=0 14., 21., 28., 35., 42., 49., 56., 63., 70., 77., 84., 91., 98., 105., 112., 119., 126., 133., 140.], [ 8., 16., 24., 32., 40., 48., 56., 64., 72., 80., 88., 96., 104., 112., 120., 128., 136., 144., 152., 160.], [ 9., 18., 27., 36., 45., 54., 63., 72., 81., 90., 99., 108., 117., 126., 135., 144., 153., 162., 171., 180.], [ 10., 20., 30., 40., 50., 60., 70., 80., 90., 100., 110., 120., 130., 140., 150., 160., 170., 180., 190., 200.]]) >>> uy array([[ 1. , 0.5 , 0.33333334, 0.25 , 0.2 , 0.16666667, 0.14285715, 0.125 , 0.11111111, 0.1 , 0.09090909, 0.08333334, 0.07692308, 0.07142857, 0.06666667, 0.0625 , 0.05882353, 0.05555556, 0.05263158, 0.05 ], [ 2. , 1. , 0.66666669, 0.5 , 0.40000001, 0.33333334, 0.2857143 , 0.25 , 0.22222222, 0.2 , 0.18181819, 0.16666667, 0.15384616, 0.14285715, 0.13333334, 0.125 , 0.11764706, 0.11111111, 0.10526316, 0.1 ], [ 3. , 1.5 , 1. , 0.75 , 0.60000002, 0.5 , 0.42857143, 0.375 , 0.33333334, 0.30000001, 0.27272728, 0.25 , 0.23076923, 0.21428572, 0.2 , 0.1875 , 0.17647059, 0.16666667, 0.15789473, 0.15000001], [ 4. , 2. , 1.33333337, 1. , 0.80000001, 0.66666669, 0.5714286 , 0.5 , 0.44444445, 0.40000001, 0.36363637, 0.33333334, 0.30769232, 0.2857143 , 0.26666668, 0.25 , 0.23529412, 0.22222222, 0.21052632, 0.2 ], [ 5. , 2.5 , 1.66666663, 1.25 , 1. , 0.83333331, 0.71428573, 0.625 , 0.55555558, 0.5 , 0.45454547, 0.41666666, 0.38461539, 0.35714287, 0.33333334, 0.3125 , 0.29411766, 0.27777779, 0.2631579 , 0.25 ], [ 6. , 3. , 2. , 1.5 , 1.20000005, 1. , 0.85714287, 0.75 , 0.66666669, 0.60000002, 0.54545456, 0.5 , 0.46153846, 0.42857143, 0.40000001, 0.375 , 0.35294119, 0.33333334, 0.31578946, 0.30000001], [ 7. , 3.5 , 2.33333325, 1.75 , 1.39999998, 1.16666663, 1. , 0.875 , 0.77777779, 0.69999999, 0.63636363, 0.58333331, 0.53846157, 0.5 , 0.46666667, 0.4375 , 0.41176471, 0.3888889 , 0.36842105, 0.34999999], [ 8. , 4. , 2.66666675, 2. , 1.60000002, 1.33333337, 1.14285719, 1. , 0.8888889 , 0.80000001, 0.72727275, 0.66666669, 0.61538464, 0.5714286 , 0.53333336, 0.5 , 0.47058824, 0.44444445, 0.42105263, 0.40000001], [ 9. , 4.5 , 3. , 2.25 , 1.79999995, 1.5 , 1.28571427, 1.125 , 1. , 0.89999998, 0.81818181, 0.75 , 0.69230771, 0.64285713, 0.60000002, 0.5625 , 0.52941179, 0.5 , 0.47368422, 0.44999999], [ 10. , 5. , 3.33333325, 2.5 , 2. , 1.66666663, 1.42857146, 1.25 , 1.11111116, 1. , 0.90909094, 0.83333331, 0.76923078, 0.71428573, 0.66666669, 0.625 , 0.58823532, 0.55555558, 0.52631581, 0.5 ]]) >>> p array([[ 2., 3., 4., 5., 6., 7., 8., 9., 10., 11., 12., 13., 14., 15., 16., 17., 18., 19., 20., 21.], [ 3., 4., 5., 6., 7., 8., 9., 10., 11., 12., 13., 14., 15., 16., 17., 18., 19., 20., 21., 22.], [ 4., 5., 6., 7., 8., 9., 10., 11., 12., 13., 14., 15., 16., 17., 18., 19., 20., 21., 22., 23.], [ 5., 6., 7., 8., 9., 10., 11., 12., 13., 14., 15., 16., 17., 18., 19., 20., 21., 22., 23., 24.], [ 6., 7., 8., 9., 10., 11., 12., 13., 14., 15., 16., 17., 18., 19., 20., 21., 22., 23., 24., 25.], [ 7., 8., 9., 10., 11., 12., 13., 14., 15., 16., 17., 18., 19., 20., 21., 22., 23., 24., 25., 26.], [ 8., 9., 10., 11., 12., 13., 14., 15., 16., 17., 18., 19., 20., 21., 22., 23., 24., 25., 26., 27.], [ 9., 10., 11., 12., 13., 14., 15., 16., 17., 18., 19., 20., 21., 22., 23., 24., 25., 26., 27., 28.], [ 10., 11., 12., 13., 14., 15., 16., 17., 18., 19., 20., 21., 22., 23., 24., 25., 26., 27., 28., 29.], [ 11., 12., 13., 14., 15., 16., 17., 18., 19., 20., 21., 22., 23., 24., 25., 26., 27., 28., 29., 30.]]) Note that you have to provide the shape information for ux and uy because fortran writes them together as a stream of 400 numbers. -Neil

10 10

automatically avoiding temporary arrays
by Julian Taylor 27 Feb '17

27 Feb '17

hi, Temporary arrays generated in expressions are expensive as the imply extra memory bandwidth which is the bottleneck in most numpy operations. For example: r = a + b + c creates the b + c temporary and then adds a to it. This can be rewritten to be more efficient using inplace operations: r = b + c r += a This saves some memory bandwidth and can speedup the operation by 50% for very large arrays or even more if the inplace operation allows it to be completed completely in the cpu cache. The problem is that inplace operations are a lot less readable so they are often only used in well optimized code. But due to pythons refcounting semantics we can actually do some inplace conversions transparently. If an operand in python has a reference count of one it must be a temporary so we can use it as the destination array. CPython itself does this optimization for string concatenations. In numpy we have the issue that we can be called from the C-API directly where the reference count may be one for other reasons. To solve this we can check the backtrace until the python frame evaluation function. If there are only numpy and python functions in between that and our entry point we should be able to elide the temporary. This PR implements this: https://github.com/numpy/numpy/pull/7997 It currently only supports Linux with glibc (which has reliable backtraces via unwinding) and maybe MacOS depending on how good their backtrace is. On windows the backtrace APIs are different and I don't know them but in theory it could also be done there. A problem is that checking the backtrace is quite expensive, so should only be enabled when the involved arrays are large enough for it to be worthwhile. In my testing this seems to be around 180-300KiB sized arrays, basically where they start spilling out of the CPU L2 cache. I made a little crappy benchmark script to test this cutoff in this branch: https://github.com/juliantaylor/numpy/tree/elide-bench If you are interested you can run it with: python setup.py build_ext -j 4 --inplace ipython --profile=null check.ipy At the end it will plot the ratio between elided and non-elided runtime. It should get larger than one around 180KiB on most cpus. If no one points out some flaw in the approach, I'm hoping to get this into the next numpy version. cheers, Julian

11 22

__numpy_ufunc__
by Charles R Harris 22 Feb '17

22 Feb '17

Hi All, For those interested in continuing the __numpy_ufunc__ saga, there is a pull request enabling it <https://github.com/numpy/numpy/pull/8247>. Likely we will want to make some changes up front before merging that, so some discussion is in order. Chuck

5 8

Building external c modules with mingw64 / numpy
by Schnizer, Pierre 17 Feb '17

17 Feb '17

Dear all, I built an external c-module (pygsl) using mingw 64 from msys2 mingw64-gcc compiler. This built required some changes to numpy.distutils to get the “python setup.py config” and “python setup.py build” working. In this process I replaced 2 files in numpy.distutils from numpy git repository: - numpy.dist_utils.misc_utils.py version ec0e046 <https://github.com/numpy/numpy/commit/ec0e04694278ef9ea83537d308b07fc27c1b5…> on 14 Dec 2016 - numpy.dist_utils. mingw32ccompiler.py version ec0e046 <https://github.com/numpy/numpy/commit/ec0e04694278ef9ea83537d308b07fc27c1b5…> on 14 Dec 2016 mingw32ccompiler.py required to be modified to get it work n preprocessor had to be defined as I am using setup.py config n specifying the runtime library search path to the linker n include path of the vcrtruntime I attached a patch reflecting the changes I had to make to file mingw32ccompile.py If this information is useful I am happy to answer questions Sincerely yours Pierre PS Version infos: Python: Python 3.6.0 (v3.6.0:41df79263a11, Dec 23 2016, 08:06:12) [MSC v.1900 64 bit (AMD64)] on win32 Numpy: >> help(numpy.version) Help on module numpy.version in numpy: DATA full_version = '1.12.0' git_revision = '561f1accf861ad8606ea2dd723d2be2b09a2dffa' release = True short_version = '1.12.0' version = '1.12.0' gcc.exe (Rev2, Built by MSYS2 project) 6.2.0 ________________________________ Helmholtz-Zentrum Berlin für Materialien und Energie GmbH Mitglied der Hermann von Helmholtz-Gemeinschaft Deutscher Forschungszentren e.V. Aufsichtsrat: Vorsitzender Dr. Karl Eugen Huthmacher, stv. Vorsitzende Dr. Jutta Koch-Unterseher Geschäftsführung: Prof. Dr. Anke Rita Kaysser-Pyzalla, Thomas Frederking Sitz Berlin, AG Charlottenburg, 89 HRB 5583 Postadresse: Hahn-Meitner-Platz 1 D-14109 Berlin http://www.helmholtz-berlin.de

2 2

ANN: numexpr 2.6.2 released!
by Francesc Alted 29 Jan '17

29 Jan '17

========================= Announcing Numexpr 2.6.2 ========================= What's new ========== This is a maintenance release that fixes several issues, with special emphasis in keeping compatibility with newer NumPy versions. Also, initial support for POWER processors is here. Thanks to Oleksandr Pavlyk, Alexander Shadchin, Breno Leitao, Fernando Seiti Furusato and Antonio Valentino for their nice contributions. In case you want to know more in detail what has changed in this version, see: https://github.com/pydata/numexpr/blob/master/RELEASE_NOTES.rst What's Numexpr ============== Numexpr is a fast numerical expression evaluator for NumPy. With it, expressions that operate on arrays (like "3*a+4*b") are accelerated and use less memory than doing the same calculation in Python. It wears multi-threaded capabilities, as well as support for Intel's MKL (Math Kernel Library), which allows an extremely fast evaluation of transcendental functions (sin, cos, tan, exp, log...) while squeezing the last drop of performance out of your multi-core processors. Look here for a some benchmarks of numexpr using MKL: https://github.com/pydata/numexpr/wiki/NumexprMKL Its only dependency is NumPy (MKL is optional), so it works well as an easy-to-deploy, easy-to-use, computational engine for projects that don't want to adopt other solutions requiring more heavy dependencies. Where I can find Numexpr? ========================= The project is hosted at GitHub in: https://github.com/pydata/numexpr You can get the packages from PyPI as well (but not for RC releases): http://pypi.python.org/pypi/numexpr Share your experience ===================== Let us know of any bugs, suggestions, gripes, kudos, etc. you may have. Enjoy data! -- Francesc Alted

1 0

Numpy development version wheels for testing
by Matthew Brett 28 Jan '17

28 Jan '17

Hi, I've taken advantage of the new travis-ci cron job feature [1] to set up daily builds of numpy manylinux and OSX wheels for the current trunk, uploading to: https://7933911d6844c6c53a7d-47bd50c35cd79bd838daf386af554a83.ssl.cf2.rackc… The numpy build process already builds Ubuntu Precise numpy wheels for the current trunk, available at [2], but the cron-job manylinux wheels have the following advantages: * they are built the same way as our usual pypi wheels, using openblas, and so will be closer to the eventual numpy distributed wheel; * manylinux wheels will install on all the travis-ci containers, not just the Precise container; * manylinux wheels don't need any extra packages installed by apt, because they are self-contained. There's an example of use at https://github.com/matthew-brett/nibabel/blob/use-pre/.travis.yml#L23 Cheers, Matthew [1] https://docs.travis-ci.com/user/cron-jobs [2] https://f66d8a5767b134cb96d3-4ffdece11fd3f72855e4665bc61c7445.ssl.cf2.rackc…

2 1

Checking matrix condition number
by Edward Richards 26 Jan '17

26 Jan '17

What is the best way to make sure that a matrix inversion makes any sense before preforming it? I am currently struggling to understand some results from matrix inversions in my work, and I would like to see if I am dealing with an ill-conditioned problem. It is probably user error, but I don't like having the possibility hanging over my head. I naively put a call to np.linalg.cond into my code; all of my cores went to 100% and a few minutes later I got a number. To be fair A is 6400 elements square, but this takes ~20x more time than the inversion. This is not really practical for what I am doing, is there a better way? This is partly in response to Ilhan Polat's post about introducing the A\b operator to numpy. I also couldn't check the Numpy mailing list archives to see if this has been asked before, the numpy-discussion gmane link isn't working for me at all. Thanks for your time, Ned

2 1

Question about numpy.random.choice with probabilties
by alebarde＠gmail.com 23 Jan '17

23 Jan '17

Hi Nadav, I may be wrong, but I think that the result of the current implementation is actually the expected one. Using you example: probabilities for item 1, 2 and 3 are: 0.2, 0.4 and 0.4 P([1,2]) = P([2] | 1st=[1]) P([1]) + P([1] | 1st=[2]) P([2]) Now, P([1]) = 0.2 and P([2]) = 0.4. However: P([2] | 1st=[1]) = 0.5 (2 and 3 have the same sampling probability) P([1] | 1st=[2]) = 1/3 (1 and 3 have probability 0.2 and 0.4 that, once normalised, translate into 1/3 and 2/3 respectively) Therefore P([1,2]) = 0.7/3 = 0.23333 Similarly, P([1,3]) = 0.23333 and P([2,3]) = 1.6/3 = 0.533333 What am I missing? Alessandro 2017-01-17 13:00 GMT+01:00 <numpy-discussion-request(a)scipy.org>: > Hi, I'm looking for a way to find a random sample of C different items out > of N items, with a some desired probabilty Pi for each item i. > > I saw that numpy has a function that supposedly does this, > numpy.random.choice (with replace=False and a probabilities array), but > looking at the algorithm actually implemented, I am wondering in what sense > are the probabilities Pi actually obeyed... > > To me, the code doesn't seem to be doing the right thing... Let me explain: > > Consider a simple numerical example: We have 3 items, and need to pick 2 > different ones randomly. Let's assume the desired probabilities for item 1, > 2 and 3 are: 0.2, 0.4 and 0.4. > > Working out the equations there is exactly one solution here: The random > outcome of numpy.random.choice in this case should be [1,2] at probability > 0.2, [1,3] at probabilty 0.2, and [2,3] at probability 0.6. That is indeed > a solution for the desired probabilities because it yields item 1 in > [1,2]+[1,3] = 0.2 + 0.2 = 2*P1 of the trials, item 2 in [1,2]+[2,3] = > 0.2+0.6 = 0.8 = 2*P2, etc. > > However, the algorithm in numpy.random.choice's replace=False generates, if > I understand correctly, different probabilities for the outcomes: I believe > in this case it generates [1,2] at probability 0.23333, [1,3] also 0.2333, > and [2,3] at probability 0.53333. > > My question is how does this result fit the desired probabilities? > > If we get [1,2] at probability 0.23333 and [1,3] at probability 0.2333, > then the expect number of "1" results we'll get per drawing is 0.23333 + > 0.2333 = 0.46666, and similarly for "2" the expected number 0.7666, and for > "3" 0.76666. As you can see, the proportions are off: Item 2 is NOT twice > common than item 1 as we originally desired (we asked for probabilities > 0.2, 0.4, 0.4 for the individual items!). > > > -- > Nadav Har'El > nyh(a)scylladb.com >

5 18