Mailman 3 April 2014 - NumPy-Discussion

mixed mode arithmetic
by Neal Becker 11 Jul '23

11 Jul '23

I've been browsing the numpy source. I'm wondering about mixed-mode arithmetic on arrays. I believe the way numpy handles this is that it never does mixed arithmetic, but instead converts arrays to a common type. Arguably, that might be efficient for a mix of say, double and float. Maybe not. But for a mix of complex and a scalar type (say, CDouble * Double), it's clearly suboptimal in efficiency. So, do I understand this correctly? If so, is that something we should improve?

4 6

Invalid value encoutered : how to prevent numpy.where to do this?
by Eric Emsellem 18 Feb '23

18 Feb '23

Dear all, I have a code using lots of "numpy.where" to make some constrained calculations as in: data = arange(10) result = np.where(data == 0, 0., 1./data) # or data1 = arange(10) data2 = arange(10)+1.0 result = np.where(data1 > data2, np.sqrt(data1-data2), np.sqrt(data2-data2)) which then produces warnings like: /usr/bin/ipython:1: RuntimeWarning: invalid value encountered in sqrt or for the first example: /usr/bin/ipython:1: RuntimeWarning: divide by zero encountered in divide How do I avoid these messages to appear? I know that I could in principle use numpy.seterr. However, I do NOT want to remove these warnings for other potential divide/multiply/sqrt etc errors. Only when I am using a "where", to in fact avoid such warnings! Note that the warnings only happen once, but since I am going to release that code, I would like to avoid the user to get such messages which are irrelevant here (because I am testing, with the where, when NOT to divide by zero or take a sqrt of a negative number). thanks! Eric

5 4

Re: [Numpy-discussion] example reading binary Fortran file
by Neil Martinsen-Burrell 22 Jul '21

22 Jul '21

David Froger <david.froger.info <at> gmail.com> writes: > Hy,My question is about reading Fortran binary file (oh no this question > again...) I've posted this before, but I finally got it cleaned up for the Cookbook. For this purpose I use a subclass of file that has methods for reading unformatted Fortran data. See http://www.scipy.org/Cookbook/FortranIO/FortranFile. I'd gladly see this in numpy or scipy somewhere, but I'm not sure where it belongs. > program makeArray > implicit none > integer,parameter:: nx=10,ny=20 > real(4),dimension(nx,ny):: ux,uy,p > integer :: i,j > open(11,file='uxuyp.bin',form='unformatted') > do i = 1,nx > do j = 1,ny > ux(i,j) = real(i*j) > uy(i,j) = real(i)/real(j) > p (i,j) = real(i) + real(j) > enddo > enddo > write(11) ux,uy > write(11) p > close(11) > end program makeArray When I run the above program compiled with gfortran on my Intel Mac, I can read it back with:: >>> import numpy as np >>> from fortranfile import FortranFile >>> f=FortranFile('uxuyp.bin', endian='<') >>> uxuy = f.readReals(prec='f') # 'f' for default reals >>> len(uxuy) 400 >>> ux = np.array(uxuy[:200]).reshape((20,10)).T >>> uy = np.array(uxuy[200:]).reshape((20,10)).T >>> p = f.readReals('f').reshape((20,10)).T >>> ux array([[ 1., 2., 3., 4., 5., 6., 7., 8., 9., 10., 11., 12., 13., 14., 15., 16., 17., 18., 19., 20.], [ 2., 4., 6., 8., 10., 12., 14., 16., 18., 20., 22., 24., 26., 28., 30., 32., 34., 36., 38., 40.], [ 3., 6., 9., 12., 15., 18., 21., 24., 27., 30., 33., 36., 39., 42., 45., 48., 51., 54., 57., 60.], [ 4., 8., 12., 16., 20., 24., 28., 32., 36., 40., 44., 48., 52., 56., 60., 64., 68., 72., 76., 80.], [ 5., 10., 15., 20., 25., 30., 35., 40., 45., 50., 55., 60., 65., 70., 75., 80., 85., 90., 95., 100.], [ 6., 12., 18., 24., 30., 36., 42., 48., 54., 60., 66., 72., 78., 84., 90., 96., 102., 108., 114., 120.], [ 7.,Proxy-Connection: keep-alive Cache-Control: max-age=0 14., 21., 28., 35., 42., 49., 56., 63., 70., 77., 84., 91., 98., 105., 112., 119., 126., 133., 140.], [ 8., 16., 24., 32., 40., 48., 56., 64., 72., 80., 88., 96., 104., 112., 120., 128., 136., 144., 152., 160.], [ 9., 18., 27., 36., 45., 54., 63., 72., 81., 90., 99., 108., 117., 126., 135., 144., 153., 162., 171., 180.], [ 10., 20., 30., 40., 50., 60., 70., 80., 90., 100., 110., 120., 130., 140., 150., 160., 170., 180., 190., 200.]]) >>> uy array([[ 1. , 0.5 , 0.33333334, 0.25 , 0.2 , 0.16666667, 0.14285715, 0.125 , 0.11111111, 0.1 , 0.09090909, 0.08333334, 0.07692308, 0.07142857, 0.06666667, 0.0625 , 0.05882353, 0.05555556, 0.05263158, 0.05 ], [ 2. , 1. , 0.66666669, 0.5 , 0.40000001, 0.33333334, 0.2857143 , 0.25 , 0.22222222, 0.2 , 0.18181819, 0.16666667, 0.15384616, 0.14285715, 0.13333334, 0.125 , 0.11764706, 0.11111111, 0.10526316, 0.1 ], [ 3. , 1.5 , 1. , 0.75 , 0.60000002, 0.5 , 0.42857143, 0.375 , 0.33333334, 0.30000001, 0.27272728, 0.25 , 0.23076923, 0.21428572, 0.2 , 0.1875 , 0.17647059, 0.16666667, 0.15789473, 0.15000001], [ 4. , 2. , 1.33333337, 1. , 0.80000001, 0.66666669, 0.5714286 , 0.5 , 0.44444445, 0.40000001, 0.36363637, 0.33333334, 0.30769232, 0.2857143 , 0.26666668, 0.25 , 0.23529412, 0.22222222, 0.21052632, 0.2 ], [ 5. , 2.5 , 1.66666663, 1.25 , 1. , 0.83333331, 0.71428573, 0.625 , 0.55555558, 0.5 , 0.45454547, 0.41666666, 0.38461539, 0.35714287, 0.33333334, 0.3125 , 0.29411766, 0.27777779, 0.2631579 , 0.25 ], [ 6. , 3. , 2. , 1.5 , 1.20000005, 1. , 0.85714287, 0.75 , 0.66666669, 0.60000002, 0.54545456, 0.5 , 0.46153846, 0.42857143, 0.40000001, 0.375 , 0.35294119, 0.33333334, 0.31578946, 0.30000001], [ 7. , 3.5 , 2.33333325, 1.75 , 1.39999998, 1.16666663, 1. , 0.875 , 0.77777779, 0.69999999, 0.63636363, 0.58333331, 0.53846157, 0.5 , 0.46666667, 0.4375 , 0.41176471, 0.3888889 , 0.36842105, 0.34999999], [ 8. , 4. , 2.66666675, 2. , 1.60000002, 1.33333337, 1.14285719, 1. , 0.8888889 , 0.80000001, 0.72727275, 0.66666669, 0.61538464, 0.5714286 , 0.53333336, 0.5 , 0.47058824, 0.44444445, 0.42105263, 0.40000001], [ 9. , 4.5 , 3. , 2.25 , 1.79999995, 1.5 , 1.28571427, 1.125 , 1. , 0.89999998, 0.81818181, 0.75 , 0.69230771, 0.64285713, 0.60000002, 0.5625 , 0.52941179, 0.5 , 0.47368422, 0.44999999], [ 10. , 5. , 3.33333325, 2.5 , 2. , 1.66666663, 1.42857146, 1.25 , 1.11111116, 1. , 0.90909094, 0.83333331, 0.76923078, 0.71428573, 0.66666669, 0.625 , 0.58823532, 0.55555558, 0.52631581, 0.5 ]]) >>> p array([[ 2., 3., 4., 5., 6., 7., 8., 9., 10., 11., 12., 13., 14., 15., 16., 17., 18., 19., 20., 21.], [ 3., 4., 5., 6., 7., 8., 9., 10., 11., 12., 13., 14., 15., 16., 17., 18., 19., 20., 21., 22.], [ 4., 5., 6., 7., 8., 9., 10., 11., 12., 13., 14., 15., 16., 17., 18., 19., 20., 21., 22., 23.], [ 5., 6., 7., 8., 9., 10., 11., 12., 13., 14., 15., 16., 17., 18., 19., 20., 21., 22., 23., 24.], [ 6., 7., 8., 9., 10., 11., 12., 13., 14., 15., 16., 17., 18., 19., 20., 21., 22., 23., 24., 25.], [ 7., 8., 9., 10., 11., 12., 13., 14., 15., 16., 17., 18., 19., 20., 21., 22., 23., 24., 25., 26.], [ 8., 9., 10., 11., 12., 13., 14., 15., 16., 17., 18., 19., 20., 21., 22., 23., 24., 25., 26., 27.], [ 9., 10., 11., 12., 13., 14., 15., 16., 17., 18., 19., 20., 21., 22., 23., 24., 25., 26., 27., 28.], [ 10., 11., 12., 13., 14., 15., 16., 17., 18., 19., 20., 21., 22., 23., 24., 25., 26., 27., 28., 29.], [ 11., 12., 13., 14., 15., 16., 17., 18., 19., 20., 21., 22., 23., 24., 25., 26., 27., 28., 29., 30.]]) Note that you have to provide the shape information for ux and uy because fortran writes them together as a stream of 400 numbers. -Neil

10 10

C-API: multidimensional array indexing?
by Johann Bauer 31 Mar '16

31 Mar '16

Dear experts, is there a C-API function for numpy which implements Python's multidimensional indexing? Say, I have a 2d-array PyArrayObject * M; and an index int i; how do I extract the i-th row or column M[i,:] respectively M[:,i]? I am looking for a function which gives again a PyArrayObject * and which is a view to M (no copied data; the result should be another PyArrayObject whose data and strides points to the correct memory portion of M). I searched the API documentation, Google and mailing lists for quite a long time but didn't find anything. Can you help me? Thanks, Johann

3 3

taking a 2D uneven surface slice
by Bryan Woods 04 Dec '14

04 Dec '14

I'm trying to do something that at first glance I think should be simple but I can't quite figure out how to do it. The problem is as follows: I have a 3D grid Values[Nx, Ny, Nz] I want to slice Values at a 2D surface in the Z dimension specified by Z_index[Nx, Ny] and return a 2D slice[Nx, Ny]. It is not as simple as Values[:,:,Z_index]. I tried this: >>> values.shape (4, 5, 6) >>> coords.shape (4, 5) >>> slice = values[:,:,coords] >>> slice.shape (4, 5, 4, 5) >>> slice = np.take(values, coords, axis=2) >>> slice.shape (4, 5, 4, 5) >>> Obviously I could create an empty 2D slice and then fill it by using np.ndenumerate to fill it point by point by selecting values[i, j, Z_index[i, j]]. This just seems too inefficient and not very pythonic.

3 3

64-bit windows numpy / scipy wheels for testing
by Matthew Brett 28 Jul '14

28 Jul '14

Hi, Thanks to Cark Kleffner's toolchain and some help from Clint Whaley (main author of ATLAS), I've built 64-bit windows numpy and scipy wheels for testing. The build uses Carl's custom mingw-w64 build with static linking. There are two harmless test failures on scipy (being discussed on the list at the moment) - tests otherwise clean. Wheels are here: https://nipy.bic.berkeley.edu/scipy_installers/numpy-1.8.1-cp27-none-win_am… https://nipy.bic.berkeley.edu/scipy_installers/scipy-0.13.3-cp27-none-win_a… You can test with: pip install -U pip # to upgrade pip to latest pip install -f https://nipy.bic.berkeley.edu/scipy_installers numpy scipy Please do send feedback. ATLAS binary here: https://nipy.bic.berkeley.edu/scipy_installers/atlas_builds/atlas-64-full-s… Many thanks for Carl in particular for doing all the hard work, Cheers, Matthew

13 72

The BLAS problem (was: Re: Wiki page for building numerical stuff on Windows)
by Nathaniel Smith 11 Jul '14

11 Jul '14

On Fri, Apr 11, 2014 at 12:21 PM, Carl Kleffner <cmkleffner(a)gmail.com> wrote: > a discussion about OpenBLAS on the octave maintainer list: > http://article.gmane.org/gmane.comp.gnu.octave.maintainers/38746 I'm getting the impression that OpenBLAS is being both a tantalizing opportunity and a practical thorn-in-the-side for everyone -- Python, Octave, Julia, R. How crazy would it be to get together an organized effort to fix this problem -- "OpenBLAS for everyone"? E.g., by collecting patches to fix the bits we don't like (like unhelpful build system defaults), applying more systematic QA, etc. Ideally this could be done upstream, but if upstream is MIA or disagrees about OpenBLAS's goals, then it could be maintained as a kind of "OpenBLAS++" that merges regularly from upstream (compare to [1][2][3] for successful projects handled in this way). If hardware for testing is a problem, then I suspect NumFOCUS would be overjoyed to throw a few kilodollars at buying one instance of each widely-distributed microarchitecture released in the last few years as a test farm... I think the goal is pretty clear: a modern optionally-multithreaded BLAS under a BSD-like license with a priority on correctness, out-of-the-box functionality (like runtime configurability and feature detection), speed, and portability, in that order. I unfortunately don't have the skills to actually lead such an effort (I've never written a line of asm in my life...), but surely our collective communities have people who do? -n [1] http://www.openssh.com/portable.html [2] http://www.eglibc.org/mission (a "friendly fork" of glibc holding stuff that Ulrich Drepper got cranky about, which eventually was merged back) [3] https://en.wikipedia.org/wiki/Go-oo -- Nathaniel J. Smith Postdoctoral researcher - Informatics - University of Edinburgh http://vorpus.org

10 37

About the npz format
by onefire 06 Jul '14

06 Jul '14

Hi all, I have been playing with the idea of using Numpy's binary format as a lightweight alternative to HDF5 (which I believe is the "right" way to do if one does not have a problem with the dependency). I am pretty happy with the npy format, but the npz format seems to be broken as far as performance is concerned (or I am missing obvious!). The following ipython session illustrates the issue: ln [1]: import numpy as np In [2]: x = np.linspace(1, 10, 50000000) In [3]: %time np.save("x.npy", x) CPU times: user 40 ms, sys: 230 ms, total: 270 ms Wall time: 488 ms In [4]: %time np.savez("x.npz", data = x) CPU times: user 657 ms, sys: 707 ms, total: 1.36 s Wall time: 7.7 s I can inspect the files to verify that they contain the same data, and I can change the example, but this seems to always hold (I am running Arch Linux, but I've done the test on other machines too): for bigger arrays, the npz format seems to add an unbelievable amount of overhead. Looking at Numpy's code, it looks like the real work is being done by Python's zipfile module, and I suspect that all the extra time is spent computing the crc32. Am I correct in my assumption (I am not familiar with zipfile's internals)? Or perhaps I am doing something really dumb and there is an easy way to speed things up? Assuming that I am correct, my next question is: why compute the crc32 at all? I mean, I know that it is part of what defines a "zip file", but is it really necessary for a npz file to be a (compliant) zip file? If, for example, I open the resulting npz file with a hex editor, and insert a bogus crc32, np.load will happily load the file anyway (Gnome's Archive Manager will do the same) To me this suggests that the fact that npz files are zip files is not that important. . Perhaps, people think that the ability to browse arrays and extract individual ones like they would do with a regular zip file is really important, but reading the little documentation that I found, I got the impression that npz files are zip files just because this was the easiest way to have multiple arrays in the same file. But my main point is: it should be fairly simple to make npz files much more efficient with simple changes like not computing checksums (or using a different algorithm like adler32). Let me know what you think about this. I've searched around the internet, and on places like Stackoverflow, it seems that the standard answer is: you are doing it wrong, forget Numpy's format and start using hdf5! Please do not give that answer. Like I said in the beginning, I am well aware of hdf5 and I use it on my "production code" (on C++). But I believe that there should be a lightweight alternative (right now, to use hdf5 I need to have installed the C library, the C++ wrappers, and the h5py library to play with the data using Python, that is a bit too heavy for my needs). I really like Numpy's format (if anything, it makes me feel better knowing that it is so easy to reverse engineer it, while the hdf5 format is very complicated), but the (apparent) poor performance of npz files if a deal breaker. Gilberto

7 22

building 32 bit numpy on 64 bit linux
by David Jones 19 May '14

19 May '14

I'm trying to compile 32-bit numpy on a 64 bit Centos 6 system, but fails with the message: "Broken toolchain: cannot link a simple C program" It get's the compile flags right, but not the linker: C compiler: gcc -pthread -fno-strict-aliasing -g -O2 -m32 -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC compile options: '-Inumpy/core/src/private -Inumpy/core/src -Inumpy/core -Inumpy/core/src/npymath -Inumpy/core/src/multiarray -Inumpy/core/src/umath -Inumpy/core/src/npysort -Inumpy/core/include -I/opt/python/ia32/include/python2.7 -c' gcc: _configtest.c gcc -pthread _configtest.o -o _configtest _configtest.o: could not read symbols: File in wrong format collect2: ld returned 1 exit status I'm bulding it using a 32bit python build that I compiled on the same system. I tried: OPT="-m32" FOPT="-m32" python setup.py build and setarch x86_64 -B python setup.py build But with the same results. Someone worked around this by altering ccompiler.py, but I'm trying to find a cleaner solution. See:http://stackoverflow.com/questions/11265057/how-do-i-install-a-32-bit-v… Is there a standard way of doing this? Regards, David J.

3 13

ANN: HDF5 for Python 2.3.0
by Andrew Collette 02 May '14

02 May '14

Announcing HDF5 for Python (h5py) 2.3.0 ======================================= The h5py team is happy to announce the availability of h5py 2.3.0 (final). Thanks to everyone who provided beta feedback! What's h5py? ------------ The h5py package is a Pythonic interface to the HDF5 binary data format. It lets you store huge amounts of numerical data, and easily manipulate that data from NumPy. For example, you can slice into multi-terabyte datasets stored on disk, as if they were real NumPy arrays. Thousands of datasets can be stored in a single file, categorized and tagged however you want. Changes ------- This release introduces some important new features, including: * Support for arbitrary vlen data * Improved exception messages * Improved setuptools support * Multiple additions to the low-level API * Improved support for MPI features * Single-step build for HDF5 on Windows Major fixes since beta: * LZF compression crash on Win64 * Unhelpful error message relating to chunked storage * Import error for IPython completer on certain platforms A complete description of changes is available online: http://docs.h5py.org/en/latest/whatsnew/2.3.html Where to get it --------------- Downloads, documentation, and more are available at the h5py website: http://www.h5py.org Acknowledgements ---------------- The h5py package relies on third-party testing and contributions. For the 2.3 release, thanks especially to: * Martin Teichmann * Florian Rathgerber * Pierre de Buyl * Thomas Caswell * Andy Salnikov * Darren Dale * Robert David Grant * Toon Verstraelen * Many others who contributed bug reports and testing

3 4