Mailman 3 February 2013 - NumPy-Discussion

flatnonzero / nonzero returned indices order
by Gabor Kovacs Feb. 27, 2013

Feb. 27, 2013

Hello, for 1-D arrays, can one assume that the indices returned by flatnonzero is ordered? It looks so, but no such statement anywhere in the docs. (In general, order of indices in case of higher dimensions?) Thanks Gabor

1 0

1.7.1
by Charles R Harris Feb. 27, 2013

Feb. 27, 2013

When should we put out 1.7.1? Discuss ;) Chuck

4 7

numpy.scipy.org giving 404 error
by Peter Cock Feb. 26, 2013

Feb. 26, 2013

Hello all, http://numpy.scipy.org is giving a GitHub 404 error. As this used to be a widely used URL for the project, and likely appears in many printed references, could it be fixed to point to or redirect to the (relatively new) http://www.numpy.org site please? Thanks, Peter

1 0

What should np.ndarray.__contains__ do
by Sebastian Berg Feb. 26, 2013

Feb. 26, 2013

Hello all, currently the `__contains__` method or the `in` operator on arrays, does not return what the user would expect when in the operation `a in b` the `a` is not a single element (see "In [3]-[4]" below). The first solution coming to mind might be checking `all()` for all dimensions given in argument `a` (see line "In [5]" for a simplistic example). This does not play too well with broadcasting however, but one could maybe simply *not* broadcast at all (i.e. a.shape == b.shape[b.ndim-a.… [View More]

3 6

another discussion on numpy correlate (and convolution)
by Pierre Haessig Feb. 26, 2013

Feb. 26, 2013

Hi everybody, (just coming from a discussion on the performance of Matplotlib's (x)corr function which uses np.correlate) There have been already many discussions on how to compute (cross-)correlations of time-series in Python (like http://stackoverflow.com/questions/6991471/computing-cross-correlation-func…). The discussion is spread between the various stakeholders (just to name some I've in mind : scipy, statsmodel, matplotlib, ...). There are basically 2 implementations : time-domain and … [View More]frequency-domain (using fft + multiplication). My discussion is only on time-domain implementation. The key usecase which I feel is not well adressed today is when computing the (cross)correlation of two long timeseries on only a few lagpoints. For time-domain, one can either write its own implementation or rely on numpy.correlate. The latter rely on the fast C implementation _pyarray_correlate() in multiarraymodule.c (https://github.com/numpy/numpy/blob/master/numpy/core/src/multiarray/multia…) Now, the signature of this function is meant to accept one of three *computation modes* ('valid', 'same', 'full'). Thoses modes make a lot of sense when using this function in the context of convoluting two signals (say an "input array" and a "impulse response array"). In the context of computing the (cross)correlation of two long timeseries on only a few lagpoints, those computation modes are unadapted and potentially leads to huge computational and memory wastes (for example : https://github.com/matplotlib/matplotlib/blob/master/lib/matplotlib/axes.py…). For some time, I was thinking the solution was to write a dedicated function in one of the stakeholder modules (but which ? statsmodel ?) but now I came to think numpy is the best place to put a change and this would quickly benefit to all stackeholders downstream. Indeed, I looked more carefully at the C _pyarray_correlate() function, and I've come to the conclusion that these three computation modes are an unnecessary restriction because the actual computation relies on a triple for loop (https://github.com/numpy/numpy/blob/master/numpy/core/src/multiarray/multia…) which boundaries are governed by two integers `n_left` and `n_right` instead of the three modes. Maybe I've misunderstood the inner-working of this function, but I've the feeling that the ability to set these two integers directly instead of just the mode would open up the possibility to compute the correlation on only a few lagpoints. I'm fully aware that changing the signature of such a core numpy is out-of-question but I've got the feeling that a reasonably small some code refactor might lift the longgoing problem of (cross-)correlation computation. The Python np.correlate would require two additional keywords -`n_left` and `n_right`)which would override the `mode` keyword. Only the C function would need some more care to keep good backward compatibility in case it's used externally. What do other people think ? Best, Pierre [View Less]

4 5

Leaking memory problem
by Jaakko Luttinen Feb. 26, 2013

Feb. 26, 2013

Hi! I was wondering if anyone could help me in finding a memory leak problem with NumPy. My project is quite massive and I haven't been able to construct a simple example which would reproduce the problem.. I have an iterative algorithm which should not increase the memory usage as the iteration progresses. However, after the first iteration, 1GB of memory is used and it steadily increases until at about 100-200 iterations 8GB is used and the program exits with MemoryError. I have a … [View More]

5 5

Calling scipy blas from cython is extremely slow
by Sergio Callegari Feb. 24, 2013

Feb. 24, 2013

Hi, following the excellent advice of V. Armando Sole, I have finally succeeded in calling the blas routines shipped with scipy from cython. I am doing this to avoid shipping an extra blas library for some project of mine that uses scipy but has some things coded in cython for extra speed. So far I managed getting things working on Linux. Here is what I do: The following code snippet gives me the dgemv pointer (which is a pointer to a fortran function, even if it comes from scipy.linalg.… [View More]

3 4

Tabular package updated to v0.1
by Elaine Angelino Feb. 23, 2013

Feb. 23, 2013

Hi there, We are writing to announce that we've updated "Tabular" to v0.1. Tabular is a package of Python modules for working with tabular data. Its main object is the tabarray class, a data structure for holding and manipulating tabular data. By putting data into a tabarray object, you’ll get a representation of the data that is more flexible and powerful than a native Python representation. More specifically, tabarray provides: -- ultra-fast filtering, selection, and numerical analysis … [View More]

1 0

Smart way to do this?
by santhu kumar Feb. 23, 2013

Feb. 23, 2013

Hi all, I dont want to run a loop for this but it should be possible using numpy "smart" ways. a = np.ones(30) idx = np.array([2,3,2]) # there is a duplicate index of 2 a += 2 >>>a array([ 1., 1., 3., 3., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.]) But if we do this : for i in range(idx.shape[0]): a[idx[i]] += 2 >>> a array([ 1., 1., 5., 3., 1., 1., 1., 1.… [View More]

4 4

Windows, blas, atlas and dlls
by Sergio Callegari Feb. 22, 2013

Feb. 22, 2013

Hi, I have a project that includes a cython script which in turn does some direct access to a couple of cblas functions. This is necessary, since some matrix multiplications need to be done inside a tight loop that gets called thousands of times. Speedup wrt calling scipy.linalg.blas.cblas routines is 10x to 20x. Now, all this is very nice on linux where the setup script can assure that the cython code gets linked with the atlas dynamic library, which is the same library that numpy and scipy … [View More]

12 25