http://numpy.scipy.org is giving a GitHub 404 error.
As this used to be a widely used URL for the project,
and likely appears in many printed references, could
it be fixed to point to or redirect to the (relatively new)
http://www.numpy.org site please?
currently the `__contains__` method or the `in` operator on arrays, does
not return what the user would expect when in the operation `a in b` the
`a` is not a single element (see "In -" below).
The first solution coming to mind might be checking `all()` for all
dimensions given in argument `a` (see line "In " for a simplistic
example). This does not play too well with broadcasting however, but one
could maybe simply *not* broadcast at all (i.e. a.shape ==
b.shape[b.ndim-a.ndim:]) and raise an error/return False otherwise.
On the other hand one could say broadcasting of `a` onto `b` should be
"any" along that dimension (see "In "). The other way should maybe
raise an error though (see "In " to understand what I mean).
I think using broadcasting dimensions where `a` is repeated over `b` as
the dimensions to use "any" logic on is the most general way for numpy
to handle this consistently, while the other way around could be handled
with an `all` but to me makes so little sense that I think it should be
an error. Of course this is different to a list of lists, which gives
False in these cases, but arrays are not list of lists...
As a side note, since for loop, etc. use "for item in array", I do not
think that vectorizing along `a` as np.in1d does is reasonable. `in`
should return a single boolean.
I have opened an issue for it:
In : a = np.array([0, 2])
In : b = np.arange(10).reshape(5,2)
In : b
In : a in b
In : (b == a).any()
In : (b == a).all(0).any() # the 0 could be multiple axes
In : a_2d = a[None,:]
In : a_2d in b # broadcast dimension means "any" -> True
In : [0, 1] in b[:,:1] # should not work (or be False, not True)
(just coming from a discussion on the performance of Matplotlib's
(x)corr function which uses np.correlate)
There have been already many discussions on how to compute
(cross-)correlations of time-series in Python (like
The discussion is spread between the various stakeholders (just to name
some I've in mind : scipy, statsmodel, matplotlib, ...).
There are basically 2 implementations : time-domain and frequency-domain
(using fft + multiplication). My discussion is only on time-domain
implementation. The key usecase which I feel is not well adressed today
is when computing the (cross)correlation of two long timeseries on only
a few lagpoints.
For time-domain, one can either write its own implementation or rely on
numpy.correlate. The latter rely on the fast C implementation
_pyarray_correlate() in multiarraymodule.c
Now, the signature of this function is meant to accept one of three
*computation modes* ('valid', 'same', 'full'). Thoses modes make a lot
of sense when using this function in the context of convoluting two
signals (say an "input array" and a "impulse response array"). In the
context of computing the (cross)correlation of two long timeseries on
only a few lagpoints, those computation modes are unadapted and
potentially leads to huge computational and memory wastes
(for example :
For some time, I was thinking the solution was to write a dedicated
function in one of the stakeholder modules (but which ? statsmodel ?)
but now I came to think numpy is the best place to put a change and this
would quickly benefit to all stackeholders downstream.
Indeed, I looked more carefully at the C _pyarray_correlate() function,
and I've come to the conclusion that these three computation modes are
an unnecessary restriction because the actual computation relies on a
triple for loop
which boundaries are governed by two integers `n_left` and `n_right`
instead of the three modes.
Maybe I've misunderstood the inner-working of this function, but I've
the feeling that the ability to set these two integers directly instead
of just the mode would open up the possibility to compute the
correlation on only a few lagpoints.
I'm fully aware that changing the signature of such a core numpy is
out-of-question but I've got the feeling that a reasonably small some
code refactor might lift the longgoing problem of (cross-)correlation
computation. The Python np.correlate would require two additional
keywords -`n_left` and `n_right`)which would override the `mode`
keyword. Only the C function would need some more care to keep good
backward compatibility in case it's used externally.
What do other people think ?
I was wondering if anyone could help me in finding a memory leak problem
with NumPy. My project is quite massive and I haven't been able to
construct a simple example which would reproduce the problem..
I have an iterative algorithm which should not increase the memory usage
as the iteration progresses. However, after the first iteration, 1GB of
memory is used and it steadily increases until at about 100-200
iterations 8GB is used and the program exits with MemoryError.
I have a collection of objects which contain large arrays. In each
iteration, the objects are updated in turns by re-computing the arrays
they contain. The number of arrays and their sizes are constant (do not
change during the iteration). So the memory usage should not increase,
and I'm a bit confused, how can the program run out of memory if it can
easily compute at least a few iterations..
I've tried to use Pympler, but I've understood that it doesn't show the
memory usage of NumPy arrays.. ?
I also tried gc.set_debug(gc.DEBUG_UNCOLLECTABLE) and then printing
gc.garbage at each iteration, but that doesn't show anything.
Does anyone have any ideas how to debug this kind of memory leak bug?
And how to find out whether the bug is in my code, NumPy or elsewhere?
Thanks for any help!
following the excellent advice of V. Armando Sole, I have finally succeeded in
calling the blas routines shipped with scipy from cython.
I am doing this to avoid shipping an extra blas library for some project of
mine that uses scipy but has some things coded in cython for extra speed.
So far I managed getting things working on Linux. Here is what I do:
The following code snippet gives me the dgemv pointer (which is a pointer to a
fortran function, even if it comes from scipy.linalg.blas.cblas, weird).
from cpython cimport PyCObject_AsVoidPtr
import scipy as sp
ctypedef void (*dgemv_ptr) (char *trans, int *m, int *n,\
double *alpha, double *a, int *lda, double *x,\
double *beta, double *y, int *incy)
cdef dgemv_ptr dgemv=<dgemv_ptr>PyCObject_AsVoidPtr(\
Then, in a tight loop, I can call dgemv by first defining the constants
and then calling dgemv inside the loop
cdef int one=1
cdef double onedot = 1.0
cdef double zerodot = 0.0
cdef char trans = 'N'
for i in xrange(N):
dgemv(&trans, &nq, &order,\
&onedot, <double *>np.PyArray_DATA(C), &order, \
<double*>np.PyArray_DATA(c_x0), &one, \
&zerodot, <double*>np.PyArray_DATA(y0), &one)
It works, but it is many many times slower than linking to the cblas that is
available on the same system. Specifically, I have about 8 calls to blas in my
tight loop, 4 of them are to dgemv and the others are to dcopy. Changing a
single dgemv call from the system cblas to the blas function returned by
scipy.linalg.blas.cblas.dgemv._cpointer makes the execution time of a test case
jump from about 0.7 s to 1.25 on my system.
Any clue about why is this happening?
In the end, on linux, scipy dynamically link to atlas exactly as I link to
atlas when I use the cblas functions.
We are writing to announce that we've updated "Tabular" to v0.1.
Tabular is a package of Python modules for working with tabular data.
Its main object is the tabarray class, a data structure for holding
and manipulating tabular data. By putting data into a tabarray object,
you’ll get a representation of the data that is more flexible and
powerful than a native Python representation. More specifically,
-- ultra-fast filtering, selection, and numerical analysis methods,
using convenient Matlab-style matrix operation syntax
-- spreadsheet-style operations, including row & column operations,
'sort', 'replace', 'aggregate', 'pivot', and 'join'
-- flexible load and save methods for a variety of file formats,
including delimited text (CSV), binary, and HTML
-- helpful inference algorithms for determining formatting parameters
and data types of input files
You can download Tabular from PyPI
(http://pypi.python.org/pypi/tabular/) or alternatively clone our git
repository from github (http://github.com/yamins81/tabular). We also
have posted tutorial-style Sphinx documentation
The tabarray object is based on the record array object from the
Numerical Python package (NumPy), and Tabular is built to interface
well with NumPy in general. Our intended audience is two-fold: (1)
Python users who, though they may not be familiar with NumPy, are in
need of a way to work with tabular data, and (2) NumPy users who would
like to do spreadsheet-style operations on top of their more
We hope that some of you find Tabular useful!
Elaine and Dan
I dont want to run a loop for this but it should be possible using numpy
a = np.ones(30)
idx = np.array([2,3,2]) # there is a duplicate index of 2
a += 2
array([ 1., 1., 3., 3., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
1., 1., 1., 1.])
But if we do this :
for i in range(idx.shape):
a[idx[i]] += 2
array([ 1., 1., 5., 3., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
1., 1., 1., 1.])
How to achieve the second result without looping??
I have a project that includes a cython script which in turn does some direct
access to a couple of cblas functions. This is necessary, since some matrix
multiplications need to be done inside a tight loop that gets called thousands
of times. Speedup wrt calling scipy.linalg.blas.cblas routines is 10x to 20x.
Now, all this is very nice on linux where the setup script can assure that the
cython code gets linked with the atlas dynamic library, which is the same
library that numpy and scipy link to on this platform.
However, I now have trouble in providing easy ways to use my project in
windows. All the free windows distros for scientific python that I have
looked at (python(x,y) and winpython) seem to repackage the windows version of
numpy/scipy as it is built in the numpy/scipy development sites. These appear
to statically link atlas inside some pyd files. So I get no atlas to link
against, and I have to ship an additional pre-built atlas with my project.
All this seems somehow inconvenient.
In the end, when my code runs, due to static linking I get 3 replicas of 2
slightly different atlas libs in memory. One coming with _dotblas.pyd in numpy,
another one with cblas.pyd or fblas.pyd in scipy. And the last one as the one
shipped in my code.
Would it be possible to have a win distro of scipy which provides some
pre built atlas dlls, and to have numpy and scipy dynamically link to them?
This would save memory and also provide a decent blas to link to for things
done in cython. But I believe there must be some problem since the scipy site
"IMPORTANT: NumPy and SciPy in Windows can currently only make use of CBLAS and
LAPACK as static libraries - DLLs are not supported."
Can someone please explain why or link to an explanation?
Unfortunately, not having a good, prebuilt and cheap blas implementation in
windows is really striking me as a severe limitation, since you loose the
ability to prototype in python/scipy and then move to C or Cython the major
bottlenecks to achieve speed.
Many thanks in advance!
Is this a bug in numpy.einsum?
>>> np.einsum(3, , 2, , )
ValueError: If 'op_axes' or 'itershape' is not NULL in theiterator
constructor, 'oa_ndim' must be greater than zero
I think it should return 6 (i.e., 3*2).
Hi all, and apologies for a little cross-posting:
First, thanks to those of you who have used and contributed to the
PyDSTool math modeling environment . This project has greatly
benefitted from the underlying platform of numpy / scipy / matplotlib
/ ipython. Going forward I have three goals, for which I would like
the urgent input and support of existing or potential users.
(i) I have ideas for expanding PyDSTool with innovative tools in my
research area, which is essentially for the reverse engineering of
complex mechanisms in multi-scale dynamic systems . These tools
have already been prototyped and show promise, but they need a lot of
(ii) I want to grow and develop the community of users who will help
drive new ideas, provide feedback, and collaborate on writing and
testing code for both the core and application aspects of PyDSTool.
(iii) The first two goals will help me to expand the scientific /
engineering applications and use cases of PyDSTool as well as further
sustain the project in the long-term.
I am applying for NSF funding to support these software and
application goals over the next few years , but the proposal
deadline is in just four weeks! If you are interested in helping in
any way I would greatly appreciate your replies (off list) to either
of the following queries:
I need to better understand my existing and potential users, many of
whom may not be registered on the sourceforge users list. Please tell
me who you are and what you use PyDSTool for. If you are not using it
yet but you’re interested in this area then please provide feedback
regarding what you would like to see change.
If you are interested in these future goals, even if you are not an
existing user but may be in the future, please write a brief letter of
support on a letterhead document that I will send in with the proposal
as PDFs. I have sample text that I can send you, as well as my draft
proposal’s introduction and specific aims. These letters can make a
great deal of difference during review.
Without funding, collaborators, user demand and community support,
these more ambitious goals for PyDSTool will not happen, although I am
committed to a basic level of maintenance. For instance, based on user
feedback I am about to release an Ubuntu-based Live CD  that will
allow users to try PyDSTool on any OS without having to install it.
PyDSTool will also acquire an improved setup procedure and will be
added to the NeuroDebian repository , among others. I am also
finalizing an integrated interface to CUDA GPUs to perform fast
parallel ODE solving .
Thanks for your time,
 http://www.ni.gsu.edu/~rclewley/Research/index.html, and in
 NSF Software Infrastructure for Sustained Innovation (SI2-SSE)
Robert Clewley, Ph.D.
Neuroscience Institute and
Department of Mathematics and Statistics
Georgia State University
PO Box 5030
Atlanta, GA 30302, USA
tel: 404-413-6420 fax: 404-413-5446