I was just looking at the einsum function.
To me, it's a really elegant and clear way of doing array operations, which
is the core of what numpy is about.
It removes the need to remember a range of functions, some of which I find
tricky (e.g. tile).
Unfortunately the present implementation seems ~ 4-6x slower than dot or
tensordot for decent size arrays.
I suspect it is because the implementation does not use blas/lapack calls.
cheers, George Nurser.
E.g. (in ipython on Mac OS X 10.6, python 2.7.3, numpy 1.6.2 from macports)
a = np.arange(600000.).reshape(1500,400)
b = np.arange(240000.).reshape(400,600)
c = np.arange(600)
d = np.arange(400)
%timeit np.einsum('ij,jk', a, b)
10 loops, best of 3: 156 ms per loop
10 loops, best of 3: 27.4 ms per loop
1000 loops, best of 3: 709 us per loop
10000 loops, best of 3: 121 us per loop
abig = np.arange(4800.).reshape(6,8,100)
bbig = np.arange(1920.).reshape(8,6,40)
%timeit np.einsum('ijk,jil->kl', abig, bbig)
1000 loops, best of 3: 425 us per loop
%timeit np.tensordot(abig,bbig, axes=([1,0],[0,1]))
10000 loops, best of 3: 105 us per loop
np.linalg.lstsq of a random-uniform A 50 x 32 with 3 columns all 0
returns x[:3] 0 as expected,
but 4 columns all 0 => huge x:
lstsq (50, 32) with 4 columns all 0:
[ -3.7e+09 -3.6e+13 -1.9e+13 -2.9e+12 7.3e-01 ...
This may be a roundoff problem, or even a Mac Altivec lapack bug,
not worth looking into. linalg.svd is ok though, odd.
Summary: if you run linalg.lstsq on big arrays,
either check max |x|
or regularize, do lstsq( vstack( A, weight * eye(dim) ),
hstack( b, zeros(dim) ))
As numpy.fromfile seems to require full file object functionalities
like seek, I can not use it with the sys.stdin pipe.
So how could I stream a binary pipe directly into numpy?
I can imagine storing the data in a string and use StringIO but the
files are 3.6 GB large, just the binary, and that will most likely be
much more as a string object.
Reading binary files on disk is NOT the problem, I would like to avoid
the temporary file if possible.
I have an array that is peppered throughout in random spots with 'nan'. I
would like to use 'cumsum', but I want it to reset the accumulation to 0
whenever a 'nan' is encountered. Is there a way to do this? Aside from a
loop - which is what I am going to setup here in a moment.
The issue tracking discussion seems to have died. Since github issues looks
to be a viable alternative at this point, I propose to turn it on for the
numpy repository and start directing people in that direction.
I am new to this list and still consider myself a python newby even
though I have worked with it for almost two years now. My question may be a
bit academic (because there is numpy.genfromtext and
matplotlib.mlab.csv2rec), yet it caused me quite a bit of searching before I
found out how to convert a list of lists into a recarray. Here is an example
----- start of code ----------------------------
import numpy as np
x=list(reader) # array of string arrays
h=x.pop(0) # first line has column headers
# convert data to recarray (must make list of tuples!)
xx=[tuple(elem) for elem in x]
z=np.dtype(('S16, f8, f4, f4'))
res.dtype.names=h # set header names
----- end of code ----------------------------
Doesn't this consume too much memory? This first creates the list of
lists as x, then converts this into a list of tuples xx, and then forms a
recarray res from xx. Why is there the limitation that a recarray must be
given a list of tuples instead of a list of lists (which comes out of the
csv.reader, for example)?
Thanks for any hints to make this more efficient or explanations why these
things are done the way they are done.
View this message in context: http://old.nabble.com/recarray-from-list-of-lists-tp34590361p34590361.html
Sent from the Numpy-discussion mailing list archive at Nabble.com.
I just wanted to let everyone know about our new release of Anaconda which
now has Spyder and Matplotlib working for Mac OS X and Windows.
Right now, it's the best way to get the pre-requisites for Numba --- though
I recommend getting the latest Numba from github as Numba is still under
Anaconda 1.1 Announcement
Continuum Analytics, Inc. is pleased to announce the release of Anaconda
Pro 1.1, which extends Anaconda’s programming capabilities to the desktop.
Anaconda Pro now includes an IDE (Spyder <http://spyder-ide.blogspot.com/>)
and plotting capabilities (Matplotlib <http://matplotlib.org/>), as well as
optimized versions of Numba Pro <https://store.continuum.io/cshop/numbapro>and
With these enhancements, AnacondaPro is a complete solution for server-side
computation or client-side development. It is equally well-suited for
supercomputers or for training in a classroom.
Available for Windows, Mac OS X, and Linux, Anaconda is a Python
distribution for scientific computing, engineering simulation, and business
intelligence & data management. It includes the most popular numerical and
scientific libraries used by scientists, engineers, and data analysts, with
a single integrated and flexible installer.
Continuum Analytics offers Enterprise-level support for Anaconda, covering
both its open source libraries as well as the included commercial libraries
For more information, to download a trial version of Anaconda Pro, or
download the completely free Anaconda CE, click
When comparing rows of a structured masked array I'm getting an
exception. A similar operation on an structured ndarray gives the
expected True/False result. Note that this exception only occurs if
one or more of the mask values are True, since otherwise both row
objects are np.void and the ndarray behavior is obtained. (See PR
#483, which seems like a good idea to me, compatibility issues
>>> from numpy import ma
>>> marr = ma.array([(1,2), (3,4)], dtype=[('x', int), ('y', int)])
>>> marr.mask = True
>>> marr.mask = True
>>> marr == marr
Traceback (most recent call last):
File "<ipython-input-5-fecaee6f2991>", line 1, in <module>
marr == marr
line 3573, in __eq__
check = ndarray.__eq__(self.filled(0), odata).view(type(self))
TypeError: descriptor '__eq__' requires a 'numpy.ndarray' object but
received a 'numpy.void'
>>> arr = np.array([(1,2), (3,4)], dtype=[('x', int), ('y', int)])
>>> arr == arr
I find it annoying that in casual use, if I print an array, that form can't be
directly used as subsequent input (or can it?).
What do others do about this? When I say casual, what I mean is, I write some
long-running task and at the end, print some small array. Now I decide I'd like
to cut/paste that into a new program.