Mailman 3 October 2012 - NumPy-Discussion

MaskedArray, view, and fill_value
by Thomas Robitaille 26 Oct '12

26 Oct '12

Hi everyone, Could someone explain the following behavior? >>> from numpy import ma >>> x = ma.array([1,2,3]) >>> x.fill_value = 1 >>> x.view(ma.MaskedArray) masked_array(data = [1 2 3], mask = False, fill_value = 999999) i.e. the fill_value gets reset. It looks from ma/core.py like it is deliberate: # Make sure to reset the _fill_value if needed if getattr(output, '_fill_value', None) is not None: output._fill_value = None but this doesn't make any sense to me... Any ideas? I'm thinking maybe it should be: # Make sure to reset the _fill_value if needed if getattr(output, '_fill_value', None) is not None and dtype is not None: output._fill_value = None because then it would make sense to reset the fill_value if the dtype has changed. Any thoughts? Thanks, Tom

1 0

einsum slow vs (tensor)dot
by George Nurser 26 Oct '12

26 Oct '12

Hi, I was just looking at the einsum function. To me, it's a really elegant and clear way of doing array operations, which is the core of what numpy is about. It removes the need to remember a range of functions, some of which I find tricky (e.g. tile). Unfortunately the present implementation seems ~ 4-6x slower than dot or tensordot for decent size arrays. I suspect it is because the implementation does not use blas/lapack calls. cheers, George Nurser. E.g. (in ipython on Mac OS X 10.6, python 2.7.3, numpy 1.6.2 from macports) a = np.arange(600000.).reshape(1500,400) b = np.arange(240000.).reshape(400,600) c = np.arange(600) d = np.arange(400) %timeit np.einsum('ij,jk', a, b) 10 loops, best of 3: 156 ms per loop %timeit np.dot(a,b) 10 loops, best of 3: 27.4 ms per loop %timeit np.einsum('i,ij,j',d,b,c) 1000 loops, best of 3: 709 us per loop %timeit np.dot(d,np.dot(b,c)) 10000 loops, best of 3: 121 us per loop or abig = np.arange(4800.).reshape(6,8,100) bbig = np.arange(1920.).reshape(8,6,40) %timeit np.einsum('ijk,jil->kl', abig, bbig) 1000 loops, best of 3: 425 us per loop %timeit np.tensordot(abig,bbig, axes=([1,0],[0,1])) 10000 loops, best of 3: 105 us per loop

2 2

np.linalg.lstsq with several columns all 0 => huge x ?
by denis 25 Oct '12

25 Oct '12

Folks, np.linalg.lstsq of a random-uniform A 50 x 32 with 3 columns all 0 returns x[:3] 0 as expected, but 4 columns all 0 => huge x: lstsq (50, 32) with 4 columns all 0: [ -3.7e+09 -3.6e+13 -1.9e+13 -2.9e+12 7.3e-01 ... This may be a roundoff problem, or even a Mac Altivec lapack bug, not worth looking into. linalg.svd is ok though, odd. Summary: if you run linalg.lstsq on big arrays, either check max |x| or regularize, do lstsq( vstack( A, weight * eye(dim) ), hstack( b, zeros(dim) )) cheers -- denis

2 2

how to pipe into numpy arrays?
by Michael Aye 24 Oct '12

24 Oct '12

As numpy.fromfile seems to require full file object functionalities like seek, I can not use it with the sys.stdin pipe. So how could I stream a binary pipe directly into numpy? I can imagine storing the data in a string and use StringIO but the files are 3.6 GB large, just the binary, and that will most likely be much more as a string object. Reading binary files on disk is NOT the problem, I would like to avoid the temporary file if possible.

3 3

Is there a way to reset an accumulate function?
by Cera, Tim 24 Oct '12

24 Oct '12

I have an array that is peppered throughout in random spots with 'nan'. I would like to use 'cumsum', but I want it to reset the accumulation to 0 whenever a 'nan' is encountered. Is there a way to do this? Aside from a loop - which is what I am going to setup here in a moment. Kindest regards, Tim

5 6

Issue tracking
by Charles R Harris 23 Oct '12

23 Oct '12

Hi All, The issue tracking discussion seems to have died. Since github issues looks to be a viable alternative at this point, I propose to turn it on for the numpy repository and start directing people in that direction. Thoughts? Chuck

9 40

re carray from list of lists
by maschu 22 Oct '12

22 Oct '12

Dear all, I am new to this list and still consider myself a python newby even though I have worked with it for almost two years now. My question may be a bit academic (because there is numpy.genfromtext and matplotlib.mlab.csv2rec), yet it caused me quite a bit of searching before I found out how to convert a list of lists into a recarray. Here is an example code: ----- start of code ---------------------------- import numpy as np import csv reader=csv.reader(open("csvfile.csv","rb"),delimiter=';') x=list(reader) # array of string arrays h=x.pop(0) # first line has column headers # convert data to recarray (must make list of tuples!) xx=[tuple(elem) for elem in x] z=np.dtype(('S16, f8, f4, f4')) res=np.array(xx,dtype=z) res.dtype.names=h # set header names ----- end of code ---------------------------- Doesn't this consume too much memory? This first creates the list of lists as x, then converts this into a list of tuples xx, and then forms a recarray res from xx. Why is there the limitation that a recarray must be given a list of tuples instead of a list of lists (which comes out of the csv.reader, for example)? Thanks for any hints to make this more efficient or explanations why these things are done the way they are done. Martin -- View this message in context: http://old.nabble.com/recarray-from-list-of-lists-tp34590361p34590361.html Sent from the Numpy-discussion mailing list archive at Nabble.com.

1 0

matrix norm
by Jason Grout 22 Oct '12

22 Oct '12

I'm curious why scipy/numpy defaults to calculating the Frobenius norm for matrices [1], when Matlab, Octave, and Mathematica all default to calculating the induced 2-norm [2]. Is it solely because the Frobenius norm is easier to calculate, or is there some other good mathematical reason for doing things differently? Thanks, Jason [1] http://docs.scipy.org/doc/numpy/reference/generated/numpy.linalg.norm.html [2] * Matlab (http://www.mathworks.com/help/matlab/ref/norm.html). * Octave (http://www.network-theory.co.uk/docs/octave3/octave_198.html). * Mathematica (http://reference.wolfram.com/mathematica/ref/Norm.html)

7 12

Announcing Anaconda version 1.1
by Travis Oliphant 22 Oct '12

22 Oct '12

I just wanted to let everyone know about our new release of Anaconda which now has Spyder and Matplotlib working for Mac OS X and Windows. Right now, it's the best way to get the pre-requisites for Numba --- though I recommend getting the latest Numba from github as Numba is still under active development. * Anaconda 1.1 Announcement Continuum Analytics, Inc. is pleased to announce the release of Anaconda Pro 1.1, which extends Anaconda’s programming capabilities to the desktop. Anaconda Pro now includes an IDE (Spyder <http://spyder-ide.blogspot.com/>) and plotting capabilities (Matplotlib <http://matplotlib.org/>), as well as optimized versions of Numba Pro <https://store.continuum.io/cshop/numbapro>and IOPro <https://store.continuum.io/cshop/iopro>. With these enhancements, AnacondaPro is a complete solution for server-side computation or client-side development. It is equally well-suited for supercomputers or for training in a classroom. Available for Windows, Mac OS X, and Linux, Anaconda is a Python distribution for scientific computing, engineering simulation, and business intelligence & data management. It includes the most popular numerical and scientific libraries used by scientists, engineers, and data analysts, with a single integrated and flexible installer. Continuum Analytics offers Enterprise-level support for Anaconda, covering both its open source libraries as well as the included commercial libraries from Continuum. For more information, to download a trial version of Anaconda Pro, or download the completely free Anaconda CE, click here<https://store.continuum.io/cshop/anaconda> . * * * * * Best regards, -Travis * *

2 2

Comparing rows in a structured masked array raises exception
by Tom Aldcroft 21 Oct '12

21 Oct '12

When comparing rows of a structured masked array I'm getting an exception. A similar operation on an structured ndarray gives the expected True/False result. Note that this exception only occurs if one or more of the mask values are True, since otherwise both row objects are np.void and the ndarray behavior is obtained. (See PR #483, which seems like a good idea to me, compatibility issues notwithstanding). >>> from numpy import ma >>> marr = ma.array([(1,2), (3,4)], dtype=[('x', int), ('y', int)]) >>> marr.mask[0][0] = True >>> marr.mask[1][0] = True >>> marr[0] == marr[1] Traceback (most recent call last): File "<ipython-input-5-fecaee6f2991>", line 1, in <module> marr[0] == marr[1] File "/data/cosmos2/ska/arch/x86_64-linux_CentOS-5/lib/python2.7/site-packages/numpy/ma/core.py", line 3573, in __eq__ check = ndarray.__eq__(self.filled(0), odata).view(type(self)) TypeError: descriptor '__eq__' requires a 'numpy.ndarray' object but received a 'numpy.void' >>> arr = np.array([(1,2), (3,4)], dtype=[('x', int), ('y', int)]) >>> arr[0] == arr[1] False >>> np.__version__ '1.6.2' Thanks, Tom

1 0