From madsmh at gmail.com Wed Feb 1 06:56:06 2012 From: madsmh at gmail.com (Mads M. Hansen) Date: Wed, 1 Feb 2012 12:56:06 +0100 Subject: [SciPy-User] NumPy and SciPy test failures Message-ID: I have built NumPy 1.6.1 and SciPy 0.10.0 for Python 3.2 on a Fedora 16 system and I used gfortran, but when I run the tests I get the following failures and errors NumPy: ====================================================================== FAIL: test_kind.TestKind.test_all ---------------------------------------------------------------------- Traceback (most recent call last): File "/usr/lib/python3.2/site-packages/nose/case.py", line 198, in runTest self.test(*self.arg) File "/usr/lib64/python3.2/site-packages/numpy/f2py/tests/test_kind.py", line 30, in test_all 'selectedrealkind(%s): expected %r but got %r' % (i, selected_real_kind(i), selectedrealkind(i))) File "/usr/lib64/python3.2/site-packages/numpy/testing/utils.py", line 34, in assert_ raise AssertionError(msg) AssertionError: selectedrealkind(19): expected -1 but got 16 ====================================================================== FAIL: test_doctests (test_polynomial.TestDocs) ---------------------------------------------------------------------- Traceback (most recent call last): File "/usr/lib64/python3.2/site-packages/numpy/lib/tests/test_polynomial.py", line 84, in test_doctests return rundocs() File "/usr/lib64/python3.2/site-packages/numpy/testing/utils.py", line 988, in rundocs raise AssertionError("Some doctests failed:\n%s" % "\n".join(msg)) AssertionError: Some doctests failed: ********************************************************************** File "/usr/lib64/python3.2/site-packages/numpy/lib/tests/test_polynomial.py", line 32, in test_polynomial Failed example: p / q Expected: (poly1d([ 0.33333333]), poly1d([ 1.33333333, 2.66666667])) Got: (poly1d([ 0.333]), poly1d([ 1.333, 2.667])) ********************************************************************** File "/usr/lib64/python3.2/site-packages/numpy/lib/tests/test_polynomial.py", line 54, in test_polynomial Failed example: p.integ() Expected: poly1d([ 0.33333333, 1. , 3. , 0. ]) Got: poly1d([ 0.333, 1. , 3. , 0. ]) ********************************************************************** File "/usr/lib64/python3.2/site-packages/numpy/lib/tests/test_polynomial.py", line 56, in test_polynomial Failed example: p.integ(1) Expected: poly1d([ 0.33333333, 1. , 3. , 0. ]) Got: poly1d([ 0.333, 1. , 3. , 0. ]) ********************************************************************** File "/usr/lib64/python3.2/site-packages/numpy/lib/tests/test_polynomial.py", line 58, in test_polynomial Failed example: p.integ(5) Expected: poly1d([ 0.00039683, 0.00277778, 0.025 , 0. , 0. , 0. , 0. , 0. ]) Got: poly1d([ 0. , 0.003, 0.025, 0. , 0. , 0. , 0. , 0. ]) ----------------------------------------------------------------------: And SciPy: ====================================================================== ERROR: Failure: ImportError (cannot import name _minimize_neldermead) ---------------------------------------------------------------------- Traceback (most recent call last): File "/usr/lib/python3.2/site-packages/nose/failure.py", line 37, in runTest raise self.exc_class(self.exc_val).with_traceback(self.tb) File "/usr/lib/python3.2/site-packages/nose/loader.py", line 390, in loadTestsFromName addr.filename, addr.module) File "/usr/lib/python3.2/site-packages/nose/importer.py", line 39, in importFromPath return self.importFromDir(dir_path, fqname) File "/usr/lib/python3.2/site-packages/nose/importer.py", line 86, in importFromDir mod = load_module(part_fqname, fh, filename, desc) File "/usr/lib64/python3.2/site-packages/scipy/optimize/tests/test_anneal.py", line 10, in from scipy.optimize import anneal, minimize File "/usr/lib64/python3.2/site-packages/scipy/optimize/minimize.py", line 16, in from .optimize import _minimize_neldermead, _minimize_powell, \ ImportError: cannot import name _minimize_neldermead ====================================================================== ERROR: Failure: ImportError (cannot import name cwt) ---------------------------------------------------------------------- Traceback (most recent call last): File "/usr/lib/python3.2/site-packages/nose/failure.py", line 37, in runTest raise self.exc_class(self.exc_val).with_traceback(self.tb) File "/usr/lib/python3.2/site-packages/nose/loader.py", line 390, in loadTestsFromName addr.filename, addr.module) File "/usr/lib/python3.2/site-packages/nose/importer.py", line 39, in importFromPath return self.importFromDir(dir_path, fqname) File "/usr/lib/python3.2/site-packages/nose/importer.py", line 86, in importFromDir mod = load_module(part_fqname, fh, filename, desc) File "/usr/lib64/python3.2/site-packages/scipy/signal/tests/test_peak_finding.py", line 7, in from scipy.signal._peak_finding import argrelmax, find_peaks_cwt, _identify_ridge_lines File "/usr/lib64/python3.2/site-packages/scipy/signal/_peak_finding.py", line 7, in from scipy.signal.wavelets import cwt, ricker ImportError: cannot import name cwt ====================================================================== ERROR: test_iv_cephes_vs_amos_mass_test (test_basic.TestBessel) ---------------------------------------------------------------------- Traceback (most recent call last): File "/usr/lib64/python3.2/site-packages/scipy/special/tests/test_basic.py", line 1642, in test_iv_cephes_vs_amos_mass_test c1 = special.iv(v, x) RuntimeWarning: divide by zero encountered in iv ====================================================================== ERROR: test_continuous_extra.test_cont_extra(, (0.4141193182605212,), 'loggamma loc, scale test') ---------------------------------------------------------------------- Traceback (most recent call last): File "/usr/lib/python3.2/site-packages/nose/case.py", line 198, in runTest self.test(*self.arg) File "/usr/lib64/python3.2/site-packages/scipy/stats/tests/test_continuous_extra.py", line 78, in check_loc_scale m,v = distfn.stats(*arg) File "/usr/lib64/python3.2/site-packages/scipy/stats/distributions.py", line 1632, in stats mu = self._munp(1.0,*goodargs) File "/usr/lib64/python3.2/site-packages/scipy/stats/distributions.py", line 4120, in _munp return self._mom0_sc(n,*args) File "/usr/lib64/python3.2/site-packages/scipy/stats/distributions.py", line 1166, in _mom0_sc self.b, args=(m,)+args)[0] File "/usr/lib64/python3.2/site-packages/scipy/integrate/quadpack.py", line 247, in quad retval = _quad(func,a,b,args,full_output,epsabs,epsrel,limit,points) File "/usr/lib64/python3.2/site-packages/scipy/integrate/quadpack.py", line 314, in _quad return _quadpack._qagie(func,bound,infbounds,args,full_output,epsabs,epsrel,limit) File "/usr/lib64/python3.2/site-packages/scipy/stats/distributions.py", line 1163, in _mom_integ0 return x**m * self.pdf(x,*args) File "/usr/lib64/python3.2/site-packages/scipy/stats/distributions.py", line 1263, in pdf place(output,cond,self._pdf(*goodargs) / scale) File "/usr/lib64/python3.2/site-packages/scipy/stats/distributions.py", line 4113, in _pdf return exp(c*x-exp(x)-gamln(c)) RuntimeWarning: overflow encountered in exp ====================================================================== ERROR: test_continuous_extra.test_cont_extra(, (1.8771398388773268,), 'lomax loc, scale test') ---------------------------------------------------------------------- Traceback (most recent call last): File "/usr/lib/python3.2/site-packages/nose/case.py", line 198, in runTest self.test(*self.arg) File "/usr/lib64/python3.2/site-packages/scipy/stats/tests/test_continuous_extra.py", line 78, in check_loc_scale m,v = distfn.stats(*arg) File "/usr/lib64/python3.2/site-packages/scipy/stats/distributions.py", line 1618, in stats mu, mu2, g1, g2 = self._stats(*args) File "/usr/lib64/python3.2/site-packages/scipy/stats/distributions.py", line 4644, in _stats mu, mu2, g1, g2 = pareto.stats(c, loc=-1.0, moments='mvsk') File "/usr/lib64/python3.2/site-packages/scipy/stats/distributions.py", line 1616, in stats mu, mu2, g1, g2 = self._stats(*args,**{'moments':moments}) File "/usr/lib64/python3.2/site-packages/scipy/stats/distributions.py", line 4595, in _stats vals = 2*(bt+1.0)*sqrt(b-2.0)/((b-3.0)*sqrt(b)) RuntimeWarning: invalid value encountered in sqrt ====================================================================== ERROR: test_discrete_basic.test_discrete_extra(, (30, 12, 6), 'hypergeom entropy nan test') ---------------------------------------------------------------------- Traceback (most recent call last): File "/usr/lib/python3.2/site-packages/nose/case.py", line 198, in runTest self.test(*self.arg) File "/usr/lib64/python3.2/site-packages/scipy/stats/tests/test_discrete_basic.py", line 199, in check_entropy ent = distfn.entropy(*arg) File "/usr/lib64/python3.2/site-packages/scipy/stats/distributions.py", line 6315, in entropy place(output,cond0,self.vecentropy(*goodargs)) File "/usr/lib64/python3.2/site-packages/numpy/lib/function_base.py", line 1863, in __call__ theout = self.thefunc(*newargs) File "/usr/lib64/python3.2/site-packages/scipy/stats/distributions.py", line 6669, in _entropy lvals = where(vals==0.0,0.0,log(vals)) RuntimeWarning: divide by zero encountered in log ====================================================================== ERROR: test_discrete_basic.test_discrete_extra(, (21, 3, 12), 'hypergeom entropy nan test') ---------------------------------------------------------------------- Traceback (most recent call last): File "/usr/lib/python3.2/site-packages/nose/case.py", line 198, in runTest self.test(*self.arg) File "/usr/lib64/python3.2/site-packages/scipy/stats/tests/test_discrete_basic.py", line 199, in check_entropy ent = distfn.entropy(*arg) File "/usr/lib64/python3.2/site-packages/scipy/stats/distributions.py", line 6315, in entropy place(output,cond0,self.vecentropy(*goodargs)) File "/usr/lib64/python3.2/site-packages/numpy/lib/function_base.py", line 1863, in __call__ theout = self.thefunc(*newargs) File "/usr/lib64/python3.2/site-packages/scipy/stats/distributions.py", line 6669, in _entropy lvals = where(vals==0.0,0.0,log(vals)) RuntimeWarning: divide by zero encountered in log ====================================================================== ERROR: test_fit (test_distributions.TestFitMethod) ---------------------------------------------------------------------- Traceback (most recent call last): File "/usr/lib64/python3.2/site-packages/scipy/stats/tests/test_distributions.py", line 439, in test_fit vals2 = distfunc.fit(res, optimizer='powell') File "/usr/lib64/python3.2/site-packages/scipy/stats/distributions.py", line 1875, in fit vals = optimizer(func,x0,args=(ravel(data),),disp=0) File "/usr/lib64/python3.2/site-packages/scipy/optimize/optimize.py", line 1622, in fmin_powell fval, x, direc1 = _linesearch_powell(func, x, direc1, tol=xtol*100) File "/usr/lib64/python3.2/site-packages/scipy/optimize/optimize.py", line 1492, in _linesearch_powell alpha_min, fret, iter, num = brent(myfunc, full_output=1, tol=tol) File "/usr/lib64/python3.2/site-packages/scipy/optimize/optimize.py", line 1313, in brent brent.optimize() File "/usr/lib64/python3.2/site-packages/scipy/optimize/optimize.py", line 1214, in optimize tmp2 = (x-v)*(fx-fw) RuntimeWarning: invalid value encountered in double_scalars ====================================================================== ERROR: test_fix_fit (test_distributions.TestFitMethod) ---------------------------------------------------------------------- Traceback (most recent call last): File "/usr/lib64/python3.2/site-packages/scipy/stats/tests/test_distributions.py", line 460, in test_fix_fit vals2 = distfunc.fit(res,fscale=1) File "/usr/lib64/python3.2/site-packages/scipy/stats/distributions.py", line 1875, in fit vals = optimizer(func,x0,args=(ravel(data),),disp=0) File "/usr/lib64/python3.2/site-packages/scipy/optimize/optimize.py", line 302, in fmin and max(abs(fsim[0]-fsim[1:])) <= ftol): RuntimeWarning: invalid value encountered in subtract ====================================================================== ERROR: Failure: ImportError (cannot import name common_info) ---------------------------------------------------------------------- Traceback (most recent call last): File "/usr/lib/python3.2/site-packages/nose/failure.py", line 37, in runTest raise self.exc_class(self.exc_val).with_traceback(self.tb) File "/usr/lib/python3.2/site-packages/nose/loader.py", line 390, in loadTestsFromName addr.filename, addr.module) File "/usr/lib/python3.2/site-packages/nose/importer.py", line 39, in importFromPath return self.importFromDir(dir_path, fqname) File "/usr/lib/python3.2/site-packages/nose/importer.py", line 86, in importFromDir mod = load_module(part_fqname, fh, filename, desc) File "/usr/lib64/python3.2/site-packages/scipy/weave/__init__.py", line 26, in from .inline_tools import inline File "/usr/lib64/python3.2/site-packages/scipy/weave/inline_tools.py", line 5, in from . import ext_tools File "/usr/lib64/python3.2/site-packages/scipy/weave/ext_tools.py", line 7, in from . import converters File "/usr/lib64/python3.2/site-packages/scipy/weave/converters.py", line 4, in from . import common_info ImportError: cannot import name common_info ====================================================================== FAIL: test_mio.test_mat4_3d ---------------------------------------------------------------------- Traceback (most recent call last): File "/usr/lib/python3.2/site-packages/nose/case.py", line 198, in runTest self.test(*self.arg) File "/usr/lib64/python3.2/site-packages/scipy/io/matlab/tests/test_mio.py", line 740, in test_mat4_3d stream, {'a': arr}, True, '4') File "/usr/lib64/python3.2/site-packages/numpy/testing/utils.py", line 1008, in assert_raises return nose.tools.assert_raises(*args,**kwargs) AssertionError: DeprecationWarning not raised by functools.partial(, oned_as='row') ====================================================================== FAIL: Regression test for #651: better handling of badly conditioned ---------------------------------------------------------------------- Traceback (most recent call last): File "/usr/lib64/python3.2/site-packages/scipy/signal/tests/test_filter_design.py", line 34, in test_bad_filter assert_raises(BadCoefficients, tf2zpk, [1e-15], [1.0, 1.0]) File "/usr/lib64/python3.2/site-packages/numpy/testing/utils.py", line 1008, in assert_raises return nose.tools.assert_raises(*args,**kwargs) AssertionError: BadCoefficients not raised by tf2zpk ---------------------------------------------------------------------- The NumPy errors seem to be mostly rounding errors, but it seems to round quite aggressively. How siginificant are these errors? .. Mads From nouiz at nouiz.org Wed Feb 1 07:44:56 2012 From: nouiz at nouiz.org (=?ISO-8859-1?Q?Fr=E9d=E9ric_Bastien?=) Date: Wed, 1 Feb 2012 07:44:56 -0500 Subject: [SciPy-User] Accumulation sum using indirect indexes In-Reply-To: References: Message-ID: It will be slow, but you can make a python loop. Fred On Jan 31, 2012 3:34 PM, "Alexander Kalinin" wrote: > Hello! > > I use SciPy in computer graphics applications. My task is to calculate > vertex normals by averaging faces normals. In other words I want to > accumulate vectors with the same ids. For example, > > ids = numpy.array([0, 1, 1, 2]) > n = numpy.array([ [0.1, 0.1, 0.1], [0.1, 0.1, 0.1], [0.1, 0.1, 0.1], [0.1, > 0.1 0.1] ]) > > I need result: > nv = ([ [0.1, 0.1, 0.1], [0.2, 0.2, 0.2], [0.1, 0.1, 0.1]]) > > The most simple code: > nv[ids] += n > does not work, I know about this. For 1D arrays I use numpy.bincount(...) > function. But this function does not work for 2D arrays. > > So, my question. What is the best way calculate accumulation sum for 2D > arrays using indirect indexes? > > Sincerely, > Alexander > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Wed Feb 1 09:01:11 2012 From: charlesr.harris at gmail.com (Charles R Harris) Date: Wed, 1 Feb 2012 07:01:11 -0700 Subject: [SciPy-User] NumPy and SciPy test failures In-Reply-To: References: Message-ID: On Wed, Feb 1, 2012 at 4:56 AM, Mads M. Hansen wrote: > I have built NumPy 1.6.1 and SciPy 0.10.0 for Python 3.2 on a Fedora > 16 system and I used gfortran, but when I run the tests I get the > following failures and errors > > NumPy: > > ====================================================================== > FAIL: test_kind.TestKind.test_all > ---------------------------------------------------------------------- > Traceback (most recent call last): > File "/usr/lib/python3.2/site-packages/nose/case.py", line 198, in runTest > self.test(*self.arg) > File "/usr/lib64/python3.2/site-packages/numpy/f2py/tests/test_kind.py", > line 30, in test_all > 'selectedrealkind(%s): expected %r but got %r' % (i, > selected_real_kind(i), selectedrealkind(i))) > File "/usr/lib64/python3.2/site-packages/numpy/testing/utils.py", > line 34, in assert_ > raise AssertionError(msg) > AssertionError: selectedrealkind(19): expected -1 but got 16 > > I think this is a bug in the test that comes from adding the float16 type. > ====================================================================== > FAIL: test_doctests (test_polynomial.TestDocs) > ---------------------------------------------------------------------- > Traceback (most recent call last): > File > "/usr/lib64/python3.2/site-packages/numpy/lib/tests/test_polynomial.py", > line 84, in test_doctests > return rundocs() > File "/usr/lib64/python3.2/site-packages/numpy/testing/utils.py", > line 988, in rundocs > raise AssertionError("Some doctests failed:\n%s" % "\n".join(msg)) > AssertionError: Some doctests failed: > ********************************************************************** > File > "/usr/lib64/python3.2/site-packages/numpy/lib/tests/test_polynomial.py", > line 32, in test_polynomial > Failed example: > p / q > Expected: > (poly1d([ 0.33333333]), poly1d([ 1.33333333, 2.66666667])) > Got: > (poly1d([ 0.333]), poly1d([ 1.333, 2.667])) > > ********************************************************************** > File > "/usr/lib64/python3.2/site-packages/numpy/lib/tests/test_polynomial.py", > line 54, in test_polynomial > Failed example: > p.integ() > Expected: > poly1d([ 0.33333333, 1. , 3. , 0. ]) > Got: > poly1d([ 0.333, 1. , 3. , 0. ]) > > ********************************************************************** > File > "/usr/lib64/python3.2/site-packages/numpy/lib/tests/test_polynomial.py", > line 56, in test_polynomial > Failed example: > p.integ(1) > Expected: > poly1d([ 0.33333333, 1. , 3. , 0. ]) > Got: > poly1d([ 0.333, 1. , 3. , 0. ]) > > ********************************************************************** > File > "/usr/lib64/python3.2/site-packages/numpy/lib/tests/test_polynomial.py", > line 58, in test_polynomial > Failed example: > p.integ(5) > Expected: > poly1d([ 0.00039683, 0.00277778, 0.025 , 0. , 0. , > 0. , 0. , 0. ]) > Got: > poly1d([ 0. , 0.003, 0.025, 0. , 0. , 0. , 0. , 0. ]) > > > ----------------------------------------------------------------------: > > And SciPy: > > ====================================================================== > ERROR: Failure: ImportError (cannot import name _minimize_neldermead) > ---------------------------------------------------------------------- > Traceback (most recent call last): > File "/usr/lib/python3.2/site-packages/nose/failure.py", line 37, in > runTest > raise self.exc_class(self.exc_val).with_traceback(self.tb) > File "/usr/lib/python3.2/site-packages/nose/loader.py", line 390, in > loadTestsFromName > addr.filename, addr.module) > File "/usr/lib/python3.2/site-packages/nose/importer.py", line 39, > in importFromPath > return self.importFromDir(dir_path, fqname) > File "/usr/lib/python3.2/site-packages/nose/importer.py", line 86, > in importFromDir > mod = load_module(part_fqname, fh, filename, desc) > File > "/usr/lib64/python3.2/site-packages/scipy/optimize/tests/test_anneal.py", > line 10, in > from scipy.optimize import anneal, minimize > File "/usr/lib64/python3.2/site-packages/scipy/optimize/minimize.py", > line 16, in > from .optimize import _minimize_neldermead, _minimize_powell, \ > ImportError: cannot import name _minimize_neldermead > > ====================================================================== > ERROR: Failure: ImportError (cannot import name cwt) > ---------------------------------------------------------------------- > Traceback (most recent call last): > File "/usr/lib/python3.2/site-packages/nose/failure.py", line 37, in > runTest > raise self.exc_class(self.exc_val).with_traceback(self.tb) > File "/usr/lib/python3.2/site-packages/nose/loader.py", line 390, in > loadTestsFromName > addr.filename, addr.module) > File "/usr/lib/python3.2/site-packages/nose/importer.py", line 39, > in importFromPath > return self.importFromDir(dir_path, fqname) > File "/usr/lib/python3.2/site-packages/nose/importer.py", line 86, > in importFromDir > mod = load_module(part_fqname, fh, filename, desc) > File > "/usr/lib64/python3.2/site-packages/scipy/signal/tests/test_peak_finding.py", > line 7, in > from scipy.signal._peak_finding import argrelmax, find_peaks_cwt, > _identify_ridge_lines > File "/usr/lib64/python3.2/site-packages/scipy/signal/_peak_finding.py", > line 7, in > from scipy.signal.wavelets import cwt, ricker > ImportError: cannot import name cwt > > ====================================================================== > ERROR: test_iv_cephes_vs_amos_mass_test (test_basic.TestBessel) > ---------------------------------------------------------------------- > Traceback (most recent call last): > File > "/usr/lib64/python3.2/site-packages/scipy/special/tests/test_basic.py", > line 1642, in test_iv_cephes_vs_amos_mass_test > c1 = special.iv(v, x) > RuntimeWarning: divide by zero encountered in iv > > ====================================================================== > ERROR: > test_continuous_extra.test_cont_extra( object at 0x6b38f10>, (0.4141193182605212,), 'loggamma loc, scale > test') > ---------------------------------------------------------------------- > Traceback (most recent call last): > File "/usr/lib/python3.2/site-packages/nose/case.py", line 198, in runTest > self.test(*self.arg) > File > "/usr/lib64/python3.2/site-packages/scipy/stats/tests/test_continuous_extra.py", > line 78, in check_loc_scale > m,v = distfn.stats(*arg) > File "/usr/lib64/python3.2/site-packages/scipy/stats/distributions.py", > line 1632, in stats > mu = self._munp(1.0,*goodargs) > File "/usr/lib64/python3.2/site-packages/scipy/stats/distributions.py", > line 4120, in _munp > return self._mom0_sc(n,*args) > File "/usr/lib64/python3.2/site-packages/scipy/stats/distributions.py", > line 1166, in _mom0_sc > self.b, args=(m,)+args)[0] > File "/usr/lib64/python3.2/site-packages/scipy/integrate/quadpack.py", > line 247, in quad > retval = _quad(func,a,b,args,full_output,epsabs,epsrel,limit,points) > File "/usr/lib64/python3.2/site-packages/scipy/integrate/quadpack.py", > line 314, in _quad > return > _quadpack._qagie(func,bound,infbounds,args,full_output,epsabs,epsrel,limit) > File "/usr/lib64/python3.2/site-packages/scipy/stats/distributions.py", > line 1163, in _mom_integ0 > return x**m * self.pdf(x,*args) > File "/usr/lib64/python3.2/site-packages/scipy/stats/distributions.py", > line 1263, in pdf > place(output,cond,self._pdf(*goodargs) / scale) > File "/usr/lib64/python3.2/site-packages/scipy/stats/distributions.py", > line 4113, in _pdf > return exp(c*x-exp(x)-gamln(c)) > RuntimeWarning: overflow encountered in exp > > ====================================================================== > ERROR: > test_continuous_extra.test_cont_extra( object at 0x74ff610>, (1.8771398388773268,), 'lomax loc, scale test') > ---------------------------------------------------------------------- > Traceback (most recent call last): > File "/usr/lib/python3.2/site-packages/nose/case.py", line 198, in runTest > self.test(*self.arg) > File > "/usr/lib64/python3.2/site-packages/scipy/stats/tests/test_continuous_extra.py", > line 78, in check_loc_scale > m,v = distfn.stats(*arg) > File "/usr/lib64/python3.2/site-packages/scipy/stats/distributions.py", > line 1618, in stats > mu, mu2, g1, g2 = self._stats(*args) > File "/usr/lib64/python3.2/site-packages/scipy/stats/distributions.py", > line 4644, in _stats > mu, mu2, g1, g2 = pareto.stats(c, loc=-1.0, moments='mvsk') > File "/usr/lib64/python3.2/site-packages/scipy/stats/distributions.py", > line 1616, in stats > mu, mu2, g1, g2 = self._stats(*args,**{'moments':moments}) > File "/usr/lib64/python3.2/site-packages/scipy/stats/distributions.py", > line 4595, in _stats > vals = 2*(bt+1.0)*sqrt(b-2.0)/((b-3.0)*sqrt(b)) > RuntimeWarning: invalid value encountered in sqrt > > ====================================================================== > ERROR: > test_discrete_basic.test_discrete_extra( object at 0x6c9f510>, (30, 12, 6), 'hypergeom entropy nan test') > ---------------------------------------------------------------------- > Traceback (most recent call last): > File "/usr/lib/python3.2/site-packages/nose/case.py", line 198, in runTest > self.test(*self.arg) > File > "/usr/lib64/python3.2/site-packages/scipy/stats/tests/test_discrete_basic.py", > line 199, in check_entropy > ent = distfn.entropy(*arg) > File "/usr/lib64/python3.2/site-packages/scipy/stats/distributions.py", > line 6315, in entropy > place(output,cond0,self.vecentropy(*goodargs)) > File "/usr/lib64/python3.2/site-packages/numpy/lib/function_base.py", > line 1863, in __call__ > theout = self.thefunc(*newargs) > File "/usr/lib64/python3.2/site-packages/scipy/stats/distributions.py", > line 6669, in _entropy > lvals = where(vals==0.0,0.0,log(vals)) > RuntimeWarning: divide by zero encountered in log > > ====================================================================== > ERROR: > test_discrete_basic.test_discrete_extra( object at 0x6c9f510>, (21, 3, 12), 'hypergeom entropy nan test') > ---------------------------------------------------------------------- > Traceback (most recent call last): > File "/usr/lib/python3.2/site-packages/nose/case.py", line 198, in runTest > self.test(*self.arg) > File > "/usr/lib64/python3.2/site-packages/scipy/stats/tests/test_discrete_basic.py", > line 199, in check_entropy > ent = distfn.entropy(*arg) > File "/usr/lib64/python3.2/site-packages/scipy/stats/distributions.py", > line 6315, in entropy > place(output,cond0,self.vecentropy(*goodargs)) > File "/usr/lib64/python3.2/site-packages/numpy/lib/function_base.py", > line 1863, in __call__ > theout = self.thefunc(*newargs) > File "/usr/lib64/python3.2/site-packages/scipy/stats/distributions.py", > line 6669, in _entropy > lvals = where(vals==0.0,0.0,log(vals)) > RuntimeWarning: divide by zero encountered in log > > ====================================================================== > ERROR: test_fit (test_distributions.TestFitMethod) > ---------------------------------------------------------------------- > Traceback (most recent call last): > File > "/usr/lib64/python3.2/site-packages/scipy/stats/tests/test_distributions.py", > line 439, in test_fit > vals2 = distfunc.fit(res, optimizer='powell') > File "/usr/lib64/python3.2/site-packages/scipy/stats/distributions.py", > line 1875, in fit > vals = optimizer(func,x0,args=(ravel(data),),disp=0) > File "/usr/lib64/python3.2/site-packages/scipy/optimize/optimize.py", > line 1622, in fmin_powell > fval, x, direc1 = _linesearch_powell(func, x, direc1, tol=xtol*100) > File "/usr/lib64/python3.2/site-packages/scipy/optimize/optimize.py", > line 1492, in _linesearch_powell > alpha_min, fret, iter, num = brent(myfunc, full_output=1, tol=tol) > File "/usr/lib64/python3.2/site-packages/scipy/optimize/optimize.py", > line 1313, in brent > brent.optimize() > File "/usr/lib64/python3.2/site-packages/scipy/optimize/optimize.py", > line 1214, in optimize > tmp2 = (x-v)*(fx-fw) > RuntimeWarning: invalid value encountered in double_scalars > > ====================================================================== > ERROR: test_fix_fit (test_distributions.TestFitMethod) > ---------------------------------------------------------------------- > Traceback (most recent call last): > File > "/usr/lib64/python3.2/site-packages/scipy/stats/tests/test_distributions.py", > line 460, in test_fix_fit > vals2 = distfunc.fit(res,fscale=1) > File "/usr/lib64/python3.2/site-packages/scipy/stats/distributions.py", > line 1875, in fit > vals = optimizer(func,x0,args=(ravel(data),),disp=0) > File "/usr/lib64/python3.2/site-packages/scipy/optimize/optimize.py", > line 302, in fmin > and max(abs(fsim[0]-fsim[1:])) <= ftol): > RuntimeWarning: invalid value encountered in subtract > > ====================================================================== > ERROR: Failure: ImportError (cannot import name common_info) > ---------------------------------------------------------------------- > Traceback (most recent call last): > File "/usr/lib/python3.2/site-packages/nose/failure.py", line 37, in > runTest > raise self.exc_class(self.exc_val).with_traceback(self.tb) > File "/usr/lib/python3.2/site-packages/nose/loader.py", line 390, in > loadTestsFromName > addr.filename, addr.module) > File "/usr/lib/python3.2/site-packages/nose/importer.py", line 39, > in importFromPath > return self.importFromDir(dir_path, fqname) > File "/usr/lib/python3.2/site-packages/nose/importer.py", line 86, > in importFromDir > mod = load_module(part_fqname, fh, filename, desc) > File "/usr/lib64/python3.2/site-packages/scipy/weave/__init__.py", > line 26, in > from .inline_tools import inline > File "/usr/lib64/python3.2/site-packages/scipy/weave/inline_tools.py", > line 5, in > from . import ext_tools > File "/usr/lib64/python3.2/site-packages/scipy/weave/ext_tools.py", > line 7, in > from . import converters > File "/usr/lib64/python3.2/site-packages/scipy/weave/converters.py", > line 4, in > from . import common_info > ImportError: cannot import name common_info > > ====================================================================== > FAIL: test_mio.test_mat4_3d > ---------------------------------------------------------------------- > Traceback (most recent call last): > File "/usr/lib/python3.2/site-packages/nose/case.py", line 198, in runTest > self.test(*self.arg) > File > "/usr/lib64/python3.2/site-packages/scipy/io/matlab/tests/test_mio.py", > line 740, in test_mat4_3d > stream, {'a': arr}, True, '4') > File "/usr/lib64/python3.2/site-packages/numpy/testing/utils.py", > line 1008, in assert_raises > return nose.tools.assert_raises(*args,**kwargs) > AssertionError: DeprecationWarning not raised by > functools.partial(, oned_as='row') > > ====================================================================== > FAIL: Regression test for #651: better handling of badly conditioned > ---------------------------------------------------------------------- > Traceback (most recent call last): > File > "/usr/lib64/python3.2/site-packages/scipy/signal/tests/test_filter_design.py", > line 34, in test_bad_filter > assert_raises(BadCoefficients, tf2zpk, [1e-15], [1.0, 1.0]) > File "/usr/lib64/python3.2/site-packages/numpy/testing/utils.py", > line 1008, in assert_raises > return nose.tools.assert_raises(*args,**kwargs) > AssertionError: BadCoefficients not raised by tf2zpk > > ---------------------------------------------------------------------- > > The NumPy errors seem to be mostly rounding errors, but it seems to > round quite aggressively. How siginificant are these errors? > > .. Mads > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > -------------- next part -------------- An HTML attachment was scrubbed... URL: From dlaxalde at gmail.com Wed Feb 1 10:37:54 2012 From: dlaxalde at gmail.com (Denis Laxalde) Date: Wed, 1 Feb 2012 10:37:54 -0500 Subject: [SciPy-User] NumPy and SciPy test failures In-Reply-To: References: Message-ID: <20120201103754.63be48ec@mcgill.ca> Mads M. Hansen wrote: > I have built NumPy 1.6.1 and SciPy 0.10.0 for Python 3.2 on a Fedora > 16 system and I used gfortran, but when I run the tests I get the > following failures and errors It's probably not scipy 0.10.0. Could you specify the exact versions of you have installed (e.g. from the header displayed by scipy tests)? > And SciPy: > > ====================================================================== > ERROR: Failure: ImportError (cannot import name _minimize_neldermead) > ---------------------------------------------------------------------- > Traceback (most recent call last): > File "/usr/lib/python3.2/site-packages/nose/failure.py", line 37, in runTest > raise self.exc_class(self.exc_val).with_traceback(self.tb) > File "/usr/lib/python3.2/site-packages/nose/loader.py", line 390, in > loadTestsFromName > addr.filename, addr.module) > File "/usr/lib/python3.2/site-packages/nose/importer.py", line 39, > in importFromPath > return self.importFromDir(dir_path, fqname) > File "/usr/lib/python3.2/site-packages/nose/importer.py", line 86, > in importFromDir > mod = load_module(part_fqname, fh, filename, desc) > File "/usr/lib64/python3.2/site-packages/scipy/optimize/tests/test_anneal.py", > line 10, in > from scipy.optimize import anneal, minimize > File "/usr/lib64/python3.2/site-packages/scipy/optimize/minimize.py", > line 16, in > from .optimize import _minimize_neldermead, _minimize_powell, \ > ImportError: cannot import name _minimize_neldermead I was interested by this one but cannot reproduce it with current master on python 3.2.2. -- Denis From glen at toadhill.net Wed Feb 1 11:16:37 2012 From: glen at toadhill.net (glen at toadhill.net) Date: Wed, 1 Feb 2012 11:16:37 -0500 Subject: [SciPy-User] asarray_chkfinite Message-ID: <9394EDC3-D458-4697-86AC-75115C14AA62@toadhill.net> Hi all, I'm trying to optimize some code that entails a very large number of sparse matrix-vector and vctor-vector multiplies. Upon running the profiler I see that about 25% of my program's cumulative time is spent running asarray_chkfinite. I do not call this routine directly. Can anyone tell me what might be calling it and whether there is anything obvious I can do about it? Glen Glen Henshaw, PhD ? Roboticist ? U.S. Naval Research Laboratory office: 202-767-1196 ? google voice/mobile: 443-295-3050 glen.henshaw at nrl.navy.mil -------------- next part -------------- An HTML attachment was scrubbed... URL: From madsmh at gmail.com Wed Feb 1 11:17:43 2012 From: madsmh at gmail.com (Mads M. Hansen) Date: Wed, 1 Feb 2012 17:17:43 +0100 Subject: [SciPy-User] NumPy and SciPy test failures In-Reply-To: <20120201103754.63be48ec@mcgill.ca> References: <20120201103754.63be48ec@mcgill.ca> Message-ID: 2012/2/1 Denis Laxalde : > Mads M. Hansen wrote: >> I have built NumPy 1.6.1 and SciPy 0.10.0 for Python 3.2 on a Fedora >> 16 system and I used gfortran, but when I run the tests I get the >> following failures and errors > > It's probably not scipy 0.10.0. Could you specify the exact versions of > you have installed (e.g. from the header displayed by scipy tests)? > Here is the header >>> scipy.test('full') Running unit tests for scipy NumPy version 1.6.1 NumPy is installed in /usr/lib64/python3.2/site-packages/numpy SciPy version 0.10.0 SciPy is installed in /usr/lib64/python3.2/site-packages/scipy Python version 3.2.1 (default, Jul 11 2011, 18:54:42) [GCC 4.6.1 20110627 (Red Hat 4.6.1-1)] nose version 1.1.2 I checked out the v.0.10.0 tag from the Git repository. .. Mads From lafont.fabien at gmail.com Wed Feb 1 11:21:08 2012 From: lafont.fabien at gmail.com (Fabien Lafont) Date: Wed, 1 Feb 2012 17:21:08 +0100 Subject: [SciPy-User] [scipy-user] How to add a 1D np.array to another np.array? Message-ID: Hello everyone, I try to add an array to another (to build a 2D array). I try that a = np.zeros(2) b=np.array([2,4]) a[[0]] = b print a [2,0] And I want a= [[2,4],0] How can I do? Thx Fabien From alec.kalinin at gmail.com Wed Feb 1 11:34:12 2012 From: alec.kalinin at gmail.com (Alexander Kalinin) Date: Wed, 1 Feb 2012 20:34:12 +0400 Subject: [SciPy-User] Accumulation sum using indirect indexes In-Reply-To: References:

Message-ID: Yes, but for large data sets loops is quite slow. I have tried Pandas groupby.sum() and it works faster. 2012/2/1 Fr?d?ric Bastien > It will be slow, but you can make a python loop. > > Fred > On Jan 31, 2012 3:34 PM, "Alexander Kalinin" > wrote: > >> Hello! >> >> I use SciPy in computer graphics applications. My task is to calculate >> vertex normals by averaging faces normals. In other words I want to >> accumulate vectors with the same ids. For example, >> >> ids = numpy.array([0, 1, 1, 2]) >> n = numpy.array([ [0.1, 0.1, 0.1], [0.1, 0.1, 0.1], [0.1, 0.1, 0.1], >> [0.1, 0.1 0.1] ]) >> >> I need result: >> nv = ([ [0.1, 0.1, 0.1], [0.2, 0.2, 0.2], [0.1, 0.1, 0.1]]) >> >> The most simple code: >> nv[ids] += n >> does not work, I know about this. For 1D arrays I use numpy.bincount(...) >> function. But this function does not work for 2D arrays. >> >> So, my question. What is the best way calculate accumulation sum for 2D >> arrays using indirect indexes? >> >> Sincerely, >> Alexander >> >> _______________________________________________ >> SciPy-User mailing list >> SciPy-User at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-user >> >> > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From warren.weckesser at enthought.com Wed Feb 1 11:35:25 2012 From: warren.weckesser at enthought.com (Warren Weckesser) Date: Wed, 1 Feb 2012 10:35:25 -0600 Subject: [SciPy-User] [scipy-user] How to add a 1D np.array to another np.array? In-Reply-To: References: Message-ID: On Wed, Feb 1, 2012 at 10:21 AM, Fabien Lafont wrote: > Hello everyone, > > I try to add an array to another (to build a 2D array). > > I try that > > > > a = np.zeros(2) > b=np.array([2,4]) > a[[0]] = b > > print a > [2,0] > > And I want a= [[2,4],0] > > But that is not a 2D array. Do you want to "stack" b above the zeros? Perhaps something like this: In [7]: a = np.zeros(2) In [8]: b = np.array([2,4]) In [9]: c = np.vstack((b,a)) In [10]: c Out[10]: array([[ 2., 4.], [ 0., 0.]]) Warren -------------- next part -------------- An HTML attachment was scrubbed... URL: From gustavo.goretkin at gmail.com Wed Feb 1 11:36:38 2012 From: gustavo.goretkin at gmail.com (Gustavo Goretkin) Date: Wed, 1 Feb 2012 11:36:38 -0500 Subject: [SciPy-User] masked recarray, recarray with one field of type "ndarray" In-Reply-To: References:

Message-ID: Thanks for the help! Now is there any way to mask elements of a recarray? I should explain the application because I think I may be going about this the wrong way: I'll be building a tree and each node will have some attributes (for example, a matrix). I often have to iterate through every node of the tree and do a calculation -- something that I could do in a vectorized way with NumPy if all the attributes were stored in an array. So I thought I could represent the tree as a recarray (that I'd occasionally need to grow). I'd also need to delete nodes from the tree occasionally. I'd accomplish this by masking entries of the recarray. When I needed to add a node to the tree, I'd try to populate a masked entry before going to the end of the array. On Tue, Jan 31, 2012 at 9:33 AM, Warren Weckesser wrote: > > > On Tue, Jan 31, 2012 at 2:36 AM, Gustavo Goretkin > wrote: >> >> Does a recarray support masking? >> >> Can I have a recarray where one of the fields is an M-by-N ndarray >> (not recarray) of some dtype? >> ex: a = np.recarray(shape=(10),formats=['i4','f8','3-by-3 ndarray of >> dtype=float64']) > > > > Here's how it can be done with the dtype argument (in this case, the > "sub-arrays" are 3x5 float32): > > In [21]: dt = np.dtype([('id', int32), ('values', float32, (3,5))]) > > In [22]: a = np.recarray(shape=(3,), dtype=dt) > > In [23]: a.id > Out[23]: array([????? 7, 2345536, 8585218]) > > In [24]: a[0].id > Out[24]: 7 > > In [25]: a[0].values > Out[25]: > array([[? 9.80908925e-45,?? 2.15997513e-37,?? 3.16079124e-39, > ????????? 1.18408375e-38,?? 2.81552923e-38], > ?????? [? 2.13004362e-37,? -7.69011974e-02,?? 9.80908925e-45, > ????????? 9.80908925e-45,?? 3.62636667e-21], > ?????? [? 5.67059093e-24,?? 5.67095065e-24,?? 5.64768872e-24, > ????????? 7.86448908e+11,?? 0.00000000e+00]], dtype=float32) > > In [26]: a[0].values.shape > Out[26]: (3, 5) > > > Warren > > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > From ralf.gommers at googlemail.com Wed Feb 1 12:12:49 2012 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Wed, 1 Feb 2012 18:12:49 +0100 Subject: [SciPy-User] NumPy and SciPy test failures In-Reply-To: References: <20120201103754.63be48ec@mcgill.ca> Message-ID: On Wed, Feb 1, 2012 at 5:17 PM, Mads M. Hansen wrote: > 2012/2/1 Denis Laxalde : > > Mads M. Hansen wrote: > >> I have built NumPy 1.6.1 and SciPy 0.10.0 for Python 3.2 on a Fedora > >> 16 system and I used gfortran, but when I run the tests I get the > >> following failures and errors > > > > It's probably not scipy 0.10.0. Could you specify the exact versions of > > you have installed (e.g. from the header displayed by scipy tests)? > > > Here is the header > > >>> scipy.test('full') > Running unit tests for scipy > NumPy version 1.6.1 > NumPy is installed in /usr/lib64/python3.2/site-packages/numpy > SciPy version 0.10.0 > SciPy is installed in /usr/lib64/python3.2/site-packages/scipy > Python version 3.2.1 (default, Jul 11 2011, 18:54:42) [GCC 4.6.1 > 20110627 (Red Hat 4.6.1-1)] > nose version 1.1.2 > > I checked out the v.0.10.0 tag from the Git repository. > > The reason you're seeing the _minimize_neldermead failure, and probably some others, is likely that you didn't clean the install dir before installing 0.10.0. That test was only added after 0.10.0 came out. Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at googlemail.com Wed Feb 1 12:47:52 2012 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Wed, 1 Feb 2012 18:47:52 +0100 Subject: [SciPy-User] asarray_chkfinite In-Reply-To: <9394EDC3-D458-4697-86AC-75115C14AA62@toadhill.net> References: <9394EDC3-D458-4697-86AC-75115C14AA62@toadhill.net> Message-ID: On Wed, Feb 1, 2012 at 5:16 PM, glen at toadhill.net wrote: > Hi all, > > I'm trying to optimize some code that entails a very large number of > sparse matrix-vector and vctor-vector multiplies. Upon running the > profiler I see that about 25% of my program's cumulative time is spent > running asarray_chkfinite. I do not call this routine directly. Can > anyone tell me what might be calling it and whether there is anything > obvious I can do about it? > It is called by many routines in order to check input arrays for bad data (inf/nan) that can cause crashes. A proposed change to allow disabling these checks is at https://github.com/scipy/scipy/pull/48. Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From madsmh at gmail.com Wed Feb 1 14:05:57 2012 From: madsmh at gmail.com (Mads M. Hansen) Date: Wed, 1 Feb 2012 20:05:57 +0100 Subject: [SciPy-User] NumPy and SciPy test failures In-Reply-To: References: <20120201103754.63be48ec@mcgill.ca> Message-ID: 2012/2/1 Ralf Gommers : > > > On Wed, Feb 1, 2012 at 5:17 PM, Mads M. Hansen wrote: >> >> 2012/2/1 Denis Laxalde : >> > Mads M. Hansen wrote: >> >> I have built NumPy 1.6.1 and SciPy 0.10.0 for Python 3.2 on a Fedora >> >> 16 system and I used gfortran, but when I run the tests I get the >> >> following failures and errors >> > >> > It's probably not scipy 0.10.0. Could you specify the exact versions of >> > you have installed (e.g. from the header displayed by scipy tests)? >> > >> Here is the header >> >> >>> scipy.test('full') >> Running unit tests for scipy >> NumPy version 1.6.1 >> NumPy is installed in /usr/lib64/python3.2/site-packages/numpy >> SciPy version 0.10.0 >> SciPy is installed in /usr/lib64/python3.2/site-packages/scipy >> Python version 3.2.1 (default, Jul 11 2011, 18:54:42) [GCC 4.6.1 >> 20110627 (Red Hat 4.6.1-1)] >> nose version 1.1.2 >> >> I checked out the v.0.10.0 tag from the Git repository. >> > The reason you're seeing the _minimize_neldermead failure, and probably some > others, is likely that you didn't clean the install dir before installing > 0.10.0. That test was only added after 0.10.0 came out. > > Ralf Hm that's odd, I delete /usr/lib64/python3.2/site-packages/scipy (and numpy/) prior to each reinstall - is scipy installed in other places by defaultr? From pav at iki.fi Wed Feb 1 14:17:08 2012 From: pav at iki.fi (Pauli Virtanen) Date: Wed, 01 Feb 2012 20:17:08 +0100 Subject: [SciPy-User] NumPy and SciPy test failures In-Reply-To: References: <20120201103754.63be48ec@mcgill.ca> Message-ID: 01.02.2012 20:05, Mads M. Hansen kirjoitti: > Hm that's odd, I delete /usr/lib64/python3.2/site-packages/scipy (and > numpy/) prior to each reinstall - is scipy installed in > other places by defaultr? You'll also need to delete the build/ directory. -- Pauli Virtanen From madsmh at gmail.com Wed Feb 1 14:41:28 2012 From: madsmh at gmail.com (Mads M. Hansen) Date: Wed, 1 Feb 2012 20:41:28 +0100 Subject: [SciPy-User] NumPy and SciPy test failures In-Reply-To: References: <20120201103754.63be48ec@mcgill.ca> Message-ID: Thanks, properly cleaning before building brought the errors down to one which follows, ====================================================================== ERROR: Failure: AttributeError ('module' object has no attribute 'FileType') ---------------------------------------------------------------------- Traceback (most recent call last): File "/usr/lib/python3.2/site-packages/nose/failure.py", line 37, in runTest raise self.exc_class(self.exc_val).with_traceback(self.tb) File "/usr/lib/python3.2/site-packages/nose/loader.py", line 390, in loadTestsFromName addr.filename, addr.module) File "/usr/lib/python3.2/site-packages/nose/importer.py", line 39, in importFromPath return self.importFromDir(dir_path, fqname) File "/usr/lib/python3.2/site-packages/nose/importer.py", line 86, in importFromDir mod = load_module(part_fqname, fh, filename, desc) File "/usr/lib64/python3.2/site-packages/scipy/weave/__init__.py", line 22, in from .blitz_tools import blitz File "/usr/lib64/python3.2/site-packages/scipy/weave/blitz_tools.py", line 6, in from . import converters File "/usr/lib64/python3.2/site-packages/scipy/weave/converters.py", line 19, in c_spec.file_converter(), File "/usr/lib64/python3.2/site-packages/scipy/weave/c_spec.py", line 74, in __init__ self.init_info() File "/usr/lib64/python3.2/site-packages/scipy/weave/c_spec.py", line 264, in init_info self.matching_types = [types.FileType] AttributeError: 'module' object has no attribute 'FileType' ---------------------------------------------------------------------- 2012/2/1 Pauli Virtanen : > 01.02.2012 20:05, Mads M. Hansen kirjoitti: >> Hm that's odd, I delete /usr/lib64/python3.2/site-packages/scipy (and >> numpy/) prior to each reinstall - is scipy installed in >> other places by defaultr? > > You'll also need to delete the build/ directory. > > -- > Pauli Virtanen > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user From ralf.gommers at googlemail.com Wed Feb 1 16:29:34 2012 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Wed, 1 Feb 2012 22:29:34 +0100 Subject: [SciPy-User] NumPy and SciPy test failures In-Reply-To: References: <20120201103754.63be48ec@mcgill.ca> Message-ID: On Wed, Feb 1, 2012 at 8:41 PM, Mads M. Hansen wrote: > Thanks, properly cleaning before building brought the errors down to > one which follows, > > ====================================================================== > ERROR: Failure: AttributeError ('module' object has no attribute > 'FileType') > ---------------------------------------------------------------------- > Traceback (most recent call last): > File "/usr/lib/python3.2/site-packages/nose/failure.py", line 37, in > runTest > raise self.exc_class(self.exc_val).with_traceback(self.tb) > File "/usr/lib/python3.2/site-packages/nose/loader.py", line 390, in > loadTestsFromName > addr.filename, addr.module) > File "/usr/lib/python3.2/site-packages/nose/importer.py", line 39, > in importFromPath > return self.importFromDir(dir_path, fqname) > File "/usr/lib/python3.2/site-packages/nose/importer.py", line 86, > in importFromDir > mod = load_module(part_fqname, fh, filename, desc) > File "/usr/lib64/python3.2/site-packages/scipy/weave/__init__.py", > line 22, in > from .blitz_tools import blitz > File "/usr/lib64/python3.2/site-packages/scipy/weave/blitz_tools.py", > line 6, in > from . import converters > File "/usr/lib64/python3.2/site-packages/scipy/weave/converters.py", > line 19, in > c_spec.file_converter(), > File "/usr/lib64/python3.2/site-packages/scipy/weave/c_spec.py", > line 74, in __init__ > self.init_info() > File "/usr/lib64/python3.2/site-packages/scipy/weave/c_spec.py", > line 264, in init_info > self.matching_types = [types.FileType] > AttributeError: 'module' object has no attribute 'FileType' > > ---------------------------------------------------------------------- > > This failure is caused by the weave module not being py3k compatible. Is known. Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From lafont.fabien at gmail.com Thu Feb 2 03:34:42 2012 From: lafont.fabien at gmail.com (Fabien Lafont) Date: Thu, 2 Feb 2012 09:34:42 +0100 Subject: [SciPy-User] [scipy-user] How to add a 1D np.array to another np.array? In-Reply-To: References:

Message-ID: In fact I would prefer, replace the first 0 by [2,4] but it still interesting, thx! Fabien 2012/2/1 Warren Weckesser : > > > On Wed, Feb 1, 2012 at 10:21 AM, Fabien Lafont > wrote: >> >> Hello everyone, >> >> I try to add an array to another (to build a 2D array). >> >> I try that >> >> >> >> a = np.zeros(2) >> b=np.array([2,4]) >> a[[0]] = b >> >> print a >> [2,0] >> >> And I want a= [[2,4],0] >> > > But that is not a 2D array.? Do you want to "stack" b above the zeros? > Perhaps something like this: > > In [7]: a = np.zeros(2) > > In [8]: b = np.array([2,4]) > > In [9]: c = np.vstack((b,a)) > > In [10]: c > Out[10]: > array([[ 2.,? 4.], > ?????? [ 0.,? 0.]]) > > > Warren > > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > From scipy at samueljohn.de Thu Feb 2 04:58:01 2012 From: scipy at samueljohn.de (Samuel John) Date: Thu, 2 Feb 2012 10:58:01 +0100 Subject: [SciPy-User] [scipy-user] How to add a 1D np.array to another np.array? In-Reply-To: References:

Message-ID: Hi Fabien! On 02.02.2012, at 09:34, Fabien Lafont wrote: > In fact I would prefer, replace the first 0 by [2,4] but it still > interesting, thx! ...that is only possible with lists of lists. Arrays have to have the same dimension for each entry. bests, Samuel From lamblinp at iro.umontreal.ca Thu Feb 2 11:52:44 2012 From: lamblinp at iro.umontreal.ca (Pascal Lamblin) Date: Thu, 2 Feb 2012 17:52:44 +0100 Subject: [SciPy-User] Indexing sparse matrices with step Message-ID: <20120202165243.GA1808@bob.blip.be> Hi everybody, I've noticed that if I have a scipy.sparse matrix (csr or csc), and I try to index into it with slices, the "step" component of my slice seems to be silently ignored. Is it an expected behaviour? I would have expected an error saying only a step of None (or not providing a step at all) is supported. Here is a small test case: import numpy, scipy.sparse sm = scipy.sparse.csc_matrix([[1, 0, 0], [0, 0, 0], [0, 0, 0], [0, 1, 0]]) # True, expected numpy.all(sm[:1,:].toarray() == sm.toarray()[:1,:]) # False, expected numpy.all(sm.toarray()[:1,:] == sm.toarray()[:1:-1,:]) # True, unexpected numpy.all(sm[:1:-1,:].toarray() == sm[:1,:].toarray()) # False, unexpected numpy.all(sm[:1:-1,:].toarray() == sm.toarray()[:1:-1,:]) Thanks in advance, -- Pascal From warren.weckesser at enthought.com Thu Feb 2 13:16:20 2012 From: warren.weckesser at enthought.com (Warren Weckesser) Date: Thu, 2 Feb 2012 12:16:20 -0600 Subject: [SciPy-User] Accumulation sum using indirect indexes In-Reply-To: References:

Message-ID: On Wed, Feb 1, 2012 at 10:34 AM, Alexander Kalinin wrote: > Yes, but for large data sets loops is quite slow. I have tried Pandas > groupby.sum() and it works faster. > > Pandas is probably the correct tool to use for this, but it will be nice when numpy has a native "group-by" capability. For what its worth (had to scratch the itch, so to speak), the attached script provides a "pure numpy" implementation without a python loop. The output of the script is In [53]: run pseudo_group_by.py Label Data 20 [1 2 3] 20 [1 2 4] 10 [3 3 1] 0 [5 0 0] 20 [1 9 0] 10 [2 3 4] 20 [9 9 1] Label Num. Sum 0 1 [5 0 0] 10 2 [5 6 5] 20 4 [12 22 8] A drawback of the method is that it will make a reordered copy of the data. I haven't compared the performance to pandas. Warren > > 2012/2/1 Fr?d?ric Bastien > >> It will be slow, but you can make a python loop. >> >> Fred >> On Jan 31, 2012 3:34 PM, "Alexander Kalinin" >> wrote: >> >>> Hello! >>> >>> I use SciPy in computer graphics applications. My task is to calculate >>> vertex normals by averaging faces normals. In other words I want to >>> accumulate vectors with the same ids. For example, >>> >>> ids = numpy.array([0, 1, 1, 2]) >>> n = numpy.array([ [0.1, 0.1, 0.1], [0.1, 0.1, 0.1], [0.1, 0.1, 0.1], >>> [0.1, 0.1 0.1] ]) >>> >>> I need result: >>> nv = ([ [0.1, 0.1, 0.1], [0.2, 0.2, 0.2], [0.1, 0.1, 0.1]]) >>> >>> The most simple code: >>> nv[ids] += n >>> does not work, I know about this. For 1D arrays I use >>> numpy.bincount(...) function. But this function does not work for 2D arrays. >>> >>> So, my question. What is the best way calculate accumulation sum for 2D >>> arrays using indirect indexes? >>> >>> Sincerely, >>> Alexander >>> >>> _______________________________________________ >>> SciPy-User mailing list >>> SciPy-User at scipy.org >>> http://mail.scipy.org/mailman/listinfo/scipy-user >>> >>> >> _______________________________________________ >> SciPy-User mailing list >> SciPy-User at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-user >> >> > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: pseudo_group_by.py Type: application/octet-stream Size: 1455 bytes Desc: not available URL: From josef.pktd at gmail.com Thu Feb 2 14:01:10 2012 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Thu, 2 Feb 2012 14:01:10 -0500 Subject: [SciPy-User] Accumulation sum using indirect indexes In-Reply-To: References:

Message-ID: On Thu, Feb 2, 2012 at 1:16 PM, Warren Weckesser wrote: > > > On Wed, Feb 1, 2012 at 10:34 AM, Alexander Kalinin > wrote: >> >> Yes, but for large data sets loops is quite slow. I have tried Pandas >> groupby.sum() and it works faster. >> > > > Pandas is probably the correct tool to use for this, but it will be nice > when numpy has a native "group-by" capability. > > For what its worth (had to scratch the itch, so to speak), the attached > script provides a "pure numpy" implementation without a python loop.? The > output of the script is > > In [53]: run pseudo_group_by.py > Label?? Data > ?20??? [1 2 3] > ?20??? [1 2 4] > ?10??? [3 3 1] > ? 0??? [5 0 0] > ?20??? [1 9 0] > ?10??? [2 3 4] > ?20??? [9 9 1] > > Label? Num.?? Sum > ? 0???? 1?? [5 0 0] > ?10???? 2?? [5 6 5] > ?20???? 4?? [12 22? 8] > > > A drawback of the method is that it will make a reordered copy of the data. > I haven't compared the performance to pandas. nice use of reduceat, I found it recently in an example but haven't used it yet. It looks convenient if labels are presorted and numeric. Josef > > Warren > > >> >> >> 2012/2/1 Fr?d?ric Bastien >>> >>> It will be slow, but you can make a python loop. >>> >>> Fred >>> >>> On Jan 31, 2012 3:34 PM, "Alexander Kalinin" >>> wrote: >>>> >>>> Hello! >>>> >>>> I use SciPy in computer graphics applications. My task is to calculate >>>> vertex normals by averaging faces normals. In other words I want to >>>> accumulate vectors with the same ids. For example, >>>> >>>> ids = numpy.array([0, 1, 1, 2]) >>>> n = numpy.array([ [0.1, 0.1, 0.1], [0.1, 0.1, 0.1], [0.1, 0.1, 0.1], >>>> [0.1, 0.1 0.1] ]) >>>> >>>> I need result: >>>> nv = ([ [0.1, 0.1, 0.1], [0.2, 0.2, 0.2], [0.1, 0.1, 0.1]]) >>>> >>>> The most simple code: >>>> nv[ids] += n >>>> does not work, I know about this. For 1D arrays I use >>>> numpy.bincount(...) function. But this function does not work for 2D arrays. >>>> >>>> So, my question. What is the best way calculate accumulation sum for 2D >>>> arrays using indirect indexes? >>>> >>>> Sincerely, >>>> Alexander >>>> >>>> _______________________________________________ >>>> SciPy-User mailing list >>>> SciPy-User at scipy.org >>>> http://mail.scipy.org/mailman/listinfo/scipy-user >>>> >>> >>> _______________________________________________ >>> SciPy-User mailing list >>> SciPy-User at scipy.org >>> http://mail.scipy.org/mailman/listinfo/scipy-user >>> >> >> >> _______________________________________________ >> SciPy-User mailing list >> SciPy-User at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-user >> > > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > From travis at continuum.io Thu Feb 2 14:11:55 2012 From: travis at continuum.io (Travis Oliphant) Date: Thu, 2 Feb 2012 13:11:55 -0600 Subject: [SciPy-User] Accumulation sum using indirect indexes In-Reply-To: References:

Message-ID: On Feb 2, 2012, at 1:01 PM, josef.pktd at gmail.com wrote: > On Thu, Feb 2, 2012 at 1:16 PM, Warren Weckesser > wrote: >> >> >> On Wed, Feb 1, 2012 at 10:34 AM, Alexander Kalinin >> wrote: >>> >>> Yes, but for large data sets loops is quite slow. I have tried Pandas >>> groupby.sum() and it works faster. >>> >> >> >> Pandas is probably the correct tool to use for this, but it will be nice >> when numpy has a native "group-by" capability. >> >> For what its worth (had to scratch the itch, so to speak), the attached >> script provides a "pure numpy" implementation without a python loop. The >> output of the script is >> >> In [53]: run pseudo_group_by.py >> Label Data >> 20 [1 2 3] >> 20 [1 2 4] >> 10 [3 3 1] >> 0 [5 0 0] >> 20 [1 9 0] >> 10 [2 3 4] >> 20 [9 9 1] >> >> Label Num. Sum >> 0 1 [5 0 0] >> 10 2 [5 6 5] >> 20 4 [12 22 8] >> >> >> A drawback of the method is that it will make a reordered copy of the data. >> I haven't compared the performance to pandas. > > nice use of reduceat, I found it recently in an example but haven't used it yet. > It looks convenient if labels are presorted and numeric. Reduceat is pretty convenient, but it's limited right now because you have to have contiguous fence-posts for your reductions. There is a NEP with the group-by nep to make a reduce that takes in arbitrary index-ranges for reductions. -Travis > > Josef > >> >> Warren >> >> >>> >>> >>> 2012/2/1 Fr?d?ric Bastien >>>> >>>> It will be slow, but you can make a python loop. >>>> >>>> Fred >>>> >>>> On Jan 31, 2012 3:34 PM, "Alexander Kalinin" >>>> wrote: >>>>> >>>>> Hello! >>>>> >>>>> I use SciPy in computer graphics applications. My task is to calculate >>>>> vertex normals by averaging faces normals. In other words I want to >>>>> accumulate vectors with the same ids. For example, >>>>> >>>>> ids = numpy.array([0, 1, 1, 2]) >>>>> n = numpy.array([ [0.1, 0.1, 0.1], [0.1, 0.1, 0.1], [0.1, 0.1, 0.1], >>>>> [0.1, 0.1 0.1] ]) >>>>> >>>>> I need result: >>>>> nv = ([ [0.1, 0.1, 0.1], [0.2, 0.2, 0.2], [0.1, 0.1, 0.1]]) >>>>> >>>>> The most simple code: >>>>> nv[ids] += n >>>>> does not work, I know about this. For 1D arrays I use >>>>> numpy.bincount(...) function. But this function does not work for 2D arrays. >>>>> >>>>> So, my question. What is the best way calculate accumulation sum for 2D >>>>> arrays using indirect indexes? >>>>> >>>>> Sincerely, >>>>> Alexander >>>>> >>>>> _______________________________________________ >>>>> SciPy-User mailing list >>>>> SciPy-User at scipy.org >>>>> http://mail.scipy.org/mailman/listinfo/scipy-user >>>>> >>>> >>>> _______________________________________________ >>>> SciPy-User mailing list >>>> SciPy-User at scipy.org >>>> http://mail.scipy.org/mailman/listinfo/scipy-user >>>> >>> >>> >>> _______________________________________________ >>> SciPy-User mailing list >>> SciPy-User at scipy.org >>> http://mail.scipy.org/mailman/listinfo/scipy-user >>> >> >> >> _______________________________________________ >> SciPy-User mailing list >> SciPy-User at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-user >> > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user From josef.pktd at gmail.com Thu Feb 2 14:29:59 2012 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Thu, 2 Feb 2012 14:29:59 -0500 Subject: [SciPy-User] Accumulation sum using indirect indexes In-Reply-To: References:

Message-ID: On Thu, Feb 2, 2012 at 2:11 PM, Travis Oliphant wrote: > > On Feb 2, 2012, at 1:01 PM, josef.pktd at gmail.com wrote: > >> On Thu, Feb 2, 2012 at 1:16 PM, Warren Weckesser >> wrote: >>> >>> >>> On Wed, Feb 1, 2012 at 10:34 AM, Alexander Kalinin >>> wrote: >>>> >>>> Yes, but for large data sets loops is quite slow. I have tried Pandas >>>> groupby.sum() and it works faster. >>>> >>> >>> >>> Pandas is probably the correct tool to use for this, but it will be nice >>> when numpy has a native "group-by" capability. >>> >>> For what its worth (had to scratch the itch, so to speak), the attached >>> script provides a "pure numpy" implementation without a python loop. ?The >>> output of the script is >>> >>> In [53]: run pseudo_group_by.py >>> Label ? Data >>> ?20 ? ?[1 2 3] >>> ?20 ? ?[1 2 4] >>> ?10 ? ?[3 3 1] >>> ? 0 ? ?[5 0 0] >>> ?20 ? ?[1 9 0] >>> ?10 ? ?[2 3 4] >>> ?20 ? ?[9 9 1] >>> >>> Label ?Num. ? Sum >>> ? 0 ? ? 1 ? [5 0 0] >>> ?10 ? ? 2 ? [5 6 5] >>> ?20 ? ? 4 ? [12 22 ?8] >>> >>> >>> A drawback of the method is that it will make a reordered copy of the data. >>> I haven't compared the performance to pandas. >> >> nice use of reduceat, I found it recently in an example but haven't used it yet. >> It looks convenient if labels are presorted and numeric. > > Reduceat is pretty convenient, but it's limited right now because you have to have contiguous fence-posts for your reductions. ? There is a NEP with the group-by nep to make a reduce that takes in arbitrary index-ranges for reductions. I have been looking forward for the group-by for a long time, but I would also be happy with a bincount that takes a 2d or nd weights matrix. Josef > > -Travis > > >> >> Josef >> >>> >>> Warren >>> >>> >>>> >>>> >>>> 2012/2/1 Fr?d?ric Bastien >>>>> >>>>> It will be slow, but you can make a python loop. >>>>> >>>>> Fred >>>>> >>>>> On Jan 31, 2012 3:34 PM, "Alexander Kalinin" >>>>> wrote: >>>>>> >>>>>> Hello! >>>>>> >>>>>> I use SciPy in computer graphics applications. My task is to calculate >>>>>> vertex normals by averaging faces normals. In other words I want to >>>>>> accumulate vectors with the same ids. For example, >>>>>> >>>>>> ids = numpy.array([0, 1, 1, 2]) >>>>>> n = numpy.array([ [0.1, 0.1, 0.1], [0.1, 0.1, 0.1], [0.1, 0.1, 0.1], >>>>>> [0.1, 0.1 0.1] ]) >>>>>> >>>>>> I need result: >>>>>> nv = ([ [0.1, 0.1, 0.1], [0.2, 0.2, 0.2], [0.1, 0.1, 0.1]]) >>>>>> >>>>>> The most simple code: >>>>>> nv[ids] += n >>>>>> does not work, I know about this. For 1D arrays I use >>>>>> numpy.bincount(...) function. But this function does not work for 2D arrays. >>>>>> >>>>>> So, my question. What is the best way calculate accumulation sum for 2D >>>>>> arrays using indirect indexes? >>>>>> >>>>>> Sincerely, >>>>>> Alexander >>>>>> >>>>>> _______________________________________________ >>>>>> SciPy-User mailing list >>>>>> SciPy-User at scipy.org >>>>>> http://mail.scipy.org/mailman/listinfo/scipy-user >>>>>> >>>>> >>>>> _______________________________________________ >>>>> SciPy-User mailing list >>>>> SciPy-User at scipy.org >>>>> http://mail.scipy.org/mailman/listinfo/scipy-user >>>>> >>>> >>>> >>>> _______________________________________________ >>>> SciPy-User mailing list >>>> SciPy-User at scipy.org >>>> http://mail.scipy.org/mailman/listinfo/scipy-user >>>> >>> >>> >>> _______________________________________________ >>> SciPy-User mailing list >>> SciPy-User at scipy.org >>> http://mail.scipy.org/mailman/listinfo/scipy-user >>> >> _______________________________________________ >> SciPy-User mailing list >> SciPy-User at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-user > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user From denis.laxalde at mcgill.ca Wed Feb 1 11:34:20 2012 From: denis.laxalde at mcgill.ca (Denis Laxalde) Date: Wed, 1 Feb 2012 11:34:20 -0500 Subject: [SciPy-User] NumPy and SciPy test failures In-Reply-To: References: <20120201103754.63be48ec@mcgill.ca> Message-ID: <20120201113420.62d6d3d4@mcgill.ca> Mads M. Hansen wrote: > I checked out the v.0.10.0 tag from the Git repository. But this file (quoting the first error in scipy's test from your original message): > > File "/usr/lib64/python3.2/site-packages/scipy/optimize/tests/test_anneal.py" is not in 0.10.0. See Maybe your local repository was not clean when you built the package? -- Denis From warren.weckesser at enthought.com Thu Feb 2 21:46:42 2012 From: warren.weckesser at enthought.com (Warren Weckesser) Date: Thu, 2 Feb 2012 20:46:42 -0600 Subject: [SciPy-User] Indexing sparse matrices with step In-Reply-To: <20120202165243.GA1808@bob.blip.be> References: <20120202165243.GA1808@bob.blip.be> Message-ID: On Thu, Feb 2, 2012 at 10:52 AM, Pascal Lamblin wrote: > Hi everybody, > > I've noticed that if I have a scipy.sparse matrix (csr or csc), and I > try to index into it with slices, the "step" component of my slice seems > to be silently ignored. > > Is it an expected behaviour? I would have expected an error saying only > a step of None (or not providing a step at all) is supported. > > Here is a small test case: > > import numpy, scipy.sparse > > sm = scipy.sparse.csc_matrix([[1, 0, 0], [0, 0, 0], [0, 0, 0], [0, 1, 0]]) > > # True, expected > numpy.all(sm[:1,:].toarray() == sm.toarray()[:1,:]) > > # False, expected > numpy.all(sm.toarray()[:1,:] == sm.toarray()[:1:-1,:]) > > # True, unexpected > numpy.all(sm[:1:-1,:].toarray() == sm[:1,:].toarray()) > > # False, unexpected > numpy.all(sm[:1:-1,:].toarray() == sm.toarray()[:1:-1,:]) > > Looks like a bug. I've created a ticket: http://projects.scipy.org/scipy/ticket/1592 Thanks for reporting the problem. Warren -------------- next part -------------- An HTML attachment was scrubbed... URL: From lafont.fabien at gmail.com Fri Feb 3 05:48:50 2012 From: lafont.fabien at gmail.com (Fabien Lafont) Date: Fri, 3 Feb 2012 11:48:50 +0100 Subject: [SciPy-User] [scipy-user] How to apply a condition on some specific values of an array Message-ID: I think my title is not very clear, but I don't know how to formulate it... I'm just starting to use numpy and I have still Python's reflexes so I want to know how can I do the following code using Numpy "style". for i in range(0,len(array)+1): if 10 References: Message-ID: On 3 February 2012 12:48, Fabien Lafont wrote: > I'm just starting to use numpy and I have still Python's reflexes so I > want to know how can I do the following code using Numpy "style". > > for i in range(0,len(array)+1): > ? ? ? if 10 ? ? ? ? ? ?new_array = array[i]*1000 > > In other words is it possible to "scan" the values of an array and > apply a "modification" to it if the condition is true Yes - you can use fancy indexing (see http://docs.scipy.org/doc/numpy/user/basics.indexing.html) In[1]: import numpy as np In[2]: arr = np.arange(10) In[3]: arr Out[3]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]) In[4]: arr[(2 < arr) & (arr < 9)] *= 1000 In[5]: arr Out[5]: array([ 0, 1, 2, 3000, 4000, 5000, 6000, 7000, 8000, 9]) Cheers, Scott From madsmh at gmail.com Fri Feb 3 07:13:28 2012 From: madsmh at gmail.com (Mads M. Hansen) Date: Fri, 3 Feb 2012 13:13:28 +0100 Subject: [SciPy-User] NumPy and SciPy test failures In-Reply-To: <20120201113420.62d6d3d4@mcgill.ca> References: <20120201103754.63be48ec@mcgill.ca> <20120201113420.62d6d3d4@mcgill.ca> Message-ID: 2012/2/1 Denis Laxalde : > Mads M. Hansen wrote: >> I checked out the v.0.10.0 tag from the Git repository. > > But this file (quoting the first error in scipy's test from your > original message): >> > ? File "/usr/lib64/python3.2/site-packages/scipy/optimize/tests/test_anneal.py" > > is not in 0.10.0. See > > > Maybe your local repository was not clean when you built the package? Hi Dennis - that was indeed the case - I forgot to run git clean -xdf in the repo. .. Mads From lafont.fabien at gmail.com Fri Feb 3 08:19:39 2012 From: lafont.fabien at gmail.com (Fabien Lafont) Date: Fri, 3 Feb 2012 14:19:39 +0100 Subject: [SciPy-User] [scipy-user] How to apply a condition on some specific values of an array In-Reply-To: References: Message-ID: thx Scott! 2012/2/3 Scott Sinclair : > On 3 February 2012 12:48, Fabien Lafont wrote: >> I'm just starting to use numpy and I have still Python's reflexes so I >> want to know how can I do the following code using Numpy "style". >> >> for i in range(0,len(array)+1): >> ? ? ? if 10> ? ? ? ? ? ?new_array = array[i]*1000 >> >> In other words is it possible to "scan" the values of an array and >> apply a "modification" to it if the condition is true > > Yes - you can use fancy indexing (see > http://docs.scipy.org/doc/numpy/user/basics.indexing.html) > > In[1]: import numpy as np > > In[2]: arr = np.arange(10) > In[3]: arr > Out[3]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]) > > In[4]: arr[(2 < arr) & (arr < 9)] *= 1000 > In[5]: arr > Out[5]: array([ ? 0, ? ?1, ? ?2, 3000, 4000, 5000, 6000, 7000, 8000, ? ?9]) > > Cheers, > Scott > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user From dg.gmane at thesamovar.net Sat Feb 4 00:03:04 2012 From: dg.gmane at thesamovar.net (Dan Goodman) Date: Sat, 04 Feb 2012 06:03:04 +0100 Subject: [SciPy-User] Accumulation sum using indirect indexes In-Reply-To: References: Message-ID: For the project I'm working on we have quite a specific case of this to handle, where we have (1) generally have few repeats of the ids, (2) arbitrary operations to be applied, not just addition. I've just been working on an optimised numpy-only solution to this and it might be of interest to others. It works particularly well with few repeats, but I think it's no slower than other solutions if there are many repeats, at least until it gets to be mostly repeats at which points doing a simple loop is faster. For the case of just addition (the case below), a method using sorting and reduceat is probably quicker (I didn't do a comparison), but I thought it might be useful for many people to have an efficient solution for the general case. And if anyone knows a better one, I'd be very interested! It's still far from close to ideal, for the typical case it's about 10-20x slower than doing it with C++ (I used weave to test it), but also about 10-20x faster than doing it with a loop. I've attached the code (function apply_batch, the others are for comparison). If anyone's interested I can comment on the code, but it's basically the trick used by unique(), sorting the indices and comparing adjacent ones. Dan On 31/01/2012 21:34, Alexander Kalinin wrote: > Hello! > > I use SciPy in computer graphics applications. My task is to calculate > vertex normals by averaging faces normals. In other words I want to > accumulate vectors with the same ids. For example, > > ids = numpy.array([0, 1, 1, 2]) > n = numpy.array([ [0.1, 0.1, 0.1], [0.1, 0.1, 0.1], [0.1, 0.1, 0.1], > [0.1, 0.1 0.1] ]) > > I need result: > nv = ([ [0.1, 0.1, 0.1], [0.2, 0.2, 0.2], [0.1, 0.1, 0.1]]) > > The most simple code: > nv[ids] += n > does not work, I know about this. For 1D arrays I use > numpy.bincount(...) function. But this function does not work for 2D arrays. > > So, my question. What is the best way calculate accumulation sum for 2D > arrays using indirect indexes? > > Sincerely, > Alexander > > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: batch_apply.py URL: From D.Richards at mmu.ac.uk Sat Feb 4 05:48:22 2012 From: D.Richards at mmu.ac.uk (Dan Richards) Date: Sat, 4 Feb 2012 10:48:22 +0000 Subject: [SciPy-User] scipy.spatial.Delaunay.convex_hull problelm Message-ID: <000101cce32a$83dd1ee0$8b975ca0$@Richards@mmu.ac.uk> Hi All, I have been using scipy to find the Delaunay tetrahedron of a set of points in three-dimensions. However, now I wish to only generate the external faces of the tetrahedron. I assume this can be done using scipy.spatial.Delaunay.convex_hull? For my three-dimensional tetrahedron I am using this: import scipy from scipy import spatial Points = ([x1,y1,z1], [x2,y2,z2]...[xn,yn,zn]) Del = scipy.spatial.Delaunay(Points) faces = [] v = x.vertices for i in xrange(x.nsimplex): faces.extend([ (v[i,0],v[i,1],v[i,2]), (v[i,1],v[i,3],v[i,2]), (v[i,0],v[i,3],v[i,1]), (v[i,0],v[i,2],v[i,3]),]) for i in faces: MakeLines(i[0],i[1],i[2]) This allows me to create a three-dimensional tetragedron. I had thought to find the 3D convex hull could simply change either: "v = x.verticies" into "v=x.convex_hull" ; or "Del = scipy.spatial.Delaunay (Points)" into "Del = scipy.spatial.Delaunay.convex_hull(Points)".However, neither of these have worked as planned? If anyone is able to give me some advice or simply point me in the right direction that would be much appreciated. Thanks, Dan "Before acting on this email or opening any attachments you should read the Manchester Metropolitan University email disclaimer available on its website http://www.mmu.ac.uk/emaildisclaimer " -------------- next part -------------- An HTML attachment was scrubbed... URL: From pav at iki.fi Sat Feb 4 08:13:52 2012 From: pav at iki.fi (Pauli Virtanen) Date: Sat, 04 Feb 2012 14:13:52 +0100 Subject: [SciPy-User] scipy.spatial.Delaunay.convex_hull problelm In-Reply-To: <36924.2006949664$1328352535@news.gmane.org> References: <36924.2006949664$1328352535@news.gmane.org> Message-ID: 04.02.2012 11:48, Dan Richards kirjoitti: [clip] > This allows me to create a three-dimensional tetragedron. I had thought > to find the 3D convex hull could simply change either: ?v = x.verticies? > into ?v=x.*convex_hull*? ; or ?Del = scipy.spatial.Delaunay (Points)? > into ?Del = scipy.spatial.Delaunay.*convex_hull*(Points)?.However, > neither of these have worked as planned? Elements of the convex hull are triangles, not tetrahedra, so you need to change the code also below. for i1, i2, i3 in Del.convex_hull: faces.extend([(v[i1,0], ............), .... (.... v[i3,3]),]) From pav at iki.fi Sat Feb 4 10:53:05 2012 From: pav at iki.fi (Pauli Virtanen) Date: Sat, 04 Feb 2012 16:53:05 +0100 Subject: [SciPy-User] scipy.spatial.Delaunay.convex_hull problelm In-Reply-To: References: <36924.2006949664$1328352535@news.gmane.org> Message-ID: 04.02.2012 14:13, Pauli Virtanen kirjoitti: > 04.02.2012 11:48, Dan Richards kirjoitti: > [clip] >> This allows me to create a three-dimensional tetragedron. I had thought >> to find the 3D convex hull could simply change either: ?v = x.verticies? >> into ?v=x.*convex_hull*? ; or ?Del = scipy.spatial.Delaunay (Points)? >> into ?Del = scipy.spatial.Delaunay.*convex_hull*(Points)?.However, >> neither of these have worked as planned? > > Elements of the convex hull are triangles, not tetrahedra, so you need > to change the code also below. > > for i1, i2, i3 in Del.convex_hull: > faces.extend([(v[i1,0], ............), .... (.... v[i3,3]),]) Like so: import numpy as np from scipy.spatial import Delaunay points = np.random.randn(300, 3) tri = Delaunay(points) # -- Make a list of faces, [(p1, p2, p3), ...]; pj = (xj, yj, zj) faces = [] for ia, ib, ic in tri.convex_hull: faces.append(points[[ia, ib, ic]]) import matplotlib.pyplot as plt from mpl_toolkits.mplot3d import Axes3D from mpl_toolkits.mplot3d.art3d import Poly3DCollection fig = plt.figure() ax = fig.gca(projection='3d') items = Poly3DCollection(faces, facecolors=[(0, 0, 0, 0.1)]) ax.add_collection(items) ax.scatter(points[:,0], points[:,1], points[:,2], 'o') plt.show() From alec.kalinin at gmail.com Sat Feb 4 14:23:57 2012 From: alec.kalinin at gmail.com (Alexander Kalinin) Date: Sat, 4 Feb 2012 22:23:57 +0300 Subject: [SciPy-User] Accumulation sum using indirect indexes In-Reply-To: References:

Message-ID: I have checked the performance of the "pure numpy" solution with pandas solution on my task. The "pure numpy" solution is about two times slower. The data shape: (1062, 6348) Pandas "group by sum" time: 0.16588 seconds Pure numpy "group by sum" time: 0.38979 seconds But it is interesting, that the main bottleneck in numpy solution is the data copying. I have divided solution on three blocks: # block (a): s = np.argsort(labels) keys, inv = np.unique(labels, return_inverse = True) i = inv[s] groups_at = np.where(i != np.concatenate(([-1], i[:-1])))[0] # block (b): ordered_data = data[:, s] # block (c): group_sums = np.add.reduceat(ordered_data, groups_at, axis = 1) The timing for the blocks is: block (a): 0.00138 seconds block (b): 0.29285 seconds block (c): 0.08868 seconds The sorting and reduce_at procedures are very fast. But only one line: "ordered_data = data[:, s]" takes the most time. For me it is a bit strange. The reduceat() procedure where summation is executed is about 3 time faster than the only data copying. Alexander On Thu, Feb 2, 2012 at 10:16 PM, Warren Weckesser < warren.weckesser at enthought.com> wrote: > > > On Wed, Feb 1, 2012 at 10:34 AM, Alexander Kalinin > wrote: > >> Yes, but for large data sets loops is quite slow. I have tried Pandas >> groupby.sum() and it works faster. >> >> > > Pandas is probably the correct tool to use for this, but it will be nice > when numpy has a native "group-by" capability. > > For what its worth (had to scratch the itch, so to speak), the attached > script provides a "pure numpy" implementation without a python loop. The > output of the script is > > In [53]: run pseudo_group_by.py > Label Data > 20 [1 2 3] > 20 [1 2 4] > 10 [3 3 1] > 0 [5 0 0] > 20 [1 9 0] > 10 [2 3 4] > 20 [9 9 1] > > Label Num. Sum > 0 1 [5 0 0] > 10 2 [5 6 5] > 20 4 [12 22 8] > > > A drawback of the method is that it will make a reordered copy of the > data. I haven't compared the performance to pandas. > > Warren > > > >> >> 2012/2/1 Fr?d?ric Bastien >> >>> It will be slow, but you can make a python loop. >>> >>> Fred >>> On Jan 31, 2012 3:34 PM, "Alexander Kalinin" >>> wrote: >>> >>>> Hello! >>>> >>>> I use SciPy in computer graphics applications. My task is to calculate >>>> vertex normals by averaging faces normals. In other words I want to >>>> accumulate vectors with the same ids. For example, >>>> >>>> ids = numpy.array([0, 1, 1, 2]) >>>> n = numpy.array([ [0.1, 0.1, 0.1], [0.1, 0.1, 0.1], [0.1, 0.1, 0.1], >>>> [0.1, 0.1 0.1] ]) >>>> >>>> I need result: >>>> nv = ([ [0.1, 0.1, 0.1], [0.2, 0.2, 0.2], [0.1, 0.1, 0.1]]) >>>> >>>> The most simple code: >>>> nv[ids] += n >>>> does not work, I know about this. For 1D arrays I use >>>> numpy.bincount(...) function. But this function does not work for 2D arrays. >>>> >>>> So, my question. What is the best way calculate accumulation sum for 2D >>>> arrays using indirect indexes? >>>> >>>> Sincerely, >>>> Alexander >>>> >>>> _______________________________________________ >>>> SciPy-User mailing list >>>> SciPy-User at scipy.org >>>> http://mail.scipy.org/mailman/listinfo/scipy-user >>>> >>>> >>> _______________________________________________ >>> SciPy-User mailing list >>> SciPy-User at scipy.org >>> http://mail.scipy.org/mailman/listinfo/scipy-user >>> >>> >> >> _______________________________________________ >> SciPy-User mailing list >> SciPy-User at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-user >> >> > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From wesmckinn at gmail.com Sat Feb 4 14:27:18 2012 From: wesmckinn at gmail.com (Wes McKinney) Date: Sat, 4 Feb 2012 14:27:18 -0500 Subject: [SciPy-User] Accumulation sum using indirect indexes In-Reply-To: References:

Message-ID: On Sat, Feb 4, 2012 at 2:23 PM, Alexander Kalinin wrote: > I have checked the performance of the "pure numpy" solution with pandas > solution on my task. The "pure numpy" solution is about two times slower. > > The data shape: > ??? (1062, 6348) > Pandas "group by sum" time: > ??? 0.16588 seconds > Pure numpy "group by sum" time: > ??? 0.38979 seconds > > But it is interesting, that the main bottleneck in numpy solution is the > data copying. I have divided solution on three blocks: > > # block (a): > ? ? s = np.argsort(labels) > > keys, inv = np.unique(labels, return_inverse = True) > > i = inv[s] > > groups_at = np.where(i != np.concatenate(([-1], i[:-1])))[0] > > > # block (b): > ??? ordered_data = data[:, s] > > # block (c): > ??? group_sums = np.add.reduceat(ordered_data, groups_at, axis = 1) > > The timing for the blocks is: > block (a): > ??? 0.00138 seconds > > block (b): > ??? 0.29285 seconds > > block (c): > ??? 0.08868 seconds > > The sorting and reduce_at procedures are very fast. But only one line: > "ordered_data = data[:, s]" takes the most time. > > For me it is a bit strange. The reduceat() procedure where summation is > executed is about 3 time faster than the only data copying. > > Alexander > > > On Thu, Feb 2, 2012 at 10:16 PM, Warren Weckesser > wrote: >> >> >> >> On Wed, Feb 1, 2012 at 10:34 AM, Alexander Kalinin >> wrote: >>> >>> Yes, but for large data sets loops is quite slow. I have tried Pandas >>> groupby.sum() and it works faster. >>> >> >> >> Pandas is probably the correct tool to use for this, but it will be nice >> when numpy has a native "group-by" capability. >> >> For what its worth (had to scratch the itch, so to speak), the attached >> script provides a "pure numpy" implementation without a python loop.? The >> output of the script is >> >> In [53]: run pseudo_group_by.py >> Label?? Data >> ?20??? [1 2 3] >> ?20??? [1 2 4] >> ?10??? [3 3 1] >> ? 0??? [5 0 0] >> ?20??? [1 9 0] >> ?10??? [2 3 4] >> ?20??? [9 9 1] >> >> Label? Num.?? Sum >> ? 0???? 1?? [5 0 0] >> ?10???? 2?? [5 6 5] >> ?20???? 4?? [12 22? 8] >> >> >> A drawback of the method is that it will make a reordered copy of the >> data.? I haven't compared the performance to pandas. >> >> Warren >> >> >>> >>> >>> 2012/2/1 Fr?d?ric Bastien >>>> >>>> It will be slow, but you can make a python loop. >>>> >>>> Fred >>>> >>>> On Jan 31, 2012 3:34 PM, "Alexander Kalinin" >>>> wrote: >>>>> >>>>> Hello! >>>>> >>>>> I use SciPy in computer graphics applications. My task is to calculate >>>>> vertex normals by averaging faces normals. In other words I want to >>>>> accumulate vectors with the same ids. For example, >>>>> >>>>> ids = numpy.array([0, 1, 1, 2]) >>>>> n = numpy.array([ [0.1, 0.1, 0.1], [0.1, 0.1, 0.1], [0.1, 0.1, 0.1], >>>>> [0.1, 0.1 0.1] ]) >>>>> >>>>> I need result: >>>>> nv = ([ [0.1, 0.1, 0.1], [0.2, 0.2, 0.2], [0.1, 0.1, 0.1]]) >>>>> >>>>> The most simple code: >>>>> nv[ids] += n >>>>> does not work, I know about this. For 1D arrays I use >>>>> numpy.bincount(...) function. But this function does not work for 2D arrays. >>>>> >>>>> So, my question. What is the best way calculate accumulation sum for 2D >>>>> arrays using indirect indexes? >>>>> >>>>> Sincerely, >>>>> Alexander >>>>> >>>>> _______________________________________________ >>>>> SciPy-User mailing list >>>>> SciPy-User at scipy.org >>>>> http://mail.scipy.org/mailman/listinfo/scipy-user >>>>> >>>> >>>> _______________________________________________ >>>> SciPy-User mailing list >>>> SciPy-User at scipy.org >>>> http://mail.scipy.org/mailman/listinfo/scipy-user >>>> >>> >>> >>> _______________________________________________ >>> SciPy-User mailing list >>> SciPy-User at scipy.org >>> http://mail.scipy.org/mailman/listinfo/scipy-user >>> >> >> >> _______________________________________________ >> SciPy-User mailing list >> SciPy-User at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-user >> > > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > I should point out that pandas is not very optimized for a large number of columns like this. I just created a github issue about it: https://github.com/wesm/pandas/issues/745 I'll get to it eventually - Wes From vanforeest at gmail.com Sat Feb 4 18:12:27 2012 From: vanforeest at gmail.com (nicky van foreest) Date: Sun, 5 Feb 2012 00:12:27 +0100 Subject: [SciPy-User] scipy.stats.poisson, strange output? Message-ID: Hi, I used two types of poisson, and obtained different results. Specifically: In [1]: from scipy.stats import poisson In [2]: import numpy as np In [3]: grid = np.arange(20) In [4]: rv = poisson(10) In [5]: print rv.pmf(grid) [ 4.53999298e-05 4.53999298e-04 2.26999649e-03 7.56665496e-03 1.89166374e-02 3.78332748e-02 6.30554580e-02 9.00792257e-02 1.12599032e-01 1.25110036e-01 1.25110036e-01 1.13736396e-01 9.47803301e-02 7.29079462e-02 5.20771044e-02 3.47180696e-02 2.16987935e-02 1.27639962e-02 7.09110899e-03 3.73216263e-03] In [6]: print poisson.pmf(10., grid) [ nan 1.01377712e-07 3.81898506e-05 8.10151179e-04 5.29247668e-03 1.81327887e-02 4.13030934e-02 7.09832687e-02 9.92615338e-02 1.18580076e-01 1.25110036e-01 1.19378060e-01 1.04837256e-01 8.58701508e-02 6.62818432e-02 4.86107508e-02 3.40976998e-02 2.29995844e-02 1.49851586e-02 9.46624674e-03] In [7]: So, in line [5], rv.pmf(grid)[0] is a number, while in [6], poisson.pmf(10,grid)[0] is nan. Am I doing something wrong, or is this an unintentional inconsistency? Nicky From vanforeest at gmail.com Sat Feb 4 18:32:23 2012 From: vanforeest at gmail.com (nicky van foreest) Date: Sun, 5 Feb 2012 00:32:23 +0100 Subject: [SciPy-User] scipy.stats.poisson, strange output? In-Reply-To: References: Message-ID: Hi, I have found my mistake. I should have called poisson.pmf(grid, 10.) rather than poisson.pmf(10, grid). Sorry for the spam. Nicky From josef.pktd at gmail.com Sat Feb 4 18:33:08 2012 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Sat, 4 Feb 2012 18:33:08 -0500 Subject: [SciPy-User] scipy.stats.poisson, strange output? In-Reply-To: References: Message-ID: On Sat, Feb 4, 2012 at 6:12 PM, nicky van foreest wrote: > Hi, > > I used two types of poisson, and obtained different results. Specifically: > > In [1]: from scipy.stats import poisson > > In [2]: import numpy as np > > In [3]: grid = np.arange(20) > > In [4]: rv = poisson(10) > > In [5]: print rv.pmf(grid) > [ ?4.53999298e-05 ? 4.53999298e-04 ? 2.26999649e-03 ? 7.56665496e-03 > ? 1.89166374e-02 ? 3.78332748e-02 ? 6.30554580e-02 ? 9.00792257e-02 > ? 1.12599032e-01 ? 1.25110036e-01 ? 1.25110036e-01 ? 1.13736396e-01 > ? 9.47803301e-02 ? 7.29079462e-02 ? 5.20771044e-02 ? 3.47180696e-02 > ? 2.16987935e-02 ? 1.27639962e-02 ? 7.09110899e-03 ? 3.73216263e-03] > > In [6]: print poisson.pmf(10., grid) > [ ? ? ? ? ? ? nan ? 1.01377712e-07 ? 3.81898506e-05 ? 8.10151179e-04 > ? 5.29247668e-03 ? 1.81327887e-02 ? 4.13030934e-02 ? 7.09832687e-02 > ? 9.92615338e-02 ? 1.18580076e-01 ? 1.25110036e-01 ? 1.19378060e-01 > ? 1.04837256e-01 ? 8.58701508e-02 ? 6.62818432e-02 ? 4.86107508e-02 > ? 3.40976998e-02 ? 2.29995844e-02 ? 1.49851586e-02 ? 9.46624674e-03] wrong sequence of arguments, the shape (mean) argument should be second and first the values at which pmf is evaluated, i.e. >>> stats.poisson.pmf(grid, 10) array([ 0.0000453999297625, 0.0004539992976248, 0.0022699964881242, 0.0075666549604141, 0.0189166374010354, 0.0378332748020708, 0.0630554580034512, 0.090079225719216 , 0.1125990321490201, 0.1251100357211337, 0.1251100357211337, 0.1137363961101213, 0.094780330091768 , 0.0729079462244373, 0.0520771044460262, 0.0347180696306844, 0.0216987935191777, 0.0127639961877516, 0.0070911089931953, 0.0037321626279975]) in the first case it's a frozen distribution >>> stats.poisson(10).pmf(grid) array([ 0.0000453999297625, 0.0004539992976248, 0.0022699964881242, 0.0075666549604141, 0.0189166374010354, 0.0378332748020708, 0.0630554580034512, 0.090079225719216 , 0.1125990321490201, 0.1251100357211337, 0.1251100357211337, 0.1137363961101213, 0.094780330091768 , 0.0729079462244373, 0.0520771044460262, 0.0347180696306844, 0.0216987935191777, 0.0127639961877516, 0.0070911089931953, 0.0037321626279975]) Josef > > In [7]: > > > So, in line [5], rv.pmf(grid)[0] is a number, while in [6], > poisson.pmf(10,grid)[0] is nan. Am I doing something wrong, or is this > an unintentional inconsistency? > > Nicky > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user From josef.pktd at gmail.com Sat Feb 4 19:01:39 2012 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Sat, 4 Feb 2012 19:01:39 -0500 Subject: [SciPy-User] Accumulation sum using indirect indexes In-Reply-To: References:

Message-ID: On Sat, Feb 4, 2012 at 2:27 PM, Wes McKinney wrote: > On Sat, Feb 4, 2012 at 2:23 PM, Alexander Kalinin > wrote: >> I have checked the performance of the "pure numpy" solution with pandas >> solution on my task. The "pure numpy" solution is about two times slower. >> >> The data shape: >> ??? (1062, 6348) >> Pandas "group by sum" time: >> ??? 0.16588 seconds >> Pure numpy "group by sum" time: >> ??? 0.38979 seconds >> >> But it is interesting, that the main bottleneck in numpy solution is the >> data copying. I have divided solution on three blocks: >> >> # block (a): >> ? ? s = np.argsort(labels) >> >> keys, inv = np.unique(labels, return_inverse = True) >> >> i = inv[s] >> >> groups_at = np.where(i != np.concatenate(([-1], i[:-1])))[0] >> >> >> # block (b): >> ??? ordered_data = data[:, s] can you try with numpy.take? Keith and Wes were showing that take is much faster than advanced indexing. Josef >> >> # block (c): >> ??? group_sums = np.add.reduceat(ordered_data, groups_at, axis = 1) >> >> The timing for the blocks is: >> block (a): >> ??? 0.00138 seconds >> >> block (b): >> ??? 0.29285 seconds >> >> block (c): >> ??? 0.08868 seconds >> >> The sorting and reduce_at procedures are very fast. But only one line: >> "ordered_data = data[:, s]" takes the most time. >> >> For me it is a bit strange. The reduceat() procedure where summation is >> executed is about 3 time faster than the only data copying. >> >> Alexander >> >> >> On Thu, Feb 2, 2012 at 10:16 PM, Warren Weckesser >> wrote: >>> >>> >>> >>> On Wed, Feb 1, 2012 at 10:34 AM, Alexander Kalinin >>> wrote: >>>> >>>> Yes, but for large data sets loops is quite slow. I have tried Pandas >>>> groupby.sum() and it works faster. >>>> >>> >>> >>> Pandas is probably the correct tool to use for this, but it will be nice >>> when numpy has a native "group-by" capability. >>> >>> For what its worth (had to scratch the itch, so to speak), the attached >>> script provides a "pure numpy" implementation without a python loop.? The >>> output of the script is >>> >>> In [53]: run pseudo_group_by.py >>> Label?? Data >>> ?20??? [1 2 3] >>> ?20??? [1 2 4] >>> ?10??? [3 3 1] >>> ? 0??? [5 0 0] >>> ?20??? [1 9 0] >>> ?10??? [2 3 4] >>> ?20??? [9 9 1] >>> >>> Label? Num.?? Sum >>> ? 0???? 1?? [5 0 0] >>> ?10???? 2?? [5 6 5] >>> ?20???? 4?? [12 22? 8] >>> >>> >>> A drawback of the method is that it will make a reordered copy of the >>> data.? I haven't compared the performance to pandas. >>> >>> Warren >>> >>> >>>> >>>> >>>> 2012/2/1 Fr?d?ric Bastien >>>>> >>>>> It will be slow, but you can make a python loop. >>>>> >>>>> Fred >>>>> >>>>> On Jan 31, 2012 3:34 PM, "Alexander Kalinin" >>>>> wrote: >>>>>> >>>>>> Hello! >>>>>> >>>>>> I use SciPy in computer graphics applications. My task is to calculate >>>>>> vertex normals by averaging faces normals. In other words I want to >>>>>> accumulate vectors with the same ids. For example, >>>>>> >>>>>> ids = numpy.array([0, 1, 1, 2]) >>>>>> n = numpy.array([ [0.1, 0.1, 0.1], [0.1, 0.1, 0.1], [0.1, 0.1, 0.1], >>>>>> [0.1, 0.1 0.1] ]) >>>>>> >>>>>> I need result: >>>>>> nv = ([ [0.1, 0.1, 0.1], [0.2, 0.2, 0.2], [0.1, 0.1, 0.1]]) >>>>>> >>>>>> The most simple code: >>>>>> nv[ids] += n >>>>>> does not work, I know about this. For 1D arrays I use >>>>>> numpy.bincount(...) function. But this function does not work for 2D arrays. >>>>>> >>>>>> So, my question. What is the best way calculate accumulation sum for 2D >>>>>> arrays using indirect indexes? >>>>>> >>>>>> Sincerely, >>>>>> Alexander >>>>>> >>>>>> _______________________________________________ >>>>>> SciPy-User mailing list >>>>>> SciPy-User at scipy.org >>>>>> http://mail.scipy.org/mailman/listinfo/scipy-user >>>>>> >>>>> >>>>> _______________________________________________ >>>>> SciPy-User mailing list >>>>> SciPy-User at scipy.org >>>>> http://mail.scipy.org/mailman/listinfo/scipy-user >>>>> >>>> >>>> >>>> _______________________________________________ >>>> SciPy-User mailing list >>>> SciPy-User at scipy.org >>>> http://mail.scipy.org/mailman/listinfo/scipy-user >>>> >>> >>> >>> _______________________________________________ >>> SciPy-User mailing list >>> SciPy-User at scipy.org >>> http://mail.scipy.org/mailman/listinfo/scipy-user >>> >> >> >> _______________________________________________ >> SciPy-User mailing list >> SciPy-User at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-user >> > > I should point out that pandas is not very optimized for a large > number of columns like this. I just created a github issue about it: > > https://github.com/wesm/pandas/issues/745 > > I'll get to it eventually > > - Wes > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user From warren.weckesser at enthought.com Sat Feb 4 19:28:24 2012 From: warren.weckesser at enthought.com (Warren Weckesser) Date: Sat, 4 Feb 2012 18:28:24 -0600 Subject: [SciPy-User] Accumulation sum using indirect indexes In-Reply-To: References:

Message-ID: On Sat, Feb 4, 2012 at 6:01 PM, wrote: > On Sat, Feb 4, 2012 at 2:27 PM, Wes McKinney wrote: > > On Sat, Feb 4, 2012 at 2:23 PM, Alexander Kalinin > > wrote: > >> I have checked the performance of the "pure numpy" solution with pandas > >> solution on my task. The "pure numpy" solution is about two times > slower. > >> > >> The data shape: > >> (1062, 6348) > >> Pandas "group by sum" time: > >> 0.16588 seconds > >> Pure numpy "group by sum" time: > >> 0.38979 seconds > >> > >> But it is interesting, that the main bottleneck in numpy solution is the > >> data copying. I have divided solution on three blocks: > >> > >> # block (a): > >> s = np.argsort(labels) > >> > >> keys, inv = np.unique(labels, return_inverse = True) > >> > >> i = inv[s] > >> > >> groups_at = np.where(i != np.concatenate(([-1], i[:-1])))[0] > >> > >> > >> # block (b): > >> ordered_data = data[:, s] > > can you try with numpy.take? Keith and Wes were showing that take is > much faster than advanced indexing. > Good idea. numpy.take is much faster: In [35]: data.shape Out[35]: (1000000, 3) In [36]: %timeit o = data[s] 10 loops, best of 3: 155 ms per loop In [37]: %timeit o = take(data, s, axis=0) 10 loops, best of 3: 37.1 ms per loop Warren > Josef > > >> > >> # block (c): > >> group_sums = np.add.reduceat(ordered_data, groups_at, axis = 1) > >> > >> The timing for the blocks is: > >> block (a): > >> 0.00138 seconds > >> > >> block (b): > >> 0.29285 seconds > >> > >> block (c): > >> 0.08868 seconds > >> > >> The sorting and reduce_at procedures are very fast. But only one line: > >> "ordered_data = data[:, s]" takes the most time. > >> > >> For me it is a bit strange. The reduceat() procedure where summation is > >> executed is about 3 time faster than the only data copying. > >> > >> Alexander > >> > >> > >> On Thu, Feb 2, 2012 at 10:16 PM, Warren Weckesser > >> wrote: > >>> > >>> > >>> > >>> On Wed, Feb 1, 2012 at 10:34 AM, Alexander Kalinin > >>> wrote: > >>>> > >>>> Yes, but for large data sets loops is quite slow. I have tried Pandas > >>>> groupby.sum() and it works faster. > >>>> > >>> > >>> > >>> Pandas is probably the correct tool to use for this, but it will be > nice > >>> when numpy has a native "group-by" capability. > >>> > >>> For what its worth (had to scratch the itch, so to speak), the attached > >>> script provides a "pure numpy" implementation without a python loop. > The > >>> output of the script is > >>> > >>> In [53]: run pseudo_group_by.py > >>> Label Data > >>> 20 [1 2 3] > >>> 20 [1 2 4] > >>> 10 [3 3 1] > >>> 0 [5 0 0] > >>> 20 [1 9 0] > >>> 10 [2 3 4] > >>> 20 [9 9 1] > >>> > >>> Label Num. Sum > >>> 0 1 [5 0 0] > >>> 10 2 [5 6 5] > >>> 20 4 [12 22 8] > >>> > >>> > >>> A drawback of the method is that it will make a reordered copy of the > >>> data. I haven't compared the performance to pandas. > >>> > >>> Warren > >>> > >>> > >>>> > >>>> > >>>> 2012/2/1 Fr?d?ric Bastien > >>>>> > >>>>> It will be slow, but you can make a python loop. > >>>>> > >>>>> Fred > >>>>> > >>>>> On Jan 31, 2012 3:34 PM, "Alexander Kalinin" > > >>>>> wrote: > >>>>>> > >>>>>> Hello! > >>>>>> > >>>>>> I use SciPy in computer graphics applications. My task is to > calculate > >>>>>> vertex normals by averaging faces normals. In other words I want to > >>>>>> accumulate vectors with the same ids. For example, > >>>>>> > >>>>>> ids = numpy.array([0, 1, 1, 2]) > >>>>>> n = numpy.array([ [0.1, 0.1, 0.1], [0.1, 0.1, 0.1], [0.1, 0.1, 0.1], > >>>>>> [0.1, 0.1 0.1] ]) > >>>>>> > >>>>>> I need result: > >>>>>> nv = ([ [0.1, 0.1, 0.1], [0.2, 0.2, 0.2], [0.1, 0.1, 0.1]]) > >>>>>> > >>>>>> The most simple code: > >>>>>> nv[ids] += n > >>>>>> does not work, I know about this. For 1D arrays I use > >>>>>> numpy.bincount(...) function. But this function does not work for > 2D arrays. > >>>>>> > >>>>>> So, my question. What is the best way calculate accumulation sum > for 2D > >>>>>> arrays using indirect indexes? > >>>>>> > >>>>>> Sincerely, > >>>>>> Alexander > >>>>>> > >>>>>> _______________________________________________ > >>>>>> SciPy-User mailing list > >>>>>> SciPy-User at scipy.org > >>>>>> http://mail.scipy.org/mailman/listinfo/scipy-user > >>>>>> > >>>>> > >>>>> _______________________________________________ > >>>>> SciPy-User mailing list > >>>>> SciPy-User at scipy.org > >>>>> http://mail.scipy.org/mailman/listinfo/scipy-user > >>>>> > >>>> > >>>> > >>>> _______________________________________________ > >>>> SciPy-User mailing list > >>>> SciPy-User at scipy.org > >>>> http://mail.scipy.org/mailman/listinfo/scipy-user > >>>> > >>> > >>> > >>> _______________________________________________ > >>> SciPy-User mailing list > >>> SciPy-User at scipy.org > >>> http://mail.scipy.org/mailman/listinfo/scipy-user > >>> > >> > >> > >> _______________________________________________ > >> SciPy-User mailing list > >> SciPy-User at scipy.org > >> http://mail.scipy.org/mailman/listinfo/scipy-user > >> > > > > I should point out that pandas is not very optimized for a large > > number of columns like this. I just created a github issue about it: > > > > https://github.com/wesm/pandas/issues/745 > > > > I'll get to it eventually > > > > - Wes > > _______________________________________________ > > SciPy-User mailing list > > SciPy-User at scipy.org > > http://mail.scipy.org/mailman/listinfo/scipy-user > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > -------------- next part -------------- An HTML attachment was scrubbed... URL: From alec.kalinin at gmail.com Sun Feb 5 02:17:12 2012 From: alec.kalinin at gmail.com (Alexander Kalinin) Date: Sun, 5 Feb 2012 10:17:12 +0300 Subject: [SciPy-User] Accumulation sum using indirect indexes In-Reply-To: References:

Message-ID: Yes, the numpy.take() is much faster than "fancy" indexing and now "pure numpy" solution is two time faster than pandas. Below are timing results: The data shape: (1062, 6348) Pandas solution: 0.16610 seconds "Pure numpy" solution: 0.08907 seconds Timing of the "pure numpy" by blocks: block (a) (sorting and obtaining groups): 0.00134 seconds block (b) (copy data to the ordered_data): 0.05517 seconds block (c) (reduceat): 0.02698 Alexander. On Sun, Feb 5, 2012 at 4:01 AM, wrote: > On Sat, Feb 4, 2012 at 2:27 PM, Wes McKinney wrote: > > On Sat, Feb 4, 2012 at 2:23 PM, Alexander Kalinin > > wrote: > >> I have checked the performance of the "pure numpy" solution with pandas > >> solution on my task. The "pure numpy" solution is about two times > slower. > >> > >> The data shape: > >> (1062, 6348) > >> Pandas "group by sum" time: > >> 0.16588 seconds > >> Pure numpy "group by sum" time: > >> 0.38979 seconds > >> > >> But it is interesting, that the main bottleneck in numpy solution is the > >> data copying. I have divided solution on three blocks: > >> > >> # block (a): > >> s = np.argsort(labels) > >> > >> keys, inv = np.unique(labels, return_inverse = True) > >> > >> i = inv[s] > >> > >> groups_at = np.where(i != np.concatenate(([-1], i[:-1])))[0] > >> > >> > >> # block (b): > >> ordered_data = data[:, s] > > can you try with numpy.take? Keith and Wes were showing that take is > much faster than advanced indexing. > > Josef > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From yosefmel at post.tau.ac.il Sun Feb 5 03:59:21 2012 From: yosefmel at post.tau.ac.il (Yosef Meller) Date: Sun, 05 Feb 2012 10:59:21 +0200 Subject: [SciPy-User] asarray_chkfinite In-Reply-To: <9394EDC3-D458-4697-86AC-75115C14AA62@toadhill.net> References: <9394EDC3-D458-4697-86AC-75115C14AA62@toadhill.net> Message-ID: <11969561.AY5P6bKH8p@yosef-pc> On Wednesday, 1 ?February 2012 11:16:37 glen at toadhill.net wrote: > Hi all, > > I'm trying to optimize some code that entails a very large number of sparse > matrix-vector and vctor-vector multiplies. Upon running the profiler I see > that about 25% of my program's cumulative time is spent running > asarray_chkfinite. I do not call this routine directly. Can anyone tell me > what might be calling it and whether there is anything obvious I can do > about it? In addition to what Ralph said, I recommend using pycallgraph to see who calls what. Yosef. -------------- next part -------------- An HTML attachment was scrubbed... URL: From D.Richards at mmu.ac.uk Sun Feb 5 09:36:28 2012 From: D.Richards at mmu.ac.uk (Daniel Richards) Date: Sun, 5 Feb 2012 14:36:28 +0000 Subject: [SciPy-User] scipy.spatial.Delaunay.convex_hull problelm In-Reply-To: References: <36924.2006949664$1328352535@news.gmane.org> , Message-ID: <69978DA9452B194B9467CF8E34D69723541A4E50@EXMB3.ad.mmu.ac.uk> Hi Pauli, This is great, it works! Thanks for the help. Dan ________________________________________ From: scipy-user-bounces at scipy.org [scipy-user-bounces at scipy.org] on behalf of Pauli Virtanen [pav at iki.fi] Sent: 04 February 2012 15:53 To: scipy-user at scipy.org Subject: Re: [SciPy-User] scipy.spatial.Delaunay.convex_hull problelm 04.02.2012 14:13, Pauli Virtanen kirjoitti: > 04.02.2012 11:48, Dan Richards kirjoitti: > [clip] >> This allows me to create a three-dimensional tetragedron. I had thought >> to find the 3D convex hull could simply change either: ?v = x.verticies? >> into ?v=x.*convex_hull*? ; or ?Del = scipy.spatial.Delaunay (Points)? >> into ?Del = scipy.spatial.Delaunay.*convex_hull*(Points)?.However, >> neither of these have worked as planned? > > Elements of the convex hull are triangles, not tetrahedra, so you need > to change the code also below. > > for i1, i2, i3 in Del.convex_hull: > faces.extend([(v[i1,0], ............), .... (.... v[i3,3]),]) Like so: import numpy as np from scipy.spatial import Delaunay points = np.random.randn(300, 3) tri = Delaunay(points) # -- Make a list of faces, [(p1, p2, p3), ...]; pj = (xj, yj, zj) faces = [] for ia, ib, ic in tri.convex_hull: faces.append(points[[ia, ib, ic]]) import matplotlib.pyplot as plt from mpl_toolkits.mplot3d import Axes3D from mpl_toolkits.mplot3d.art3d import Poly3DCollection fig = plt.figure() ax = fig.gca(projection='3d') items = Poly3DCollection(faces, facecolors=[(0, 0, 0, 0.1)]) ax.add_collection(items) ax.scatter(points[:,0], points[:,1], points[:,2], 'o') plt.show() _______________________________________________ SciPy-User mailing list SciPy-User at scipy.org http://mail.scipy.org/mailman/listinfo/scipy-user "Before acting on this email or opening any attachments you should read the Manchester Metropolitan University email disclaimer available on its website http://www.mmu.ac.uk/emaildisclaimer " From conradlee at gmail.com Sun Feb 5 10:05:39 2012 From: conradlee at gmail.com (Conrad Lee) Date: Sun, 5 Feb 2012 15:05:39 +0000 Subject: [SciPy-User] Can I copy a sparse matrix into an existing dense numpy matrix? Message-ID: Say I have a huge numpy matrix *A* taking up tens of gigabytes. It takes a non-negligible amount of time to allocate this memory. Let's say I also have a collection of scipy sparse matrices with the same dimensions as the numpy matrix. Sometimes I want to convert one of these sparse matrices into a dense matrix to perform some vectorized operations that can't be performed on sparse matrices. Can I load one of these sparse matrices into *A* rather than re-allocate space each time I want to convert a sparse matrix into a dense matrix? The .toarray() and .todense() methods which are available on scipy sparse matrices do not seem to take an optional dense array argument, but maybe there is some other way to do this. (I've also started a stackoverflow version of this question here .) Thanks, Conrad lee -------------- next part -------------- An HTML attachment was scrubbed... URL: From warren.weckesser at enthought.com Sun Feb 5 10:21:01 2012 From: warren.weckesser at enthought.com (Warren Weckesser) Date: Sun, 5 Feb 2012 09:21:01 -0600 Subject: [SciPy-User] Can I copy a sparse matrix into an existing dense numpy matrix? In-Reply-To: References: Message-ID: On Sun, Feb 5, 2012 at 9:05 AM, Conrad Lee wrote: > Say I have a huge numpy matrix *A* taking up tens of gigabytes. It takes > a non-negligible amount of time to allocate this memory. > > Let's say I also have a collection of scipy sparse matrices with the same > dimensions as the numpy matrix. Sometimes I want to convert one of these > sparse matrices into a dense matrix to perform some vectorized operations > that can't be performed on sparse matrices. > > Can I load one of these sparse matrices into *A* rather than re-allocate > space each time I want to convert a sparse matrix into a dense matrix? The > .toarray() and .todense() methods which are available on scipy sparse > matrices do not seem to take an optional dense array argument, but maybe > there is some other way to do this. > > (I've also started a stackoverflow version of this question here > .) > > Thanks, > > Conrad lee > > If your sparse matrix is in coo format, you can use fancy indexing to assign the values to the existing array. For example: In [29]: import scipy.sparse as sp In [30]: import numpy as np In [31]: a = sp.coo_matrix([[0,0,1,0],[0,0,0,0],[2,0,3,0],[0,4,0,0]]) In [32]: d = np.zeros((4,4), dtype=np.int32) In [33]: a.todense() Out[33]: matrix([[0, 0, 1, 0], [0, 0, 0, 0], [2, 0, 3, 0], [0, 4, 0, 0]]) In [34]: d[a.row, a.col] = a.data In [35]: d Out[35]: array([[0, 0, 1, 0], [0, 0, 0, 0], [2, 0, 3, 0], [0, 4, 0, 0]]) Warren -------------- next part -------------- An HTML attachment was scrubbed... URL: From vanforeest at gmail.com Sun Feb 5 15:41:39 2012 From: vanforeest at gmail.com (nicky van foreest) Date: Sun, 5 Feb 2012 21:41:39 +0100 Subject: [SciPy-User] scipy.stats.poisson, strange output? In-Reply-To: References:

Message-ID: Hi Josef, > wrong sequence of arguments, the shape (mean) argument should be > second and first the values at which pmf is evaluated, i.e. Thanks. I discovered it just after your reply. I must admit that I find it more natural to first specify the distribution's parameters, such as mu for the Poisson distribution, and then specify the points at which to evaluate the pmf (or cdf, etc.) This explains the error. > >>>> stats.poisson.pmf(grid, 10) > array([ 0.0000453999297625, ?0.0004539992976248, ?0.0022699964881242, > ? ? ? ?0.0075666549604141, ?0.0189166374010354, ?0.0378332748020708, > ? ? ? ?0.0630554580034512, ?0.090079225719216 , ?0.1125990321490201, > ? ? ? ?0.1251100357211337, ?0.1251100357211337, ?0.1137363961101213, > ? ? ? ?0.094780330091768 , ?0.0729079462244373, ?0.0520771044460262, > ? ? ? ?0.0347180696306844, ?0.0216987935191777, ?0.0127639961877516, > ? ? ? ?0.0070911089931953, ?0.0037321626279975]) > > in the first case it's a frozen distribution > >>>> stats.poisson(10).pmf(grid) > array([ 0.0000453999297625, ?0.0004539992976248, ?0.0022699964881242, > ? ? ? ?0.0075666549604141, ?0.0189166374010354, ?0.0378332748020708, > ? ? ? ?0.0630554580034512, ?0.090079225719216 , ?0.1125990321490201, > ? ? ? ?0.1251100357211337, ?0.1251100357211337, ?0.1137363961101213, > ? ? ? ?0.094780330091768 , ?0.0729079462244373, ?0.0520771044460262, > ? ? ? ?0.0347180696306844, ?0.0216987935191777, ?0.0127639961877516, > ? ? ? ?0.0070911089931953, ?0.0037321626279975]) > > Josef > >> >> In [7]: >> >> >> So, in line [5], rv.pmf(grid)[0] is a number, while in [6], >> poisson.pmf(10,grid)[0] is nan. Am I doing something wrong, or is this >> an unintentional inconsistency? >> >> Nicky >> _______________________________________________ >> SciPy-User mailing list >> SciPy-User at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-user > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user From josef.pktd at gmail.com Sun Feb 5 16:35:27 2012 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Sun, 5 Feb 2012 16:35:27 -0500 Subject: [SciPy-User] scipy.stats.poisson, strange output? In-Reply-To: References:

Message-ID: On Sun, Feb 5, 2012 at 3:41 PM, nicky van foreest wrote: > Hi Josef, > >> wrong sequence of arguments, the shape (mean) argument should be >> second and first the values at which pmf is evaluated, i.e. > > Thanks. I discovered it just after your reply. I must admit that I > find it more natural to first specify the distribution's parameters, > such as mu for the Poisson distribution, and then specify the points > at which to evaluate the pmf (or cdf, etc.) This explains the error. I saw thatt you replied at the same time. I find the current version easier to follow, reading "given the parameters" pmf(x, theta) = Prob(x | theta) cdf(x, theta) = F(x | theta) pdf(x, theta) = f(x | theta) (and we could even pretend we are Bayesians :) Cheers, Josef > >> >>>>> stats.poisson.pmf(grid, 10) >> array([ 0.0000453999297625, ?0.0004539992976248, ?0.0022699964881242, >> ? ? ? ?0.0075666549604141, ?0.0189166374010354, ?0.0378332748020708, >> ? ? ? ?0.0630554580034512, ?0.090079225719216 , ?0.1125990321490201, >> ? ? ? ?0.1251100357211337, ?0.1251100357211337, ?0.1137363961101213, >> ? ? ? ?0.094780330091768 , ?0.0729079462244373, ?0.0520771044460262, >> ? ? ? ?0.0347180696306844, ?0.0216987935191777, ?0.0127639961877516, >> ? ? ? ?0.0070911089931953, ?0.0037321626279975]) >> >> in the first case it's a frozen distribution >> >>>>> stats.poisson(10).pmf(grid) >> array([ 0.0000453999297625, ?0.0004539992976248, ?0.0022699964881242, >> ? ? ? ?0.0075666549604141, ?0.0189166374010354, ?0.0378332748020708, >> ? ? ? ?0.0630554580034512, ?0.090079225719216 , ?0.1125990321490201, >> ? ? ? ?0.1251100357211337, ?0.1251100357211337, ?0.1137363961101213, >> ? ? ? ?0.094780330091768 , ?0.0729079462244373, ?0.0520771044460262, >> ? ? ? ?0.0347180696306844, ?0.0216987935191777, ?0.0127639961877516, >> ? ? ? ?0.0070911089931953, ?0.0037321626279975]) >> >> Josef >> >>> >>> In [7]: >>> >>> >>> So, in line [5], rv.pmf(grid)[0] is a number, while in [6], >>> poisson.pmf(10,grid)[0] is nan. Am I doing something wrong, or is this >>> an unintentional inconsistency? >>> >>> Nicky >>> _______________________________________________ >>> SciPy-User mailing list >>> SciPy-User at scipy.org >>> http://mail.scipy.org/mailman/listinfo/scipy-user >> _______________________________________________ >> SciPy-User mailing list >> SciPy-User at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-user > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user From vanforeest at gmail.com Sun Feb 5 17:00:16 2012 From: vanforeest at gmail.com (nicky van foreest) Date: Sun, 5 Feb 2012 23:00:16 +0100 Subject: [SciPy-User] scipy.stats.poisson, strange output? In-Reply-To: References:

Message-ID: > I find the current version easier to follow, reading "given the parameters" This is a good hint to memorize the proper sequence. > (and we could even pretend we are Bayesians :) I am some sort of a Bayesian :-) Nicky > > Cheers, > > Josef > >> >>> >>>>>> stats.poisson.pmf(grid, 10) >>> array([ 0.0000453999297625, ?0.0004539992976248, ?0.0022699964881242, >>> ? ? ? ?0.0075666549604141, ?0.0189166374010354, ?0.0378332748020708, >>> ? ? ? ?0.0630554580034512, ?0.090079225719216 , ?0.1125990321490201, >>> ? ? ? ?0.1251100357211337, ?0.1251100357211337, ?0.1137363961101213, >>> ? ? ? ?0.094780330091768 , ?0.0729079462244373, ?0.0520771044460262, >>> ? ? ? ?0.0347180696306844, ?0.0216987935191777, ?0.0127639961877516, >>> ? ? ? ?0.0070911089931953, ?0.0037321626279975]) >>> >>> in the first case it's a frozen distribution >>> >>>>>> stats.poisson(10).pmf(grid) >>> array([ 0.0000453999297625, ?0.0004539992976248, ?0.0022699964881242, >>> ? ? ? ?0.0075666549604141, ?0.0189166374010354, ?0.0378332748020708, >>> ? ? ? ?0.0630554580034512, ?0.090079225719216 , ?0.1125990321490201, >>> ? ? ? ?0.1251100357211337, ?0.1251100357211337, ?0.1137363961101213, >>> ? ? ? ?0.094780330091768 , ?0.0729079462244373, ?0.0520771044460262, >>> ? ? ? ?0.0347180696306844, ?0.0216987935191777, ?0.0127639961877516, >>> ? ? ? ?0.0070911089931953, ?0.0037321626279975]) >>> >>> Josef >>> >>>> >>>> In [7]: >>>> >>>> >>>> So, in line [5], rv.pmf(grid)[0] is a number, while in [6], >>>> poisson.pmf(10,grid)[0] is nan. Am I doing something wrong, or is this >>>> an unintentional inconsistency? >>>> >>>> Nicky >>>> _______________________________________________ >>>> SciPy-User mailing list >>>> SciPy-User at scipy.org >>>> http://mail.scipy.org/mailman/listinfo/scipy-user >>> _______________________________________________ >>> SciPy-User mailing list >>> SciPy-User at scipy.org >>> http://mail.scipy.org/mailman/listinfo/scipy-user >> _______________________________________________ >> SciPy-User mailing list >> SciPy-User at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-user > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user From lafont.fabien at gmail.com Mon Feb 6 04:41:20 2012 From: lafont.fabien at gmail.com (Fabien Lafont) Date: Mon, 6 Feb 2012 10:41:20 +0100 Subject: [SciPy-User] [scipy-user] How to apply a condition on some specific values of an array In-Reply-To: References:

Message-ID: And is it possible to apply a specific operation with a condition (if). I have to apply different operations on the same array depending on the value of the array element. For example: I have an array like that: [100 250 501 700] and I want to multiply by 100 the value if this value is smalest thant 500 and multiply by 1000 if the value is bigest. Fabien 2012/2/3 Fabien Lafont : > thx Scott! > > 2012/2/3 Scott Sinclair : >> On 3 February 2012 12:48, Fabien Lafont wrote: >>> I'm just starting to use numpy and I have still Python's reflexes so I >>> want to know how can I do the following code using Numpy "style". >>> >>> for i in range(0,len(array)+1): >>> ? ? ? if 10>> ? ? ? ? ? ?new_array = array[i]*1000 >>> >>> In other words is it possible to "scan" the values of an array and >>> apply a "modification" to it if the condition is true >> >> Yes - you can use fancy indexing (see >> http://docs.scipy.org/doc/numpy/user/basics.indexing.html) >> >> In[1]: import numpy as np >> >> In[2]: arr = np.arange(10) >> In[3]: arr >> Out[3]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]) >> >> In[4]: arr[(2 < arr) & (arr < 9)] *= 1000 >> In[5]: arr >> Out[5]: array([ ? 0, ? ?1, ? ?2, 3000, 4000, 5000, 6000, 7000, 8000, ? ?9]) >> >> Cheers, >> Scott >> _______________________________________________ >> SciPy-User mailing list >> SciPy-User at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-user From scott.sinclair.za at gmail.com Mon Feb 6 05:13:32 2012 From: scott.sinclair.za at gmail.com (Scott Sinclair) Date: Mon, 6 Feb 2012 12:13:32 +0200 Subject: [SciPy-User] [scipy-user] How to apply a condition on some specific values of an array In-Reply-To: References:

Message-ID: On 6 February 2012 11:41, Fabien Lafont wrote: > And is it possible to apply a specific operation with a condition > (if). I have to apply different operations on the same array depending > on the value of the array element. > > For example: I have an array like that: > > [100 ?250 ?501 ?700] and I want to multiply by 100 the value if this > value is smalest thant 500 and multiply by 1000 if the value is > bigest. Here's one way that should be easy to follow. You'll have to make a copy of your array (as shown here), or generate two index arrays before modifying your original array. In [1]: arr = np.array([100, 250, 501, 700]) In [2]: # make a copy to avoid aliasing ...: new_arr = np.array(arr) In [3]: new_arr[arr < 500] *= 100 In [4]: new_arr[arr > 500] *= 1000 In [5]: arr Out[5]: array([100, 250, 501, 700]) In [6]: new_arr Out[6]: array([ 10000, 25000, 501000, 700000]) Cheers, Scott From lafont.fabien at gmail.com Mon Feb 6 05:27:20 2012 From: lafont.fabien at gmail.com (Fabien Lafont) Date: Mon, 6 Feb 2012 11:27:20 +0100 Subject: [SciPy-User] [scipy-user] How to apply a condition on some specific values of an array In-Reply-To: References:

Message-ID: Great, it seems really easy! Is it possible to append the values to new_arr because I have to do it with many "arr" so new_arr will be erase each time if I do: new_arr[arr < 500] *= 100 new_arr[arr2 < 500] *= 100 new_arr[arr3 < 500] *= 100 I've tried np.append but it doesn't work... Nevertheless thanks again! Fab 2012/2/6 Scott Sinclair : > On 6 February 2012 11:41, Fabien Lafont wrote: >> And is it possible to apply a specific operation with a condition >> (if). I have to apply different operations on the same array depending >> on the value of the array element. >> >> For example: I have an array like that: >> >> [100 ?250 ?501 ?700] and I want to multiply by 100 the value if this >> value is smalest thant 500 and multiply by 1000 if the value is >> bigest. > > Here's one way that should be easy to follow. You'll have to make a > copy of your array (as shown here), or generate two index arrays > before modifying your original array. > > In [1]: arr = np.array([100, 250, 501, 700]) > > In [2]: # make a copy to avoid aliasing > ? ...: new_arr = np.array(arr) > > In [3]: new_arr[arr < 500] *= 100 > > In [4]: new_arr[arr > 500] *= 1000 > > In [5]: arr > Out[5]: array([100, 250, 501, 700]) > > In [6]: new_arr > Out[6]: array([ 10000, ?25000, 501000, 700000]) > > Cheers, > Scott > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user From lafont.fabien at gmail.com Mon Feb 6 06:31:25 2012 From: lafont.fabien at gmail.com (Fabien Lafont) Date: Mon, 6 Feb 2012 12:31:25 +0100 Subject: [SciPy-User] [scipy-user] How to apply a condition on some specific values of an array In-Reply-To: References:

Message-ID: Sorry Scott, I managed to append values. I was using it like classical-list append function! Thx again, Fab 2012/2/6 Fabien Lafont : > Great, it seems really easy! > > Is it possible to append the values to new_arr because I have to do it > with many "arr" so new_arr will be erase each time if I do: > > new_arr[arr < 500] *= 100 > new_arr[arr2 < 500] *= 100 > new_arr[arr3 < 500] *= 100 > > I've tried np.append but it doesn't work... > > Nevertheless thanks again! > > Fab > > 2012/2/6 Scott Sinclair : >> On 6 February 2012 11:41, Fabien Lafont wrote: >>> And is it possible to apply a specific operation with a condition >>> (if). I have to apply different operations on the same array depending >>> on the value of the array element. >>> >>> For example: I have an array like that: >>> >>> [100 ?250 ?501 ?700] and I want to multiply by 100 the value if this >>> value is smalest thant 500 and multiply by 1000 if the value is >>> bigest. >> >> Here's one way that should be easy to follow. You'll have to make a >> copy of your array (as shown here), or generate two index arrays >> before modifying your original array. >> >> In [1]: arr = np.array([100, 250, 501, 700]) >> >> In [2]: # make a copy to avoid aliasing >> ? ...: new_arr = np.array(arr) >> >> In [3]: new_arr[arr < 500] *= 100 >> >> In [4]: new_arr[arr > 500] *= 1000 >> >> In [5]: arr >> Out[5]: array([100, 250, 501, 700]) >> >> In [6]: new_arr >> Out[6]: array([ 10000, ?25000, 501000, 700000]) >> >> Cheers, >> Scott >> _______________________________________________ >> SciPy-User mailing list >> SciPy-User at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-user From guyer at nist.gov Mon Feb 6 09:12:19 2012 From: guyer at nist.gov (Jonathan Guyer) Date: Mon, 6 Feb 2012 09:12:19 -0500 Subject: [SciPy-User] Can I copy a sparse matrix into an existing dense numpy matrix? In-Reply-To: References:

Message-ID: <2B682631-3D88-4E0A-BA15-760B39F1F1A2@nist.gov> On Feb 5, 2012, at 10:21 AM, Warren Weckesser wrote: > > > On Sun, Feb 5, 2012 at 9:05 AM, Conrad Lee wrote: > Say I have a huge numpy matrix A taking up tens of gigabytes. It takes a non-negligible amount of time to allocate this memory. > > Let's say I also have a collection of scipy sparse matrices with the same dimensions as the numpy matrix. Sometimes I want to convert one of these sparse matrices into a dense matrix to perform some vectorized operations that can't be performed on sparse matrices. > > Can I load one of these sparse matrices into A rather than re-allocate space each time I want to convert a sparse matrix into a dense matrix? The .toarray() and .todense() methods which are available on scipy sparse matrices do not seem to take an optional dense array argument, but maybe there is some other way to do this. > > (I've also started a stackoverflow version of this question here.) > > Thanks, > > Conrad lee > > > > If your sparse matrix is in coo format, you can use fancy indexing to assign the values to the existing array. Although, unless your sparsity pattern doesn't change (which it may not), you'll need to zero the entire dense array before reassigning, which will also take "a non-negligible amount of time". From conradlee at gmail.com Mon Feb 6 09:56:22 2012 From: conradlee at gmail.com (Conrad Lee) Date: Mon, 6 Feb 2012 14:56:22 +0000 Subject: [SciPy-User] Can I copy a sparse matrix into an existing dense numpy matrix? In-Reply-To: <2B682631-3D88-4E0A-BA15-760B39F1F1A2@nist.gov> References:

<2B682631-3D88-4E0A-BA15-760B39F1F1A2@nist.gov> Message-ID: Warren, thanks for the suggestion with the COO matrix. In general I'm storing sparse matrices in the CSR format for quick multiplication, so your approach would mean that I have to convert to a COO matrix every time, but that conversion is pretty quick. Although, unless your sparsity pattern doesn't change (which it may not), > you'll need to zero the entire dense array before reassigning, which will > also take "a non-negligible amount of time". > Zeroing out a matrix seems to happen very quickly, probably because it's a vectorized operation taking advantage of the SIMD instructions on modern processors. As far as I understand it, allocating huge amounts of memory requires slower operations. I did a quick and dirty benchmark, and zeroing takes a small fraction of the time of allocating. > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > -------------- next part -------------- An HTML attachment was scrubbed... URL: From guyer at nist.gov Mon Feb 6 10:11:33 2012 From: guyer at nist.gov (Jonathan Guyer) Date: Mon, 6 Feb 2012 10:11:33 -0500 Subject: [SciPy-User] Can I copy a sparse matrix into an existing dense numpy matrix? In-Reply-To: References:

<2B682631-3D88-4E0A-BA15-760B39F1F1A2@nist.gov> Message-ID: On Feb 6, 2012, at 9:56 AM, Conrad Lee wrote: > I did a quick and dirty benchmark, and zeroing takes a small fraction of the time of allocating. Good to know. From warren.weckesser at enthought.com Mon Feb 6 11:21:09 2012 From: warren.weckesser at enthought.com (Warren Weckesser) Date: Mon, 6 Feb 2012 10:21:09 -0600 Subject: [SciPy-User] Can I copy a sparse matrix into an existing dense numpy matrix? In-Reply-To: References:

<2B682631-3D88-4E0A-BA15-760B39F1F1A2@nist.gov> Message-ID: On Mon, Feb 6, 2012 at 8:56 AM, Conrad Lee wrote: > Warren, thanks for the suggestion with the COO matrix. In general I'm > storing sparse matrices in the CSR format for quick multiplication, so your > approach would mean that I have to convert to a COO matrix every time, but > that conversion is pretty quick. > Conrad, Here's an example of how you could do the assignment directly with a CSR matrix: import numpy as np from scipy.sparse import csr_matrix # 'c' is a sparse matrix in CSR format. c = csr_matrix([[0,0,1,0,0,0], [0,2,0,3,0,0], [0,0,0,0,0,0], [4,0,0,0,5,0]]) # 'a' is the dense array into which we'll copy the nonzero # elements of 'c' a = np.zeros(c.shape, dtype=c.dtype) # The next line is the key part: it converts c.indptr into # the row indices in the dense array. (c.indices already has # the columns.) rows = sum((m*[k] for k, m in enumerate(np.diff(c.indptr))), []) a[rows, c.indices] = c.data print c.todense() print a print np.all(c.todense() == a) This might be more efficient than converting to COO. Warren > Although, unless your sparsity pattern doesn't change (which it may not), >> you'll need to zero the entire dense array before reassigning, which will >> also take "a non-negligible amount of time". >> > > Zeroing out a matrix seems to happen very quickly, probably because it's a > vectorized operation taking advantage of the SIMD instructions on modern > processors. As far as I understand it, allocating huge amounts of memory > requires slower operations. I did a quick and dirty benchmark, and zeroing > takes a small fraction of the time of allocating. > > >> >> _______________________________________________ >> SciPy-User mailing list >> SciPy-User at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-user >> > > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From vaggi.federico at gmail.com Tue Feb 7 04:18:00 2012 From: vaggi.federico at gmail.com (federico vaggi) Date: Tue, 7 Feb 2012 10:18:00 +0100 Subject: [SciPy-User] From Delaunay edges to spatial points Message-ID: Hello, I am a relative newbie to tessellation, so I might be asking a very naive question. I have an unweighted graph (list of nodes, edge lists) that I would like to plot on the surface of a sphere. Given the edge list, is there a way to come up with a x,y,z position of all the nodes so that they follow a Delaunay tessellation? Most software I've seen starts from positions in space and then tries to obtain the edge list - I'd like to do the inverse, if possible. I am not 100% sure if this is more appropriate for the scipy mailing list or the networkx mailing list, so I think I'll post it in both places as long as that's not frowned upon. Thank you very much, Fede -------------- next part -------------- An HTML attachment was scrubbed... URL: From lafont.fabien at gmail.com Tue Feb 7 05:43:04 2012 From: lafont.fabien at gmail.com (Fabien Lafont) Date: Tue, 7 Feb 2012 11:43:04 +0100 Subject: [SciPy-User] [scipy-user] How to use genfromtext() with np.array? Message-ID: I've saved a np.array in a file using write(). Ihave then a file with my np.array over 8 columns and I can't load it using genfromtext to load at the same time the entire array. It seems genfromtext doesn't "see" the array as a real array but as 8 different columns. Is it possible to load the array easily with genfromtext, or save my array in a different way. It works with a for loop over each indices of the array with + "\n" but it's not very convenient. Thx, Fabien From papuu_k at yahoo.com Tue Feb 7 05:43:34 2012 From: papuu_k at yahoo.com (Pappu Kumar) Date: Tue, 7 Feb 2012 16:13:34 +0530 (IST) Subject: [SciPy-User] Fitting Differential Equations to a Curve Message-ID: <1328611414.88237.YahooMailNeo@web137608.mail.in.yahoo.com> I am trying to fit the differential equation ay' + by''=0 to a curve by varying a and b The following code does not work: from scipy.integrate import odeint from scipy.optimize import curve_fit from numpy import linspace, random, array time = linspace(0.0,10.0,100) def deriv(time,a,b): ??? dy=lambda y,t : array([ y[1], a*y[0]+b*y[1] ]) ??? yinit = array([0.0005,0.2]) # initial values ??? Y=odeint(dy,yinit,time) ??? return Y[:,0] z = deriv(time, 2, 0.1) zn = z + 0.1*random.normal(size=len(time)) popt, pcov = curve_fit(deriv, time, zn) print popt # it only outputs the initial values of a, b! -------------- next part -------------- An HTML attachment was scrubbed... URL: From paul.anton.letnes at gmail.com Tue Feb 7 09:38:01 2012 From: paul.anton.letnes at gmail.com (Paul Anton Letnes) Date: Tue, 7 Feb 2012 15:38:01 +0100 Subject: [SciPy-User] [scipy-user] How to use genfromtext() with np.array? In-Reply-To: References: Message-ID: <82CA1654-A13C-44B7-8184-6599E8B91DE6@gmail.com> The easiest way is probably savetxt/loadtxt: In [1]: d = np.linspace(0,1,10) In [2]: np.savetxt('foo', d) In [3]: d2 = np.loadtxt('foo') In [4]: d-d2 Out[4]: array([ 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.]) It should work equally well with 2D arrays. Paul On 7. feb. 2012, at 11:43, Fabien Lafont wrote: > I've saved a np.array in a file using write(). Ihave then a file with > my np.array over 8 columns and I can't load it using genfromtext to > load at the same time the entire array. It seems genfromtext doesn't > "see" the array as a real array but as 8 different columns. Is it > possible to load the array easily with genfromtext, or save my array > in a different way. It works with a for loop over each indices of the > array with + "\n" but it's not very convenient. > > Thx, > > Fabien > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user From lafont.fabien at gmail.com Tue Feb 7 09:39:41 2012 From: lafont.fabien at gmail.com (Fabien Lafont) Date: Tue, 7 Feb 2012 15:39:41 +0100 Subject: [SciPy-User] [scipy-user] How to use genfromtext() with np.array? In-Reply-To: <82CA1654-A13C-44B7-8184-6599E8B91DE6@gmail.com> References: <82CA1654-A13C-44B7-8184-6599E8B91DE6@gmail.com> Message-ID: Thank you very much, it works perfectly! 2012/2/7 Paul Anton Letnes : > The easiest way is probably savetxt/loadtxt: > In [1]: d = np.linspace(0,1,10) > > In [2]: np.savetxt('foo', d) > > In [3]: d2 = np.loadtxt('foo') > > In [4]: d-d2 > Out[4]: array([ 0., ?0., ?0., ?0., ?0., ?0., ?0., ?0., ?0., ?0.]) > > It should work equally well with 2D arrays. > > Paul > > On 7. feb. 2012, at 11:43, Fabien Lafont wrote: > >> I've saved a np.array in a file using write(). Ihave then a file with >> my np.array over 8 columns and I can't load it using genfromtext to >> load at the same time the entire array. It seems genfromtext doesn't >> "see" the array as a real array but as 8 different columns. Is it >> possible to load the array easily with genfromtext, or save my array >> in a different way. It works with a for loop over each indices of the >> array with + "\n" but it's not very convenient. >> >> Thx, >> >> Fabien >> _______________________________________________ >> SciPy-User mailing list >> SciPy-User at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-user > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user From pav at iki.fi Tue Feb 7 14:04:00 2012 From: pav at iki.fi (Pauli Virtanen) Date: Tue, 07 Feb 2012 20:04:00 +0100 Subject: [SciPy-User] Fitting Differential Equations to a Curve In-Reply-To: <1328611414.88237.YahooMailNeo@web137608.mail.in.yahoo.com> References: <1328611414.88237.YahooMailNeo@web137608.mail.in.yahoo.com> Message-ID: 07.02.2012 11:43, Pappu Kumar kirjoitti: > I am trying to fit the differential equation ay' + by''=0 to a curve by > varying a and b The following code does not work: > > from scipy.integrate import odeint > from scipy.optimize import curve_fit > from numpy import linspace, random, array > > time = linspace(0.0,10.0,100) > def deriv(time,a,b): > dy=lambda y,t : array([ y[1], a*y[0]+b*y[1] ]) > yinit = array([0.0005,0.2]) # initial values > Y=odeint(dy,yinit,time) > return Y[:,0] > > z = deriv(time, 2, 0.1) > zn = z + 0.1*random.normal(size=len(time)) > > popt, pcov = curve_fit(deriv, time, zn) > print popt # it only outputs the initial values of a, b! For me, this prints [ 1.999963 0.10002353] So seems to be working as expected, as the function to fit was made with parameters [2, 0.1] plus some added noise. -- Pauli Virtanen From wesmckinn at gmail.com Tue Feb 7 17:38:43 2012 From: wesmckinn at gmail.com (Wes McKinney) Date: Tue, 7 Feb 2012 17:38:43 -0500 Subject: [SciPy-User] Accumulation sum using indirect indexes In-Reply-To: References:

Message-ID: On Sun, Feb 5, 2012 at 2:17 AM, Alexander Kalinin wrote: > Yes, the numpy.take() is much faster than "fancy" indexing and now "pure > numpy" solution is two time faster than pandas. Below are timing results: > > > The data shape: > ? ?? (1062, 6348) > > Pandas solution: > ??? 0.16610 seconds > > "Pure numpy" solution: > ??? 0.08907 seconds > > Timing of the "pure numpy" by blocks: > block (a) (sorting and obtaining groups): > ??? 0.00134 seconds > block (b) (copy data to the ordered_data): > ??? 0.05517 seconds > block (c) (reduceat): > ??? 0.02698 > > Alexander. > > > On Sun, Feb 5, 2012 at 4:01 AM, wrote: >> >> On Sat, Feb 4, 2012 at 2:27 PM, Wes McKinney wrote: >> > On Sat, Feb 4, 2012 at 2:23 PM, Alexander Kalinin >> > wrote: >> >> I have checked the performance of the "pure numpy" solution with pandas >> >> solution on my task. The "pure numpy" solution is about two times >> >> slower. >> >> >> >> The data shape: >> >> ??? (1062, 6348) >> >> Pandas "group by sum" time: >> >> ??? 0.16588 seconds >> >> Pure numpy "group by sum" time: >> >> ??? 0.38979 seconds >> >> >> >> But it is interesting, that the main bottleneck in numpy solution is >> >> the >> >> data copying. I have divided solution on three blocks: >> >> >> >> # block (a): >> >> ? ? s = np.argsort(labels) >> >> >> >> keys, inv = np.unique(labels, return_inverse = True) >> >> >> >> i = inv[s] >> >> >> >> groups_at = np.where(i != np.concatenate(([-1], i[:-1])))[0] >> >> >> >> >> >> # block (b): >> >> ??? ordered_data = data[:, s] >> >> can you try with numpy.take? Keith and Wes were showing that take is >> much faster than advanced indexing. >> >> Josef >> > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > FWIW I did some refactoring in pandas today and am getting the following timings now in this use case: In [12]: df = DataFrame(randn(1062, 6348)) In [13]: labels = np.random.randint(0, 100, size=1062) In [14]: timeit df.groupby(labels).sum() 10 loops, best of 3: 38.7 ms per loop comparing with def numpy_groupby(data, labels, axis=0): s = np.argsort(labels) keys, inv = np.unique(labels, return_inverse = True) i = inv.take(s) groups_at = np.where(i != np.concatenate(([-1], i[:-1])))[0] ordered_data = data.take(s, axis=axis) group_sums = np.add.reduceat(ordered_data, groups_at, axis=axis) return group_sums In [15]: timeit numpy_groupby(df.values, labels) 10 loops, best of 3: 95.4 ms per loop according to line_profiler, the runtime is being consumed by the reduceat now In [20]: lprun -f numpy_groupby numpy_groupby(df.values, labels) Timer unit: 1e-06 s File: pandas/core/groupby.py Function: numpy_groupby at line 1511 Total time: 0.108126 s Line # Hits Time Per Hit % Time Line Contents ============================================================== 1511 def numpy_groupby(data, labels): 1512 1 125 125.0 0.1 s = np.argsort(labels) 1513 1 720 720.0 0.7 keys, inv = np.unique(labels, return_inverse = True) 1514 1 13 13.0 0.0 i = inv.take(s) 1515 1 62 62.0 0.1 groups_at = np.where(i != np.concatenate(([-1], i[:-1])))[0] 1516 1 28684 28684.0 26.5 ordered_data = data.take(s, axis=0) 1517 1 78519 78519.0 72.6 group_sums = np.add.reduceat(ordered_data, groups_at, axis=0) 1518 1519 1 3 3.0 0.0 return group_sums The performance of the pure numpy solution will degrade both with the length of the labels vector and the number of unique elements (because there are two O(N log N) steps there). In this case it matters less because there are so many rows / columns to aggregate The performance of the pure NumPy solution is unsurprisingly much better when the aggregation is across contiguous memory vs. strided memory access: In [41]: labels = np.random.randint(0, 100, size=6348) In [42]: timeit numpy_groupby(df.values, labels, axis=1) 10 loops, best of 3: 47.4 ms per loop pandas is slower in this case because it's not giving any consideration to cache locality: In [50]: timeit df.groupby(labels, axis=1).sum() 10 loops, best of 3: 79.9 ms per loop One can only complain so much about 30 lines of Cython code ;) Good enough for the time being - Wes From wesmckinn at gmail.com Tue Feb 7 18:15:04 2012 From: wesmckinn at gmail.com (Wes McKinney) Date: Tue, 7 Feb 2012 18:15:04 -0500 Subject: [SciPy-User] Accumulation sum using indirect indexes In-Reply-To: References:

Message-ID: On Tue, Feb 7, 2012 at 5:38 PM, Wes McKinney wrote: > On Sun, Feb 5, 2012 at 2:17 AM, Alexander Kalinin > wrote: >> Yes, the numpy.take() is much faster than "fancy" indexing and now "pure >> numpy" solution is two time faster than pandas. Below are timing results: >> >> >> The data shape: >> ? ?? (1062, 6348) >> >> Pandas solution: >> ??? 0.16610 seconds >> >> "Pure numpy" solution: >> ??? 0.08907 seconds >> >> Timing of the "pure numpy" by blocks: >> block (a) (sorting and obtaining groups): >> ??? 0.00134 seconds >> block (b) (copy data to the ordered_data): >> ??? 0.05517 seconds >> block (c) (reduceat): >> ??? 0.02698 >> >> Alexander. >> >> >> On Sun, Feb 5, 2012 at 4:01 AM, wrote: >>> >>> On Sat, Feb 4, 2012 at 2:27 PM, Wes McKinney wrote: >>> > On Sat, Feb 4, 2012 at 2:23 PM, Alexander Kalinin >>> > wrote: >>> >> I have checked the performance of the "pure numpy" solution with pandas >>> >> solution on my task. The "pure numpy" solution is about two times >>> >> slower. >>> >> >>> >> The data shape: >>> >> ??? (1062, 6348) >>> >> Pandas "group by sum" time: >>> >> ??? 0.16588 seconds >>> >> Pure numpy "group by sum" time: >>> >> ??? 0.38979 seconds >>> >> >>> >> But it is interesting, that the main bottleneck in numpy solution is >>> >> the >>> >> data copying. I have divided solution on three blocks: >>> >> >>> >> # block (a): >>> >> ? ? s = np.argsort(labels) >>> >> >>> >> keys, inv = np.unique(labels, return_inverse = True) >>> >> >>> >> i = inv[s] >>> >> >>> >> groups_at = np.where(i != np.concatenate(([-1], i[:-1])))[0] >>> >> >>> >> >>> >> # block (b): >>> >> ??? ordered_data = data[:, s] >>> >>> can you try with numpy.take? Keith and Wes were showing that take is >>> much faster than advanced indexing. >>> >>> Josef >>> >> >> _______________________________________________ >> SciPy-User mailing list >> SciPy-User at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-user >> > > FWIW I did some refactoring in pandas today and am getting the > following timings now in this use case: > > In [12]: df = DataFrame(randn(1062, 6348)) > > In [13]: labels = np.random.randint(0, 100, size=1062) > > In [14]: timeit df.groupby(labels).sum() > 10 loops, best of 3: 38.7 ms per loop > > comparing with > > def numpy_groupby(data, labels, axis=0): > ? ?s = np.argsort(labels) > ? ?keys, inv = np.unique(labels, return_inverse = True) > ? ?i = inv.take(s) > ? ?groups_at = np.where(i != np.concatenate(([-1], i[:-1])))[0] > ? ?ordered_data = data.take(s, axis=axis) > ? ?group_sums = np.add.reduceat(ordered_data, groups_at, axis=axis) > > ? ?return group_sums > > In [15]: timeit numpy_groupby(df.values, labels) > 10 loops, best of 3: 95.4 ms per loop > > according to line_profiler, the runtime is being consumed by the reduceat now > > In [20]: lprun -f numpy_groupby numpy_groupby(df.values, labels) > Timer unit: 1e-06 s > > File: pandas/core/groupby.py > Function: numpy_groupby at line 1511 > Total time: 0.108126 s > > Line # ? ? ?Hits ? ? ? ? Time ?Per Hit ? % Time ?Line Contents > ============================================================== > ?1511 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? def > numpy_groupby(data, labels): > ?1512 ? ? ? ? 1 ? ? ? ? ?125 ? ?125.0 ? ? ?0.1 ? ? ?s = np.argsort(labels) > ?1513 ? ? ? ? 1 ? ? ? ? ?720 ? ?720.0 ? ? ?0.7 ? ? ?keys, inv = > np.unique(labels, return_inverse = True) > ?1514 ? ? ? ? 1 ? ? ? ? ? 13 ? ? 13.0 ? ? ?0.0 ? ? ?i = inv.take(s) > ?1515 ? ? ? ? 1 ? ? ? ? ? 62 ? ? 62.0 ? ? ?0.1 ? ? ?groups_at = > np.where(i != np.concatenate(([-1], i[:-1])))[0] > ?1516 ? ? ? ? 1 ? ? ? ?28684 ?28684.0 ? ? 26.5 ? ? ?ordered_data = > data.take(s, axis=0) > ?1517 ? ? ? ? 1 ? ? ? ?78519 ?78519.0 ? ? 72.6 ? ? ?group_sums = > np.add.reduceat(ordered_data, groups_at, axis=0) > ?1518 > ?1519 ? ? ? ? 1 ? ? ? ? ? ?3 ? ? ?3.0 ? ? ?0.0 ? ? ?return group_sums > > The performance of the pure numpy solution will degrade both with the > length of the labels vector and the number of unique elements (because > there are two O(N log N) steps there). In this case it matters less > because there are so many rows / columns to aggregate > > The performance of the pure NumPy solution is unsurprisingly much > better when the aggregation is across contiguous memory vs. strided > memory access: > > > In [41]: labels = np.random.randint(0, 100, size=6348) > > In [42]: timeit numpy_groupby(df.values, labels, axis=1) > 10 loops, best of 3: 47.4 ms per loop > > pandas is slower in this case because it's not giving any > consideration to cache locality: > > In [50]: timeit df.groupby(labels, axis=1).sum() > 10 loops, best of 3: 79.9 ms per loop > > One can only complain so much about 30 lines of Cython code ;) Good > enough for the time being > > - Wes More on this for those interested. These methods start really becoming different when you aggregate very large 1D arrays. Consider a million float64s each with a label chosen from 1000 unique labels. You can see where we start running into problems: In [9]: data = np.random.randn(1000000, 1) In [10]: labels = np.random.randint(0, 1000, size=1000000) In [11]: lprun -f gp.numpy_groupby gp.numpy_groupby(data, labels) Timer unit: 1e-06 s File: pandas/core/groupby.py Function: numpy_groupby at line 1512 Total time: 0.413775 s Line # Hits Time Per Hit % Time Line Contents ============================================================== 1512 def numpy_groupby(data, labels, axis=0): 1513 1 98867 98867.0 23.9 s = np.argsort(labels) 1514 1 286792 286792.0 69.3 keys, inv = np.unique(labels, return_inverse = True) 1515 1 9617 9617.0 2.3 i = inv.take(s) 1516 1 8059 8059.0 1.9 groups_at = np.where(i != np.concatenate(([-1], i[:-1])))[0] 1517 1 9365 9365.0 2.3 ordered_data = data.take(s, axis=axis) 1518 1 1073 1073.0 0.3 group_sums = np.add.reduceat(ordered_data, groups_at, axis=axis) 1519 1520 1 2 2.0 0.0 return group_sums In [12]: timeit gp.numpy_groupby(data, labels) 1 loops, best of 3: 410 ms per loop whereas the hash-table based approach (a la pandas) looks like: In [13]: df = DataFrame(data) In [14]: timeit df.groupby(labels).sum() 10 loops, best of 3: 71.5 ms per loop with In [17]: %prun -s cumulative -l 15 for _ in xrange(10): df.groupby(labels).sum() 3002 function calls in 0.771 seconds Ordered by: cumulative time List reduced from 109 to 15 due to restriction <15> ncalls tottime percall cumtime percall filename:lineno(function) 1 0.025 0.025 0.771 0.771