From ralf.gommers at googlemail.com Mon Nov 1 06:39:40 2010 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Mon, 1 Nov 2010 18:39:40 +0800 Subject: [Numpy-discussion] Including fix for #1637 in 1.5.1 ? In-Reply-To: References: Message-ID: On Sun, Oct 31, 2010 at 2:46 PM, David Cournapeau wrote: > Hi, > > I just committed a quick fix for > http://projects.scipy.org/numpy/ticket/1637. I did want to include it > for 1.5.x branch as it is already in RC stage, but it may still be > useful to do so if the release managers think it is appropriate (there > is a test for it): > http://github.com/numpy/numpy/commit/cdcbaa4ce4b47a7bfd8905222124fd22460252a5 > I would prefer not to include it at this stage - this doesn't seem an important enough bug to me to throw in compiled code at the last moment. Backporting after 1.5.1 is released may make sense though, in case there's a 1.5.2. Cheers, Ralf From cournape at gmail.com Mon Nov 1 08:21:28 2010 From: cournape at gmail.com (David Cournapeau) Date: Mon, 1 Nov 2010 21:21:28 +0900 Subject: [Numpy-discussion] Including fix for #1637 in 1.5.1 ? In-Reply-To: References: Message-ID: On Mon, Nov 1, 2010 at 7:39 PM, Ralf Gommers wrote: > On Sun, Oct 31, 2010 at 2:46 PM, David Cournapeau wrote: >> Hi, >> >> I just committed a quick fix for >> http://projects.scipy.org/numpy/ticket/1637. I did want to include it >> for 1.5.x branch as it is already in RC stage, but it may still be >> useful to do so if the release managers think it is appropriate (there >> is a test for it): >> http://github.com/numpy/numpy/commit/cdcbaa4ce4b47a7bfd8905222124fd22460252a5 >> > > I would prefer not to include it at this stage - this doesn't seem an > important enough bug to me to throw in compiled code at the last > moment. Ok, no problem. I will backport it to 1.5.x, if only to help during the 1.5 -> 2.0 transition. cheers, David From ralf.gommers at googlemail.com Mon Nov 1 12:11:13 2010 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Tue, 2 Nov 2010 00:11:13 +0800 Subject: [Numpy-discussion] ANN: NumPy 1.5.1 release candidate 1 In-Reply-To: References: Message-ID: Hi Friedrich, On Wed, Oct 27, 2010 at 10:30 PM, Ralf Gommers wrote: > On Wed, Oct 27, 2010 at 1:23 AM, Friedrich Romstedt > wrote: >> I found some issues on Mac OS X 10.5 ppc in py2.5.4: Can you please check if this takes care of all test failures you reported: http://github.com/rgommers/numpy/commit/2ac0be7171f. If not can you please adapt the patch a bit to make it work (should be straightforward)? Thanks, Ralf From juanjo.gomeznavarro at gmail.com Mon Nov 1 13:35:39 2010 From: juanjo.gomeznavarro at gmail.com (Juanjo Gomez Navarro) Date: Mon, 1 Nov 2010 18:35:39 +0100 Subject: [Numpy-discussion] Path to numpy installation Message-ID: Hi, I have just updated my old version of numpy 1.01 to the version 1.5 *in my Mac*. The problem is that the system does not seem to recognize the new version. When I type print numpy.__path__ I get /System/Library/Frameworks/Python.framework/Versions/2.5/Extras/lib/python/numpy But the new version I have just installed is in /Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site-packages/numpy I don't know how to change the path of the installation, or how to say python to search numpy in the new path Any idea??? 2010/10/18 Ralf Gommers > On Mon, Oct 18, 2010 at 9:55 PM, Vincent Davis > wrote: > > On Sun, Oct 17, 2010 at 5:35 AM, Ralf Gommers > > wrote: > >> Hi, > >> > >> I am pleased to announce the availability of the first release > >> candidate of NumPy 1.5.1. This is a bug-fix release with no new > >> features compared to 1.5.0. > >> > >> Binaries, sources and release notes can be found at > >> https://sourceforge.net/projects/numpy/files/. > >> A note on the available binaries for OS X: these are known to not work > >> on Intel-based OS X 10.5. We hope to have that problem fixed within a > >> week. > >> > >> On Windows there are still two known test failures: > >> - #1610, in fromfile (under Python 2.x) > >> - #1633, a failing test for ldexp (under Python 2.5 only) > >> Please report any other issues on the Numpy-discussion mailing list. > > > > Test pass for me. > > osx 10.6 py27 > > > > OK (KNOWNFAIL=4, SKIP=2) > > > > > Glad it works for you. But I just reopened the OS X gfortran issue, > http://projects.scipy.org/numpy/ticket/1399. > > RC2 in one week. Two other issues to be fixed by then: > http://projects.scipy.org/numpy/ticket/1610 (Pauli has suggested a fix > already) > http://projects.scipy.org/numpy/ticket/1633 > > Cheers, > Ralf > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -- Juan Jos? G?mez Navarro Departamento de F?sica Centro de Investigacion en ?ptica y Nanof?sica (CIOyN) Universidad de Murcia Campus Espinardo E-30100 Murcia Espa?a Tel : +34 968 398552 Fax : +34 968 39 8568 Email: juanjo.gomeznavarro at gmail.com, jjgomeznavarro at um.es -------------- next part -------------- An HTML attachment was scrubbed... URL: From robert.kern at gmail.com Mon Nov 1 13:45:43 2010 From: robert.kern at gmail.com (Robert Kern) Date: Mon, 1 Nov 2010 12:45:43 -0500 Subject: [Numpy-discussion] Path to numpy installation In-Reply-To: References: Message-ID: On Mon, Nov 1, 2010 at 12:35, Juanjo Gomez Navarro wrote: > Hi, I have just updated my old version of numpy 1.01 to the version 1.5 in > my Mac. The problem is that the system does not seem to recognize the new > version. > When I type print numpy.__path__ I get > /System/Library/Frameworks/Python.framework/Versions/2.5/Extras/lib/python/numpy > But?the?new?version?I?have?just?installed?is?in > /Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site-packages/numpy > I don't know > how?to?change?the?path?of?the?installation,?or?how?to?say?python?to?search?numpy?in?the?new?path > Any?idea??? When you run "python", apparently you are picking up the system's Python executable in /System/Library, not the other Python interpreter that you installed under /Library. You need to adjust your $PATH environment variable to put /Library/Frameworks/Python.framework/Versions/Current/bin before /usr/bin. Then when you type "python", you will get this installation of Python and all of its libraries. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." ? -- Umberto Eco From bevan07 at gmail.com Mon Nov 1 15:46:16 2010 From: bevan07 at gmail.com (bevan j) Date: Mon, 1 Nov 2010 12:46:16 -0700 (PDT) Subject: [Numpy-discussion] np.ma.masked_invalid and precision In-Reply-To: <061590EF-AF59-4B5F-A899-9E552D523474@gmail.com> References: <061590EF-AF59-4B5F-A899-9E552D523474@gmail.com> Message-ID: <30107931.post@talk.nabble.com> Pierre GM-2 wrote: > > Mmh, probably a bug, I'd say. > Mind opening a ticket ? > Thanks in advance > P. > > Done - Ticket #1657 -- View this message in context: http://old.nabble.com/np.ma.masked_invalid-and-precision-tp30100542p30107931.html Sent from the Numpy-discussion mailing list archive at Nabble.com. From groups.and.lists at gmail.com Mon Nov 1 19:30:33 2010 From: groups.and.lists at gmail.com (Joon) Date: Mon, 01 Nov 2010 18:30:33 -0500 Subject: [Numpy-discussion] Precision difference between dot and sum Message-ID: Hi, I just found that using dot instead of sum in numpy gives me better results in terms of precision loss. For example, I optimized a function with scipy.optimize.fmin_bfgs. For the return value for the function, I tried the following two things: sum(Xb) - sum(denominator) and dot(ones(Xb.shape), Xb) - dot(ones(denominator.shape), denominator) Both of them are supposed to yield the same thing. But the first one gave me -589112.30492110562 and the second one gave me -589112.30492110678. In addition, with the routine using sum, the optimizer gave me "Warning: Desired error not necessarily achieved due to precision loss." With the routine with dot, the optimizer gave me "Optimization terminated successfully." I checked the gradient value as well (I provided analytical gradient) and gradient was smaller in the dot case as well. (Of course, the the magnitude was e-5 to e-6, but still) I was wondering if this is well-known fact and I'm supposed to use dot instead of sum whenever possible. It would be great if someone could let me know why this happens. Thank you, Joon -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Mon Nov 1 21:21:05 2010 From: charlesr.harris at gmail.com (Charles R Harris) Date: Mon, 1 Nov 2010 19:21:05 -0600 Subject: [Numpy-discussion] Precision difference between dot and sum In-Reply-To: References: Message-ID: On Mon, Nov 1, 2010 at 5:30 PM, Joon wrote: > Hi, > > I just found that using dot instead of sum in numpy gives me better results > in terms of precision loss. For example, I optimized a function with > scipy.optimize.fmin_bfgs. For the return value for the function, I tried the > following two things: > > sum(Xb) - sum(denominator) > > and > > dot(ones(Xb.shape), Xb) - dot(ones(denominator.shape), denominator) > > Both of them are supposed to yield the same thing. But the first one gave > me -589112.30492110562 and the second one gave me -589112.30492110678. > > In addition, with the routine using sum, the optimizer gave me "Warning: > Desired error not necessarily achieved due to precision loss." With the > routine with dot, the optimizer gave me "Optimization terminated > successfully." > I checked the gradient value as well (I provided analytical gradient) and > gradient was smaller in the dot case as well. (Of course, the the magnitude > was e-5 to e-6, but still) > > I was wondering if this is well-known fact and I'm supposed to use dot > instead of sum whenever possible. > > It would be great if someone could let me know why this happens. > Are you running on 32 bits or 64 bits? I ask because there are different floating point precisions on the 32 bit platform and the results can depend on how the compiler does things. The relative difference between your results is ~1e-15, which isn't that far from the float64 precision of ~2e-16, so little things can make a difference. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From robert.kern at gmail.com Mon Nov 1 21:49:08 2010 From: robert.kern at gmail.com (Robert Kern) Date: Mon, 1 Nov 2010 20:49:08 -0500 Subject: [Numpy-discussion] Precision difference between dot and sum In-Reply-To: References: Message-ID: On Mon, Nov 1, 2010 at 20:21, Charles R Harris wrote: > > On Mon, Nov 1, 2010 at 5:30 PM, Joon wrote: >> >> Hi, >> >> I just found that using dot instead of sum in numpy gives me better >> results in terms of precision loss. For example, I optimized a function with >> scipy.optimize.fmin_bfgs. For the return value for the function, I tried the >> following two things: >> >> sum(Xb) - sum(denominator) >> >> and >> >> dot(ones(Xb.shape), Xb) - dot(ones(denominator.shape), denominator) >> >> Both of them are supposed to yield the same thing. But the first one gave >> me -589112.30492110562 and the second one gave me?-589112.30492110678. >> >> In addition, with the routine using sum, the optimizer gave me "Warning: >> Desired error not necessarily achieved due to precision loss." With the >> routine with dot, the optimizer gave me "Optimization terminated >> successfully." >> >> I checked the gradient value as well (I provided analytical gradient) and >> gradient was smaller in the dot case as well. (Of course, the the magnitude >> was e-5 to e-6, but still) >> >> I was wondering if this is well-known fact and I'm supposed to use dot >> instead of sum whenever possible. >> >> It would be great if someone could let me know why this happens. > > Are you running on 32 bits or 64 bits? I ask because there are different > floating point precisions on the 32 bit platform and the results can depend > on how the compiler does things. Eh, what? Are you talking about the sometimes-differing intermediate precisions? I wasn't aware that was constrained to 32-bit processors. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." ? -- Umberto Eco From david at silveregg.co.jp Mon Nov 1 22:27:07 2010 From: david at silveregg.co.jp (David) Date: Tue, 02 Nov 2010 11:27:07 +0900 Subject: [Numpy-discussion] Precision difference between dot and sum In-Reply-To: References: Message-ID: <4CCF76FB.80106@silveregg.co.jp> On 11/02/2010 08:30 AM, Joon wrote: > Hi, > > I just found that using dot instead of sum in numpy gives me better > results in terms of precision loss. For example, I optimized a function > with scipy.optimize.fmin_bfgs. For the return value for the function, I > tried the following two things: > > sum(Xb) - sum(denominator) > > and > > dot(ones(Xb.shape), Xb) - dot(ones(denominator.shape), denominator) > > Both of them are supposed to yield the same thing. But the first one > gave me -589112.30492110562 and the second one gave me -589112.30492110678. Those are basically the same number: the minimal spacing between two double floats at this amplitude is ~ 1e-10 (given by the function np.spacing(the_number)), which is essentially the order of magnitude of the difference between your two numbers. > I was wondering if this is well-known fact and I'm supposed to use dot > instead of sum whenever possible. You should use dot instead of sum when application, but for speed reasons, essentially. > > It would be great if someone could let me know why this happens. They don't use the same implementation, so such tiny differences are expected - having exactly the same solution would have been surprising, actually. You may be surprised about the difference for such a trivial operation, but keep in mind that dot is implemented with highly optimized CPU instructions (that is if you use ATLAS or similar library). cheers, David From groups.and.lists at gmail.com Mon Nov 1 22:39:16 2010 From: groups.and.lists at gmail.com (Joon) Date: Mon, 01 Nov 2010 21:39:16 -0500 Subject: [Numpy-discussion] Precision difference between dot and sum In-Reply-To: <4CCF76FB.80106@silveregg.co.jp> References: <4CCF76FB.80106@silveregg.co.jp> Message-ID: Thanks for the replies. I tried several stuff like changing dot into sum in the gradient calculations just to see how they change the results, but it seems that part of the code is the only place where the results get affected by the choice of dot/sum. I am using 64bit machine and EPD python (I think it uses Intel MKL) so that could have affected the calculation. I will use dot whenever possible from now on. :) -Joon On Mon, 01 Nov 2010 21:27:07 -0500, David wrote: > On 11/02/2010 08:30 AM, Joon wrote: >> Hi, >> >> I just found that using dot instead of sum in numpy gives me better >> results in terms of precision loss. For example, I optimized a function >> with scipy.optimize.fmin_bfgs. For the return value for the function, I >> tried the following two things: >> >> sum(Xb) - sum(denominator) >> >> and >> >> dot(ones(Xb.shape), Xb) - dot(ones(denominator.shape), denominator) >> >> Both of them are supposed to yield the same thing. But the first one >> gave me -589112.30492110562 and the second one gave me >> -589112.30492110678. > > Those are basically the same number: the minimal spacing between two > double floats at this amplitude is ~ 1e-10 (given by the function > np.spacing(the_number)), which is essentially the order of magnitude of > the difference between your two numbers. > >> I was wondering if this is well-known fact and I'm supposed to use dot >> instead of sum whenever possible. > > You should use dot instead of sum when application, but for speed > reasons, essentially. > >> >> It would be great if someone could let me know why this happens. > > They don't use the same implementation, so such tiny differences are > expected - having exactly the same solution would have been surprising, > actually. You may be surprised about the difference for such a trivial > operation, but keep in mind that dot is implemented with highly > optimized CPU instructions (that is if you use ATLAS or similar library). > > cheers, > > David > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion -- Using Opera's revolutionary email client: http://www.opera.com/mail/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Mon Nov 1 22:43:11 2010 From: charlesr.harris at gmail.com (Charles R Harris) Date: Mon, 1 Nov 2010 20:43:11 -0600 Subject: [Numpy-discussion] Precision difference between dot and sum In-Reply-To: References: Message-ID: On Mon, Nov 1, 2010 at 7:49 PM, Robert Kern wrote: > On Mon, Nov 1, 2010 at 20:21, Charles R Harris > wrote: > > > > On Mon, Nov 1, 2010 at 5:30 PM, Joon wrote: > >> > >> Hi, > >> > >> I just found that using dot instead of sum in numpy gives me better > >> results in terms of precision loss. For example, I optimized a function > with > >> scipy.optimize.fmin_bfgs. For the return value for the function, I tried > the > >> following two things: > >> > >> sum(Xb) - sum(denominator) > >> > >> and > >> > >> dot(ones(Xb.shape), Xb) - dot(ones(denominator.shape), denominator) > >> > >> Both of them are supposed to yield the same thing. But the first one > gave > >> me -589112.30492110562 and the second one gave me -589112.30492110678. > >> > >> In addition, with the routine using sum, the optimizer gave me "Warning: > >> Desired error not necessarily achieved due to precision loss." With the > >> routine with dot, the optimizer gave me "Optimization terminated > >> successfully." > >> > >> I checked the gradient value as well (I provided analytical gradient) > and > >> gradient was smaller in the dot case as well. (Of course, the the > magnitude > >> was e-5 to e-6, but still) > >> > >> I was wondering if this is well-known fact and I'm supposed to use dot > >> instead of sum whenever possible. > >> > >> It would be great if someone could let me know why this happens. > > > > Are you running on 32 bits or 64 bits? I ask because there are different > > floating point precisions on the 32 bit platform and the results can > depend > > on how the compiler does things. > > Eh, what? Are you talking about the sometimes-differing intermediate > precisions? I wasn't aware that was constrained to 32-bit processors. > > It seems to be more of a problem on 32 bits, what with the variety of sse*. OTOH, all 64 bit systems have at least sse2 available together with more sse registers and I believe the x87 instruction set is not available when running in 64 bit mode. I may be wrong about that, these nasty details are hard to verify, but that has also been my experience. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From cournape at gmail.com Tue Nov 2 00:29:00 2010 From: cournape at gmail.com (David Cournapeau) Date: Tue, 2 Nov 2010 13:29:00 +0900 Subject: [Numpy-discussion] Precision difference between dot and sum In-Reply-To: References: Message-ID: On Tue, Nov 2, 2010 at 11:43 AM, Charles R Harris wrote: > > It seems to be more of a problem on 32 bits, what with the variety of sse*. > OTOH, all 64 bit systems have at least sse2 available together with more sse > registers and I believe the x87 instruction set is not available when > running in 64 bit mode. It is available, but used more sporadically than on i386, since it is always available on amd64 architecture (at least up to SSE2). cheers, David From juanjo.gomeznavarro at gmail.com Tue Nov 2 04:31:50 2010 From: juanjo.gomeznavarro at gmail.com (Juanjo Gomez Navarro) Date: Tue, 2 Nov 2010 09:31:50 +0100 Subject: [Numpy-discussion] Path to numpy installation In-Reply-To: References: Message-ID: Ok, so in your opinion I have two independent python installations? That's possible... The problem is that I want to use ipython, and this interpreter seems to take the wrong version by default... Do you think it is safe just to delete the folder /System/Library/Frameworks/Python.framework to ?uninstall? the wrong version? 2010/11/1 Robert Kern > On Mon, Nov 1, 2010 at 12:35, Juanjo Gomez Navarro > wrote: > > Hi, I have just updated my old version of numpy 1.01 to the version 1.5 > in > > my Mac. The problem is that the system does not seem to recognize the new > > version. > > When I type print numpy.__path__ I get > > > /System/Library/Frameworks/Python.framework/Versions/2.5/Extras/lib/python/numpy > > But the new version I have just installed is in > > > /Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site-packages/numpy > > I don't know > > > how to change the path of the installation, or how to say python to search numpy in the new path > > Any idea??? > > When you run "python", apparently you are picking up the system's > Python executable in /System/Library, not the other Python interpreter > that you installed under /Library. You need to adjust your $PATH > environment variable to put > /Library/Frameworks/Python.framework/Versions/Current/bin before > /usr/bin. Then when you type "python", you will get this installation > of Python and all of its libraries. > > -- > Robert Kern > > "I have come to believe that the whole world is an enigma, a harmless > enigma that is made terrible by our own mad attempt to interpret it as > though it had an underlying truth." > -- Umberto Eco > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -- Juan Jos? G?mez Navarro Departamento de F?sica Centro de Investigacion en ?ptica y Nanof?sica (CIOyN) Universidad de Murcia Campus Espinardo E-30100 Murcia Espa?a Tel : +34 968 398552 Fax : +34 968 39 8568 Email: juanjo.gomeznavarro at gmail.com, jjgomeznavarro at um.es -------------- next part -------------- An HTML attachment was scrubbed... URL: From matthieu.brucher at gmail.com Tue Nov 2 05:05:45 2010 From: matthieu.brucher at gmail.com (Matthieu Brucher) Date: Tue, 2 Nov 2010 10:05:45 +0100 Subject: [Numpy-discussion] Precision difference between dot and sum In-Reply-To: <4CCF76FB.80106@silveregg.co.jp> References: <4CCF76FB.80106@silveregg.co.jp> Message-ID: >> It would be great if someone could let me know why this happens. > > They don't use the same implementation, so such tiny differences are > expected - having exactly the same solution would have been surprising, > actually. You may be surprised about the difference for such a trivial > operation, but keep in mind that dot is implemented with highly > optimized CPU instructions (that is if you use ATLAS or similar library). Also, IIRC, 1.0 cannot be represented exactly as a float, so the dot way may be more wrong than the sum way. Matthieu -- Information System Engineer, Ph.D. Blog: http://matt.eifelle.com LinkedIn: http://www.linkedin.com/in/matthieubrucher From nadavh at visionsense.com Tue Nov 2 05:08:48 2010 From: nadavh at visionsense.com (Nadav Horesh) Date: Tue, 2 Nov 2010 11:08:48 +0200 Subject: [Numpy-discussion] Precision difference between dot and sum References: <4CCF76FB.80106@silveregg.co.jp> Message-ID: "... Also, IIRC, 1.0 cannot be represented exactly as a float," Not true Nadav -----Original Message----- From: numpy-discussion-bounces at scipy.org on behalf of Matthieu Brucher Sent: Tue 02-Nov-10 11:05 To: Discussion of Numerical Python Subject: Re: [Numpy-discussion] Precision difference between dot and sum >> It would be great if someone could let me know why this happens. > > They don't use the same implementation, so such tiny differences are > expected - having exactly the same solution would have been surprising, > actually. You may be surprised about the difference for such a trivial > operation, but keep in mind that dot is implemented with highly > optimized CPU instructions (that is if you use ATLAS or similar library). Also, IIRC, 1.0 cannot be represented exactly as a float, so the dot way may be more wrong than the sum way. Matthieu -- Information System Engineer, Ph.D. Blog: http://matt.eifelle.com LinkedIn: http://www.linkedin.com/in/matthieubrucher _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion at scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion -------------- next part -------------- A non-text attachment was scrubbed... Name: winmail.dat Type: application/ms-tnef Size: 3356 bytes Desc: not available URL: From david at silveregg.co.jp Tue Nov 2 06:20:32 2010 From: david at silveregg.co.jp (David) Date: Tue, 02 Nov 2010 19:20:32 +0900 Subject: [Numpy-discussion] Path to numpy installation In-Reply-To: References: Message-ID: <4CCFE5F0.9020004@silveregg.co.jp> On 11/02/2010 05:31 PM, Juanjo Gomez Navarro wrote: > Ok, so in your opinion I have two independent python installations? > That's possible... The problem is that I want to use ipython, and this > interpreter seems to take the wrong version by default... > > Do you think it is safe just to delete the folder > /System/Library/Frameworks/Python.framework to ?uninstall? the wrong > version? Not really - /System is used for system stuff, as its name suggests, and removing it may break unrelated things. David From friedrichromstedt at gmail.com Tue Nov 2 07:10:25 2010 From: friedrichromstedt at gmail.com (Friedrich Romstedt) Date: Tue, 2 Nov 2010 12:10:25 +0100 Subject: [Numpy-discussion] ANN: NumPy 1.5.1 release candidate 1 In-Reply-To: References: Message-ID: Hi Ralf, 2010/11/1 Ralf Gommers : > On Wed, Oct 27, 2010 at 10:30 PM, Ralf Gommers > wrote: >> On Wed, Oct 27, 2010 at 1:23 AM, Friedrich Romstedt >> wrote: >>> I found some issues on Mac OS X 10.5 ppc in py2.5.4: > > Can you please check if this takes care of all test failures you > reported: http://github.com/rgommers/numpy/commit/2ac0be7171f. > > If not can you please adapt the patch a bit to make it work (should be > straightforward)? Your patch was fine, except for that you spelled 'powerpc' as 'ppc'. This applied also to numpy/core/tests/test_umath_complex.py, see the commits here: http://github.com/friedrichromstedt/numpy/commits/maintenance%2F1.5.1-ppc-knownfails I'm not sure if they should be marked as knownfailure instead of being skipped. Looks like legacy to me, as if knownfailureif didn't exist at the time the tests were written. I don't know if there are PowerPC platforms out there which return "ppc" from ``platform.processor()``. Could this apply also to other files? The naming of the branches is of course discussable. We can change this before pulling. Does it apply to 1.5.1 only or also to master? Friedrich From ralf.gommers at googlemail.com Tue Nov 2 08:36:07 2010 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Tue, 2 Nov 2010 20:36:07 +0800 Subject: [Numpy-discussion] ANN: NumPy 1.5.1 release candidate 1 In-Reply-To: References: Message-ID: On Tue, Nov 2, 2010 at 7:10 PM, Friedrich Romstedt wrote: > Hi Ralf, > > 2010/11/1 Ralf Gommers : >> On Wed, Oct 27, 2010 at 10:30 PM, Ralf Gommers >> wrote: >>> On Wed, Oct 27, 2010 at 1:23 AM, Friedrich Romstedt >>> wrote: >>>> I found some issues on Mac OS X 10.5 ppc in py2.5.4: >> >> Can you please check if this takes care of all test failures you >> reported: http://github.com/rgommers/numpy/commit/2ac0be7171f. >> >> If not can you please adapt the patch a bit to make it work (should be >> straightforward)? > > Your patch was fine, except for that you spelled 'powerpc' as 'ppc'. > This applied > also to numpy/core/tests/test_umath_complex.py, see the commits here: Thanks, good to know. I just guessed the 'ppc' part, since the stdlib docs tell me nothing and I couldn't test it. > > http://github.com/friedrichromstedt/numpy/commits/maintenance%2F1.5.1-ppc-knownfails > > I'm not sure if they should be marked as knownfailure instead of being > skipped. ?Looks like legacy to me, as if knownfailureif didn't exist > at the time the tests were written. No, that code is less than a year old. And using knowfail is the correct thing to do here. > > I don't know if there are PowerPC platforms out there which return > "ppc" from ``platform.processor()``. > > Could this apply also to other files? platform.processor is not used anywhere else, I guess because OS X bugs that don't occur on all machines are mostly 32 vs 64-bit, not i386 vs ppc. > > The naming of the branches is of course discussable. ?We can change > this before pulling. ?Does it apply to 1.5.1 only or also to master? Branch name doesn't matter, it should be a single fast-forward commit. Applies to master, will update fix and send a pull request for that. Cheers, Ralf From ralf.gommers at googlemail.com Tue Nov 2 09:37:14 2010 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Tue, 2 Nov 2010 21:37:14 +0800 Subject: [Numpy-discussion] please test/review: Scipy on OS X with Python 2.7 Message-ID: Hi, If you had an issue recently trying to compile scipy on OS X, can you please try to install numpy from http://github.com/rgommers/numpy/commits/farchs and then compile scipy? A quick review from a numpy.distutils expert would also be very welcome. Related (long) discussion at http://projects.scipy.org/numpy/ticket/1399. Thanks, Ralf From friedrichromstedt at gmail.com Tue Nov 2 10:01:33 2010 From: friedrichromstedt at gmail.com (Friedrich Romstedt) Date: Tue, 2 Nov 2010 15:01:33 +0100 Subject: [Numpy-discussion] Knownfails due to platform and numpy.c99 Message-ID: Hello, This is a NEPP (proposal for a numpy enhancement proposal). It's quite near to Depp, which means idiot in german, but I hope that's not the case. Some tests fail currently because of platform issues. I.e., the platform clib or similar supplies some partly broken implementation which we use. The current approach is to mark those tests on those platforms as K fail. On the other hand, there are platforms which *don't* supply the functions, and there a numpy own implementation chimes in. (Afaik.) 1. We want to test the numpy own implementations, and not the OS supplied ones. This can be seen by the fact that we regard a test not as failing (or, as K fail) if it is due to OS issues. Meaning we only really want to take care about those systems where *our* implementation is used. 2. I propose to export the numpy own implementations *always* under a distinct name, e.g. numpy.c99.exp(). 3. And to test those functions on all platforms, they never have to fail. 4. And to not test the OS supplied versions, since we don't care about them failing anyway. Implementation a) always define the C level functions, but under another name not conflicting with the OS defined ones. E.g.: numpy_cexp(). b) If the OS supplies a function, okay, use it. c) If the OS doesn't supply the function, then define it as a wrapper, which just calls numpy_*(). If a Python user needs corner case safety, he would be advised to use the numpy.c99.*, e.g. numpy.c99.exp(). For C level users, it would be to use numpy_*(), e.g. numpy_cexp() instead of cexp(). Both functions would always be supplied, but do not need to be identical. Using the C99 compliant version might cause speed tradeoff. :: // Include the system header: #include <...> // Define the C99 compliant version: void numpy_cexp(float r, float i, float *or, float *oi){ ...; } // Wrap the C99 compliant version in the usual-named function if necessary ... #ifndef HAVE_CEXP void cexp(...){ wrap(); } #endif ... export numpy_cexp() as ``numpy.c99.exp`` ... ... export cexp() as ``numpy.exp`` -- The following might not be feasible if we *have to* include the system headers in any way. Ralf suggested the very good addition, to support a compilation flag -DC99COMPLIANT or similar, in that case, numpy headers would not incude the system ones (assumed they do), and would always define the functions using the numpy_*() ones. :: // Do not include the system headers if requested so: #ifndef C99COMPLIANT #include <...> #endif // Define the C99 compliant version: void numpy_cexp(float r, float i, float *or, float *oi){ ...; } // Wrap the C99 compliant version in the usual-named function if necessary ... #ifdef C99COMPLIANT void cexp(...){ wrap(); } #else #ifndef HAVE_CEXP void cexp(...){ wrap(); } #endif #endif ... export numpy_cexp() as ``numpy.c99.exp`` ... ... export cexp() as ``numpy.exp`` -- Friedrich From bsouthey at gmail.com Tue Nov 2 10:05:47 2010 From: bsouthey at gmail.com (Bruce Southey) Date: Tue, 02 Nov 2010 09:05:47 -0500 Subject: [Numpy-discussion] Precision difference between dot and sum In-Reply-To: References: <4CCF76FB.80106@silveregg.co.jp> Message-ID: <4CD01ABB.7080405@gmail.com> On 11/01/2010 09:39 PM, Joon wrote: > Thanks for the replies. > > I tried several stuff like changing dot into sum in the gradient > calculations just to see how they change the results, but it seems > that part of the code is the only place where the results get affected > by the choice of dot/sum. > > I am using 64bit machine and EPD python (I think it uses Intel MKL) so > that could have affected the calculation. I will use dot whenever > possible from now on. :) > > -Joon > > > > On Mon, 01 Nov 2010 21:27:07 -0500, David wrote: > > > On 11/02/2010 08:30 AM, Joon wrote: > >> Hi, > >> > >> I just found that using dot instead of sum in numpy gives me better > >> results in terms of precision loss. For example, I optimized a function > >> with scipy.optimize.fmin_bfgs. For the return value for the function, I > >> tried the following two things: > >> > >> sum(Xb) - sum(denominator) > >> > >> and > >> > >> dot(ones(Xb.shape), Xb) - dot(ones(denominator.shape), denominator) > >> > >> Both of them are supposed to yield the same thing. But the first one > >> gave me -589112.30492110562 and the second one gave me > >> -589112.30492110678. > > > > Those are basically the same number: the minimal spacing between two > > double floats at this amplitude is ~ 1e-10 (given by the function > > np.spacing(the_number)), which is essentially the order of magnitude of > > the difference between your two numbers. > > > >> I was wondering if this is well-known fact and I'm supposed to use dot > >> instead of sum whenever possible. > > > > You should use dot instead of sum when application, but for speed > > reasons, essentially. > > > >> > >> It would be great if someone could let me know why this happens. > > > > They don't use the same implementation, so such tiny differences are > > expected - having exactly the same solution would have been surprising, > > actually. You may be surprised about the difference for such a trivial > > operation, but keep in mind that dot is implemented with highly > > optimized CPU instructions (that is if you use ATLAS or similar > library). > > > > cheers, > > > > David > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at scipy.org > > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > -- > Using Opera's revolutionary email client: http://www.opera.com/mail/ > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion Well, a 64-bit machine is somewhat irrelevant if you are running 32-bit application or the OS is limiting... The overall difference is the combination of numerical 'errors' of two summations and a subtraction. Obviously depending on the numerical accuracy of the algorithm, so, while the individual operations can be expected to be within numerical 'error', the overall result is not. You need to be comparing the two separate summations: sum(Xb) vs dot(ones(Xb.shape), Xb) and sum(denominator) vs dot(ones(denominator.shape), denominator). If these individual summations differ above numerical limits then you should look at numerically sound approaches. The simplest ways are to use the dtype argument in sum function or do your computations in a higher precision if your OS and Python versions allow it. Otherwise you should look for a numerical sound way to do the summation. Somewhat related is that the large value of the result suggest that you should be standardizing your original data especially if the sum(Xb) is large. This should help with convergence and numerical precision. A 'large' value for sum(Xb) can also indicate a poor fit so you should also check convergence, estimates and overall fit (especially outliers). Bruce -------------- next part -------------- An HTML attachment was scrubbed... URL: From robert.kern at gmail.com Tue Nov 2 10:39:07 2010 From: robert.kern at gmail.com (Robert Kern) Date: Tue, 2 Nov 2010 09:39:07 -0500 Subject: [Numpy-discussion] Path to numpy installation In-Reply-To: References: Message-ID: On Tue, Nov 2, 2010 at 03:31, Juanjo Gomez Navarro wrote: > ?Ok, so in your opinion I have two independent python installations? That's > possible... It's certain. > The problem is that I want to use ipython, and this interpreter > seems to take the wrong version by default... That just means that you used the system Python to install ipython. Put the /Library/Frameworks/.../bin at the front of your $PATH like I explained before and reinstall IPython and numpy and all of the other packages you want to use using that python executable. Double-check by executing $ which python /Library/Frameworks/Python.framework/Versions/Current/bin/python > Do you think it is safe just to delete the folder > /System/Library/Frameworks/Python.framework to ?uninstall? the wrong > version? No! Do not touch that! It is used by OS X. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." ? -- Umberto Eco From vincent at vincentdavis.net Tue Nov 2 21:39:10 2010 From: vincent at vincentdavis.net (Vincent Davis) Date: Tue, 2 Nov 2010 19:39:10 -0600 Subject: [Numpy-discussion] [SciPy-User] please test/review: Scipy on OS X with Python 2.7 In-Reply-To: References: Message-ID: On Tue, Nov 2, 2010 at 7:37 AM, Ralf Gommers wrote: > Hi, > > If you had an issue recently trying to compile scipy on OS X, can you > please try to install numpy from > http://github.com/rgommers/numpy/commits/farchs and then compile scipy? > numpy tests OK (KNOWNFAIL=4, SKIP=1) Scipy build (did not look into this yet and have to say I am not real familiar with the issue) python2.7 setup.py build error: Command "c++ -fno-strict-aliasing -fno-common -dynamic -isysroot /Developer/SDKs/MacOSX10.4u.sdk -arch ppc -arch i386 -g -O2 -DNDEBUG -g -O3 -Iscipy/interpolate/src -I/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/numpy/core/include -I/Library/Frameworks/Python.framework/Versions/2.7/include/python2.7 -c scipy/interpolate/src/_interpolate.cpp -o build/temp.macosx-10.3-fat-2.7/scipy/interpolate/src/_interpolate.o" failed with exit status 1 Trying with LDFLAGS="-arch x86_64" FFLAGS="-arch x86_64" py27 setupscons.py scons scons: Reading SConscript files ... Mkdir("build/scons/scipy/integrate") Checking if gfortran needs dummy main - Failed ! Exception: Could not find F77 BLAS, needed for integrate package: File "/Volumes/max/Downloads/scipy/scipy/integrate/SConstruct", line 2: GetInitEnvironment(ARGUMENTS).DistutilsSConscript('SConscript') File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/numscons/core/numpyenv.py", line 135: build_dir = '$build_dir', src_dir = '$src_dir') File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/numscons/scons-local/scons-local-1.2.0/SCons/Script/SConscript.py", line 553: return apply(_SConscript, [self.fs,] + files, subst_kw) File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/numscons/scons-local/scons-local-1.2.0/SCons/Script/SConscript.py", line 262: exec _file_ in call_stack[-1].globals File "/Volumes/max/Downloads/scipy/build/scons/scipy/integrate/SConscript", line 15: raise Exception("Could not find F77 BLAS, needed for integrate package") error: Error while executing scons command. See above for more information. If you think it is a problem in numscons, you can also try executing the scons command with --log-level option for more detailed output of what numscons is doing, for example --log-level=0; the lowest the level is, the more detailed the output it. > > A quick review from a numpy.distutils expert would also be very > welcome. Related (long) discussion at > http://projects.scipy.org/numpy/ticket/1399. > > Thanks, > Ralf > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > -- Thanks Vincent Davis 720-301-3003 -------------- next part -------------- An HTML attachment was scrubbed... URL: From Nikolaus at rath.org Tue Nov 2 22:02:17 2010 From: Nikolaus at rath.org (Nikolaus Rath) Date: Tue, 02 Nov 2010 22:02:17 -0400 Subject: [Numpy-discussion] List with numpy semantics In-Reply-To: (Gerrit Holl's message of "Sun, 31 Oct 2010 18:17:42 +0100") References: <878w1e738j.fsf@vostro.rath.org> Message-ID: <8762wfnp06.fsf@vostro.rath.org> Gerrit Holl writes: > On 31 October 2010 17:10, Nikolaus Rath wrote: >> Hello, >> >> I have a couple of numpy arrays which belong together. Unfortunately >> they have different dimensions, so I can't bundle them into a higher >> dimensional array. >> >> My solution was to put them into a Python list instead. But >> unfortunately this makes it impossible to use any ufuncs. >> >> Has someone else encountered a similar problem and found a nice >> solution? Something like a numpy list maybe? > > You could try a record array with a clever dtype, maybe? It seems that this requires more cleverness than I have... Could you give me an example? How do I replace l in the following code with a record array? l = list() l.append(np.arange(3)) l.append(np.arange(42)) l.append(np.arange(9)) for i in range(len(l)): l[i] += 32 Thanks, -Nikolaus -- ?Time flies like an arrow, fruit flies like a Banana.? PGP fingerprint: 5B93 61F8 4EA2 E279 ABF6 02CF A9AD B7F8 AE4E 425C From josef.pktd at gmail.com Tue Nov 2 22:21:37 2010 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Tue, 2 Nov 2010 22:21:37 -0400 Subject: [Numpy-discussion] List with numpy semantics In-Reply-To: <8762wfnp06.fsf@vostro.rath.org> References: <878w1e738j.fsf@vostro.rath.org> <8762wfnp06.fsf@vostro.rath.org> Message-ID: On Tue, Nov 2, 2010 at 10:02 PM, Nikolaus Rath wrote: > Gerrit Holl writes: >> On 31 October 2010 17:10, Nikolaus Rath wrote: >>> Hello, >>> >>> I have a couple of numpy arrays which belong together. Unfortunately >>> they have different dimensions, so I can't bundle them into a higher >>> dimensional array. >>> >>> My solution was to put them into a Python list instead. But >>> unfortunately this makes it impossible to use any ufuncs. >>> >>> Has someone else encountered a similar problem and found a nice >>> solution? Something like a numpy list maybe? >> >> You could try a record array with a clever dtype, maybe? > > It seems that this requires more cleverness than I have... Could you > give me an example? How do I replace l in the following code with a > record array? > > l = list() > l.append(np.arange(3)) > l.append(np.arange(42)) > l.append(np.arange(9)) > > for i in range(len(l)): > ? l[i] += 32 Depending on how you want to use it, it might be more convenient to use masked arrays or fill with nan (like pandas and larry) to get a rectangular array. it might be more convenient for some things, but if the sizes differ a lot then it might not be more efficient. Josef > > Thanks, > > ? -Nikolaus > > -- > ??Time flies like an arrow, fruit flies like a Banana.? > > ?PGP fingerprint: 5B93 61F8 4EA2 E279 ABF6 ?02CF A9AD B7F8 AE4E 425C > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From josef.pktd at gmail.com Tue Nov 2 22:31:54 2010 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Tue, 2 Nov 2010 22:31:54 -0400 Subject: [Numpy-discussion] List with numpy semantics In-Reply-To: References: <878w1e738j.fsf@vostro.rath.org> <8762wfnp06.fsf@vostro.rath.org> Message-ID: On Tue, Nov 2, 2010 at 10:21 PM, wrote: > On Tue, Nov 2, 2010 at 10:02 PM, Nikolaus Rath wrote: >> Gerrit Holl writes: >>> On 31 October 2010 17:10, Nikolaus Rath wrote: >>>> Hello, >>>> >>>> I have a couple of numpy arrays which belong together. Unfortunately >>>> they have different dimensions, so I can't bundle them into a higher >>>> dimensional array. >>>> >>>> My solution was to put them into a Python list instead. But >>>> unfortunately this makes it impossible to use any ufuncs. >>>> >>>> Has someone else encountered a similar problem and found a nice >>>> solution? Something like a numpy list maybe? >>> >>> You could try a record array with a clever dtype, maybe? >> >> It seems that this requires more cleverness than I have... Could you >> give me an example? How do I replace l in the following code with a >> record array? >> >> l = list() >> l.append(np.arange(3)) >> l.append(np.arange(42)) >> l.append(np.arange(9)) >> >> for i in range(len(l)): >> ? l[i] += 32 > > Depending on how you want to use it, it might be more convenient to > use masked arrays or fill with nan (like pandas and larry) to get a > rectangular array. it might be more convenient for some things, but if > the sizes differ a lot then it might not be more efficient. another option I sometimes use (e.g. for unbalanced panel data), is to just stack them on top of each other into one long 1d array, and keep track which is which, e.g. keeping the (start-end) indices or using an indicator array. For example, with an integer label array np.bincount is very fast to work with it. This is mainly an advantage if there are many short arrays and many operations have to applied to all of them. Josef > Josef > >> >> Thanks, >> >> ? -Nikolaus >> >> -- >> ??Time flies like an arrow, fruit flies like a Banana.? >> >> ?PGP fingerprint: 5B93 61F8 4EA2 E279 ABF6 ?02CF A9AD B7F8 AE4E 425C >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> > From josef.pktd at gmail.com Tue Nov 2 22:40:24 2010 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Tue, 2 Nov 2010 22:40:24 -0400 Subject: [Numpy-discussion] List with numpy semantics In-Reply-To: References: <878w1e738j.fsf@vostro.rath.org> <8762wfnp06.fsf@vostro.rath.org> Message-ID: On Tue, Nov 2, 2010 at 10:31 PM, wrote: > On Tue, Nov 2, 2010 at 10:21 PM, ? wrote: >> On Tue, Nov 2, 2010 at 10:02 PM, Nikolaus Rath wrote: >>> Gerrit Holl writes: >>>> On 31 October 2010 17:10, Nikolaus Rath wrote: >>>>> Hello, >>>>> >>>>> I have a couple of numpy arrays which belong together. Unfortunately >>>>> they have different dimensions, so I can't bundle them into a higher >>>>> dimensional array. >>>>> >>>>> My solution was to put them into a Python list instead. But >>>>> unfortunately this makes it impossible to use any ufuncs. >>>>> >>>>> Has someone else encountered a similar problem and found a nice >>>>> solution? Something like a numpy list maybe? >>>> >>>> You could try a record array with a clever dtype, maybe? >>> >>> It seems that this requires more cleverness than I have... Could you >>> give me an example? How do I replace l in the following code with a >>> record array? >>> >>> l = list() >>> l.append(np.arange(3)) >>> l.append(np.arange(42)) >>> l.append(np.arange(9)) >>> >>> for i in range(len(l)): >>> ? l[i] += 32 >> >> Depending on how you want to use it, it might be more convenient to >> use masked arrays or fill with nan (like pandas and larry) to get a >> rectangular array. it might be more convenient for some things, but if >> the sizes differ a lot then it might not be more efficient. > > another option I sometimes use (e.g. for unbalanced panel data), is to > just stack them on top of each other into one long 1d array, and keep > track which is which, e.g. keeping the (start-end) indices or using an > indicator array. For example, with an integer label array np.bincount > is very fast to work with it. > This is mainly an advantage if there are many short arrays and many > operations have to applied to all of them. And the third option, also often used with panel data, is to stack them sparse, like an unbalanced kronecker product, on top but next to each other and fill the empty space with zeros. Josef (typing is faster than thinking) > Josef > >> Josef >> >>> >>> Thanks, >>> >>> ? -Nikolaus >>> >>> -- >>> ??Time flies like an arrow, fruit flies like a Banana.? >>> >>> ?PGP fingerprint: 5B93 61F8 4EA2 E279 ABF6 ?02CF A9AD B7F8 AE4E 425C >>> _______________________________________________ >>> NumPy-Discussion mailing list >>> NumPy-Discussion at scipy.org >>> http://mail.scipy.org/mailman/listinfo/numpy-discussion >>> >> > From numpy at mspacek.mm.st Wed Nov 3 04:13:11 2010 From: numpy at mspacek.mm.st (Martin Spacek) Date: Wed, 03 Nov 2010 01:13:11 -0700 Subject: [Numpy-discussion] ~2**32 byte tofile()/fromfile() limit in 64-bit Windows? Message-ID: I just opened a new ticket (http://projects.scipy.org/numpy/ticket/1660), but I thought I'd bring it up here as well. I can't seem to get tofile() or save() to write anything much bigger than a 2**32 byte array to a file in Py 2.6.6 on 64-bit Windows. They both hang with no errors. Also, fromfile() throws an "IOError: could not seek in file" on > 2**32-1 byte files, although load() seems to work fine on any size of file. I'm a bit surprised I haven't stumbled across these problems before. I've tested as far back as 1.4.1. Is this a known issue? (the ticket has much more detail) Cheers, Martin From schut at sarvision.nl Wed Nov 3 04:54:36 2010 From: schut at sarvision.nl (Vincent Schut) Date: Wed, 03 Nov 2010 09:54:36 +0100 Subject: [Numpy-discussion] large float32 array issue Message-ID: Hi, I'm running in this strange issue when using some pretty large float32 arrays. In the following code I create a large array filled with ones, and calculate mean and sum, first with a float64 version, then with a float32 version. Note the difference between the two. NB the float64 version is obviously right :-) In [2]: areaGrid = numpy.ones((11334, 16002)) In [3]: print(areaGrid.dtype) float64 In [4]: print(areaGrid.shape, areaGrid.min(), areaGrid.max(), areaGrid.mean(), areaGrid.sum()) ((11334, 16002), 1.0, 1.0, 1.0, 181366668.0) In [5]: areaGrid = numpy.ones((11334, 16002), numpy.float32) In [6]: print(areaGrid.dtype) float32 In [7]: print(areaGrid.shape, areaGrid.min(), areaGrid.max(), areaGrid.mean(), areaGrid.sum()) ((11334, 16002), 1.0, 1.0, 0.092504406598019437, 16777216.0) Can anybody confirm this? And better: explain it? Am I running into a for me till now hidden ieee float 'feature'? Or is it a bug somewhere? Btw I'd like to use float32 arrays, as precision is not really an issue in this case, but memory usage is... This is using python 2.7, numpy from git (yesterday's checkout), on arch linux 64bit. Best, Vincent. From juanjo.gomeznavarro at gmail.com Wed Nov 3 05:35:57 2010 From: juanjo.gomeznavarro at gmail.com (Juanjo Gomez Navarro) Date: Wed, 3 Nov 2010 10:35:57 +0100 Subject: [Numpy-discussion] Path to numpy installation In-Reply-To: References: Message-ID: Ups, sorry, but I deleted everything just after sending the email xD. I deleted everything related to Python in the Macbook and reinstalled again from scratch (python 2.5, numpy, ipython and matplotlib). In this way I managed to make it work properly, and, at least for the moment, it seems that I have not broken anything... I guess the problem will come when I less expect. Anyway, thanks for the help. For the time being everything works as it should. 2010/11/2 Robert Kern > On Tue, Nov 2, 2010 at 03:31, Juanjo Gomez Navarro > wrote: > > Ok, so in your opinion I have two independent python installations? > That's > > possible... > > It's certain. > > > The problem is that I want to use ipython, and this interpreter > > seems to take the wrong version by default... > > That just means that you used the system Python to install ipython. > Put the /Library/Frameworks/.../bin at the front of your $PATH like I > explained before and reinstall IPython and numpy and all of the other > packages you want to use using that python executable. Double-check by > executing > > $ which python > /Library/Frameworks/Python.framework/Versions/Current/bin/python > > > Do you think it is safe just to delete the folder > > /System/Library/Frameworks/Python.framework to ?uninstall? the wrong > > version? > > No! Do not touch that! It is used by OS X. > > -- > Robert Kern > > "I have come to believe that the whole world is an enigma, a harmless > enigma that is made terrible by our own mad attempt to interpret it as > though it had an underlying truth." > -- Umberto Eco > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -- Juan Jos? G?mez Navarro Departamento de F?sica Centro de Investigacion en ?ptica y Nanof?sica (CIOyN) Universidad de Murcia Campus Espinardo E-30100 Murcia Espa?a Tel : +34 968 398552 Fax : +34 968 39 8568 Email: juanjo.gomeznavarro at gmail.com, jjgomeznavarro at um.es -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at googlemail.com Wed Nov 3 06:55:12 2010 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Wed, 3 Nov 2010 18:55:12 +0800 Subject: [Numpy-discussion] [SciPy-User] please test/review: Scipy on OS X with Python 2.7 In-Reply-To: References: Message-ID: On Wed, Nov 3, 2010 at 9:39 AM, Vincent Davis wrote: > > On Tue, Nov 2, 2010 at 7:37 AM, Ralf Gommers > wrote: >> >> Hi, >> >> If you had an issue recently trying to compile scipy on OS X, can you >> please try to install numpy from >> http://github.com/rgommers/numpy/commits/farchs and then compile scipy? > > numpy tests > OK (KNOWNFAIL=4, SKIP=1) > > Scipy build (did not look into this yet and have to say I am not real > familiar with the issue) > python2.7 setup.py build > > error: Command "c++ -fno-strict-aliasing -fno-common -dynamic -isysroot > /Developer/SDKs/MacOSX10.4u.sdk -arch ppc -arch i386 -g -O2 -DNDEBUG -g -O3 > -Iscipy/interpolate/src > -I/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/numpy/core/include > -I/Library/Frameworks/Python.framework/Versions/2.7/include/python2.7 -c > scipy/interpolate/src/_interpolate.cpp -o > build/temp.macosx-10.3-fat-2.7/scipy/interpolate/src/_interpolate.o" failed > with exit status 1 This is an unrelated issue to the patch to be tested (that's about Fortran code), I'm guessing you're on 10.6 here and c++ is version 4.2 (should be 4.0). Try "$ export CXX=/usr/bin/c++-4.0". If this doesn't work let's discuss offline. > Trying with > LDFLAGS="-arch x86_64" FFLAGS="-arch x86_64" py27 setupscons.py scons >From your arch flags above you have the 10.3 python.org binary of 2.7 active, which does not have x86_64. So this certainly can't work. Cheers, Ralf From warren.weckesser at enthought.com Wed Nov 3 06:59:08 2010 From: warren.weckesser at enthought.com (Warren Weckesser) Date: Wed, 3 Nov 2010 05:59:08 -0500 Subject: [Numpy-discussion] large float32 array issue In-Reply-To: References: Message-ID: On Wed, Nov 3, 2010 at 3:54 AM, Vincent Schut wrote: > Hi, I'm running in this strange issue when using some pretty large > float32 arrays. In the following code I create a large array filled with > ones, and calculate mean and sum, first with a float64 version, then > with a float32 version. Note the difference between the two. NB the > float64 version is obviously right :-) > > > > In [2]: areaGrid = numpy.ones((11334, 16002)) > In [3]: print(areaGrid.dtype) > float64 > In [4]: print(areaGrid.shape, areaGrid.min(), areaGrid.max(), > areaGrid.mean(), areaGrid.sum()) > ((11334, 16002), 1.0, 1.0, 1.0, 181366668.0) > > > In [5]: areaGrid = numpy.ones((11334, 16002), numpy.float32) > In [6]: print(areaGrid.dtype) > float32 > In [7]: print(areaGrid.shape, areaGrid.min(), areaGrid.max(), > areaGrid.mean(), areaGrid.sum()) > ((11334, 16002), 1.0, 1.0, 0.092504406598019437, 16777216.0) > > > Can anybody confirm this? And better: explain it? Am I running into a > for me till now hidden ieee float 'feature'? Or is it a bug somewhere? > > Btw I'd like to use float32 arrays, as precision is not really an issue > in this case, but memory usage is... > > > This is using python 2.7, numpy from git (yesterday's checkout), on arch > linux 64bit. > > The problem kicks in with an array of ones of size 2**24. Note that np.float32(2**24) + np.float32(1.0) equals np.float32(2**24): In [41]: b = np.ones(2**24, np.float32) In [42]: b.size, b.sum() Out[42]: (16777216, 16777216.0) In [43]: b = np.ones(2**24+1, np.float32) In [44]: b.size, b.sum() Out[44]: (16777217, 16777216.0) In [45]: np.spacing(np.float32(2**24)) Out[45]: 2.0 In [46]: np.float32(2**24) + np.float32(1) Out[46]: 16777216.0 Warren -------------- next part -------------- An HTML attachment was scrubbed... URL: From warren.weckesser at enthought.com Wed Nov 3 07:31:31 2010 From: warren.weckesser at enthought.com (Warren Weckesser) Date: Wed, 3 Nov 2010 06:31:31 -0500 Subject: [Numpy-discussion] large float32 array issue In-Reply-To: References: Message-ID: On Wed, Nov 3, 2010 at 5:59 AM, Warren Weckesser < warren.weckesser at enthought.com> wrote: > > > On Wed, Nov 3, 2010 at 3:54 AM, Vincent Schut wrote: > >> Hi, I'm running in this strange issue when using some pretty large >> float32 arrays. In the following code I create a large array filled with >> ones, and calculate mean and sum, first with a float64 version, then >> with a float32 version. Note the difference between the two. NB the >> float64 version is obviously right :-) >> >> >> >> In [2]: areaGrid = numpy.ones((11334, 16002)) >> In [3]: print(areaGrid.dtype) >> float64 >> In [4]: print(areaGrid.shape, areaGrid.min(), areaGrid.max(), >> areaGrid.mean(), areaGrid.sum()) >> ((11334, 16002), 1.0, 1.0, 1.0, 181366668.0) >> >> >> In [5]: areaGrid = numpy.ones((11334, 16002), numpy.float32) >> In [6]: print(areaGrid.dtype) >> float32 >> In [7]: print(areaGrid.shape, areaGrid.min(), areaGrid.max(), >> areaGrid.mean(), areaGrid.sum()) >> ((11334, 16002), 1.0, 1.0, 0.092504406598019437, 16777216.0) >> >> >> Can anybody confirm this? And better: explain it? Am I running into a >> for me till now hidden ieee float 'feature'? Or is it a bug somewhere? >> >> Btw I'd like to use float32 arrays, as precision is not really an issue >> in this case, but memory usage is... >> >> >> This is using python 2.7, numpy from git (yesterday's checkout), on arch >> linux 64bit. >> >> > > The problem kicks in with an array of ones of size 2**24. Note that > np.float32(2**24) + np.float32(1.0) equals np.float32(2**24): > > > In [41]: b = np.ones(2**24, np.float32) > > In [42]: b.size, b.sum() > Out[42]: (16777216, 16777216.0) > > In [43]: b = np.ones(2**24+1, np.float32) > > In [44]: b.size, b.sum() > Out[44]: (16777217, 16777216.0) > > In [45]: np.spacing(np.float32(2**24)) > Out[45]: 2.0 > > In [46]: np.float32(2**24) + np.float32(1) > Out[46]: 16777216.0 > > > By the way, you can override the dtype of the accumulator of the mean() function: In [61]: a = np.ones((11334,16002),np.float32) In [62]: a.mean() # Not correct Out[62]: 0.092504406598019437 In [63]: a.mean(dtype=np.float64) Out[63]: 1.0 Warren -------------- next part -------------- An HTML attachment was scrubbed... URL: From schut at sarvision.nl Wed Nov 3 07:39:08 2010 From: schut at sarvision.nl (Vincent Schut) Date: Wed, 03 Nov 2010 12:39:08 +0100 Subject: [Numpy-discussion] large float32 array issue In-Reply-To: References: Message-ID: On 11/03/2010 12:31 PM, Warren Weckesser wrote: > > > On Wed, Nov 3, 2010 at 5:59 AM, Warren Weckesser > > > wrote: > > > > On Wed, Nov 3, 2010 at 3:54 AM, Vincent Schut > wrote: > > Hi, I'm running in this strange issue when using some pretty large > float32 arrays. In the following code I create a large array > filled with > ones, and calculate mean and sum, first with a float64 version, then > with a float32 version. Note the difference between the two. NB the > float64 version is obviously right :-) > > > > In [2]: areaGrid = numpy.ones((11334, 16002)) > In [3]: print(areaGrid.dtype) > float64 > In [4]: print(areaGrid.shape, areaGrid.min(), areaGrid.max(), > areaGrid.mean(), areaGrid.sum()) > ((11334, 16002), 1.0, 1.0, 1.0, 181366668.0) > > > In [5]: areaGrid = numpy.ones((11334, 16002), numpy.float32) > In [6]: print(areaGrid.dtype) > float32 > In [7]: print(areaGrid.shape, areaGrid.min(), areaGrid.max(), > areaGrid.mean(), areaGrid.sum()) > ((11334, 16002), 1.0, 1.0, 0.092504406598019437, 16777216.0) > > > Can anybody confirm this? And better: explain it? Am I running > into a > for me till now hidden ieee float 'feature'? Or is it a bug > somewhere? > > Btw I'd like to use float32 arrays, as precision is not really > an issue > in this case, but memory usage is... > > > This is using python 2.7, numpy from git (yesterday's checkout), > on arch > linux 64bit. > > > > The problem kicks in with an array of ones of size 2**24. Note that > np.float32(2**24) + np.float32(1.0) equals np.float32(2**24): > > > In [41]: b = np.ones(2**24, np.float32) > > In [42]: b.size, b.sum() > Out[42]: (16777216, 16777216.0) > > In [43]: b = np.ones(2**24+1, np.float32) > > In [44]: b.size, b.sum() > Out[44]: (16777217, 16777216.0) > > In [45]: np.spacing(np.float32(2**24)) > Out[45]: 2.0 > > In [46]: np.float32(2**24) + np.float32(1) > Out[46]: 16777216.0 > > > > > By the way, you can override the dtype of the accumulator of the mean() > function: > > In [61]: a = np.ones((11334,16002),np.float32) > > In [62]: a.mean() # Not correct > Out[62]: 0.092504406598019437 > > In [63]: a.mean(dtype=np.float64) > Out[63]: 1.0 Thanks for this. That at least gives me a temporary solution (I actually need sum() instead of mean(), but the trick works for sum too). Btw, should I file a bug on this? Vincent. > > > Warren > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From pav at iki.fi Wed Nov 3 07:52:21 2010 From: pav at iki.fi (Pauli Virtanen) Date: Wed, 3 Nov 2010 11:52:21 +0000 (UTC) Subject: [Numpy-discussion] large float32 array issue References: Message-ID: Wed, 03 Nov 2010 12:39:08 +0100, Vincent Schut wrote: [clip] > Btw, should I file a bug on this? One can argue that mean() and sum() should use a numerically stabler algorithm, so yes, a bug can be filed if there is not yet one already. -- Pauli Virtanen From warren.weckesser at enthought.com Wed Nov 3 07:52:58 2010 From: warren.weckesser at enthought.com (Warren Weckesser) Date: Wed, 3 Nov 2010 06:52:58 -0500 Subject: [Numpy-discussion] large float32 array issue In-Reply-To: References: Message-ID: On Wed, Nov 3, 2010 at 6:39 AM, Vincent Schut wrote: > > > On 11/03/2010 12:31 PM, Warren Weckesser wrote: > > > > > > On Wed, Nov 3, 2010 at 5:59 AM, Warren Weckesser > > > > > wrote: > > > > > > > > On Wed, Nov 3, 2010 at 3:54 AM, Vincent Schut > > wrote: > > > > Hi, I'm running in this strange issue when using some pretty > large > > float32 arrays. In the following code I create a large array > > filled with > > ones, and calculate mean and sum, first with a float64 version, > then > > with a float32 version. Note the difference between the two. NB > the > > float64 version is obviously right :-) > > > > > > > > In [2]: areaGrid = numpy.ones((11334, 16002)) > > In [3]: print(areaGrid.dtype) > > float64 > > In [4]: print(areaGrid.shape, areaGrid.min(), areaGrid.max(), > > areaGrid.mean(), areaGrid.sum()) > > ((11334, 16002), 1.0, 1.0, 1.0, 181366668.0) > > > > > > In [5]: areaGrid = numpy.ones((11334, 16002), numpy.float32) > > In [6]: print(areaGrid.dtype) > > float32 > > In [7]: print(areaGrid.shape, areaGrid.min(), areaGrid.max(), > > areaGrid.mean(), areaGrid.sum()) > > ((11334, 16002), 1.0, 1.0, 0.092504406598019437, 16777216.0) > > > > > > Can anybody confirm this? And better: explain it? Am I running > > into a > > for me till now hidden ieee float 'feature'? Or is it a bug > > somewhere? > > > > Btw I'd like to use float32 arrays, as precision is not really > > an issue > > in this case, but memory usage is... > > > > > > This is using python 2.7, numpy from git (yesterday's checkout), > > on arch > > linux 64bit. > > > > > > > > The problem kicks in with an array of ones of size 2**24. Note that > > np.float32(2**24) + np.float32(1.0) equals np.float32(2**24): > > > > > > In [41]: b = np.ones(2**24, np.float32) > > > > In [42]: b.size, b.sum() > > Out[42]: (16777216, 16777216.0) > > > > In [43]: b = np.ones(2**24+1, np.float32) > > > > In [44]: b.size, b.sum() > > Out[44]: (16777217, 16777216.0) > > > > In [45]: np.spacing(np.float32(2**24)) > > Out[45]: 2.0 > > > > In [46]: np.float32(2**24) + np.float32(1) > > Out[46]: 16777216.0 > > > > > > > > > > By the way, you can override the dtype of the accumulator of the mean() > > function: > > > > In [61]: a = np.ones((11334,16002),np.float32) > > > > In [62]: a.mean() # Not correct > > Out[62]: 0.092504406598019437 > > > > In [63]: a.mean(dtype=np.float64) > > Out[63]: 1.0 > > Thanks for this. That at least gives me a temporary solution (I actually > need sum() instead of mean(), but the trick works for sum too). > sum() also has the dtype argument. Warren -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Wed Nov 3 09:36:15 2010 From: charlesr.harris at gmail.com (Charles R Harris) Date: Wed, 3 Nov 2010 07:36:15 -0600 Subject: [Numpy-discussion] large float32 array issue In-Reply-To: References: Message-ID: On Wed, Nov 3, 2010 at 5:52 AM, Pauli Virtanen wrote: > Wed, 03 Nov 2010 12:39:08 +0100, Vincent Schut wrote: > [clip] > > Btw, should I file a bug on this? > > One can argue that mean() and sum() should use a numerically stabler > algorithm, so yes, a bug can be filed if there is not yet one already. > > There is a ticket for the mean and variance methods, sum could be added to the list. I assigned the ticket to myself but don't see that I'll have much time until the end of Nov. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsouthey at gmail.com Wed Nov 3 10:04:18 2010 From: bsouthey at gmail.com (Bruce Southey) Date: Wed, 03 Nov 2010 09:04:18 -0500 Subject: [Numpy-discussion] large float32 array issue In-Reply-To: References: Message-ID: <4CD16BE2.4050408@gmail.com> On 11/03/2010 06:52 AM, Pauli Virtanen wrote: > Wed, 03 Nov 2010 12:39:08 +0100, Vincent Schut wrote: > [clip] >> Btw, should I file a bug on this? > One can argue that mean() and sum() should use a numerically stabler > algorithm, so yes, a bug can be filed if there is not yet one already. > This is a 'user bug' not a numpy bug because it is a well known numerical problem. I recall that we have had this type of discussion before that has resulted in these functions being left as they are. The numerical problem is mentioned better in the np.mean docstring than the np.sum docstring. My understanding was that any new algorithm has to be better than the current algorithm especially in speed and accuracy across 'typical' numpy problems across the different Python and OS versions not just for numerically challenged cases. For example, I would not want to sacrifice speed if I achieve the same accuracy without losing as much speed as just changing the dtype to float128 (as I use x86_64 Linux). Also in Warren's mean example, this is simply a 32-bit error because it disappears when using 64-bit (numpy's default) - well, until we reach the extreme 64-bit values. >>> np.ones((11334,16002)).mean() 1.0 >>> np.ones((11334,16002),np.float32).mean() 0.092504406598019437 >>> np.ones((11334,16002),np.float32).mean().dtype dtype('float64') Note that there is probably a bug in np.mean because a 64-bit dtype is returned for integers and 32-bit or lower precision floats. So upcast is not apparently being done on the accumulator. Bruce From vincent at vincentdavis.net Wed Nov 3 10:18:31 2010 From: vincent at vincentdavis.net (Vincent Davis) Date: Wed, 3 Nov 2010 08:18:31 -0600 Subject: [Numpy-discussion] [SciPy-User] please test/review: Scipy on OS X with Python 2.7 In-Reply-To: References: Message-ID: On Wed, Nov 3, 2010 at 4:55 AM, Ralf Gommers wrote: > On Wed, Nov 3, 2010 at 9:39 AM, Vincent Davis > wrote: > > > > On Tue, Nov 2, 2010 at 7:37 AM, Ralf Gommers < > ralf.gommers at googlemail.com> > > wrote: > >> > >> Hi, > >> > >> If you had an issue recently trying to compile scipy on OS X, can you > >> please try to install numpy from > >> http://github.com/rgommers/numpy/commits/farchs and then compile scipy? > > > > numpy tests > > OK (KNOWNFAIL=4, SKIP=1) > > > > Scipy build (did not look into this yet and have to say I am not real > > familiar with the issue) > > python2.7 setup.py build > > > > error: Command "c++ -fno-strict-aliasing -fno-common -dynamic -isysroot > > /Developer/SDKs/MacOSX10.4u.sdk -arch ppc -arch i386 -g -O2 -DNDEBUG -g > -O3 > > -Iscipy/interpolate/src > > > -I/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/numpy/core/include > > -I/Library/Frameworks/Python.framework/Versions/2.7/include/python2.7 -c > > scipy/interpolate/src/_interpolate.cpp -o > > build/temp.macosx-10.3-fat-2.7/scipy/interpolate/src/_interpolate.o" > failed > > with exit status 1 > > This is an unrelated issue to the patch to be tested (that's about > Fortran code), I'm guessing you're on 10.6 here and c++ is version 4.2 > (should be 4.0). Try "$ export CXX=/usr/bin/c++-4.0". If this doesn't > work let's discuss offline. > Ok I am a little more awake now and not in zombi mode. That is correct osx 10.6 c++4.2, Doing it again and maybe correctly with python-2.7-macosx10.5 and c++4.0. full scipy test results here https://gist.github.com/661125 summary ====================================================================== ERROR: Failure: ImportError (dlopen(/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scipy/fftpack/_fftpack.so, 2): no suitable image found. Did find: /Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scipy/fftpack/_fftpack.so: mach-o, but wrong architecture) ---------------------------------------------------------------------- Traceback (most recent call last): File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/nose/loader.py", line 382, in loadTestsFromName addr.filename, addr.module) File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/nose/importer.py", line 39, in importFromPath return self.importFromDir(dir_path, fqname) File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/nose/importer.py", line 86, in importFromDir mod = load_module(part_fqname, fh, filename, desc) File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scipy/fftpack/__init__.py", line 10, in from basic import * File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scipy/fftpack/basic.py", line 11, in import _fftpack ImportError: dlopen(/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scipy/fftpack/_fftpack.so, 2): no suitable image found. Did find: /Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scipy/fftpack/_fftpack.so: mach-o, but wrong architecture ====================================================================== ERROR: Failure: ImportError (dlopen(/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scipy/special/_cephes.so, 2): no suitable image found. Did find: /Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scipy/special/_cephes.so: mach-o, but wrong architecture) ---------------------------------------------------------------------- Traceback (most recent call last): File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/nose/loader.py", line 382, in loadTestsFromName addr.filename, addr.module) File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/nose/importer.py", line 39, in importFromPath return self.importFromDir(dir_path, fqname) File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/nose/importer.py", line 86, in importFromDir mod = load_module(part_fqname, fh, filename, desc) File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scipy/integrate/__init__.py", line 7, in from quadrature import * File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scipy/integrate/quadrature.py", line 5, in from scipy.special.orthogonal import p_roots File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scipy/special/__init__.py", line 8, in from basic import * File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scipy/special/basic.py", line 6, in from _cephes import * ImportError: dlopen(/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scipy/special/_cephes.so, 2): no suitable image found. Did find: /Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scipy/special/_cephes.so: mach-o, but wrong architecture ====================================================================== ERROR: Failure: ImportError (dlopen(/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scipy/special/_cephes.so, 2): no suitable image found. Did find: /Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scipy/special/_cephes.so: mach-o, but wrong architecture) ---------------------------------------------------------------------- Traceback (most recent call last): File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/nose/loader.py", line 382, in loadTestsFromName addr.filename, addr.module) File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/nose/importer.py", line 39, in importFromPath return self.importFromDir(dir_path, fqname) File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/nose/importer.py", line 86, in importFromDir mod = load_module(part_fqname, fh, filename, desc) File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scipy/interpolate/__init__.py", line 7, in from interpolate import * File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scipy/interpolate/interpolate.py", line 13, in import scipy.special as spec File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scipy/special/__init__.py", line 8, in from basic import * File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scipy/special/basic.py", line 6, in from _cephes import * ImportError: dlopen(/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scipy/special/_cephes.so, 2): no suitable image found. Did find: /Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scipy/special/_cephes.so: mach-o, but wrong architecture ====================================================================== ERROR: Failure: ImportError (dlopen(/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scipy/lib/blas/fblas.so, 2): no suitable image found. Did find: /Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scipy/lib/blas/fblas.so: mach-o, but wrong architecture) ---------------------------------------------------------------------- Traceback (most recent call last): File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/nose/loader.py", line 382, in loadTestsFromName addr.filename, addr.module) File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/nose/importer.py", line 39, in importFromPath return self.importFromDir(dir_path, fqname) File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/nose/importer.py", line 86, in importFromDir mod = load_module(part_fqname, fh, filename, desc) File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scipy/lib/blas/__init__.py", line 9, in import fblas ImportError: dlopen(/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scipy/lib/blas/fblas.so, 2): no suitable image found. Did find: /Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scipy/lib/blas/fblas.so: mach-o, but wrong architecture ====================================================================== ERROR: Failure: ImportError (dlopen(/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scipy/lib/lapack/calc_lwork.so, 2): no suitable image found. Did find: /Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scipy/lib/lapack/calc_lwork.so: mach-o, but wrong architecture) ---------------------------------------------------------------------- Traceback (most recent call last): File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/nose/loader.py", line 382, in loadTestsFromName addr.filename, addr.module) File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/nose/importer.py", line 39, in importFromPath return self.importFromDir(dir_path, fqname) File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/nose/importer.py", line 86, in importFromDir mod = load_module(part_fqname, fh, filename, desc) File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scipy/lib/lapack/__init__.py", line 9, in import calc_lwork ImportError: dlopen(/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scipy/lib/lapack/calc_lwork.so, 2): no suitable image found. Did find: /Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scipy/lib/lapack/calc_lwork.so: mach-o, but wrong architecture ====================================================================== ERROR: Failure: ImportError (dlopen(/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scipy/linalg/flapack.so, 2): no suitable image found. Did find: /Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scipy/linalg/flapack.so: mach-o, but wrong architecture) ---------------------------------------------------------------------- Traceback (most recent call last): File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/nose/loader.py", line 382, in loadTestsFromName addr.filename, addr.module) File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/nose/importer.py", line 39, in importFromPath return self.importFromDir(dir_path, fqname) File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/nose/importer.py", line 86, in importFromDir mod = load_module(part_fqname, fh, filename, desc) File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scipy/linalg/__init__.py", line 9, in from basic import * File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scipy/linalg/basic.py", line 16, in from lapack import get_lapack_funcs File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scipy/linalg/lapack.py", line 14, in from scipy.linalg import flapack ImportError: dlopen(/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scipy/linalg/flapack.so, 2): no suitable image found. Did find: /Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scipy/linalg/flapack.so: mach-o, but wrong architecture ====================================================================== ERROR: Failure: ImportError (dlopen(/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scipy/optimize/minpack2.so, 2): no suitable image found. Did find: /Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scipy/optimize/minpack2.so: mach-o, but wrong architecture) ---------------------------------------------------------------------- Traceback (most recent call last): File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/nose/loader.py", line 382, in loadTestsFromName addr.filename, addr.module) File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/nose/importer.py", line 39, in importFromPath return self.importFromDir(dir_path, fqname) File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/nose/importer.py", line 86, in importFromDir mod = load_module(part_fqname, fh, filename, desc) File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scipy/maxentropy/__init__.py", line 2, in from maxentropy import * File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scipy/maxentropy/maxentropy.py", line 74, in from scipy import optimize File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scipy/optimize/__init__.py", line 7, in from optimize import * File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scipy/optimize/optimize.py", line 28, in from linesearch import \ File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scipy/optimize/linesearch.py", line 1, in from scipy.optimize import minpack2 ImportError: dlopen(/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scipy/optimize/minpack2.so, 2): no suitable image found. Did find: /Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scipy/optimize/minpack2.so: mach-o, but wrong architecture ====================================================================== ERROR: Failure: ImportError (dlopen(/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scipy/odr/__odrpack.so, 2): no suitable image found. Did find: /Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scipy/odr/__odrpack.so: mach-o, but wrong architecture) ---------------------------------------------------------------------- Traceback (most recent call last): File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/nose/loader.py", line 382, in loadTestsFromName addr.filename, addr.module) File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/nose/importer.py", line 39, in importFromPath return self.importFromDir(dir_path, fqname) File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/nose/importer.py", line 86, in importFromDir mod = load_module(part_fqname, fh, filename, desc) File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scipy/odr/__init__.py", line 11, in import odrpack File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scipy/odr/odrpack.py", line 103, in from scipy.odr import __odrpack ImportError: dlopen(/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scipy/odr/__odrpack.so, 2): no suitable image found. Did find: /Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scipy/odr/__odrpack.so: mach-o, but wrong architecture ====================================================================== ERROR: Failure: ImportError (dlopen(/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scipy/optimize/minpack2.so, 2): no suitable image found. Did find: /Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scipy/optimize/minpack2.so: mach-o, but wrong architecture) ---------------------------------------------------------------------- Traceback (most recent call last): File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/nose/loader.py", line 382, in loadTestsFromName addr.filename, addr.module) File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/nose/importer.py", line 39, in importFromPath return self.importFromDir(dir_path, fqname) File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/nose/importer.py", line 86, in importFromDir mod = load_module(part_fqname, fh, filename, desc) File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scipy/optimize/__init__.py", line 7, in from optimize import * File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scipy/optimize/optimize.py", line 28, in from linesearch import \ File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scipy/optimize/linesearch.py", line 1, in from scipy.optimize import minpack2 ImportError: dlopen(/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scipy/optimize/minpack2.so, 2): no suitable image found. Did find: /Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scipy/optimize/minpack2.so: mach-o, but wrong architecture ====================================================================== ERROR: Failure: ImportError (dlopen(/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scipy/special/_cephes.so, 2): no suitable image found. Did find: /Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scipy/special/_cephes.so: mach-o, but wrong architecture) ---------------------------------------------------------------------- Traceback (most recent call last): File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/nose/loader.py", line 382, in loadTestsFromName addr.filename, addr.module) File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/nose/importer.py", line 39, in importFromPath return self.importFromDir(dir_path, fqname) File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/nose/importer.py", line 86, in importFromDir mod = load_module(part_fqname, fh, filename, desc) File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scipy/signal/__init__.py", line 9, in from bsplines import * File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scipy/signal/bsplines.py", line 2, in import scipy.special File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scipy/special/__init__.py", line 8, in from basic import * File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scipy/special/basic.py", line 6, in from _cephes import * ImportError: dlopen(/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scipy/special/_cephes.so, 2): no suitable image found. Did find: /Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scipy/special/_cephes.so: mach-o, but wrong architecture ====================================================================== ERROR: Failure: ImportError (dlopen(/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scipy/sparse/linalg/isolve/_iterative.so, 2): no suitable image found. Did find: /Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scipy/sparse/linalg/isolve/_iterative.so: mach-o, but wrong architecture) ---------------------------------------------------------------------- Traceback (most recent call last): File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/nose/loader.py", line 382, in loadTestsFromName addr.filename, addr.module) File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/nose/importer.py", line 39, in importFromPath return self.importFromDir(dir_path, fqname) File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/nose/importer.py", line 86, in importFromDir mod = load_module(part_fqname, fh, filename, desc) File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scipy/sparse/linalg/__init__.py", line 5, in from isolve import * File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scipy/sparse/linalg/isolve/__init__.py", line 4, in from iterative import * File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scipy/sparse/linalg/isolve/iterative.py", line 5, in import _iterative ImportError: dlopen(/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scipy/sparse/linalg/isolve/_iterative.so, 2): no suitable image found. Did find: /Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scipy/sparse/linalg/isolve/_iterative.so: mach-o, but wrong architecture ====================================================================== ERROR: Failure: ImportError (dlopen(/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scipy/sparse/linalg/isolve/_iterative.so, 2): no suitable image found. Did find: /Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scipy/sparse/linalg/isolve/_iterative.so: mach-o, but wrong architecture) ---------------------------------------------------------------------- Traceback (most recent call last): File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/nose/loader.py", line 382, in loadTestsFromName addr.filename, addr.module) File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/nose/importer.py", line 39, in importFromPath return self.importFromDir(dir_path, fqname) File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/nose/importer.py", line 86, in importFromDir mod = load_module(part_fqname, fh, filename, desc) File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scipy/sparse/tests/test_base.py", line 33, in from scipy.sparse.linalg import splu File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scipy/sparse/linalg/__init__.py", line 5, in from isolve import * File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scipy/sparse/linalg/isolve/__init__.py", line 4, in from iterative import * File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scipy/sparse/linalg/isolve/iterative.py", line 5, in import _iterative ImportError: dlopen(/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scipy/sparse/linalg/isolve/_iterative.so, 2): no suitable image found. Did find: /Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scipy/sparse/linalg/isolve/_iterative.so: mach-o, but wrong architecture ====================================================================== ERROR: Failure: ImportError (dlopen(/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scipy/special/_cephes.so, 2): no suitable image found. Did find: /Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scipy/special/_cephes.so: mach-o, but wrong architecture) ---------------------------------------------------------------------- Traceback (most recent call last): File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/nose/loader.py", line 382, in loadTestsFromName addr.filename, addr.module) File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/nose/importer.py", line 39, in importFromPath return self.importFromDir(dir_path, fqname) File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/nose/importer.py", line 86, in importFromDir mod = load_module(part_fqname, fh, filename, desc) File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scipy/special/__init__.py", line 8, in from basic import * File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scipy/special/basic.py", line 6, in from _cephes import * ImportError: dlopen(/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scipy/special/_cephes.so, 2): no suitable image found. Did find: /Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scipy/special/_cephes.so: mach-o, but wrong architecture ====================================================================== ERROR: Failure: ImportError (dlopen(/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scipy/special/_cephes.so, 2): no suitable image found. Did find: /Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scipy/special/_cephes.so: mach-o, but wrong architecture) ---------------------------------------------------------------------- Traceback (most recent call last): File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/nose/loader.py", line 382, in loadTestsFromName addr.filename, addr.module) File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/nose/importer.py", line 39, in importFromPath return self.importFromDir(dir_path, fqname) File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/nose/importer.py", line 86, in importFromDir mod = load_module(part_fqname, fh, filename, desc) File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scipy/stats/__init__.py", line 7, in from stats import * File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scipy/stats/stats.py", line 202, in import scipy.special as special File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scipy/special/__init__.py", line 8, in from basic import * File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scipy/special/basic.py", line 6, in from _cephes import * ImportError: dlopen(/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scipy/special/_cephes.so, 2): no suitable image found. Did find: /Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scipy/special/_cephes.so: mach-o, but wrong architecture ====================================================================== FAIL: test_ndimage.TestNdimage.test_gauss03 gaussian filter 3 ---------------------------------------------------------------------- Traceback (most recent call last): File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/nose/case.py", line 186, in runTest self.test(*self.arg) File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scipy/ndimage/tests/test_ndimage.py", line 468, in test_gauss03 assert_almost_equal(output.sum(), input.sum()) File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/numpy/testing/utils.py", line 463, in assert_almost_equal raise AssertionError(msg) AssertionError: Arrays are not almost equal ACTUAL: 49993304.0 DESIRED: 49992896.0 ---------------------------------------------------------------------- Ran 1425 tests in 16.316s FAILED (SKIP=6, errors=14, failures=1) vincent > > > Trying with > > LDFLAGS="-arch x86_64" FFLAGS="-arch x86_64" py27 setupscons.py scons > > >From your arch flags above you have the 10.3 python.org binary of 2.7 > active, which does not have x86_64. So this certainly can't work. > > Cheers, > Ralf > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > -- Thanks Vincent Davis 720-301-3003 -------------- next part -------------- An HTML attachment was scrubbed... URL: From braingateway at gmail.com Wed Nov 3 14:38:35 2010 From: braingateway at gmail.com (braingateway) Date: Wed, 03 Nov 2010 19:38:35 +0100 Subject: [Numpy-discussion] large float32 array issue In-Reply-To: References: Message-ID: <4CD1AC2B.5060001@gmail.com> Vincent Schut : > Hi, I'm running in this strange issue when using some pretty large > float32 arrays. In the following code I create a large array filled with > ones, and calculate mean and sum, first with a float64 version, then > with a float32 version. Note the difference between the two. NB the > float64 version is obviously right :-) > > > > In [2]: areaGrid = numpy.ones((11334, 16002)) > In [3]: print(areaGrid.dtype) > float64 > In [4]: print(areaGrid.shape, areaGrid.min(), areaGrid.max(), > areaGrid.mean(), areaGrid.sum()) > ((11334, 16002), 1.0, 1.0, 1.0, 181366668.0) > > > In [5]: areaGrid = numpy.ones((11334, 16002), numpy.float32) > In [6]: print(areaGrid.dtype) > float32 > In [7]: print(areaGrid.shape, areaGrid.min(), areaGrid.max(), > areaGrid.mean(), areaGrid.sum()) > ((11334, 16002), 1.0, 1.0, 0.092504406598019437, 16777216.0) > > Yes I also got the same problem. b=npy.ones((11334,16002),dtype='float32') >>> a.shape[0]*a.shape[1] 181366668L >>> b.sum() 16777216.0 >>> print npy.finfo(b.dtype).max 3.40282e+38 Acumulator size is definitely not the problem. I think the float point accuracy actually kicked in. try following code: npy.float32(16777216)+npy.float32(1) You will see the number will not grow any more it is because eps(npy.float32(16777216)) = 2 >1 That is why u cannot accumulate with 1 or smaller number beyound this value. try: npy.float32(16777215)+npy.float32(0.5) and: npy.float64(1e16)+npy.float64(1) You also cannot get bigger number by accumulation anymore The numpy.sum() is simply clumsy in this aspect. It try to simply accumulate all the value together, which should always be avoided for float point value, even with float64 number. Think about add 1e12 with 1e16 values smaller than 0.0001, it will give u 1.0e12, instead of 2e12. Some one try to do smarter things like: 1) put all small value into a group, all big value into another group 2) obtain sum values respectively 3) add the sum values together But it is costy I guess > Can anybody confirm this? And better: explain it? Am I running into a > for me till now hidden ieee float 'feature'? Or is it a bug somewhere? > > Btw I'd like to use float32 arrays, as precision is not really an issue > in this case, but memory usage is... > > > This is using python 2.7, numpy from git (yesterday's checkout), on arch > linux 64bit. > > Best, > Vincent. > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From cgohlke at uci.edu Wed Nov 3 17:56:51 2010 From: cgohlke at uci.edu (Christoph Gohlke) Date: Wed, 03 Nov 2010 14:56:51 -0700 Subject: [Numpy-discussion] ~2**32 byte tofile()/fromfile() limit in 64-bit Windows? In-Reply-To: References: Message-ID: <4CD1DAA3.5080503@uci.edu> On 11/3/2010 1:13 AM, Martin Spacek wrote: > I just opened a new ticket (http://projects.scipy.org/numpy/ticket/1660), but I > thought I'd bring it up here as well. I can't seem to get tofile() or save() to > write anything much bigger than a 2**32 byte array to a file in Py 2.6.6 on > 64-bit Windows. They both hang with no errors. Also, fromfile() throws an > "IOError: could not seek in file" on> 2**32-1 byte files, although load() seems > to work fine on any size of file. I'm a bit surprised I haven't stumbled across > these problems before. I've tested as far back as 1.4.1. Is this a known issue? > > (the ticket has much more detail) > > Cheers, > > Martin > I have attached a patch to ticket #1660. Any chance it could be considered for numpy 1.5.1? Christoph From braingateway at gmail.com Wed Nov 3 19:04:07 2010 From: braingateway at gmail.com (braingateway) Date: Thu, 04 Nov 2010 00:04:07 +0100 Subject: [Numpy-discussion] strange behavior of ravel() and flatnonzero() on matrix Message-ID: <4CD1EA67.1040603@gmail.com> >>> aa=matrix([[-1, 2, 0],[0, 0, 3]]) >>> aa matrix([[-1, 2, 0], [ 0, 0, 3]]) >>> aa.nonzero() (matrix([[0, 0, 1]], dtype=int64), matrix([[0, 1, 2]], dtype=int64)) *********OK********* >>> npy.nonzero(aa.flat) (array([0, 1, 5], dtype=int64),) *********OK********* >>> flatnonzero(aa) matrix([[0, 0, 0]], dtype=int64) *******This is Wrong********** If I convert aa to an ndarray, it is OK then aaa=asarray(aa) >>> flatnonzero(aaa) array([0, 1, 5], dtype=int64) Then I figure it out that it might be induced by the behavior of ravel() >>> aaa.shape (2L, 3L) >>> aaa.ravel().shape (6L,) >>> aa.ravel() matrix([[-1, 2, 0, 0, 0, 3]]) >>> _.shape (1L, 6L) Why not make ravel() behaviors consistent under both ndarray and matrix contexts? Or make different flatnonzero() for the matrix context? m.ravel().nonzero()[1]# for matrix a.ravel().nonzero()[0]# for ndarray From braingateway at gmail.com Wed Nov 3 19:07:44 2010 From: braingateway at gmail.com (braingateway) Date: Thu, 04 Nov 2010 00:07:44 +0100 Subject: [Numpy-discussion] strange behavior of ravel() and flatnonzero() on matrix In-Reply-To: <4CD1EA67.1040603@gmail.com> References: <4CD1EA67.1040603@gmail.com> Message-ID: <4CD1EB40.7070505@gmail.com> braingateway : >>>> aa=matrix([[-1, 2, 0],[0, 0, 3]]) >>>> aa >>>> > matrix([[-1, 2, 0], > [ 0, 0, 3]]) > >>>> aa.nonzero() >>>> > (matrix([[0, 0, 1]], dtype=int64), matrix([[0, 1, 2]], dtype=int64)) > *********OK********* > >>>> npy.nonzero(aa.flat) >>>> > (array([0, 1, 5], dtype=int64),) > *********OK********* > >>>> flatnonzero(aa) >>>> > matrix([[0, 0, 0]], dtype=int64) > *******This is Wrong********** > If I convert aa to an ndarray, it is OK then > aaa=asarray(aa) > >>>> flatnonzero(aaa) >>>> > array([0, 1, 5], dtype=int64) > > Then I figure it out that it might be induced by the behavior of ravel() > >>>> aaa.shape >>>> > (2L, 3L) > >>>> aaa.ravel().shape >>>> > (6L,) > >>>> aa.ravel() >>>> > matrix([[-1, 2, 0, 0, 0, 3]]) > >>>> _.shape >>>> > (1L, 6L) > Why not make ravel() behaviors consistent under both ndarray and matrix > contexts? > Or make different flatnonzero() for the matrix context? > m.ravel().nonzero()[1]# for matrix > a.ravel().nonzero()[0]# for ndarray > > here is the numpy version: >>> numpy.__version__ '1.5.0' From robert.kern at gmail.com Wed Nov 3 19:30:49 2010 From: robert.kern at gmail.com (Robert Kern) Date: Wed, 3 Nov 2010 18:30:49 -0500 Subject: [Numpy-discussion] strange behavior of ravel() and flatnonzero() on matrix In-Reply-To: <4CD1EA67.1040603@gmail.com> References: <4CD1EA67.1040603@gmail.com> Message-ID: On Wed, Nov 3, 2010 at 18:04, braingateway wrote: >>>> aa=matrix([[-1, 2, 0],[0, 0, 3]]) >>>> aa > matrix([[-1, 2, 0], > [ 0, 0, 3]]) >>>> aa.nonzero() > (matrix([[0, 0, 1]], dtype=int64), matrix([[0, 1, 2]], dtype=int64)) > *********OK********* >>>> npy.nonzero(aa.flat) > (array([0, 1, 5], dtype=int64),) > *********OK********* >>>> flatnonzero(aa) > matrix([[0, 0, 0]], dtype=int64) > *******This is Wrong********** > If I convert aa to an ndarray, it is OK then > aaa=asarray(aa) >>>> flatnonzero(aaa) > array([0, 1, 5], dtype=int64) > > Then I figure it out that it might be induced by the behavior of ravel() >>>> aaa.shape > (2L, 3L) >>>> aaa.ravel().shape > (6L,) >>>> aa.ravel() > matrix([[-1, 2, 0, 0, 0, 3]]) >>>> _.shape > (1L, 6L) > Why not make ravel() behaviors consistent under both ndarray and matrix > contexts? > Or make different flatnonzero() for the matrix context? > m.ravel().nonzero()[1]# for matrix > a.ravel().nonzero()[0]# for ndarray Most of the ndarray methods will make sure that they return the same subclass of ndarray that the original object is. So type(some_matrix.ravel()) is also matrix. Since matrix objects are always 2D, you get the above behavior. One could probably overwrite those methods to return ndarrays of the proper shape instead. If you are doing such shape-manipulating operations, though, I highly recommend just using ndarray objects and never using matrix objects. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." ? -- Umberto Eco From mwwiebe at gmail.com Wed Nov 3 19:45:32 2010 From: mwwiebe at gmail.com (Mark Wiebe) Date: Wed, 3 Nov 2010 16:45:32 -0700 Subject: [Numpy-discussion] float16/half-precision module Message-ID: I wrote a float16/half-precision module here: https://github.com/m-paradox/numpy_half I've tested it with NumPy 1.3.0 and the latest trunk. With it, you can do things like this: >>> import numpy as np, half as h >>> np.array([0,0.1,1.0/3.0], dtype='float16') array([ 0. , 0.09997559, 0.33325195], dtype=float16) >>> a = h.float16(1.5) >>> print a 1.5 >>> a.dtype dtype('float16') Many functions aren't implemented, so things like this don't work: >>> np.arange(10, dtype='float16') Traceback (most recent call last): File "", line 1, in ValueError: no fill-function for data-type. Also, because of bug #809, it looks like there's no way to nicely support 'f2'. >>> np.array([10], dtype='f2') array([ 10.], dtype=float16) >>> np.array([10], dtype='", line 1, in TypeError: data type not understood What would need to be done to build it in as a supported NumPy data type? -Mark -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at googlemail.com Wed Nov 3 20:18:05 2010 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Thu, 4 Nov 2010 08:18:05 +0800 Subject: [Numpy-discussion] ~2**32 byte tofile()/fromfile() limit in 64-bit Windows? In-Reply-To: <4CD1DAA3.5080503@uci.edu> References: <4CD1DAA3.5080503@uci.edu> Message-ID: On Thu, Nov 4, 2010 at 5:56 AM, Christoph Gohlke wrote: > > > On 11/3/2010 1:13 AM, Martin Spacek wrote: >> I just opened a new ticket (http://projects.scipy.org/numpy/ticket/1660), but I >> thought I'd bring it up here as well. I can't seem to get tofile() or save() to >> write anything much bigger than a 2**32 byte array to a file in Py 2.6.6 on >> 64-bit Windows. They both hang with no errors. Also, fromfile() throws an >> "IOError: could not seek in file" on> ?2**32-1 byte files, although load() seems >> to work fine on any size of file. I'm a bit surprised I haven't stumbled across >> these problems before. I've tested as far back as 1.4.1. Is this a known issue? >> >> (the ticket has much more detail) >> >> Cheers, >> >> Martin >> > > I have attached a patch to ticket #1660. Any chance it could be > considered for numpy 1.5.1? To me it seems tofile/save being broken for some use cases is a more serious problem than the other patches proposed. Also the change only affects 64-bit Windows. So I think it can go in if it's merged in time (i.e. within a couple of days). I'd prefer someone more familiar with that particular code to review and merge it. Cheers, Ralf From ralf.gommers at googlemail.com Wed Nov 3 20:25:14 2010 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Thu, 4 Nov 2010 08:25:14 +0800 Subject: [Numpy-discussion] [SciPy-User] please test/review: Scipy on OS X with Python 2.7 In-Reply-To: References: Message-ID: Hi Vincent, On Wed, Nov 3, 2010 at 10:18 PM, Vincent Davis wrote: > > > Ok I am a little more awake now and not in zombi mode. That is ?correct osx > 10.6 c++4.2, Doing it again and maybe correctly with?python-2.7-macosx10.5 > and c++4.0. > full scipy test results here > https://gist.github.com/661125 > summary Can you tell us the build command you used, and output of the "file" command on the built .so files? Also, can you run the numpy test suite like so: >>> numpy.test('full') Thanks, Ralf > > ====================================================================== > ERROR: Failure: ImportError > (dlopen(/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scipy/fftpack/_fftpack.so, > 2): no suitable image found. Did find: > ????????/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scipy/fftpack/_fftpack.so: > mach-o, but wrong architecture) From david at silveregg.co.jp Wed Nov 3 21:00:05 2010 From: david at silveregg.co.jp (David) Date: Thu, 04 Nov 2010 10:00:05 +0900 Subject: [Numpy-discussion] ~2**32 byte tofile()/fromfile() limit in 64-bit Windows? In-Reply-To: References: <4CD1DAA3.5080503@uci.edu> Message-ID: <4CD20595.3010307@silveregg.co.jp> On 11/04/2010 09:18 AM, Ralf Gommers wrote: > To me it seems tofile/save being broken for some use cases is a more > serious problem than the other patches proposed. Also the change only > affects 64-bit Windows. So I think it can go in if it's merged in time > (i.e. within a couple of days). I'd prefer someone more familiar with > that particular code to review and merge it. I don't think the code is appropriate as is. I can take a look at it this WE, but not before, cheers, David From vincent at vincentdavis.net Wed Nov 3 22:55:15 2010 From: vincent at vincentdavis.net (Vincent Davis) Date: Wed, 3 Nov 2010 20:55:15 -0600 Subject: [Numpy-discussion] [SciPy-User] please test/review: Scipy on OS X with Python 2.7 In-Reply-To: References: Message-ID: On Wed, Nov 3, 2010 at 6:25 PM, Ralf Gommers wrote: > Hi Vincent, > > On Wed, Nov 3, 2010 at 10:18 PM, Vincent Davis > wrote: > > > > > > Ok I am a little more awake now and not in zombi mode. That is correct > osx > > 10.6 c++4.2, Doing it again and maybe correctly > with python-2.7-macosx10.5 > > and c++4.0. > > full scipy test results here > > https://gist.github.com/661125 > > summary > > Can you tell us the build command you used, and output of the "file" > command on the built .so files? Also, can you run the numpy test suite > like so: > >>> numpy.test('full') > Ok all the info are the results and build commands http://db.tt/McbfXS2 I am traveling and am only able to spend sort bits of time on this so I am not putting much thought into what is going on. I hope to have more time after Friday. The point being, let me know what you would like me to do but don't expect much interpretation of the results :-) > Thanks, > Ralf > > > > > ====================================================================== > > ERROR: Failure: ImportError > > > (dlopen(/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scipy/fftpack/_fftpack.so, > > 2): no suitable image found. Did find: > > > /Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scipy/fftpack/_fftpack.so: > > mach-o, but wrong architecture) > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -- Thanks Vincent Davis 720-301-3003 -------------- next part -------------- An HTML attachment was scrubbed... URL: From schut at sarvision.nl Thu Nov 4 05:56:23 2010 From: schut at sarvision.nl (Vincent Schut) Date: Thu, 04 Nov 2010 10:56:23 +0100 Subject: [Numpy-discussion] large float32 array issue In-Reply-To: <4CD16BE2.4050408@gmail.com> References: <4CD16BE2.4050408@gmail.com> Message-ID: On 11/03/2010 03:04 PM, Bruce Southey wrote: > On 11/03/2010 06:52 AM, Pauli Virtanen wrote: >> Wed, 03 Nov 2010 12:39:08 +0100, Vincent Schut wrote: >> [clip] >>> Btw, should I file a bug on this? >> One can argue that mean() and sum() should use a numerically stabler >> algorithm, so yes, a bug can be filed if there is not yet one already. >> > This is a 'user bug' not a numpy bug because it is a well known > numerical problem. I recall that we have had this type of discussion > before that has resulted in these functions being left as they are. The > numerical problem is mentioned better in the np.mean docstring than the > np.sum docstring. > > My understanding was that any new algorithm has to be better than the > current algorithm especially in speed and accuracy across 'typical' > numpy problems across the different Python and OS versions not just for > numerically challenged cases. For example, I would not want to sacrifice > speed if I achieve the same accuracy without losing as much speed as > just changing the dtype to float128 (as I use x86_64 Linux). > > Also in Warren's mean example, this is simply a 32-bit error because it > disappears when using 64-bit (numpy's default) - well, until we reach > the extreme 64-bit values. > > >>> np.ones((11334,16002)).mean() > 1.0 > >>> np.ones((11334,16002),np.float32).mean() > 0.092504406598019437 > >>> np.ones((11334,16002),np.float32).mean().dtype > dtype('float64') > > Note that there is probably a bug in np.mean because a 64-bit dtype is > returned for integers and 32-bit or lower precision floats. So upcast is > not apparently being done on the accumulator. > > > Bruce Thanks for the info, all. I agree that this is a 'user bug', however, mentioning this as a corner case someplace a user would look when finding errors like these might be an idea, as I have the feeling it will keep turning up once and again at this list otherwise. Maybe start a webpage 'numpy and some often encountered floating point issues'? For now, I've just ordered more memory. The cheapest and simplest solution, if you ask me :-) Vincent. From sippis99 at hotmail.com Thu Nov 4 08:15:00 2010 From: sippis99 at hotmail.com (=?iso-8859-1?B?T2xsaSBTaXBpbOQ=?=) Date: Thu, 4 Nov 2010 12:15:00 +0000 Subject: [Numpy-discussion] problem with a binary file on OS X 10.6 Message-ID: Hello, I have a problem reading a binary file on OS X 10.6. On my desktop mac (OS X 10.4.11, Python 2.5.4, Numpy 1.3.0), the command CELLS, NSIZE, NE = fromfile(fp, int32, 3), where fp is the filename, correctly prints "21 20 300". However, when I try the above on my laptop using Snow Leopard (Python 2.6.6, Numpy 1.5.0), I get the numbers "352321536 335544320 738263040". This results in a failure when trying to read the data that comes afterwards (Numpy fails with "array too big"). I assume this error may have something to do with 10.6 being 64-bit and 10.4 being 32-bit; however, the Python 2.6.6. distribution is also 32-bit. Any thoughts on this? Thanks,Olli Sipil? -------------- next part -------------- An HTML attachment was scrubbed... URL: From whg21 at cam.ac.uk Thu Nov 4 08:17:01 2010 From: whg21 at cam.ac.uk (Henry Gomersall) Date: Thu, 04 Nov 2010 12:17:01 +0000 Subject: [Numpy-discussion] Error in API docs? In-Reply-To: <1288355540.15977.9.camel@whg21-laptop> References: <1288355540.15977.9.camel@whg21-laptop> Message-ID: <1288873021.2199.57.camel@whg21-laptop> Does anyone care about this? Is there an alternative channel for such information, perhaps a bug report? Cheers, Henry On Fri, 2010-10-29 at 13:32 +0100, Henry Gomersall wrote: > There is an inconsistency in the documentation for NPY_INOUT_ARRAY. > > cf. > http://docs.scipy.org/doc/numpy/user/c-info.how-to-extend.html#NPY_INOUT_ARRAY > http://docs.scipy.org/doc/numpy/reference/c-api.array.html#NPY_INOUT_ARRAY > > The first link includes the flag NPY_UPDATEIFCOPY. Checking the code > seems to confirm that the correct version is that in the first link, and > also that NPY_OUT_ARRAY is wrong in the API docs. > > I haven't checked NPY_INOUT_FARRAY or NPY_OUT_FARRAY. > > Cheers, > > Henry > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From numpy-discussion at maubp.freeserve.co.uk Thu Nov 4 08:32:24 2010 From: numpy-discussion at maubp.freeserve.co.uk (Peter) Date: Thu, 4 Nov 2010 12:32:24 +0000 Subject: [Numpy-discussion] problem with a binary file on OS X 10.6 In-Reply-To: References: Message-ID: On Thu, Nov 4, 2010 at 12:15 PM, Olli Sipil? wrote: > Hello, > I have a problem reading a binary file on OS X 10.6. On my desktop mac (OS X > 10.4.11, Python 2.5.4, Numpy 1.3.0), the command > CELLS, NSIZE, NE = fromfile(fp, int32, 3), where fp is the filename, > correctly prints "21 20 300". However, when I try the above on my laptop > using Snow Leopard (Python 2.6.6, Numpy 1.5.0), I get the numbers "352321536 > 335544320 738263040". This results in a failure when trying to read the data > that comes afterwards (Numpy fails with "array too big"). I assume this > error may have something to do with 10.6 being 64-bit and 10.4 being 32-bit; > however, the Python 2.6.6. distribution is also 32-bit. Any thoughts on > this? > Thanks, > Olli Sipil? Snow Leopard (10.6) is Intel only, so your laptop is using an Intel CPU which is little-endian. Is your Tiger (10.4) machine a PowerPC CPU? That would be big-endian, and therefore would write the floats to disk reversed compared to what a little-endian machine would expect. Peter From faltet at pytables.org Thu Nov 4 08:33:48 2010 From: faltet at pytables.org (Francesc Alted) Date: Thu, 4 Nov 2010 13:33:48 +0100 Subject: [Numpy-discussion] problem with a binary file on OS X 10.6 In-Reply-To: References: Message-ID: <201011041333.48152.faltet@pytables.org> A Thursday 04 November 2010 13:15:00 Olli Sipil? escrigu?: > Hello, > I have a problem reading a binary file on OS X 10.6. On my desktop > mac (OS X 10.4.11, Python 2.5.4, Numpy 1.3.0), the command CELLS, > NSIZE, NE = fromfile(fp, int32, 3), where fp is the filename, > correctly prints "21 20 300". However, when I try the above on my > laptop using Snow Leopard (Python 2.6.6, Numpy 1.5.0), I get the > numbers "352321536 335544320 738263040". This results in a failure > when trying to read the data that comes afterwards (Numpy fails with > "array too big"). I assume this error may have something to do with > 10.6 being 64-bit and 10.4 being 32-bit; however, the Python 2.6.6. > distribution is also 32-bit. Any thoughts on this? Thanks,Olli Yeah. Most probably your Mac has a PowerPC processor, which has a different byte-ordering than Intel. Look at this: >>> a = np.int32(21) >>> a.byteswap() 352321536 >>> a = np.int32(20) >>> a.byteswap() 335544320 >>> a = np.int32(300) >>> a.byteswap() 738263040 To solve this, just apply byteswap once more: >>> a = np.int32(300) >>> a.byteswap().byteswap() 300 and you are done. -- Francesc Alted From pav at iki.fi Thu Nov 4 08:35:00 2010 From: pav at iki.fi (Pauli Virtanen) Date: Thu, 4 Nov 2010 12:35:00 +0000 (UTC) Subject: [Numpy-discussion] Error in API docs? References: <1288355540.15977.9.camel@whg21-laptop> <1288873021.2199.57.camel@whg21-laptop> Message-ID: Thu, 04 Nov 2010 12:17:01 +0000, Henry Gomersall wrote: > Does anyone care about this? Is there an alternative channel for such > information, perhaps a bug report? We care, but just posting to the ML is not the best way to get minor bugs fixed, due to the high list volume. Alternative channels: (i) Fix it yourself. http://docs.scipy.org/doc/numpy/user/c-info.how-to- extend.html#NPY_INOUT_ARRAY Check the "Edit this page" link in the left sidebar. You'll need to register an account and after that ask for activation (eg. via mailing here). (ii) Fix it yourself (b). http://docs.scipy.org/doc/numpy/dev/gitwash/patching.html (iii) File a bug ticket. -- Pauli Virtanen From pav at iki.fi Thu Nov 4 08:37:33 2010 From: pav at iki.fi (Pauli Virtanen) Date: Thu, 4 Nov 2010 12:37:33 +0000 (UTC) Subject: [Numpy-discussion] problem with a binary file on OS X 10.6 References: <201011041333.48152.faltet@pytables.org> Message-ID: Thu, 04 Nov 2010 13:33:48 +0100, Francesc Alted wrote: [clip] > To solve this, just apply byteswap once more: > >>>> a = np.int32(300) >>>> a.byteswap().byteswap() > 300 > > and you are done. Or directly specify big-endian byte order when reading fromfile(fp, '>i4', 3) From faltet at pytables.org Thu Nov 4 08:54:29 2010 From: faltet at pytables.org (Francesc Alted) Date: Thu, 4 Nov 2010 13:54:29 +0100 Subject: [Numpy-discussion] problem with a binary file on OS X 10.6 In-Reply-To: References: <201011041333.48152.faltet@pytables.org> Message-ID: <201011041354.29198.faltet@pytables.org> A Thursday 04 November 2010 13:37:33 Pauli Virtanen escrigu?: > Thu, 04 Nov 2010 13:33:48 +0100, Francesc Alted wrote: > [clip] > > > To solve this, just apply byteswap once more: > >>>> a = np.int32(300) > >>>> a.byteswap().byteswap() > > > > 300 > > > > and you are done. > > Or directly specify big-endian byte order when reading > > fromfile(fp, '>i4', 3) Much, much better :-) -- Francesc Alted From ralf.gommers at googlemail.com Thu Nov 4 09:13:03 2010 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Thu, 4 Nov 2010 21:13:03 +0800 Subject: [Numpy-discussion] [SciPy-User] please test/review: Scipy on OS X with Python 2.7 In-Reply-To: References: Message-ID: On Thu, Nov 4, 2010 at 10:55 AM, Vincent Davis wrote: > > > On Wed, Nov 3, 2010 at 6:25 PM, Ralf Gommers > wrote: >> >> Hi Vincent, >> >> On Wed, Nov 3, 2010 at 10:18 PM, Vincent Davis >> wrote: >> > >> > >> > Ok I am a little more awake now and not in zombi mode. That is ?correct >> > osx >> > 10.6 c++4.2, Doing it again and maybe correctly >> > with?python-2.7-macosx10.5 >> > and c++4.0. >> > full scipy test results here >> > https://gist.github.com/661125 >> > summary >> >> Can you tell us the build command you used, and output of the "file" >> command on the built .so files? Also, can you run the numpy test suite >> like so: >> ?>>> numpy.test('full') > > Ok all the info are the results and build commands > http://db.tt/McbfXS2 > I am traveling and am only able to spend sort bits of time on this so I am > not putting much thought into what is going on. I hope to have more time > after Friday. The point being, let me know what you would like me to do but > don't expect much interpretation of the results ?:-) > The output of numpy.test('full') tells me you are not actually running my farchs branch. The version should say 1.5.1dev, not 2.0.0dev. So then it's not surprising this is still failing for you.... Ralf From sippis99 at hotmail.com Thu Nov 4 09:30:59 2010 From: sippis99 at hotmail.com (=?iso-8859-1?B?T2xsaSBTaXBpbOQ=?=) Date: Thu, 4 Nov 2010 13:30:59 +0000 Subject: [Numpy-discussion] [SciPy-User] problem with a binary file on OS X 10.6 In-Reply-To: References: , , , , , , Message-ID: Seems the problem was indeed with the endian. Works like a charm now, thank you all for the quick replies! Olli -------------- next part -------------- An HTML attachment was scrubbed... URL: From braingateway at gmail.com Thu Nov 4 09:46:33 2010 From: braingateway at gmail.com (braingateway) Date: Thu, 04 Nov 2010 14:46:33 +0100 Subject: [Numpy-discussion] flattened index for Sparse Matrix? Message-ID: <4CD2B939.9020005@gmail.com> Hi Everyone, I am trying sparse matrix these days. I am wondering is there any way I can access the sparse matrix with flattened index? For example: a=numpy.matrix([[0,1,2],[3,4,5]) matrix([[0, 1, 2], [3, 4, 5]]) >>> >>>print a.flat[3] >>> 3 >>> >>> a.flat[3]=10 >>> >>> print a >>> [[ 0 1 2] [10 4 5]] How could I do the similar indexing for sparse matrix? Thanks ahead, LittleBigBrain From faltet at pytables.org Thu Nov 4 09:58:55 2010 From: faltet at pytables.org (Francesc Alted) Date: Thu, 4 Nov 2010 14:58:55 +0100 Subject: [Numpy-discussion] ANN: python-blosc 1.0.2 Message-ID: <201011041458.55870.faltet@pytables.org> ==================================================== Announcing python-blosc 1.0.2 A Python wrapper for the Blosc compression library ==================================================== What is it? =========== Blosc (http://blosc.pytables.org) is a high performance compressor optimized for binary data. It has been designed to transmit data to the processor cache faster than the traditional, non-compressed, direct memory fetch approach via a memcpy() OS call. Blosc works well for compressing numerical arrays that contains data with relatively low entropy, like sparse data, time series, grids with regular-spaced values, etc. python-blosc is a Python package that wraps it. What is new? ============ Updated to Blosc 1.1.2. Fixes some bugs when dealing with very small buffers (typically smaller than specified typesizes). Closes #1. Basic Usage =========== [Using IPython shell and a 2-core machine below] # Create a binary string made of int (32-bit) elements >>> import array >>> a = array.array('i', range(10*1000*1000)) >>> bytes_array = a.tostring() # Compress it >>> import blosc >>> bpacked = blosc.compress(bytes_array, typesize=a.itemsize) >>> len(bytes_array) / len(bpacked) 110 # 110x compression ratio. Not bad! # Compression speed? >>> timeit blosc.compress(bytes_array, typesize=a.itemsize) 100 loops, best of 3: 12.8 ms per loop >>> len(bytes_array) / 0.0128 / (1024*1024*1024) 2.9103830456733704 # wow, compressing at ~ 3 GB/s, that's fast! # Decompress it >>> bytes_array2 = blosc.decompress(bpacked) # Check whether our data have had a good trip >>> bytes_array == bytes_array2 True # yup, it seems so # Decompression speed? >>> timeit blosc.decompress(bpacked) 10 loops, best of 3: 21.3 ms per loop >>> len(bytes_array) / 0.0213 / (1024*1024*1024) 1.7489625814375185 # decompressing at ~ 1.7 GB/s is pretty good too! More examples showing other features (and using NumPy arrays) are available on the python-blosc wiki page: http://github.com/FrancescAlted/python-blosc/wiki Documentation ============= Please refer to docstrings. Start by the main package: >>> import blosc >>> help(blosc) and ask for more docstrings in the referenced functions. Download sources ================ Go to: http://github.com/FrancescAlted/python-blosc and download the most recent release from here. Blosc is distributed using the MIT license, see LICENSES/BLOSC.txt for details. Mailing list ============ There is an official mailing list for Blosc at: blosc at googlegroups.com http://groups.google.es/group/blosc ---- **Enjoy data!** -- Francesc Alted From ralf.gommers at googlemail.com Thu Nov 4 10:00:47 2010 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Thu, 4 Nov 2010 22:00:47 +0800 Subject: [Numpy-discussion] Trac unaware of github move Message-ID: Hi, I just noticed that the Trac wiki is not displaying updates to files kept in the source tree, for example http://projects.scipy.org/numpy/wiki/TestingGuidelines is stuck at an older version. Can one of the admins point the ReST plugin to github? Ralf From pav at iki.fi Thu Nov 4 10:06:14 2010 From: pav at iki.fi (Pauli Virtanen) Date: Thu, 4 Nov 2010 14:06:14 +0000 (UTC) Subject: [Numpy-discussion] Trac unaware of github move References: Message-ID: Thu, 04 Nov 2010 22:00:47 +0800, Ralf Gommers wrote: > I just noticed that the Trac wiki is not displaying updates to files > kept in the source tree, for example > http://projects.scipy.org/numpy/wiki/TestingGuidelines is stuck at an > older version. > > Can one of the admins point the ReST plugin to github? That would require more work on the Trac-Git integration front: http://github.com/pv/githubsimple-trac It might be more cost-effective to just use links to the Github web interface. -- Pauli Virtanen From jsseabold at gmail.com Thu Nov 4 12:08:43 2010 From: jsseabold at gmail.com (Skipper Seabold) Date: Thu, 4 Nov 2010 12:08:43 -0400 Subject: [Numpy-discussion] create an object array with named dtype Message-ID: I just ran into this and am kind of baffled. There are other ways I could do this so it's not a huge deal, but I'm wondering if this is a bug. I want a (named) structured array with an object dtype. Is this possible? For example In [68]: import numpy as np In [69]: A = np.array((np.arange(10)[:,None], 3, 4.5), dtype=[("prob", float, (1 0,1)), ....: ("m1", float),("m2", float)]) In [70]: B = np.array((np.arange(10,20)[:,None], 3, 4.5), dtype=[("prob", float, (10,1)), ....: ("m1", float),("m2", float)]) This works fine. In [71]: C = np.empty(2, dtype=object) In [72]: C[0] = A In [73]: C[1] = B In [74]: np.all(C[0] == A) Out[74]: True In [75]: np.all(C[1] == B) Out[75]: True This doesn't. Looks like it takes a bad view on the array part (?) In [76]: D = np.empty(2, dtype=[("n1",object),("n2", object)]) In [77]: D["n1"] = A In [78]: D["n2"] = B In [79]: np.all(D["n1"] == A) Out[79]: False In [80]: np.all(D["n2"] == B) Out[80]: False Skipper From robert.kern at gmail.com Thu Nov 4 12:28:20 2010 From: robert.kern at gmail.com (Robert Kern) Date: Thu, 4 Nov 2010 11:28:20 -0500 Subject: [Numpy-discussion] create an object array with named dtype In-Reply-To: References: Message-ID: On Thu, Nov 4, 2010 at 11:08, Skipper Seabold wrote: > I just ran into this and am kind of baffled. ?There are other ways I > could do this so it's not a huge deal, but I'm wondering if this is a > bug. ?I want a (named) structured array with an object dtype. ?Is this > possible? > > For example > > In [68]: import numpy as np > > In [69]: A = np.array((np.arange(10)[:,None], 3, 4.5), dtype=[("prob", float, (1 > 0,1)), > ? ....: ? ? ? ? ("m1", float),("m2", float)]) > > In [70]: B = np.array((np.arange(10,20)[:,None], 3, 4.5), dtype=[("prob", float, > ?(10,1)), > ? ....: ? ? ? ? ("m1", float),("m2", float)]) > > This works fine. > > In [71]: C = np.empty(2, dtype=object) > > In [72]: C[0] = A > > In [73]: C[1] = B > > In [74]: np.all(C[0] == A) > Out[74]: True > > In [75]: np.all(C[1] == B) > Out[75]: True > > This doesn't. ?Looks like it takes a bad view on the array part (?) > > In [76]: D = np.empty(2, dtype=[("n1",object),("n2", object)]) > > In [77]: D["n1"] = A > > In [78]: D["n2"] = B > > In [79]: np.all(D["n1"] == A) > Out[79]: False > > In [80]: np.all(D["n2"] == B) > Out[80]: False Yes, there is a problem: [~] |6> D['n1'] array([ (array([[ 3.98154716e-303], [ 2.15123073e-314], [ 1.16902381e-291], [ 3.99988181e-303], [ 6.33886224e-321], [ -1.79678332e+182], [ 1.69759663e-313], [ 3.98154716e-303], [ 3.96089848e-303], [ 3.96089848e-303]]), 3.0, 4.5), (array([[ 0.00000000e+000], [ 4.94065646e-324], [ 1.10229260e-291], [ 3.99988181e-303], [ 1.16899398e-291], [ 1.83328798e-288], [ 3.98154716e-303], [ 2.15124483e-314], [ 1.16898711e-291], [ 3.99632165e-303]]), 3.0, 4.5)], dtype=object) The data in the A['prob'] didn't get copied correctly when it got assigned. If you assign D[0]['n1'] and D[1]['n1'] separately, it works. [~] |12> D[0]['n1'] = A [~] |16> D[1]['n1'] = A [~] |17> D array([ ((array([[ 0.], [ 1.], [ 2.], [ 3.], [ 4.], [ 5.], [ 6.], [ 7.], [ 8.], [ 9.]]), 3.0, 4.5), None), ((array([[ 0.], [ 1.], [ 2.], [ 3.], [ 4.], [ 5.], [ 6.], [ 7.], [ 8.], [ 9.]]), 3.0, 4.5), None)], dtype=[('n1', '|O4'), ('n2', '|O4')]) -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." ? -- Umberto Eco From luc.dekoninck at intec.ugent.be Fri Nov 5 10:03:03 2010 From: luc.dekoninck at intec.ugent.be (Luc Dekoninck) Date: Fri, 5 Nov 2010 14:03:03 +0000 (UTC) Subject: [Numpy-discussion] Matlab IO Warning in mio5.py References: Message-ID: Hello, Got the same error, working on windows 7, 64 bit... Does this help? Is there a solution available? Luc From Chris.Barker at noaa.gov Fri Nov 5 11:50:08 2010 From: Chris.Barker at noaa.gov (Christopher Barker) Date: Fri, 05 Nov 2010 08:50:08 -0700 Subject: [Numpy-discussion] problem with a binary file on OS X 10.6 In-Reply-To: References: Message-ID: <4CD427B0.7090803@noaa.gov> On 11/4/10 5:15 AM, Olli Sipil? wrote: > I have a problem reading a binary file on OS X 10.6. On my desktop mac > (OS X 10.4.11, Python 2.5.4, Numpy 1.3.0), the command > > CELLS, NSIZE, NE = fromfile(fp, int32, 3), where fp is the filename, > > correctly prints "21 20 300". However, when I try the above on my laptop > using Snow Leopard (Python 2.6.6, Numpy 1.5.0), I get the numbers > "352321536 335544320 738263040". This results in a failure when trying > to read the data that comes afterwards (Numpy fails with "array too > big"). I assume this error may have something to do with 10.6 being > 64-bit and 10.4 being 32-bit; however, the Python 2.6.6. distribution is > also 32-bit. Any thoughts on this? you've specified int32, so 32-64 bit should not be the issue. however, it's likely that endian issues are -- is one of your machine PPC, and one Intel? yup, that's it: In [29]: a = np.array((21, 20, 300), dtype=np.int32) In [30]: a Out[30]: array([ 21, 20, 300]) In [31]: a.byteswap() Out[31]: array([352321536, 335544320, 738263040]) So, you can either call bytswap yourself, or, better yet specify the endianness in your dtype specifier, somethign like (untested) dtype='>i4' for big endian and dtype='>i4' for little endian. Intel is little endian, PPC is bigendian (I think -- do test to make sure!) -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From faltet at pytables.org Fri Nov 5 11:59:39 2010 From: faltet at pytables.org (Francesc Alted) Date: Fri, 5 Nov 2010 16:59:39 +0100 Subject: [Numpy-discussion] ANN: PyTables 2.2.1 released Message-ID: <201011051659.39926.faltet@pytables.org> =========================== Announcing PyTables 2.2.1 =========================== This is maintenance release. The upgrade is recommended for all that are running PyTables in production environments. What's new ========== Many fixes have been included, as well as a fair bunch of performance improvements. Also, the Blosc compression library has been updated to 1.1.2, in order to prevent locks in some scenarios. Finally, the new evaluation version of PyTables Pro is based on the previous Pro 2.2. In case you want to know more in detail what has changed in this version, have a look at: http://www.pytables.org/moin/ReleaseNotes/Release_2.2.1 You can download a source package with generated PDF and HTML docs, as well as binaries for Windows, from: http://www.pytables.org/download/stable For an on-line version of the manual, visit: http://www.pytables.org/docs/manual-2.2.1 What it is? =========== PyTables is a library for managing hierarchical datasets and designed to efficiently cope with extremely large amounts of data with support for full 64-bit file addressing. PyTables runs on top of the HDF5 library and NumPy package for achieving maximum throughput and convenient use. Resources ========= About PyTables: http://www.pytables.org About the HDF5 library: http://hdfgroup.org/HDF5/ About NumPy: http://numpy.scipy.org/ Acknowledgments =============== Thanks to many users who provided feature improvements, patches, bug reports, support and suggestions. See the ``THANKS`` file in the distribution package for a (incomplete) list of contributors. Most specially, a lot of kudos go to the HDF5 and NumPy (and numarray!) makers. Without them, PyTables simply would not exist. Share your experience ===================== Let us know of any bugs, suggestions, gripes, kudos, etc. you may have. ---- **Enjoy data!** -- The PyTables Team -- Francesc Alted From braingateway at gmail.com Fri Nov 5 21:22:26 2010 From: braingateway at gmail.com (braingateway) Date: Sat, 06 Nov 2010 02:22:26 +0100 Subject: [Numpy-discussion] scipy.linalg.solve()'s overwrite option does not work Message-ID: <4CD4ADD2.5080904@gmail.com> Hi everyone, I believe the overwrite option is used for reduce memory usage. But I did following test, and find out it does not work at all. Maybe I misunderstood the purpose of overwrite option. If anybody could explain this, I shall highly appreciate your help. >>> a=npy.random.randn(20,20) >>> x=npy.random.randn(20,4) >>> a=npy.matrix(a) >>> x=npy.matrix(x) >>> b=a*x >>> import scipy.linalg as sla >>> a0=npy.matrix(a) >>> a is a0 False >>> b0=npy.matrix(b) >>> b is b0 False >>> X=sla.solve(a,b,overwrite_b=True,debug=True) solve:overwrite_a= False solve:overwrite_b= True >>> X is b False >>> (X==b).all() False >>> (b0==b).all() True >>> sla.solve(a,b,overwrite_a=True,overwrite_b=True,debug=True) solve:overwrite_a= True solve:overwrite_b= True >>> (a0==a).all() True >>> help(sla.solve) Help on function solve in module scipy.linalg.basic: solve(a, b, sym_pos=False, lower=False, overwrite_a=False, overwrite_b=False, debug=False) Solve the equation a x = b for x Parameters ---------- a : array, shape (M, M) b : array, shape (M,) or (M, N) sym_pos : boolean Assume a is symmetric and positive definite lower : boolean Use only data contained in the lower triangle of a, if sym_pos is true. Default is to use upper triangle. overwrite_a : boolean Allow overwriting data in a (may enhance performance) overwrite_b : boolean Allow overwriting data in b (may enhance performance) Returns ------- From ralf.gommers at googlemail.com Fri Nov 5 22:45:05 2010 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Sat, 6 Nov 2010 10:45:05 +0800 Subject: [Numpy-discussion] Matlab IO Warning in mio5.py In-Reply-To: References: Message-ID: On Fri, Nov 5, 2010 at 10:03 PM, Luc Dekoninck wrote: > > > Hello, > > Got the same error, working on windows 7, 64 bit... > > Does this help? > Is there ?a solution available? > It's not an error but a harmless (although confusing) warning message. You should be able to filter it by adding the following to scipy/__init__.py: import warnings warnings.filterwarnings(action='ignore', message='.*__builtin__.file size changed.*') Can you check if that works for you? Cheers, Ralf From staywithpin at gmail.com Sat Nov 6 08:51:27 2010 From: staywithpin at gmail.com (qihua wu) Date: Sat, 6 Nov 2010 20:51:27 +0800 Subject: [Numpy-discussion] about SIMD (SSE2 & SSE3) Message-ID: I used the following command to install the numpy to enable the SSE3 numpy-1.5.1rc1-win32-superpack-python3.1.exe /arch sse3 Then how can I know whether numpy is running with SSE or not? I have a program to process the data from sql server using java to process 600M rows, it takes 7 hours to complete, about 4 hours is eating the cpu. I am wondering whether I can port the java to numpy to cut the 4 hours to 2hours or even less by enabling the SSE3. Any comment? -------------- next part -------------- An HTML attachment was scrubbed... URL: From damienlmoore at gmail.com Sat Nov 6 09:22:24 2010 From: damienlmoore at gmail.com (Damien Moore) Date: Sat, 6 Nov 2010 09:22:24 -0400 Subject: [Numpy-discussion] numpy.genfromtxt converters issue Message-ID: Hi List, I'm trying to import csv data as a numpy array using genfromtxt. The csv file contains mixed data, some floating point, others string codes and dates that I want to convert to floating point. The strange thing is that when I use the 'converters' argument to convert a subset of the columns the resulting output of genfromtxt becomes a 1d array of tuples instead of the desired 2d array of floats. I've provided a simple example below. The output I want should be numpy.array([[1,2],[3,4]]). Any thoughts on how to get my desired output would be appreciated. thanks, Damien import numpy, StringIO s=StringIO.StringIO('q1,2\nq3,4') a=numpy.genfromtxt(s,delimiter=',',converters={0:lambda s:float(s[1:])}) s=StringIO.StringIO('q1,2\nq3,4') b=numpy.genfromtxt(s,delimiter=',') a.shape (2,) b.shape (2,2) >>> a array([(1.0, 2.0), (3.0, 4.0)], dtype=[('f0', '|O4'), ('f1', '>> b array([[ NaN, 2.], [ NaN, 4.]]) -------------- next part -------------- An HTML attachment was scrubbed... URL: From damienlmoore at gmail.com Sat Nov 6 11:52:11 2010 From: damienlmoore at gmail.com (Damien Moore) Date: Sat, 6 Nov 2010 11:52:11 -0400 Subject: [Numpy-discussion] numpy.genfromtxt converters issue In-Reply-To: References: Message-ID: In reply to my own question, the trivial, but massively inefficient solution is: s=StringIO.StringIO('q1,2\nq3,4') a=numpy.genfromtxt(s,delimiter=',',converters={0:lambda s:float(s[1:])}) a1 = numpy.array(a.tolist()) But what I really want to do is have genfromtxt do the conversion for me. Specifically, if I specify a dtype like: a=numpy.genfromtxt(s,delimiter=',',dtype=float,converters={0:lambda s:float(s[1:])}) then, ideally from my perspective, genfromtxt should coerce values to the dtype choice (after the converter has done its work) and output a 2d array or report an error. Instead it seems like whenever a converter is used, dtype gets ignored and a 1d array of tuples is always returned. More generally, it seems like the caller should have more control over what genfromtxt returns, whether a 1d array of tuples or the 2d array of a specific type. -------------- next part -------------- An HTML attachment was scrubbed... URL: From xscript at gmx.net Sat Nov 6 11:52:47 2010 From: xscript at gmx.net (=?utf-8?Q?Llu=C3=ADs?=) Date: Sat, 06 Nov 2010 16:52:47 +0100 Subject: [Numpy-discussion] numpy.genfromtxt converters issue In-Reply-To: (Damien Moore's message of "Sat, 6 Nov 2010 09:22:24 -0400") References: Message-ID: <87tyjuih4g.fsf@ginnungagap.bsc.es> Damien Moore writes: > Hi List, > I'm trying to import csv data as a numpy array using genfromtxt. The csv file > contains mixed data, some floating point, others string codes and dates that I > want to convert to floating point. The strange thing is that when I use the ' > converters' argument to convert a subset of the columns the resulting output of > genfromtxt becomes a 1d array of tuples instead of the desired 2d array of > floats. I've provided a simple example below. The output I want should be > numpy.array([[1,2],[3,4]]). Any thoughts on how to get my desired output would > be appreciated. > import numpy, StringIO > s=StringIO.StringIO('q1,2\nq3,4') > a=numpy.genfromtxt(s,delimiter=',',converters={0:lambda s:float(s[1:])}) > s=StringIO.StringIO('q1,2\nq3,4') > b=numpy.genfromtxt(s,delimiter=',') > a.shape > (2,) > b.shape > (2,2) >>>> a > array([(1.0, 2.0), (3.0, 4.0)], > ????? dtype=[('f0', '|O4'), ('f1', '>>> b > array([[ NaN,?? 2.], > ?????? [ NaN,?? 4.]]) I think this is what you want: >>> cat /tmp/test.csv 1, 2 3, 4 >>> numpy.genfromtxt("/tmp/test.csv", delimiter=",", dtype=float) array([[ 1., 2.], [ 3., 4.]]) Lluis -- "And it's much the same thing with knowledge, for whenever you learn something new, the whole world becomes that much richer." -- The Princess of Pure Reason, as told by Norton Juster in The Phantom Tollbooth From xscript at gmx.net Sat Nov 6 12:26:23 2010 From: xscript at gmx.net (=?utf-8?Q?Llu=C3=ADs?=) Date: Sat, 06 Nov 2010 17:26:23 +0100 Subject: [Numpy-discussion] numpy.genfromtxt converters issue In-Reply-To: <87tyjuih4g.fsf@ginnungagap.bsc.es> (=?utf-8?Q?=22Llu=C3=ADs?= =?utf-8?Q?=22's?= message of "Sat, 06 Nov 2010 16:52:47 +0100") References: <87tyjuih4g.fsf@ginnungagap.bsc.es> Message-ID: <87k4kqifkg.fsf@ginnungagap.bsc.es> Sorry, I got it wrong and ignored the StringIO part. Lluis Llu?s writes: > Damien Moore writes: [...] >> import numpy, StringIO >> s=StringIO.StringIO('q1,2\nq3,4') >> a=numpy.genfromtxt(s,delimiter=',',converters={0:lambda s:float(s[1:])}) >> s=StringIO.StringIO('q1,2\nq3,4') >> b=numpy.genfromtxt(s,delimiter=',') >> a.shape >> (2,) >> b.shape >> (2,2) >>>>> a >> array([(1.0, 2.0), (3.0, 4.0)], >> ????? dtype=[('f0', '|O4'), ('f1', '>>>> b >> array([[ NaN,?? 2.], >> ?????? [ NaN,?? 4.]]) > I think this is what you want: >>>> cat /tmp/test.csv > 1, 2 > 3, 4 >>>> numpy.genfromtxt("/tmp/test.csv", delimiter=",", dtype=float) > array([[ 1., 2.], > [ 3., 4.]]) > Lluis -- "And it's much the same thing with knowledge, for whenever you learn something new, the whole world becomes that much richer." -- The Princess of Pure Reason, as told by Norton Juster in The Phantom Tollbooth From josef.pktd at gmail.com Sat Nov 6 12:45:48 2010 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Sat, 6 Nov 2010 12:45:48 -0400 Subject: [Numpy-discussion] numpy.genfromtxt converters issue In-Reply-To: References: Message-ID: On Sat, Nov 6, 2010 at 11:52 AM, Damien Moore wrote: > In reply to my own question, the trivial, but massively inefficient solution > is: > > s=StringIO.StringIO('q1,2\nq3,4') > a=numpy.genfromtxt(s,delimiter=',',converters={0:lambda s:float(s[1:])}) > a1 = numpy.array(a.tolist()) > > But what I really want to do is have genfromtxt do the conversion for me. > Specifically, if I specify a dtype like: > > a=numpy.genfromtxt(s,delimiter=',',dtype=float,converters={0:lambda > s:float(s[1:])}) > > then, ideally from my perspective, genfromtxt should coerce values to the > dtype choice (after the converter has done its work) and output a 2d array > or report an error. Instead it seems like whenever a converter is used, > dtype gets ignored and a 1d array of tuples is always returned. > > More generally, it seems like the caller should have more control over what > genfromtxt returns, whether a 1d array of tuples or the 2d array of a > specific type. It seems that genfromtxt doesn't check the return type from the converter in this case. This is how I get to what you want with some detours >>> s=StringIO.StringIO('q1,2\nq3,4') >>> a=numpy.genfromtxt(s,delimiter=',',converters={0:lambda s:float(s[1:])},dtype=[float, float]) >>> a array([(1.0, 2.0), (3.0, 4.0)], dtype=[('f0', '>> a.view(float) array([ 1., 2., 3., 4.]) >>> a.view(float).reshape(a.shape[0], -1) array([[ 1., 2.], [ 3., 4.]]) Josef > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > From sunk.cs at gmail.com Sat Nov 6 15:28:04 2010 From: sunk.cs at gmail.com (K. Sun) Date: Sat, 06 Nov 2010 20:28:04 +0100 Subject: [Numpy-discussion] Optimize Floyd-Wallshall algorithm with Numpy Message-ID: <20101106192804.GA68097@comet> Hello, I wrote the following code with numpy to implement the Floyd-Wallshall algorithm to compute the pair-wise shortest path in a undirected weighted graph. It is really slow when N ~ 10k, while the same implementation in matlab is much faster. I am sorry I don't want to run it again to present some accurate comparison. Is there any suggestions to optimize this code without destroying the elegant coding style of python? Thank you very much. def floyd( dis ): ''' dis is the pair-wise distance matrix. return and update dis as the shortest path matrix w_{ij}.''' N = dis.shape[0] for k in range( N ): route = np.kron( np.ones( (N, 1) ), dis[k, :] ) dis = np.minimum( dis, route + route.T ) return dis From josef.pktd at gmail.com Sat Nov 6 15:46:17 2010 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Sat, 6 Nov 2010 15:46:17 -0400 Subject: [Numpy-discussion] Optimize Floyd-Wallshall algorithm with Numpy In-Reply-To: <20101106192804.GA68097@comet> References: <20101106192804.GA68097@comet> Message-ID: On Sat, Nov 6, 2010 at 3:28 PM, K. Sun wrote: > Hello, > > I wrote the following code with numpy to implement the Floyd-Wallshall > algorithm to compute the pair-wise shortest path in a undirected weighted > graph. It is really slow when N ~ 10k, while the same implementation in > matlab is much faster. I am sorry I don't want to run it again to > present some accurate comparison. Is there any suggestions to optimize > this code without destroying the elegant coding style of python? > Thank you very much. > > def floyd( dis ): > ? ? ''' > ? ? dis is the pair-wise distance matrix. > ? ? return and update dis as the shortest path matrix w_{ij}.''' > > ? ? N = dis.shape[0] > ? ? for k in range( N ): > ? ? ? ? route = np.kron( np.ones( (N, 1) ), dis[k, :] ) I think your kron just does broadcasting, if you use route = dis[k:k+1, :] (I expect) you get the same results, and it would save one intermediary array > ? ? ? ? dis = np.minimum( dis, route + route.T ) Otherwise, I don't see much that I would speed up (without understanding the algorithm) Josef > > ? ? return dis > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From wardefar at iro.umontreal.ca Sat Nov 6 15:46:35 2010 From: wardefar at iro.umontreal.ca (David Warde-Farley) Date: Sat, 6 Nov 2010 15:46:35 -0400 Subject: [Numpy-discussion] about SIMD (SSE2 & SSE3) In-Reply-To: References: Message-ID: <51BD339E-A9E5-4349-897D-2DEFA4C31F61@iro.umontreal.ca> On 2010-11-06, at 8:51 AM, qihua wu wrote: > I used the following command to install the numpy to enable the SSE3 > numpy-1.5.1rc1-win32-superpack-python3.1.exe /arch sse3 > > Then how can I know whether numpy is running with SSE or not? As far as I know, the only thing that uses SSE/SSE2/SSE3 would be BLAS operations. Things like elementwise addition, multiplication, etc. are not implemented to take advantage of vectorized machine instructions, at least not yet, unless the C compiler is aggressively optimizing and doing some loop unrolling which I sort of doubt. > I have a program to process the data from sql server using java to process 600M rows, it takes 7 hours to complete, about 4 hours is eating the cpu. I am wondering whether I can port the java to numpy to cut the 4 hours to 2hours or even less by enabling the SSE3. Any comment? It's not clear that crunching data from an SQL database would be any faster with NumPy. It really depends on the specifics of your problem. David From sunk.cs at gmail.com Sat Nov 6 16:14:07 2010 From: sunk.cs at gmail.com (K. Sun) Date: Sat, 06 Nov 2010 21:14:07 +0100 Subject: [Numpy-discussion] Optimize Floyd-Wallshall algorithm with Numpy In-Reply-To: References: <20101106192804.GA68097@comet> Message-ID: <20101106201407.GA68764@comet> Thanks a lot. It works! I modify the code as follows and it runs at fast as matlab. By numpy's convention, the input and output are all ndarrays. 'route' has to be a (1xN) matrix to produce a square matrix in 'route + route.T'. def floyd( dis ): '''Floyd-Wallshall algorithm for shortest path dis is the pair-wise distance matrix. return and update dis as the shortest path matrix w_{ij}.''' N = dis.shape[0] for k in range( N ): route = np.mat( dis[k, :] ) dis = np.minimum( dis, route + route.T ) return np.array( dis ) * josef.pktd at gmail.com [2010-11-06 15:46:17 -0400]: >On Sat, Nov 6, 2010 at 3:28 PM, K. Sun wrote: >> Hello, >> >> I wrote the following code with numpy to implement the Floyd-Wallshall >> algorithm to compute the pair-wise shortest path in a undirected weighted >> graph. It is really slow when N ~ 10k, while the same implementation in >> matlab is much faster. I am sorry I don't want to run it again to >> present some accurate comparison. Is there any suggestions to optimize >> this code without destroying the elegant coding style of python? >> Thank you very much. >> >> def floyd( dis ): >> ? ? ''' >> ? ? dis is the pair-wise distance matrix. >> ? ? return and update dis as the shortest path matrix w_{ij}.''' >> >> ? ? N = dis.shape[0] >> ? ? for k in range( N ): >> ? ? ? ? route = np.kron( np.ones( (N, 1) ), dis[k, :] ) > >I think your kron just does broadcasting, if you use > >route = dis[k:k+1, :] > >(I expect) you get the same results, and it would save one intermediary array > >> ? ? ? ? dis = np.minimum( dis, route + route.T ) > >Otherwise, I don't see much that I would speed up (without >understanding the algorithm) > >Josef >> >> ? ? return dis >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> >_______________________________________________ >NumPy-Discussion mailing list >NumPy-Discussion at scipy.org >http://mail.scipy.org/mailman/listinfo/numpy-discussion -- Ke Sun, Research Assistant Viper Group, Computer Vision and Multimedia Lab University of Geneva, Switzerland Tel: +41 (0)22 379 0176 Fax: +41 (0)22 379 0250 From josef.pktd at gmail.com Sat Nov 6 16:24:29 2010 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Sat, 6 Nov 2010 16:24:29 -0400 Subject: [Numpy-discussion] Optimize Floyd-Wallshall algorithm with Numpy In-Reply-To: <20101106201407.GA68764@comet> References: <20101106192804.GA68097@comet> <20101106201407.GA68764@comet> Message-ID: On Sat, Nov 6, 2010 at 4:14 PM, K. Sun wrote: > Thanks a lot. It works! I modify the code as follows and it runs > at fast as matlab. By numpy's convention, the input and output > are all ndarrays. 'route' has to be a (1xN) matrix to produce a > square matrix in 'route + route.T'. If you read my small print, route = dis[k:k+1, :] should create (1,N) array or alternatively dis[k, :][:,None] I don't like mixing matrices with arrays because it gets confusing and matrices are (sometimes, often?) slower. glad to hear it got faster. Josef > > def floyd( dis ): > ? ? '''Floyd-Wallshall algorithm for shortest path > > ? ? dis is the pair-wise distance matrix. > ? ? return and update dis as the shortest path matrix w_{ij}.''' > > ? ? N = dis.shape[0] > ? ? for k in range( N ): > ? ? ? ? route = np.mat( dis[k, :] ) > ? ? ? ? dis = np.minimum( dis, route + route.T ) > > ? ?return np.array( dis ) > > * josef.pktd at gmail.com [2010-11-06 15:46:17 -0400]: > >>On Sat, Nov 6, 2010 at 3:28 PM, K. Sun wrote: >>> Hello, >>> >>> I wrote the following code with numpy to implement the Floyd-Wallshall >>> algorithm to compute the pair-wise shortest path in a undirected weighted >>> graph. It is really slow when N ~ 10k, while the same implementation in >>> matlab is much faster. I am sorry I don't want to run it again to >>> present some accurate comparison. Is there any suggestions to optimize >>> this code without destroying the elegant coding style of python? >>> Thank you very much. >>> >>> def floyd( dis ): >>> ? ? ''' >>> ? ? dis is the pair-wise distance matrix. >>> ? ? return and update dis as the shortest path matrix w_{ij}.''' >>> >>> ? ? N = dis.shape[0] >>> ? ? for k in range( N ): >>> ? ? ? ? route = np.kron( np.ones( (N, 1) ), dis[k, :] ) >> >>I think your kron just does broadcasting, if you use >> >>route = dis[k:k+1, :] >> >>(I expect) you get the same results, and it would save one intermediary array >> >>> ? ? ? ? dis = np.minimum( dis, route + route.T ) >> >>Otherwise, I don't see much that I would speed up (without >>understanding the algorithm) >> >>Josef >>> >>> ? ? return dis >>> _______________________________________________ >>> NumPy-Discussion mailing list >>> NumPy-Discussion at scipy.org >>> http://mail.scipy.org/mailman/listinfo/numpy-discussion >>> >>_______________________________________________ >>NumPy-Discussion mailing list >>NumPy-Discussion at scipy.org >>http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -- > Ke Sun, Research Assistant > Viper Group, Computer Vision and Multimedia Lab > University of Geneva, Switzerland > Tel: +41 (0)22 379 0176 ? ?Fax: +41 (0)22 379 0250 > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From jsalvati at u.washington.edu Sat Nov 6 16:37:06 2010 From: jsalvati at u.washington.edu (John Salvatier) Date: Sat, 6 Nov 2010 13:37:06 -0700 Subject: [Numpy-discussion] Optimize Floyd-Wallshall algorithm with Numpy In-Reply-To: References: <20101106192804.GA68097@comet> <20101106201407.GA68764@comet> Message-ID: The difference is that dis[k,:] eliminates the first dimension since you are using a single number as an index, but dis[k:k+1,:] does not eliminate that dimension. On Sat, Nov 6, 2010 at 1:24 PM, wrote: > On Sat, Nov 6, 2010 at 4:14 PM, K. Sun wrote: >> Thanks a lot. It works! I modify the code as follows and it runs >> at fast as matlab. By numpy's convention, the input and output >> are all ndarrays. 'route' has to be a (1xN) matrix to produce a >> square matrix in 'route + route.T'. > > If you read my small print, > > route = dis[k:k+1, :] ?should create (1,N) array or alternatively > dis[k, :][:,None] > > I don't like mixing matrices with arrays because it gets confusing and > matrices are (sometimes, often?) slower. > > glad to hear it got faster. > > Josef > >> >> def floyd( dis ): >> ? ? '''Floyd-Wallshall algorithm for shortest path >> >> ? ? dis is the pair-wise distance matrix. >> ? ? return and update dis as the shortest path matrix w_{ij}.''' >> >> ? ? N = dis.shape[0] >> ? ? for k in range( N ): >> ? ? ? ? route = np.mat( dis[k, :] ) >> ? ? ? ? dis = np.minimum( dis, route + route.T ) >> >> ? ?return np.array( dis ) >> >> * josef.pktd at gmail.com [2010-11-06 15:46:17 -0400]: >> >>>On Sat, Nov 6, 2010 at 3:28 PM, K. Sun wrote: >>>> Hello, >>>> >>>> I wrote the following code with numpy to implement the Floyd-Wallshall >>>> algorithm to compute the pair-wise shortest path in a undirected weighted >>>> graph. It is really slow when N ~ 10k, while the same implementation in >>>> matlab is much faster. I am sorry I don't want to run it again to >>>> present some accurate comparison. Is there any suggestions to optimize >>>> this code without destroying the elegant coding style of python? >>>> Thank you very much. >>>> >>>> def floyd( dis ): >>>> ? ? ''' >>>> ? ? dis is the pair-wise distance matrix. >>>> ? ? return and update dis as the shortest path matrix w_{ij}.''' >>>> >>>> ? ? N = dis.shape[0] >>>> ? ? for k in range( N ): >>>> ? ? ? ? route = np.kron( np.ones( (N, 1) ), dis[k, :] ) >>> >>>I think your kron just does broadcasting, if you use >>> >>>route = dis[k:k+1, :] >>> >>>(I expect) you get the same results, and it would save one intermediary array >>> >>>> ? ? ? ? dis = np.minimum( dis, route + route.T ) >>> >>>Otherwise, I don't see much that I would speed up (without >>>understanding the algorithm) >>> >>>Josef >>>> >>>> ? ? return dis >>>> _______________________________________________ >>>> NumPy-Discussion mailing list >>>> NumPy-Discussion at scipy.org >>>> http://mail.scipy.org/mailman/listinfo/numpy-discussion >>>> >>>_______________________________________________ >>>NumPy-Discussion mailing list >>>NumPy-Discussion at scipy.org >>>http://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> -- >> Ke Sun, Research Assistant >> Viper Group, Computer Vision and Multimedia Lab >> University of Geneva, Switzerland >> Tel: +41 (0)22 379 0176 ? ?Fax: +41 (0)22 379 0250 >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From staywithpin at gmail.com Sat Nov 6 19:46:57 2010 From: staywithpin at gmail.com (qihua wu) Date: Sun, 7 Nov 2010 07:46:57 +0800 Subject: [Numpy-discussion] about SIMD (SSE2 & SSE3) In-Reply-To: <51BD339E-A9E5-4349-897D-2DEFA4C31F61@iro.umontreal.ca> References: <51BD339E-A9E5-4349-897D-2DEFA4C31F61@iro.umontreal.ca> Message-ID: Thank David, the java program takes 3 hours to read data, after read the data into memory, it takes 4 hours to process/calculate somthing on all these data. The data is the sale data which contains both promoted sale and non-promoted sale, the program needs to predict the non-promoted sale: so input data is a serial of promoted sale and non-promoted sale, the output is a serial of non-promoted sale. e.g day 1,2,3 have the non-promoted sales, day 4 have the promoted sales, day 5,6,7 have the non-promted sales, the output for day 1~7 are all non-promoted sales. During the process, we might need to sum all the data for day 1~7, is this what you called " elementwise addition, multiplication", which can't be SIMDed in numpy? On Sun, Nov 7, 2010 at 3:46 AM, David Warde-Farley < wardefar at iro.umontreal.ca> wrote: > On 2010-11-06, at 8:51 AM, qihua wu wrote: > > > I used the following command to install the numpy to enable the SSE3 > > numpy-1.5.1rc1-win32-superpack-python3.1.exe /arch sse3 > > > > Then how can I know whether numpy is running with SSE or not? > > As far as I know, the only thing that uses SSE/SSE2/SSE3 would be BLAS > operations. Things like elementwise addition, multiplication, etc. are not > implemented to take advantage of vectorized machine instructions, at least > not yet, unless the C compiler is aggressively optimizing and doing some > loop unrolling which I sort of doubt. > > > I have a program to process the data from sql server using java to > process 600M rows, it takes 7 hours to complete, about 4 hours is eating the > cpu. I am wondering whether I can port the java to numpy to cut the 4 hours > to 2hours or even less by enabling the SSE3. Any comment? > > It's not clear that crunching data from an SQL database would be any faster > with NumPy. It really depends on the specifics of your problem. > > David > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From wardefar at iro.umontreal.ca Sat Nov 6 20:34:03 2010 From: wardefar at iro.umontreal.ca (David Warde-Farley) Date: Sat, 6 Nov 2010 20:34:03 -0400 Subject: [Numpy-discussion] about SIMD (SSE2 & SSE3) In-Reply-To: References: <51BD339E-A9E5-4349-897D-2DEFA4C31F61@iro.umontreal.ca> Message-ID: <9119314E-7080-4F84-B9CE-8D5F4B62C5AF@iro.umontreal.ca> On 2010-11-06, at 7:46 PM, qihua wu wrote: > day 1,2,3 have the non-promoted sales, day 4 have the promoted sales, day 5,6,7 have the non-promted sales, the output for day 1~7 are all non-promoted sales. During the process, we might need to sum all the data for day 1~7, is this what you called " elementwise addition, multiplication", which can't be SIMDed in numpy? Really the only thing that can be SIMDed with SSE/SSE2/SSE3 is matrix-matrix or matrix-vector multiplies, i.e. things that involve calls to the BLAS. NumPy will perform the summations you mention with efficient loops at the C level but not using SSE. I don't know how much of a speed boost this will provide over Java, as the JVM is pretty heavily optimized. David From cournape at gmail.com Sat Nov 6 21:05:57 2010 From: cournape at gmail.com (David Cournapeau) Date: Sun, 7 Nov 2010 10:05:57 +0900 Subject: [Numpy-discussion] about SIMD (SSE2 & SSE3) In-Reply-To: References: Message-ID: On Sat, Nov 6, 2010 at 9:51 PM, qihua wu wrote: > I used the following command to install the numpy to enable the SSE3 > numpy-1.5.1rc1-win32-superpack-python3.1.exe /arch sse3 The whole point of the super pack installer is to install the most optimized one possible on your machine. So you should not use the arch flag (it is meant for people who want to explicity install something which is not the most optimal one). As for your program, it depends too much on what you are doing. Keep in mind that java with the appropriate JVM is pretty fast, cheers, David From ralf.gommers at googlemail.com Sun Nov 7 02:51:30 2010 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Sun, 7 Nov 2010 15:51:30 +0800 Subject: [Numpy-discussion] updated 1.5.1 release schedule Message-ID: Hi, Since we weren't able to stick to the original schedule for 1.5.1, here's a short note about the current status. There are two changes that need to go in before RC2: https://github.com/numpy/numpy/pull/11 https://github.com/numpy/numpy/pull/9 If no one has time for a review I'll commit those to 1.5.x by Tuesday and tag RC2. Final release should be one week after that, unless an RC3 is necessary. Cheers, Ralf From faltet at pytables.org Sun Nov 7 03:39:27 2010 From: faltet at pytables.org (Francesc Alted) Date: Sun, 7 Nov 2010 09:39:27 +0100 Subject: [Numpy-discussion] about SIMD (SSE2 & SSE3) In-Reply-To: References: <51BD339E-A9E5-4349-897D-2DEFA4C31F61@iro.umontreal.ca> Message-ID: 2010/11/7 qihua wu > Thank David, > > the java program takes 3 hours to read data, after read the data into > memory, it takes 4 hours to process/calculate somthing on all these data. > The data is the sale data which contains both promoted sale and > non-promoted sale, the program needs to predict the non-promoted sale: so > input data is a serial of promoted sale and non-promoted sale, the output is > a serial of non-promoted sale. e.g > day 1,2,3 have the non-promoted sales, day 4 have the promoted sales, day > 5,6,7 have the non-promted sales, the output for day 1~7 are all > non-promoted sales. During the process, we might need to sum all the data > for day 1~7, is this what you called " elementwise addition, > multiplication", which can't be SIMDed in numpy? > There is little sense in implementing element wise adds and mults in SIMD because these operations are memory bounded in modern computers. SIMD is only useful when you want to accelerate operations that are CPU bounded (e.g. evaluation of transcendental functions or matrix-matrix operations). You can get a better grasp on this limitation (I like to call it the starving CPU problem) by having a look at this material: https://portal.g-node.org/python-autumnschool/materials/starving_cpus It also includes exercises, so that you can do your own experiments. -- Francesc Alted -------------- next part -------------- An HTML attachment was scrubbed... URL: From pgmdevlist at gmail.com Sun Nov 7 10:07:28 2010 From: pgmdevlist at gmail.com (Pierre GM) Date: Sun, 7 Nov 2010 16:07:28 +0100 Subject: [Numpy-discussion] numpy.genfromtxt converters issue In-Reply-To: References: Message-ID: <66F2753F-6F60-4E1B-9A10-960EBE1F614C@gmail.com> On Nov 6, 2010, at 2:22 PM, Damien Moore wrote: > Hi List, > > I'm trying to import csv data as a numpy array using genfromtxt. The csv file contains mixed data, some floating point, others string codes and dates that I want to convert to floating point. The strange thing is that when I use the 'converters' argument to convert a subset of the columns the resulting output of genfromtxt becomes a 1d array of tuples instead of the desired 2d array of floats. I've provided a simple example below. Please open a ticket so that I don't forget about it. Thx in advance! From ralf.gommers at googlemail.com Sun Nov 7 10:17:31 2010 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Sun, 7 Nov 2010 23:17:31 +0800 Subject: [Numpy-discussion] Trac unaware of github move In-Reply-To: References: Message-ID: On Thu, Nov 4, 2010 at 10:06 PM, Pauli Virtanen wrote: > Thu, 04 Nov 2010 22:00:47 +0800, Ralf Gommers wrote: >> I just noticed that the Trac wiki is not displaying updates to files >> kept in the source tree, for example >> http://projects.scipy.org/numpy/wiki/TestingGuidelines is stuck at an >> older version. >> >> Can one of the admins point the ReST plugin to github? > > That would require more work on the Trac-Git integration front: > > ? ?http://github.com/pv/githubsimple-trac > > It might be more cost-effective to just use links to the Github web > interface. That will require renaming those files in the source tree from *.txt to *.rst, otherwise there's no way to have github render them properly. Unless I missed something. Would that be fine? Ralf From charlesr.harris at gmail.com Sun Nov 7 13:38:32 2010 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sun, 7 Nov 2010 11:38:32 -0700 Subject: [Numpy-discussion] updated 1.5.1 release schedule In-Reply-To: References: Message-ID: Hi Ralph, On Sun, Nov 7, 2010 at 1:51 AM, Ralf Gommers wrote: > Hi, > > Since we weren't able to stick to the original schedule for 1.5.1, > here's a short note about the current status. There are two changes > that need to go in before RC2: > https://github.com/numpy/numpy/pull/11 > https://github.com/numpy/numpy/pull/9 > If no one has time for a review I'll commit those to 1.5.x by Tuesday > and tag RC2. Final release should be one week after that, unless an > RC3 is necessary. > > I committed #11. I left some comments on #9, I think it should be split up a bit. It is probably easier for you to do that yourself than for me to mess with master. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From vincent at vincentdavis.net Sun Nov 7 16:16:09 2010 From: vincent at vincentdavis.net (Vincent Davis) Date: Sun, 7 Nov 2010 14:16:09 -0700 Subject: [Numpy-discussion] updated 1.5.1 release schedule In-Reply-To: References: Message-ID: On Sun, Nov 7, 2010 at 1:51 AM, Ralf Gommers wrote: > Hi, > > Since we weren't able to stick to the original schedule for 1.5.1, > here's a short note about the current status. There are two changes > that need to go in before RC2: > https://github.com/numpy/numpy/pull/11 > https://github.com/numpy/numpy/pull/9 > If no one has time for a review I'll commit those to 1.5.x by Tuesday > and tag RC2. Final release should be one week after that, unless an > RC3 is necessary. > Since we will have 2 different dmgs for python2.7 (osx10.3 and osx10.5) and I don't think there is any check in the installer to make sure the right python2.7 version is present when installing. The installer only check that python2.7 is present. I think a check should be added. I am missing somthing or there are other suggestions I would like to get this in 1.5.1rc2. I not sure the best way to make this check but I think I can come up with a solution. Also would need a useful error message. Vincent > Cheers, > Ralf > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -- Thanks Vincent Davis 720-301-3003 -------------- next part -------------- An HTML attachment was scrubbed... URL: From ondrej at certik.cz Sun Nov 7 17:25:38 2010 From: ondrej at certik.cz (Ondrej Certik) Date: Sun, 7 Nov 2010 14:25:38 -0800 Subject: [Numpy-discussion] Trac unaware of github move In-Reply-To: References: Message-ID: On Sun, Nov 7, 2010 at 7:17 AM, Ralf Gommers wrote: > On Thu, Nov 4, 2010 at 10:06 PM, Pauli Virtanen wrote: >> Thu, 04 Nov 2010 22:00:47 +0800, Ralf Gommers wrote: >>> I just noticed that the Trac wiki is not displaying updates to files >>> kept in the source tree, for example >>> http://projects.scipy.org/numpy/wiki/TestingGuidelines is stuck at an >>> older version. >>> >>> Can one of the admins point the ReST plugin to github? >> >> That would require more work on the Trac-Git integration front: >> >> ? ?http://github.com/pv/githubsimple-trac >> >> It might be more cost-effective to just use links to the Github web >> interface. > > That will require renaming those files in the source tree from *.txt > to *.rst, otherwise there's no way to have github render them > properly. Unless I missed something. Would that be fine? I use .rst instead of .txt for my projects, so that github can render it properly. I don't know if that's the right solution, but it gets the job done. Ondrej From ralf.gommers at googlemail.com Sun Nov 7 19:09:02 2010 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Mon, 8 Nov 2010 08:09:02 +0800 Subject: [Numpy-discussion] updated 1.5.1 release schedule In-Reply-To: References: Message-ID: On Mon, Nov 8, 2010 at 2:38 AM, Charles R Harris wrote: > Hi Ralph, > > On Sun, Nov 7, 2010 at 1:51 AM, Ralf Gommers > wrote: >> >> Hi, >> >> Since we weren't able to stick to the original schedule for 1.5.1, >> here's a short note about the current status. There are two changes >> that need to go in before RC2: >> https://github.com/numpy/numpy/pull/11 >> https://github.com/numpy/numpy/pull/9 >> If no one has time for a review I'll commit those to 1.5.x by Tuesday >> and tag RC2. Final release should be one week after that, unless an >> RC3 is necessary. >> > > I committed #11. I left some comments on #9, I think it should be split up a > bit. It is probably easier for you to do that yourself than for me to mess > with master. > Thanks. I'll split up #9. Whitespace cleanup separate for a single file seems like a wasted commit though - would be good to do that on all files at once. Cheers, Ralf From ralf.gommers at googlemail.com Sun Nov 7 19:12:22 2010 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Mon, 8 Nov 2010 08:12:22 +0800 Subject: [Numpy-discussion] updated 1.5.1 release schedule In-Reply-To: References: Message-ID: On Mon, Nov 8, 2010 at 5:16 AM, Vincent Davis wrote: > > > On Sun, Nov 7, 2010 at 1:51 AM, Ralf Gommers > wrote: >> >> Hi, >> >> Since we weren't able to stick to the original schedule for 1.5.1, >> here's a short note about the current status. There are two changes >> that need to go in before RC2: >> https://github.com/numpy/numpy/pull/11 >> https://github.com/numpy/numpy/pull/9 >> If no one has time for a review I'll commit those to 1.5.x by Tuesday >> and tag RC2. Final release should be one week after that, unless an >> RC3 is necessary. > > Since we will have 2 different dmgs for python2.7 (osx10.3 and osx10.5) and > I don't think there is any check in the installer to make sure the right > python2.7 version is present when installing. The installer only check that > python2.7 is present. I think a check should be added. I am missing somthing > or there are other suggestions I would like to get this in 1.5.1rc2. I not > sure the best way to make this check but I think I can come up with a > solution. Also would need a useful error message. > Vincent To let the user know if there's a mismatch may be helpful, but we shouldn't prevent installation. In many cases mixing installers will just work. If you have a patch it's welcome, but I think it's not critical for this release. Cheers, Ralf From charlesr.harris at gmail.com Sun Nov 7 19:41:49 2010 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sun, 7 Nov 2010 17:41:49 -0700 Subject: [Numpy-discussion] updated 1.5.1 release schedule In-Reply-To: References: Message-ID: On Sun, Nov 7, 2010 at 5:09 PM, Ralf Gommers wrote: > On Mon, Nov 8, 2010 at 2:38 AM, Charles R Harris > wrote: > > Hi Ralph, > > > > On Sun, Nov 7, 2010 at 1:51 AM, Ralf Gommers < > ralf.gommers at googlemail.com> > > wrote: > >> > >> Hi, > >> > >> Since we weren't able to stick to the original schedule for 1.5.1, > >> here's a short note about the current status. There are two changes > >> that need to go in before RC2: > >> https://github.com/numpy/numpy/pull/11 > >> https://github.com/numpy/numpy/pull/9 > >> If no one has time for a review I'll commit those to 1.5.x by Tuesday > >> and tag RC2. Final release should be one week after that, unless an > >> RC3 is necessary. > >> > > > > I committed #11. I left some comments on #9, I think it should be split > up a > > bit. It is probably easier for you to do that yourself than for me to > mess > > with master. > > > Thanks. I'll split up #9. Whitespace cleanup separate for a single > file seems like a wasted commit though - would be good to do that on > all files at once. > > Sure. You could also do it as separate commits. In this case however, I think just two patches would be fine, one with the uncontroversial stuff and the other for further consideration. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From goodfellow.ian at gmail.com Mon Nov 8 09:18:45 2010 From: goodfellow.ian at gmail.com (Ian Goodfellow) Date: Mon, 8 Nov 2010 09:18:45 -0500 Subject: [Numpy-discussion] Anyone with Core i7 and Ubuntu 10.04? Message-ID: I'm wondering if anyone here has successfully built numpy with ATLAS and a Core i7 CPU on Ubuntu 10.04. If so, I could really use your help. I've been trying since August (see my earlier messages to this list) to get numpy running at full speed on my machine with no luck. The Ubuntu packages don't seem very fast, and numpy won't use the version of ATLAS that I compiled. It's pretty sad; anything that involves a lot of BLAS calls runs slower on this 2.8 ghz Core i7 than on an older 2.66 ghz Core 2 Quad I use at work. From sebastien.barthelemy at crans.org Mon Nov 8 11:03:39 2010 From: sebastien.barthelemy at crans.org (=?ISO-8859-15?Q?S=E9bastien_Barth=E9lemy?=) Date: Mon, 8 Nov 2010 17:03:39 +0100 Subject: [Numpy-discussion] Trac unaware of github move In-Reply-To: References: Message-ID: On Sun, 7 Nov 2010, Ralf Gommers wrote: > That will require renaming those files in the source tree from *.txt > to *.rst, otherwise there's no way to have github render them > properly. Unless I missed something. Would that be fine? I think a *.rst.txt extension would also be recognized by github. Note that the docutils FAQ advises against using .rst as a file extension : http://docutils.sourceforge.net/FAQ.html#what-s-the-standard-filename-extension-for-a-restructuredtext-file Cheers From braingateway at gmail.com Mon Nov 8 12:56:06 2010 From: braingateway at gmail.com (LittleBigBrain) Date: Mon, 8 Nov 2010 18:56:06 +0100 Subject: [Numpy-discussion] LapackError:non-native byte order Message-ID: Hi everyone, In my system '<' is the native byte-order, but unless I change the byte-order label to '=', it won't work in linalg sub-module, but in others works OK. I am not sure whether this is an expected behavior or a bug? >>> import sys >>> sys.byteorder 'little' >>> a.dtype.byteorder '<' >>> b.dtype.byteorder '<' >>> c=a*b >>> c.dtype.byteorder '=' >>> d=npy.linalg.solve(a, c) Traceback (most recent call last): File "", line 1, in d=npy.linalg.solve(a, c) File "C:\Python27\lib\site-packages\numpy\linalg\linalg.py", line 326, in solve results = lapack_routine(n_eq, n_rhs, a, n_eq, pivots, b, n_eq, 0) LapackError: Parameter a has non-native byte order in lapack_lite.dgesv >>> cc=c.newbyteorder('<') >>> cc.dtype.byteorder '<' >>> d=npy.linalg.solve(a, cc) Traceback (most recent call last): File "", line 1, in d=npy.linalg.solve(a, cc) File "C:\Python27\lib\site-packages\numpy\linalg\linalg.py", line 326, in solve results = lapack_routine(n_eq, n_rhs, a, n_eq, pivots, b, n_eq, 0) LapackError: Parameter a has non-native byte order in lapack_lite.dgesv >>> d=npy.linalg.solve(a.newbyteorder('='), c) >>> d.shape (2000L, 1000L) Thanks, LittleBigBrain From pav at iki.fi Mon Nov 8 13:31:31 2010 From: pav at iki.fi (Pauli Virtanen) Date: Mon, 08 Nov 2010 19:31:31 +0100 Subject: [Numpy-discussion] LapackError:non-native byte order In-Reply-To: References: Message-ID: <1289241091.3170.2.camel@talisman> ma, 2010-11-08 kello 18:56 +0100, LittleBigBrain kirjoitti: > In my system '<' is the native byte-order, but unless I change the > byte-order label to '=', it won't work in linalg sub-module, but in > others works OK. I am not sure whether this is an expected behavior or > a bug? > >>> import sys > >>> sys.byteorder > 'little' > >>> a.dtype.byteorder > '<' > >>> b.dtype.byteorder > '<' The error is here: it's not possible to create such dtypes via any Numpy methods -- the '<' (or '>') is always normalized to '='. Numpy and several other modules consequently assume this normalization. Where do `a` and `b` come from? -- Pauli Virtanen From pav at iki.fi Mon Nov 8 13:34:20 2010 From: pav at iki.fi (Pauli Virtanen) Date: Mon, 8 Nov 2010 18:34:20 +0000 (UTC) Subject: [Numpy-discussion] LapackError:non-native byte order References: <1289241091.3170.2.camel@talisman> Message-ID: Mon, 08 Nov 2010 19:31:31 +0100, Pauli Virtanen wrote: > ma, 2010-11-08 kello 18:56 +0100, LittleBigBrain kirjoitti: >> In my system '<' is the native byte-order, but unless I change the >> byte-order label to '=', it won't work in linalg sub-module, but in >> others works OK. I am not sure whether this is an expected behavior or >> a bug? >> >>> import sys >> >>> sys.byteorder >> 'little' >> >>> a.dtype.byteorder >> '<' >> >>> b.dtype.byteorder >> '<' > > The error is here: it's not possible to create such dtypes via any Numpy > methods -- the '<' (or '>') is always normalized to '='. Numpy and > several other modules consequently assume this normalization. > > Where do `a` and `b` come from? Ok, `x.newbyteorder('<')` seems to do this. Now I'm unsure how things are supposed to work. -- Pauli Virtanen From renato.fabbri at gmail.com Mon Nov 8 14:00:34 2010 From: renato.fabbri at gmail.com (Renato Fabbri) Date: Mon, 8 Nov 2010 17:00:34 -0200 Subject: [Numpy-discussion] OFFTOPIC: simple databases Message-ID: Dear All, i want to find simple databases, like a 5 dimensional with more than 30 samples. i am having difficult times with this. where do you get them? all the best, rf -- GNU/Linux User #479299 skype: fabbri.renato From matthew.brett at gmail.com Mon Nov 8 14:05:34 2010 From: matthew.brett at gmail.com (Matthew Brett) Date: Mon, 8 Nov 2010 11:05:34 -0800 Subject: [Numpy-discussion] LapackError:non-native byte order In-Reply-To: References: <1289241091.3170.2.camel@talisman> Message-ID: Hi, On Mon, Nov 8, 2010 at 10:34 AM, Pauli Virtanen wrote: > Mon, 08 Nov 2010 19:31:31 +0100, Pauli Virtanen wrote: > >> ma, 2010-11-08 kello 18:56 +0100, LittleBigBrain kirjoitti: >>> In my system '<' is the native byte-order, but unless I change the >>> byte-order label to '=', it won't work in linalg sub-module, but in >>> others works OK. I am not sure whether this is an expected behavior or >>> a bug? >>> >>> import sys >>> >>> sys.byteorder >>> 'little' >>> >>> a.dtype.byteorder >>> '<' >>> >>> b.dtype.byteorder >>> '<' >> >> The error is here: it's not possible to create such dtypes via any Numpy >> methods -- the '<' (or '>') is always normalized to '='. Numpy and >> several other modules consequently assume this normalization. >> >> Where do `a` and `b` come from? > > Ok, `x.newbyteorder('<')` seems to do this. Now I'm unsure how things are > supposed to work. Yes - it is puzzling that ``x.newbyteorder('<')`` makes arrays that are confusing to numpy. If numpy generally always normalizes to the system endian to '=' then should that not also be true of ``newbyteorder``? See you, Matthew From pav at iki.fi Mon Nov 8 14:06:30 2010 From: pav at iki.fi (Pauli Virtanen) Date: Mon, 8 Nov 2010 19:06:30 +0000 (UTC) Subject: [Numpy-discussion] OFFTOPIC: simple databases References: Message-ID: Mon, 08 Nov 2010 17:00:34 -0200, Renato Fabbri wrote: [clip: offtopic] Please post this on the scipy-user list instead, it's more suitable for misc questions. -- Pauli Virtanen From groups.and.lists at gmail.com Mon Nov 8 14:17:11 2010 From: groups.and.lists at gmail.com (Joon) Date: Mon, 08 Nov 2010 13:17:11 -0600 Subject: [Numpy-discussion] Solving Ax = b: inverse vs cholesky factorization Message-ID: Hi, I was wondering when it is better to store cholesky factor and use it to solve Ax = b, instead of storing the inverse of A. (A is a symmetric, positive-definite matrix.) Even in the repeated case, if I have the inverse of A (invA) stored, then I can solve Ax = b_i, i = 1, ... , n, by x = dot(invA, b_i). Is dot(invA, b_i) slower than cho_solve(cho_factor, b_i)? I heard calculating the inverse is not recommended, but my understanding is that numpy.linalg.inv actually solves Ax = I instead of literally calculating the inverse of A. It would be great if I can get some intuition about this. Thank you, Joon -------------- next part -------------- An HTML attachment was scrubbed... URL: From groups.and.lists at gmail.com Mon Nov 8 14:22:33 2010 From: groups.and.lists at gmail.com (Joon) Date: Mon, 08 Nov 2010 13:22:33 -0600 Subject: [Numpy-discussion] Solving Ax = b: inverse vs cholesky factorization Message-ID: Hi, I was wondering when it is better to store cholesky factor and use it to solve Ax = b, instead of storing the inverse of A. (A is a symmetric, positive-definite matrix.) Even in the repeated case, if I have the inverse of A (invA) stored, then I can solve Ax = b_i, i = 1, ... , n, by x = dot(invA, b_i). Is dot(invA, b_i) slower than cho_solve(cho_factor, b_i)? I heard calculating the inverse is not recommended, but my understanding is that numpy.linalg.inv actually solves Ax = I instead of literally calculating the inverse of A. It would be great if I can get some intuition about this. Thank you, Joon -------------- next part -------------- An HTML attachment was scrubbed... URL: From pav at iki.fi Mon Nov 8 14:23:46 2010 From: pav at iki.fi (Pauli Virtanen) Date: Mon, 8 Nov 2010 19:23:46 +0000 (UTC) Subject: [Numpy-discussion] Solving Ax = b: inverse vs cholesky factorization References: Message-ID: Mon, 08 Nov 2010 13:17:11 -0600, Joon wrote: > I was wondering when it is better to store cholesky factor and use it to > solve Ax = b, instead of storing the inverse of A. (A is a symmetric, > positive-definite matrix.) > > Even in the repeated case, if I have the inverse of A (invA) stored, > then I can solve Ax = b_i, i = 1, ... , n, by x = dot(invA, b_i). Is > dot(invA, b_i) slower than cho_solve(cho_factor, b_i)? Not necessarily slower, but it contains more numerical error. http://www.johndcook.com/blog/2010/01/19/dont-invert-that-matrix/ > I heard calculating the inverse is not recommended, but my understanding > is that numpy.linalg.inv actually solves Ax = I instead of literally > calculating the inverse of A. It would be great if I can get some > intuition about this. That's the same thing as computing the inverse matrix. -- Pauli Virtanen From bsouthey at gmail.com Mon Nov 8 14:34:42 2010 From: bsouthey at gmail.com (Bruce Southey) Date: Mon, 08 Nov 2010 13:34:42 -0600 Subject: [Numpy-discussion] Developmental version numbering with git Message-ID: <4CD850D2.7020703@gmail.com> Hi, Since the change to git the numpy version in setup.py is '2.0.0.dev' regardless because the prior numbering was determined by svn. Is there a plan to add some numbering system to numpy developmental version? Regardless of the answer, the 'numpy/numpy/version.py' will need to changed because of the reference to the svn naming. Bruce From groups.and.lists at gmail.com Mon Nov 8 14:38:45 2010 From: groups.and.lists at gmail.com (Joon) Date: Mon, 08 Nov 2010 13:38:45 -0600 Subject: [Numpy-discussion] Solving Ax = b: inverse vs cholesky factorization In-Reply-To: References: Message-ID: On Mon, 08 Nov 2010 13:23:46 -0600, Pauli Virtanen wrote: > Mon, 08 Nov 2010 13:17:11 -0600, Joon wrote: >> I was wondering when it is better to store cholesky factor and use it to >> solve Ax = b, instead of storing the inverse of A. (A is a symmetric, >> positive-definite matrix.) >> >> Even in the repeated case, if I have the inverse of A (invA) stored, >> then I can solve Ax = b_i, i = 1, ... , n, by x = dot(invA, b_i). Is >> dot(invA, b_i) slower than cho_solve(cho_factor, b_i)? > > Not necessarily slower, but it contains more numerical error. > > http://www.johndcook.com/blog/2010/01/19/dont-invert-that-matrix/ > >> I heard calculating the inverse is not recommended, but my understanding >> is that numpy.linalg.inv actually solves Ax = I instead of literally >> calculating the inverse of A. It would be great if I can get some >> intuition about this. > > That's the same thing as computing the inverse matrix. > Oh I see. So I guess in invA = solve(Ax, I) and then x = dot(invA, b) case, there are more places where numerical errors occur, than just x = solve(Ax, b) case. Thank you, Joon -- -------------- next part -------------- An HTML attachment was scrubbed... URL: From groups.and.lists at gmail.com Mon Nov 8 15:00:48 2010 From: groups.and.lists at gmail.com (Joon) Date: Mon, 08 Nov 2010 14:00:48 -0600 Subject: [Numpy-discussion] Solving Ax = b: inverse vs cholesky factorization In-Reply-To: References: Message-ID: On Mon, 08 Nov 2010 13:23:46 -0600, Pauli Virtanen wrote: > Mon, 08 Nov 2010 13:17:11 -0600, Joon wrote: >> I was wondering when it is better to store cholesky factor and use it to >> solve Ax = b, instead of storing the inverse of A. (A is a symmetric, >> positive-definite matrix.) >> >> Even in the repeated case, if I have the inverse of A (invA) stored, >> then I can solve Ax = b_i, i = 1, ... , n, by x = dot(invA, b_i). Is >> dot(invA, b_i) slower than cho_solve(cho_factor, b_i)? > > Not necessarily slower, but it contains more numerical error. > > http://www.johndcook.com/blog/2010/01/19/dont-invert-that-matrix/ > >> I heard calculating the inverse is not recommended, but my understanding >> is that numpy.linalg.inv actually solves Ax = I instead of literally >> calculating the inverse of A. It would be great if I can get some >> intuition about this. > > That's the same thing as computing the inverse matrix. > Another question is, is it better to do cho_solve(cho_factor(A), b) than solve(A, b)? Thank you, Joon -- From bsouthey at gmail.com Mon Nov 8 15:06:03 2010 From: bsouthey at gmail.com (Bruce Southey) Date: Mon, 08 Nov 2010 14:06:03 -0600 Subject: [Numpy-discussion] Solving Ax = b: inverse vs cholesky factorization In-Reply-To: References: Message-ID: <4CD8582B.2060803@gmail.com> On 11/08/2010 01:38 PM, Joon wrote: > On Mon, 08 Nov 2010 13:23:46 -0600, Pauli Virtanen wrote: > > > Mon, 08 Nov 2010 13:17:11 -0600, Joon wrote: > >> I was wondering when it is better to store cholesky factor and use > it to > >> solve Ax = b, instead of storing the inverse of A. (A is a symmetric, > >> positive-definite matrix.) > >> > >> Even in the repeated case, if I have the inverse of A (invA) stored, > >> then I can solve Ax = b_i, i = 1, ... , n, by x = dot(invA, b_i). Is > >> dot(invA, b_i) slower than cho_solve(cho_factor, b_i)? > > > > Not necessarily slower, but it contains more numerical error. > > > > http://www.johndcook.com/blog/2010/01/19/dont-invert-that-matrix/ > > > >> I heard calculating the inverse is not recommended, but my > understanding > >> is that numpy.linalg.inv actually solves Ax = I instead of literally > >> calculating the inverse of A. It would be great if I can get some > >> intuition about this. > > > > That's the same thing as computing the inverse matrix. > > > > Oh I see. So I guess in invA = solve(Ax, I) and then x = dot(invA, b) > case, there are more places where numerical errors occur, than just > x = solve(Ax, b) case. > > Thank you, > Joon > > > Numpy uses SVD to get the (pseudo) inverse, which is usually very accurate at getting (pseudo) inverse. There are a lot of misconceptions involved but ultimately it comes down to two options: If you need the inverse (like standard errors) then everything else is rather moot. If you are just solving a system then there are better numerical solvers available in speed and accuracy than using inverse or similar approaches. Bruce From jsseabold at gmail.com Mon Nov 8 15:14:36 2010 From: jsseabold at gmail.com (Skipper Seabold) Date: Mon, 8 Nov 2010 15:14:36 -0500 Subject: [Numpy-discussion] Catching and dealing with floating point errors Message-ID: I am doing some optimizations on random samples. In a small number of cases, the objective is not well-defined for a given sample (it's not possible to tell beforehand and hopefully won't happen much in practice). What is the most numpythonic way to handle this? It doesn't look like I can use np.seterrcall in this case (without ignoring its actual intent). Here's a toy example of the method I have come up with. import numpy as np def reset_seterr(d): """ Helper function to reset FP error-handling to user's original settings """ for action in [i+'='+"'"+d[i]+"'" for i in d]: exec(action) np.seterr(over=over, divide=divide, invalid=invalid, under=under) def log_random_sample(X): """ Toy example to catch a FP error, re-sample, and return objective """ d = np.seterr() # get original values to reset np.seterr('raise') # set to raise on fp error in order to catch try: ret = np.log(X) reset_seterr(d) return ret except: lb,ub = -1,1 # includes bad domain to test recursion X = np.random.uniform(lb,ub) reset_seterr(d) return log_random_sample(X) lb,ub = 0,0 orig_setting = np.seterr() X = np.random.uniform(lb,ub) log_random_sample(X) assert(orig_setting == np.seterr()) This seems to work, but I'm not sure it's as transparent as it could be. If it is, then maybe it will be useful to others. Skipper From pav at iki.fi Mon Nov 8 15:17:46 2010 From: pav at iki.fi (Pauli Virtanen) Date: Mon, 8 Nov 2010 20:17:46 +0000 (UTC) Subject: [Numpy-discussion] Solving Ax = b: inverse vs cholesky factorization References: <4CD8582B.2060803@gmail.com> Message-ID: On Mon, 08 Nov 2010 14:06:03 -0600, Bruce Southey wrote: [clip] > Numpy uses SVD to get the (pseudo) inverse, which is usually very > accurate at getting (pseudo) inverse. numpy.linalg.inv does solve(a, identity(a.shape[0], dtype=a.dtype)) It doesn't use xGETRI since that's not included in lapack_lite. numpy.linalg.pinv OTOH does use SVD, but that's probably more costly. -- Pauli Virtanen From jsseabold at gmail.com Mon Nov 8 15:17:45 2010 From: jsseabold at gmail.com (Skipper Seabold) Date: Mon, 8 Nov 2010 15:17:45 -0500 Subject: [Numpy-discussion] Catching and dealing with floating point errors In-Reply-To: References: Message-ID: On Mon, Nov 8, 2010 at 3:14 PM, Skipper Seabold wrote: > I am doing some optimizations on random samples. ?In a small number of > cases, the objective is not well-defined for a given sample (it's not > possible to tell beforehand and hopefully won't happen much in > practice). ?What is the most numpythonic way to handle this? ?It > doesn't look like I can use np.seterrcall in this case (without > ignoring its actual intent). ?Here's a toy example of the method I > have come up with. > > import numpy as np > > def reset_seterr(d): > ? ?""" > ? ?Helper function to reset FP error-handling to user's original settings > ? ?""" > ? ?for action in [i+'='+"'"+d[i]+"'" for i in d]: > ? ? ? ?exec(action) > ? ?np.seterr(over=over, divide=divide, invalid=invalid, under=under) > It just occurred to me that this is unsafe. Better options for resetting seterr? > def log_random_sample(X): > ? ?""" > ? ?Toy example to catch a FP error, re-sample, and return objective > ? ?""" > ? ?d = np.seterr() # get original values to reset > ? ?np.seterr('raise') # set to raise on fp error in order to catch > ? ?try: > ? ? ? ?ret = np.log(X) > ? ? ? ?reset_seterr(d) > ? ? ? ?return ret > ? ?except: > ? ? ? ?lb,ub = -1,1 ?# includes bad domain to test recursion > ? ? ? ?X = np.random.uniform(lb,ub) > ? ? ? ?reset_seterr(d) > ? ? ? ?return log_random_sample(X) > > lb,ub = 0,0 > orig_setting = np.seterr() > X = np.random.uniform(lb,ub) > log_random_sample(X) > assert(orig_setting == np.seterr()) > > This seems to work, but I'm not sure it's as transparent as it could > be. ?If it is, then maybe it will be useful to others. > > Skipper > From bsouthey at gmail.com Mon Nov 8 15:42:12 2010 From: bsouthey at gmail.com (Bruce Southey) Date: Mon, 08 Nov 2010 14:42:12 -0600 Subject: [Numpy-discussion] Catching and dealing with floating point errors In-Reply-To: References: Message-ID: <4CD860A4.1040106@gmail.com> On 11/08/2010 02:17 PM, Skipper Seabold wrote: > On Mon, Nov 8, 2010 at 3:14 PM, Skipper Seabold wrote: >> I am doing some optimizations on random samples. In a small number of >> cases, the objective is not well-defined for a given sample (it's not >> possible to tell beforehand and hopefully won't happen much in >> practice). What is the most numpythonic way to handle this? It >> doesn't look like I can use np.seterrcall in this case (without >> ignoring its actual intent). Here's a toy example of the method I >> have come up with. >> >> import numpy as np >> >> def reset_seterr(d): >> """ >> Helper function to reset FP error-handling to user's original settings >> """ >> for action in [i+'='+"'"+d[i]+"'" for i in d]: >> exec(action) >> np.seterr(over=over, divide=divide, invalid=invalid, under=under) >> > It just occurred to me that this is unsafe. Better options for > resetting seterr? > >> def log_random_sample(X): >> """ >> Toy example to catch a FP error, re-sample, and return objective >> """ >> d = np.seterr() # get original values to reset >> np.seterr('raise') # set to raise on fp error in order to catch >> try: >> ret = np.log(X) >> reset_seterr(d) >> return ret >> except: >> lb,ub = -1,1 # includes bad domain to test recursion >> X = np.random.uniform(lb,ub) >> reset_seterr(d) >> return log_random_sample(X) >> >> lb,ub = 0,0 >> orig_setting = np.seterr() >> X = np.random.uniform(lb,ub) >> log_random_sample(X) >> assert(orig_setting == np.seterr()) >> >> This seems to work, but I'm not sure it's as transparent as it could >> be. If it is, then maybe it will be useful to others. >> >> Skipper >> > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion What do you mean by 'floating point error'? For example, log of zero is not what I would consider a 'floating point error'. In this case, if you are after a log distribution, then you should be ensuring that the lower bound to the np.random.uniform() is always greater than zero. That is, if lb <= zero then you *know* you have a problem at the very start. Bruce From warren.weckesser at enthought.com Mon Nov 8 15:45:34 2010 From: warren.weckesser at enthought.com (Warren Weckesser) Date: Mon, 8 Nov 2010 14:45:34 -0600 Subject: [Numpy-discussion] Catching and dealing with floating point errors In-Reply-To: References: Message-ID: On Mon, Nov 8, 2010 at 2:17 PM, Skipper Seabold wrote: > On Mon, Nov 8, 2010 at 3:14 PM, Skipper Seabold > wrote: > > I am doing some optimizations on random samples. In a small number of > > cases, the objective is not well-defined for a given sample (it's not > > possible to tell beforehand and hopefully won't happen much in > > practice). What is the most numpythonic way to handle this? It > > doesn't look like I can use np.seterrcall in this case (without > > ignoring its actual intent). Here's a toy example of the method I > > have come up with. > > > > import numpy as np > > > > def reset_seterr(d): > > """ > > Helper function to reset FP error-handling to user's original settings > > """ > > for action in [i+'='+"'"+d[i]+"'" for i in d]: > > exec(action) > > np.seterr(over=over, divide=divide, invalid=invalid, under=under) > > > > It just occurred to me that this is unsafe. Better options for > resetting seterr? > Hey Skipper, I don't understand why you need your helper function. Why not just pass the saved dictionary back to seterr()? E.g. saved = np.seterr('raise') try: # Do something dangerous... result = whatever... except Exception: # Handle the problems... result = better result... np.seterr(**saved) return result Warren > > > def log_random_sample(X): > > """ > > Toy example to catch a FP error, re-sample, and return objective > > """ > > d = np.seterr() # get original values to reset > > np.seterr('raise') # set to raise on fp error in order to catch > > try: > > ret = np.log(X) > > reset_seterr(d) > > return ret > > except: > > lb,ub = -1,1 # includes bad domain to test recursion > > X = np.random.uniform(lb,ub) > > reset_seterr(d) > > return log_random_sample(X) > > > > lb,ub = 0,0 > > orig_setting = np.seterr() > > X = np.random.uniform(lb,ub) > > log_random_sample(X) > > assert(orig_setting == np.seterr()) > > > > This seems to work, but I'm not sure it's as transparent as it could > > be. If it is, then maybe it will be useful to others. > > > > Skipper > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jsseabold at gmail.com Mon Nov 8 15:52:18 2010 From: jsseabold at gmail.com (Skipper Seabold) Date: Mon, 8 Nov 2010 15:52:18 -0500 Subject: [Numpy-discussion] Catching and dealing with floating point errors In-Reply-To: <4CD860A4.1040106@gmail.com> References: <4CD860A4.1040106@gmail.com> Message-ID: On Mon, Nov 8, 2010 at 3:42 PM, Bruce Southey wrote: > On 11/08/2010 02:17 PM, Skipper Seabold wrote: >> On Mon, Nov 8, 2010 at 3:14 PM, Skipper Seabold ?wrote: >>> I am doing some optimizations on random samples. ?In a small number of >>> cases, the objective is not well-defined for a given sample (it's not >>> possible to tell beforehand and hopefully won't happen much in >>> practice). ?What is the most numpythonic way to handle this? ?It >>> doesn't look like I can use np.seterrcall in this case (without >>> ignoring its actual intent). ?Here's a toy example of the method I >>> have come up with. >>> >>> import numpy as np >>> >>> def reset_seterr(d): >>> ? ? """ >>> ? ? Helper function to reset FP error-handling to user's original settings >>> ? ? """ >>> ? ? for action in [i+'='+"'"+d[i]+"'" for i in d]: >>> ? ? ? ? exec(action) >>> ? ? np.seterr(over=over, divide=divide, invalid=invalid, under=under) >>> >> It just occurred to me that this is unsafe. ?Better options for >> resetting seterr? >> >>> def log_random_sample(X): >>> ? ? """ >>> ? ? Toy example to catch a FP error, re-sample, and return objective >>> ? ? """ >>> ? ? d = np.seterr() # get original values to reset >>> ? ? np.seterr('raise') # set to raise on fp error in order to catch >>> ? ? try: >>> ? ? ? ? ret = np.log(X) >>> ? ? ? ? reset_seterr(d) >>> ? ? ? ? return ret >>> ? ? except: >>> ? ? ? ? lb,ub = -1,1 ?# includes bad domain to test recursion >>> ? ? ? ? X = np.random.uniform(lb,ub) >>> ? ? ? ? reset_seterr(d) >>> ? ? ? ? return log_random_sample(X) >>> >>> lb,ub = 0,0 >>> orig_setting = np.seterr() >>> X = np.random.uniform(lb,ub) >>> log_random_sample(X) >>> assert(orig_setting == np.seterr()) >>> >>> This seems to work, but I'm not sure it's as transparent as it could >>> be. ?If it is, then maybe it will be useful to others. >>> >>> Skipper >>> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion > What do you mean by 'floating point error'? > For example, log of zero is not what I would consider a 'floating point > error'. > > In this case, if you are after a log distribution, then you should be > ensuring that the lower bound to the np.random.uniform() is always > greater than zero. That is, if lb <= zero then you *know* you have a > problem at the very start. > > Just a toy example to get a similar error. I call x <= 0 on purpose here. > Bruce > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From jsseabold at gmail.com Mon Nov 8 15:52:56 2010 From: jsseabold at gmail.com (Skipper Seabold) Date: Mon, 8 Nov 2010 15:52:56 -0500 Subject: [Numpy-discussion] Catching and dealing with floating point errors In-Reply-To: References: Message-ID: On Mon, Nov 8, 2010 at 3:45 PM, Warren Weckesser wrote: > > > On Mon, Nov 8, 2010 at 2:17 PM, Skipper Seabold wrote: >> >> On Mon, Nov 8, 2010 at 3:14 PM, Skipper Seabold >> wrote: >> > I am doing some optimizations on random samples. ?In a small number of >> > cases, the objective is not well-defined for a given sample (it's not >> > possible to tell beforehand and hopefully won't happen much in >> > practice). ?What is the most numpythonic way to handle this? ?It >> > doesn't look like I can use np.seterrcall in this case (without >> > ignoring its actual intent). ?Here's a toy example of the method I >> > have come up with. >> > >> > import numpy as np >> > >> > def reset_seterr(d): >> > ? ?""" >> > ? ?Helper function to reset FP error-handling to user's original >> > settings >> > ? ?""" >> > ? ?for action in [i+'='+"'"+d[i]+"'" for i in d]: >> > ? ? ? ?exec(action) >> > ? ?np.seterr(over=over, divide=divide, invalid=invalid, under=under) >> > >> >> It just occurred to me that this is unsafe. ?Better options for >> resetting seterr? > > > Hey Skipper, > > I don't understand why you need your helper function.? Why not just pass the > saved dictionary back to seterr()?? E.g. > > saved = np.seterr('raise') > try: > ??? # Do something dangerous... > ??? result = whatever... > except Exception: > ??? # Handle the problems... > ??? result = better result... > np.seterr(**saved) > return result > Ha. I knew I was forgetting something. Thanks. > > Warren > > > >> >> > def log_random_sample(X): >> > ? ?""" >> > ? ?Toy example to catch a FP error, re-sample, and return objective >> > ? ?""" >> > ? ?d = np.seterr() # get original values to reset >> > ? ?np.seterr('raise') # set to raise on fp error in order to catch >> > ? ?try: >> > ? ? ? ?ret = np.log(X) >> > ? ? ? ?reset_seterr(d) >> > ? ? ? ?return ret >> > ? ?except: >> > ? ? ? ?lb,ub = -1,1 ?# includes bad domain to test recursion >> > ? ? ? ?X = np.random.uniform(lb,ub) >> > ? ? ? ?reset_seterr(d) >> > ? ? ? ?return log_random_sample(X) >> > >> > lb,ub = 0,0 >> > orig_setting = np.seterr() >> > X = np.random.uniform(lb,ub) >> > log_random_sample(X) >> > assert(orig_setting == np.seterr()) >> > >> > This seems to work, but I'm not sure it's as transparent as it could >> > be. ?If it is, then maybe it will be useful to others. >> > >> > Skipper >> > >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > From damienlmoore at gmail.com Mon Nov 8 15:49:07 2010 From: damienlmoore at gmail.com (Damien Moore) Date: Mon, 8 Nov 2010 20:49:07 +0000 (UTC) Subject: [Numpy-discussion] numpy.genfromtxt converters issue References: <66F2753F-6F60-4E1B-9A10-960EBE1F614C@gmail.com> Message-ID: Pierre GM gmail.com> writes: > On Nov 6, 2010, at 2:22 PM, Damien Moore wrote: > > > Hi List, > > > > I'm trying to import csv data as a numpy array using genfromtxt. > [...] > Please open a ticket so that I don't forget about it. Thx in advance! > The ticket is here: http://projects.scipy.org/numpy/ticket/1665 From bsouthey at gmail.com Mon Nov 8 16:04:57 2010 From: bsouthey at gmail.com (Bruce Southey) Date: Mon, 08 Nov 2010 15:04:57 -0600 Subject: [Numpy-discussion] Catching and dealing with floating point errors In-Reply-To: References: <4CD860A4.1040106@gmail.com> Message-ID: <4CD865F9.6050509@gmail.com> On 11/08/2010 02:52 PM, Skipper Seabold wrote: > On Mon, Nov 8, 2010 at 3:42 PM, Bruce Southey wrote: >> On 11/08/2010 02:17 PM, Skipper Seabold wrote: >>> On Mon, Nov 8, 2010 at 3:14 PM, Skipper Seabold wrote: >>>> I am doing some optimizations on random samples. In a small number of >>>> cases, the objective is not well-defined for a given sample (it's not >>>> possible to tell beforehand and hopefully won't happen much in >>>> practice). What is the most numpythonic way to handle this? It >>>> doesn't look like I can use np.seterrcall in this case (without >>>> ignoring its actual intent). Here's a toy example of the method I >>>> have come up with. >>>> >>>> import numpy as np >>>> >>>> def reset_seterr(d): >>>> """ >>>> Helper function to reset FP error-handling to user's original settings >>>> """ >>>> for action in [i+'='+"'"+d[i]+"'" for i in d]: >>>> exec(action) >>>> np.seterr(over=over, divide=divide, invalid=invalid, under=under) >>>> >>> It just occurred to me that this is unsafe. Better options for >>> resetting seterr? >>> >>>> def log_random_sample(X): >>>> """ >>>> Toy example to catch a FP error, re-sample, and return objective >>>> """ >>>> d = np.seterr() # get original values to reset >>>> np.seterr('raise') # set to raise on fp error in order to catch >>>> try: >>>> ret = np.log(X) >>>> reset_seterr(d) >>>> return ret >>>> except: >>>> lb,ub = -1,1 # includes bad domain to test recursion >>>> X = np.random.uniform(lb,ub) >>>> reset_seterr(d) >>>> return log_random_sample(X) >>>> >>>> lb,ub = 0,0 >>>> orig_setting = np.seterr() >>>> X = np.random.uniform(lb,ub) >>>> log_random_sample(X) >>>> assert(orig_setting == np.seterr()) >>>> >>>> This seems to work, but I'm not sure it's as transparent as it could >>>> be. If it is, then maybe it will be useful to others. >>>> >>>> Skipper >>>> >>> _______________________________________________ >>> NumPy-Discussion mailing list >>> NumPy-Discussion at scipy.org >>> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> What do you mean by 'floating point error'? >> For example, log of zero is not what I would consider a 'floating point >> error'. >> >> In this case, if you are after a log distribution, then you should be >> ensuring that the lower bound to the np.random.uniform() is always >> greater than zero. That is, if lb<= zero then you *know* you have a >> problem at the very start. >> >> > Just a toy example to get a similar error. I call x<= 0 on purpose here. > > I was aware of that. Messing about warnings is not what I consider Pythonic because you should be fixing the source of the problem. In this case, your sampling must be greater than zero. If you are sampling from a distribution, then that should be built into the call otherwise your samples will not be from the requested distribution. Bruce From jsseabold at gmail.com Mon Nov 8 16:15:38 2010 From: jsseabold at gmail.com (Skipper Seabold) Date: Mon, 8 Nov 2010 16:15:38 -0500 Subject: [Numpy-discussion] Catching and dealing with floating point errors In-Reply-To: <4CD865F9.6050509@gmail.com> References: <4CD860A4.1040106@gmail.com> <4CD865F9.6050509@gmail.com> Message-ID: On Mon, Nov 8, 2010 at 4:04 PM, Bruce Southey wrote: > On 11/08/2010 02:52 PM, Skipper Seabold wrote: >> On Mon, Nov 8, 2010 at 3:42 PM, Bruce Southey ?wrote: >>> On 11/08/2010 02:17 PM, Skipper Seabold wrote: >>>> On Mon, Nov 8, 2010 at 3:14 PM, Skipper Seabold ? ?wrote: >>>>> I am doing some optimizations on random samples. ?In a small number of >>>>> cases, the objective is not well-defined for a given sample (it's not >>>>> possible to tell beforehand and hopefully won't happen much in >>>>> practice). ?What is the most numpythonic way to handle this? ?It >>>>> doesn't look like I can use np.seterrcall in this case (without >>>>> ignoring its actual intent). ?Here's a toy example of the method I >>>>> have come up with. >>>>> >>>>> import numpy as np >>>>> >>>>> def reset_seterr(d): >>>>> ? ? ?""" >>>>> ? ? ?Helper function to reset FP error-handling to user's original settings >>>>> ? ? ?""" >>>>> ? ? ?for action in [i+'='+"'"+d[i]+"'" for i in d]: >>>>> ? ? ? ? ?exec(action) >>>>> ? ? ?np.seterr(over=over, divide=divide, invalid=invalid, under=under) >>>>> >>>> It just occurred to me that this is unsafe. ?Better options for >>>> resetting seterr? >>>> >>>>> def log_random_sample(X): >>>>> ? ? ?""" >>>>> ? ? ?Toy example to catch a FP error, re-sample, and return objective >>>>> ? ? ?""" >>>>> ? ? ?d = np.seterr() # get original values to reset >>>>> ? ? ?np.seterr('raise') # set to raise on fp error in order to catch >>>>> ? ? ?try: >>>>> ? ? ? ? ?ret = np.log(X) >>>>> ? ? ? ? ?reset_seterr(d) >>>>> ? ? ? ? ?return ret >>>>> ? ? ?except: >>>>> ? ? ? ? ?lb,ub = -1,1 ?# includes bad domain to test recursion >>>>> ? ? ? ? ?X = np.random.uniform(lb,ub) >>>>> ? ? ? ? ?reset_seterr(d) >>>>> ? ? ? ? ?return log_random_sample(X) >>>>> >>>>> lb,ub = 0,0 >>>>> orig_setting = np.seterr() >>>>> X = np.random.uniform(lb,ub) >>>>> log_random_sample(X) >>>>> assert(orig_setting == np.seterr()) >>>>> >>>>> This seems to work, but I'm not sure it's as transparent as it could >>>>> be. ?If it is, then maybe it will be useful to others. >>>>> >>>>> Skipper >>>>> >>>> _______________________________________________ >>>> NumPy-Discussion mailing list >>>> NumPy-Discussion at scipy.org >>>> http://mail.scipy.org/mailman/listinfo/numpy-discussion >>> What do you mean by 'floating point error'? >>> For example, log of zero is not what I would consider a 'floating point >>> error'. >>> >>> In this case, if you are after a log distribution, then you should be >>> ensuring that the lower bound to the np.random.uniform() is always >>> greater than zero. That is, if lb<= zero then you *know* you have a >>> problem at the very start. >>> >>> >> Just a toy example to get a similar error. ?I call x<= 0 on purpose here. >> >> > I was aware of that. > Ah, ok. I don't mean to be short, just busy. > Messing about warnings is not what I consider Pythonic because you > should be fixing the source of the problem. In this case, your sampling > must be greater than zero. If you are sampling from a distribution, then > that should be built into the call otherwise your samples will not be > from the requested distribution. > Basically, it looks like a small sample issue with an estimator. I'm not sure about the theory yet (or the underlying numerical issue), but I've confirmed that the solution also breaks down using several different solvers with a constrained version of the primal in GAMS to ensure that it's not just a domain error or numerical underflow/overflow. So at this point I just want to catch the warning and resample. Am going to explore the "bad" cases further at a later time. Skipper From matthew.brett at gmail.com Mon Nov 8 16:17:54 2010 From: matthew.brett at gmail.com (Matthew Brett) Date: Mon, 8 Nov 2010 13:17:54 -0800 Subject: [Numpy-discussion] Developmental version numbering with git In-Reply-To: <4CD850D2.7020703@gmail.com> References: <4CD850D2.7020703@gmail.com> Message-ID: Hi, > Since the change to git the numpy version in setup.py is '2.0.0.dev' > regardless because the prior numbering was determined by svn. > > Is there a plan to add some numbering system to numpy developmental version? > > Regardless of the answer, the 'numpy/numpy/version.py' will need to > changed because of the reference to the svn naming. In case it's useful, we (nipy) went for a scheme where the version number stays as '2.0.0.dev', but we keep a record of what git commit has we are on - described here: http://web.archiveorange.com/archive/v/AW2a1CzoOZtfBfNav9hd I can post more details of the implementation if it's of any interest, Best, Matthew From warren.weckesser at enthought.com Mon Nov 8 16:20:12 2010 From: warren.weckesser at enthought.com (Warren Weckesser) Date: Mon, 8 Nov 2010 15:20:12 -0600 Subject: [Numpy-discussion] Catching and dealing with floating point errors In-Reply-To: References: Message-ID: On Mon, Nov 8, 2010 at 2:52 PM, Skipper Seabold wrote: > On Mon, Nov 8, 2010 at 3:45 PM, Warren Weckesser > wrote: > > > > > > On Mon, Nov 8, 2010 at 2:17 PM, Skipper Seabold > wrote: > >> > >> On Mon, Nov 8, 2010 at 3:14 PM, Skipper Seabold > >> wrote: > >> > I am doing some optimizations on random samples. In a small number of > >> > cases, the objective is not well-defined for a given sample (it's not > >> > possible to tell beforehand and hopefully won't happen much in > >> > practice). What is the most numpythonic way to handle this? It > >> > doesn't look like I can use np.seterrcall in this case (without > >> > ignoring its actual intent). Here's a toy example of the method I > >> > have come up with. > >> > > >> > import numpy as np > >> > > >> > def reset_seterr(d): > >> > """ > >> > Helper function to reset FP error-handling to user's original > >> > settings > >> > """ > >> > for action in [i+'='+"'"+d[i]+"'" for i in d]: > >> > exec(action) > >> > np.seterr(over=over, divide=divide, invalid=invalid, under=under) > >> > > >> > >> It just occurred to me that this is unsafe. Better options for > >> resetting seterr? > > > > > > Hey Skipper, > > > > I don't understand why you need your helper function. Why not just pass > the > > saved dictionary back to seterr()? E.g. > > > > saved = np.seterr('raise') > > try: > > # Do something dangerous... > > result = whatever... > > except Exception: > > # Handle the problems... > > result = better result... > > np.seterr(**saved) > > return result > > > > Ha. I knew I was forgetting something. Thanks. > > Your question reminded me to file an enhancement request that I've been meaning to suggest for a while: http://projects.scipy.org/numpy/ticket/1667 Warren -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Mon Nov 8 16:47:22 2010 From: njs at pobox.com (Nathaniel Smith) Date: Mon, 8 Nov 2010 13:47:22 -0800 Subject: [Numpy-discussion] Solving Ax = b: inverse vs cholesky factorization In-Reply-To: References: Message-ID: On Mon, Nov 8, 2010 at 12:00 PM, Joon wrote: > Another question is, is it better to do cho_solve(cho_factor(A), b) than > solve(A, b)? If A is symmetric positive definite, then using the cholesky decomposition should be somewhat faster than using a more general solver. (Because, basically, the cholesky decomposition routine "knows" that your matrix is symmetric, so it only has to "look at" half of it, while a generic solver routine has to "look at" your whole matrix regardless). And indeed, that seems to be the case in numpy: In [18]: A = np.random.normal(size=(500, 500)) In [19]: A = np.dot(A, A.T) In [20]: b = np.random.normal(size=(500, 1)) In [21]: %timeit solve(A, b) 1 loops, best of 3: 147 ms per loop In [22]: %timeit cho_solve(cho_factor(A), b) 10 loops, best of 3: 92.6 ms per loop Also of note -- going via the inverse is much slower: In [23]: %timeit dot(inv(A), b) 1 loops, best of 3: 419 ms per loop (I didn't check what happens if you have to solve the same set of equations many times with A fixed and b varying, but I would still use cholesky for that case. Also, note that for solve(), cho_solve(), etc., b can be a matrix, which lets you solve for many different b vectors simultaneously.) -- Nathaniel From aarchiba at physics.mcgill.ca Mon Nov 8 17:40:57 2010 From: aarchiba at physics.mcgill.ca (Anne Archibald) Date: Mon, 8 Nov 2010 17:40:57 -0500 Subject: [Numpy-discussion] Solving Ax = b: inverse vs cholesky factorization In-Reply-To: References: Message-ID: On 8 November 2010 14:38, Joon wrote: > Oh I see. So I guess in?invA = solve(Ax, I) and then x = dot(invA, b) case, > there are more places where numerical errors occur, than just x?= solve(Ax, > b) case. That's the heart of the matter, but one can be more specific. You can think of a matrix by how it acts on vectors. Taking the inverse amounts to solving Ax=b for all the standard basis vectors (0,...,0,1,0,...,0); multiplying by the inverse amounts to expressing your vector in terms of these, finding where they go, and adding them together. But it can happen that when you break your vector up like that, the images of the components are large but almost cancel. This sort of near-cancellation amplifies numerical errors tremendously. In comparison, solving directly, if you're using a stable algorithm, is able to avoid ever constructing these nearly-cancelling combinations explicitly. The standard reason for trying to construct an inverse is that you want to solve equations for many vectors with the same matrix. But most solution methods are implemented as a matrix factorization followed by a single cheap operation, so if this is your goal, it's better to simply keep the matrix factorization around. Anne From rowen at uw.edu Mon Nov 8 17:42:55 2010 From: rowen at uw.edu (Russell E. Owen) Date: Mon, 08 Nov 2010 14:42:55 -0800 Subject: [Numpy-discussion] updated 1.5.1 release schedule References: Message-ID: In article , Ralf Gommers wrote: > On Mon, Nov 8, 2010 at 5:16 AM, Vincent Davis > wrote: > > > > > > On Sun, Nov 7, 2010 at 1:51 AM, Ralf Gommers > > wrote: > >> > >> Hi, > >> > >> Since we weren't able to stick to the original schedule for 1.5.1, > >> here's a short note about the current status. There are two changes > >> that need to go in before RC2: > >> https://github.com/numpy/numpy/pull/11 > >> https://github.com/numpy/numpy/pull/9 > >> If no one has time for a review I'll commit those to 1.5.x by Tuesday > >> and tag RC2. Final release should be one week after that, unless an > >> RC3 is necessary. > > > > Since we will have 2 different dmgs for python2.7 (osx10.3 and osx10.5) and > > I don't think there is any check in the installer to make sure the right > > python2.7 version is present when installing. The installer only check that > > python2.7 is present. I think a check should be added. I am missing > > somthing > > or there are other suggestions I would like to get this in 1.5.1rc2. I not > > sure the best way to make this check but I think I can come up with a > > solution. Also would need a useful error message. > > Vincent > > To let the user know if there's a mismatch may be helpful, but we > shouldn't prevent installation. In many cases mixing installers will > just work. If you have a patch it's welcome, but I think it's not > critical for this release. I am strongly in favor of such a check and refusing to install on the mismatched version of Python. I am be concerned about hidden problems that emerge later -- for instance when the user bundles an application and somebody else tries to use it. That said, I don't know how to perform such a test. -- Russell From groups.and.lists at gmail.com Mon Nov 8 17:43:12 2010 From: groups.and.lists at gmail.com (Joon) Date: Mon, 08 Nov 2010 16:43:12 -0600 Subject: [Numpy-discussion] Solving Ax = b: inverse vs cholesky factorization In-Reply-To: References: Message-ID: Thanks, Nathaniel. Your reply was very helpful. -Joon On Mon, 08 Nov 2010 15:47:22 -0600, Nathaniel Smith wrote: > On Mon, Nov 8, 2010 at 12:00 PM, Joon wrote: >> Another question is, is it better to do cho_solve(cho_factor(A), b) than >> solve(A, b)? > > If A is symmetric positive definite, then using the cholesky > decomposition should be somewhat faster than using a more general > solver. (Because, basically, the cholesky decomposition routine > "knows" that your matrix is symmetric, so it only has to "look at" > half of it, while a generic solver routine has to "look at" your whole > matrix regardless). And indeed, that seems to be the case in numpy: > > In [18]: A = np.random.normal(size=(500, 500)) > In [19]: A = np.dot(A, A.T) > In [20]: b = np.random.normal(size=(500, 1)) > > In [21]: %timeit solve(A, b) > 1 loops, best of 3: 147 ms per loop > > In [22]: %timeit cho_solve(cho_factor(A), b) > 10 loops, best of 3: 92.6 ms per loop > > Also of note -- going via the inverse is much slower: > > In [23]: %timeit dot(inv(A), b) > 1 loops, best of 3: 419 ms per loop > > (I didn't check what happens if you have to solve the same set of > equations many times with A fixed and b varying, but I would still use > cholesky for that case. Also, note that for solve(), cho_solve(), > etc., b can be a matrix, which lets you solve for many different b > vectors simultaneously.) > > -- Nathaniel > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion -- From braingateway at gmail.com Mon Nov 8 18:38:58 2010 From: braingateway at gmail.com (braingateway) Date: Tue, 09 Nov 2010 00:38:58 +0100 Subject: [Numpy-discussion] LapackError:non-native byte order In-Reply-To: References: <1289241091.3170.2.camel@talisman> Message-ID: <4CD88A12.2020409@gmail.com> Matthew Brett : > Hi, > > On Mon, Nov 8, 2010 at 10:34 AM, Pauli Virtanen wrote: > >> Mon, 08 Nov 2010 19:31:31 +0100, Pauli Virtanen wrote: >> >> >>> ma, 2010-11-08 kello 18:56 +0100, LittleBigBrain kirjoitti: >>> >>>> In my system '<' is the native byte-order, but unless I change the >>>> byte-order label to '=', it won't work in linalg sub-module, but in >>>> others works OK. I am not sure whether this is an expected behavior or >>>> a bug? >>>> >>>>>>> import sys >>>>>>> sys.byteorder >>>>>>> >>>> 'little' >>>> >>>>>>> a.dtype.byteorder >>>>>>> >>>> '<' >>>> >>>>>>> b.dtype.byteorder >>>>>>> >>>> '<' >>>> >>> The error is here: it's not possible to create such dtypes via any Numpy >>> methods -- the '<' (or '>') is always normalized to '='. Numpy and >>> several other modules consequently assume this normalization. >>> >>> Where do `a` and `b` come from? >>> >> Ok, `x.newbyteorder('<')` seems to do this. Now I'm unsure how things are >> supposed to work. >> > > Yes - it is puzzling that ``x.newbyteorder('<')`` makes arrays that > are confusing to numpy. If numpy generally always normalizes to the > system endian to '=' then should that not also be true of > ``newbyteorder``? > > See you, > > Matthew > I agree, at it is not documented or hard to find. I think usually all users will assume '<' is the same as '=' on normal 'little' system. Thanks for reply, LittleBigBrain > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From david at silveregg.co.jp Mon Nov 8 20:52:09 2010 From: david at silveregg.co.jp (David) Date: Tue, 09 Nov 2010 10:52:09 +0900 Subject: [Numpy-discussion] Anyone with Core i7 and Ubuntu 10.04? In-Reply-To: References: Message-ID: <4CD8A949.6040207@silveregg.co.jp> Hi Ian, On 11/08/2010 11:18 PM, Ian Goodfellow wrote: > I'm wondering if anyone here has successfully built numpy with ATLAS > and a Core i7 CPU on Ubuntu 10.04. If so, I could really use your > help. I've been trying since August (see my earlier messages to this > list) to get numpy running at full speed on my machine with no luck. Please tell us what error you got - saying that something did not working is really not useful to help you. You need to say exactly what fails, and which steps you followed before that failure. > The Ubuntu packages don't seem very fast, and numpy won't use the > version of ATLAS that I compiled. It's pretty sad; anything that > involves a lot of BLAS calls runs slower on this 2.8 ghz Core i7 than > on an older 2.66 ghz Core 2 Quad I use at work. One simple solution is to upgrade to ubuntu 10.10, which has finally a working atlas package, thanks to the work of the debian packagers. There is a version compiled for i7, cheers, David From wardefar at iro.umontreal.ca Mon Nov 8 23:33:57 2010 From: wardefar at iro.umontreal.ca (David Warde-Farley) Date: Mon, 8 Nov 2010 23:33:57 -0500 Subject: [Numpy-discussion] Anyone with Core i7 and Ubuntu 10.04? In-Reply-To: <4CD8A949.6040207@silveregg.co.jp> References: <4CD8A949.6040207@silveregg.co.jp> Message-ID: <0D26CA8A-A9A7-4EA6-8E4B-0931AAE6636F@iro.umontreal.ca> On 2010-11-08, at 8:52 PM, David wrote: > Please tell us what error you got - saying that something did not > working is really not useful to help you. You need to say exactly what > fails, and which steps you followed before that failure. I think what he means is that it's very slow, there's no discernable error but dot-multiplies don't seem to be using BLAS. David From wesmckinn at gmail.com Tue Nov 9 00:24:50 2010 From: wesmckinn at gmail.com (Wes McKinney) Date: Tue, 9 Nov 2010 00:24:50 -0500 Subject: [Numpy-discussion] Anyone with Core i7 and Ubuntu 10.04? In-Reply-To: <0D26CA8A-A9A7-4EA6-8E4B-0931AAE6636F@iro.umontreal.ca> References: <4CD8A949.6040207@silveregg.co.jp> <0D26CA8A-A9A7-4EA6-8E4B-0931AAE6636F@iro.umontreal.ca> Message-ID: On Mon, Nov 8, 2010 at 11:33 PM, David Warde-Farley wrote: > On 2010-11-08, at 8:52 PM, David wrote: > >> Please tell us what error you got - saying that something did not >> working is really not useful to help you. You need to say exactly what >> fails, and which steps you followed before that failure. > > I think what he means is that it's very slow, there's no discernable error but dot-multiplies don't seem to be using BLAS. > > David > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > Somewhat related topic: anyone know the status of EPD (Enthought distribution) releases on i7 processors as far as this goes? From warren.weckesser at enthought.com Tue Nov 9 02:14:09 2010 From: warren.weckesser at enthought.com (Warren Weckesser) Date: Tue, 9 Nov 2010 01:14:09 -0600 Subject: [Numpy-discussion] Catching and dealing with floating point errors In-Reply-To: References: Message-ID: On Mon, Nov 8, 2010 at 3:20 PM, Warren Weckesser < warren.weckesser at enthought.com> wrote: > > > On Mon, Nov 8, 2010 at 2:52 PM, Skipper Seabold wrote: > >> On Mon, Nov 8, 2010 at 3:45 PM, Warren Weckesser >> wrote: >> > >> > >> > On Mon, Nov 8, 2010 at 2:17 PM, Skipper Seabold >> wrote: >> >> >> >> On Mon, Nov 8, 2010 at 3:14 PM, Skipper Seabold >> >> wrote: >> >> > I am doing some optimizations on random samples. In a small number >> of >> >> > cases, the objective is not well-defined for a given sample (it's not >> >> > possible to tell beforehand and hopefully won't happen much in >> >> > practice). What is the most numpythonic way to handle this? It >> >> > doesn't look like I can use np.seterrcall in this case (without >> >> > ignoring its actual intent). Here's a toy example of the method I >> >> > have come up with. >> >> > >> >> > import numpy as np >> >> > >> >> > def reset_seterr(d): >> >> > """ >> >> > Helper function to reset FP error-handling to user's original >> >> > settings >> >> > """ >> >> > for action in [i+'='+"'"+d[i]+"'" for i in d]: >> >> > exec(action) >> >> > np.seterr(over=over, divide=divide, invalid=invalid, under=under) >> >> > >> >> >> >> It just occurred to me that this is unsafe. Better options for >> >> resetting seterr? >> > >> > >> > Hey Skipper, >> > >> > I don't understand why you need your helper function. Why not just pass >> the >> > saved dictionary back to seterr()? E.g. >> > >> > saved = np.seterr('raise') >> > try: >> > # Do something dangerous... >> > result = whatever... >> > except Exception: >> > # Handle the problems... >> > result = better result... >> > np.seterr(**saved) >> > return result >> > >> >> Ha. I knew I was forgetting something. Thanks. >> >> > Your question reminded me to file an enhancement request that I've been > meaning to suggest for a while: > http://projects.scipy.org/numpy/ticket/1667 > > I just discovered that a context manager for the error settings already exists: numpy.errstate. So a nicer way to write that code is: with np.errstate(all='raise'): try: # Do something dangerous... result = whatever... except Exception: # Handle the problems... result = better result... return result Warren -------------- next part -------------- An HTML attachment was scrubbed... URL: From scott.sinclair.za at gmail.com Tue Nov 9 06:28:10 2010 From: scott.sinclair.za at gmail.com (Scott Sinclair) Date: Tue, 9 Nov 2010 13:28:10 +0200 Subject: [Numpy-discussion] Trouble cloning numpy Github repo over HTTP In-Reply-To: References: Message-ID: On 30 September 2010 10:15, Scott Sinclair wrote: > I'm behind a firewall that doesn't allow me to use the git protocol so > I can't use the git:// URL. > > I see the following problem: > > $ git clone http://github.com/numpy/numpy.git numpy > Initialized empty Git repository in /home/scott/external_repos/numpy/.git/ > error: RPC failed; result=22, HTTP code = 417 > > I never have trouble cloning other repos off of Github over HTTP. For what it's worth, someone posted a work-around at http://support.github.com/discussions/repos/4323-error-rpc-failed-result22-http-code-411 that works for me.. Cheers, Scott From joanpeturpetersen at gmail.com Tue Nov 9 06:34:01 2010 From: joanpeturpetersen at gmail.com (=?ISO-8859-1?Q?J=F3an_Petur_Petersen?=) Date: Tue, 9 Nov 2010 11:34:01 +0000 Subject: [Numpy-discussion] Stop program when getting a warning (invalid value encountered in sqrt) Message-ID: Hi, I have write quite a large program, but I occationally get the following warning message: Warning: invalid value encountered in sqrt The programs runs for several hours finding the hyperparameters of a Gaussian Process, but occationally this warning pop up. Maybe due tue rounding errors or maybe because some numbers become nan. Can I make the program stop instead of just giving a warning, so that I get a traceback to where it comes from? For example in IPython: In [4]: def test(): ...: np.sqrt(-1) ...: np.sqrt(-1) ...: In [5]: test() Warning: invalid value encountered in sqrt Warning: invalid value encountered in sqrt Gives the warning two times instead of stopping. Best regard, J?an Petur Petersen -------------- next part -------------- An HTML attachment was scrubbed... URL: From scott.sinclair.za at gmail.com Tue Nov 9 06:43:44 2010 From: scott.sinclair.za at gmail.com (Scott Sinclair) Date: Tue, 9 Nov 2010 13:43:44 +0200 Subject: [Numpy-discussion] Stop program when getting a warning (invalid value encountered in sqrt) In-Reply-To: References: Message-ID: 2010/11/9 J?an Petur Petersen : > I have write quite a large program, but I occationally get the following > warning message: > > Warning: invalid value encountered in sqrt > > Can I make the program stop instead of just giving a warning, so that I get > a traceback to where it comes from? I think np.seterr(invalid='raise') should do the trick. Cheers, Scott From ralf.gommers at googlemail.com Tue Nov 9 09:19:42 2010 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Tue, 9 Nov 2010 22:19:42 +0800 Subject: [Numpy-discussion] ANN: NumPy 1.5.1 release candidate 2 Message-ID: Hi, I am pleased to announce the availability of the second release candidate of NumPy 1.5.1. This is a bug-fix release with no new features compared to 1.5.0. Please test and report any issues on the numpy mailing list. If no new issues are reported, the final 1.5.1 release will follow in a week. Binaries, sources and release notes can be found at https://sourceforge.net/projects/numpy/files/. OS X binaries are not up yet, they should follow by tomorrow. Enjoy, Ralf From ralf.gommers at googlemail.com Tue Nov 9 09:24:16 2010 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Tue, 9 Nov 2010 22:24:16 +0800 Subject: [Numpy-discussion] Trac unaware of github move In-Reply-To: References: Message-ID: 2010/11/9 S?bastien Barth?lemy : > On Sun, 7 Nov 2010, Ralf Gommers wrote: >> That will require renaming those files in the source tree from *.txt >> to *.rst, otherwise there's no way to have github render them >> properly. Unless I missed something. Would that be fine? > > I think a *.rst.txt extension would also be recognized by github. Yes it would, documented at https://github.com/github/markup. > > Note that the docutils FAQ advises against using .rst as a file > extension : > http://docutils.sourceforge.net/FAQ.html#what-s-the-standard-filename-extension-for-a-restructuredtext-file > Good point. I agree .rst.txt is better. Ralf From scott.sinclair.za at gmail.com Tue Nov 9 10:20:49 2010 From: scott.sinclair.za at gmail.com (Scott Sinclair) Date: Tue, 9 Nov 2010 17:20:49 +0200 Subject: [Numpy-discussion] Developmental version numbering with git In-Reply-To: References: <4CD850D2.7020703@gmail.com> Message-ID: On 8 November 2010 23:17, Matthew Brett wrote: >> Since the change to git the numpy version in setup.py is '2.0.0.dev' >> regardless because the prior numbering was determined by svn. >> >> Is there a plan to add some numbering system to numpy developmental version? >> >> Regardless of the answer, the 'numpy/numpy/version.py' will need to >> changed because of the reference to the svn naming. > > In case it's useful, we (nipy) went for a scheme where the version > number stays as '2.0.0.dev', but we keep a record of what git commit > has we are on - described here: > > http://web.archiveorange.com/archive/v/AW2a1CzoOZtfBfNav9hd > > I can post more details of the implementation if it's of any interest, In the meantime there's a patch in that direction here: https://github.com/numpy/numpy/pull/12 Cheers, Scott From charlesr.harris at gmail.com Tue Nov 9 10:48:12 2010 From: charlesr.harris at gmail.com (Charles R Harris) Date: Tue, 9 Nov 2010 08:48:12 -0700 Subject: [Numpy-discussion] Developmental version numbering with git In-Reply-To: References: <4CD850D2.7020703@gmail.com> Message-ID: On Tue, Nov 9, 2010 at 8:20 AM, Scott Sinclair wrote: > On 8 November 2010 23:17, Matthew Brett wrote: > >> Since the change to git the numpy version in setup.py is '2.0.0.dev' > >> regardless because the prior numbering was determined by svn. > >> > >> Is there a plan to add some numbering system to numpy developmental > version? > >> > >> Regardless of the answer, the 'numpy/numpy/version.py' will need to > >> changed because of the reference to the svn naming. > > > > In case it's useful, we (nipy) went for a scheme where the version > > number stays as '2.0.0.dev', but we keep a record of what git commit > > has we are on - described here: > > > > http://web.archiveorange.com/archive/v/AW2a1CzoOZtfBfNav9hd > > > > I can post more details of the implementation if it's of any interest, > > In the meantime there's a patch in that direction here: > > https://github.com/numpy/numpy/pull/12 > Thanks. I left some comments there. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From goodfellow.ian at gmail.com Tue Nov 9 11:51:53 2010 From: goodfellow.ian at gmail.com (Ian Goodfellow) Date: Tue, 9 Nov 2010 11:51:53 -0500 Subject: [Numpy-discussion] Anyone with Core i7 and Ubuntu 10.04? In-Reply-To: <0D26CA8A-A9A7-4EA6-8E4B-0931AAE6636F@iro.umontreal.ca> References: <4CD8A949.6040207@silveregg.co.jp> <0D26CA8A-A9A7-4EA6-8E4B-0931AAE6636F@iro.umontreal.ca> Message-ID: Yes, that's pretty much the situation. I'm mostly looking for someone who has satisfactory performance with their Core i7 so I can get some comparison information and figure out if I need to disable hyperthreading or compile atlas with different flags or what. Are the Ubuntu 10.10 atlas packages actually different from the Ubuntu 10.04 atlas packages? Depending on how I install I can get different error messages. If I install the atlas packages in Ubuntu 10.04 I get slow performance relative to other Core i7 or even Core 2 Quad machines with lower clock rates that are available to me at work. If I compile my own atlas and try to build numpy with site.cfg I get one set of error messages that I've sent to the list before. If I compile my own atlas and try to build numpy with environment variables specifying the location of ATLAS, the environment variables get ignored and it is built without atlas. If I compile my own atlas and put symlinks to it in the default search places, I get numpy libraries that ldd shows as linking to atlas but still get slow peformance. On Mon, Nov 8, 2010 at 11:33 PM, David Warde-Farley wrote: > On 2010-11-08, at 8:52 PM, David wrote: > >> Please tell us what error you got - saying that something did not >> working is really not useful to help you. You need to say exactly what >> fails, and which steps you followed before that failure. > > I think what he means is that it's very slow, there's no discernable error but dot-multiplies don't seem to be using BLAS. > > David > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From vincent at vincentdavis.net Tue Nov 9 12:03:27 2010 From: vincent at vincentdavis.net (Vincent Davis) Date: Tue, 9 Nov 2010 10:03:27 -0700 Subject: [Numpy-discussion] Python2.7 version check for OSX numpy installer (please Test) Message-ID: No need to actualy install, You should get an error right away if you do not have both. /Library/Frameworks/Python.framework/Versions/2.7/bin/python2.7 /Library/Frameworks/Python.framework/Versions/2.7/bin/python2.7-32 If you have python-2.7-macosx10.5.dmg installed or built python2.7 for both 32 and 64 bit it should work (let you install) If you have python-2.7-macosx10.3.dmg installed it should not. It will fail to find /Library/Frameworks/Python.framework/Versions/2.7/bin/python2.7-32 I think :-) Should this check be a warning or prevent install? The installer is numpy-1.5.1rc2-py2.7-python.org-macosx10.5 I should have a numpy-1.5.1rc2-py2.7-python.org-macosx10.3 ready for testing later. http://dl.dropbox.com/u/1340248/1-nov%209%202010/numpy1.5.1rc2-Py2.7-32-64-test-python-version.dmg -- Thanks Vincent Davis 720-301-3003 -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at googlemail.com Tue Nov 9 19:18:50 2010 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Wed, 10 Nov 2010 08:18:50 +0800 Subject: [Numpy-discussion] Python2.7 version check for OSX numpy installer (please Test) In-Reply-To: References: Message-ID: Hi Vincent, On Wed, Nov 10, 2010 at 1:03 AM, Vincent Davis wrote: > No need to actualy install, You should get an error right away if you do not > have both. > /Library/Frameworks/Python.framework/Versions/2.7/bin/python2.7 > /Library/Frameworks/Python.framework/Versions/2.7/bin/python2.7-32 > If you have python-2.7-macosx10.5.dmg installed or built python2.7 for both > 32 and 64 bit it should work (let you install) > If you have python-2.7-macosx10.3.dmg installed it should not. It will fail > to find?/Library/Frameworks/Python.framework/Versions/2.7/bin/python2.7-32 I > think :-) > Should this check be a warning or prevent install? I'd say warning. > The installer is numpy-1.5.1rc2-py2.7-python.org-macosx10.5 > I should have a?numpy-1.5.1rc2-py2.7-python.org-macosx10.3 ready for testing > later. > http://dl.dropbox.com/u/1340248/1-nov%209%202010/numpy1.5.1rc2-Py2.7-32-64-test-python-version.dmg > If you propose a patch it would be helpful if you send a github link to the list, or at least a diff. BTW, Dropbox seems to be blocked in China, so it would help me a lot if you could put binaries on your own site. Cheers, Ralf From vincent at vincentdavis.net Tue Nov 9 19:30:06 2010 From: vincent at vincentdavis.net (Vincent Davis) Date: Tue, 9 Nov 2010 17:30:06 -0700 Subject: [Numpy-discussion] Python2.7 version check for OSX numpy installer (please Test) In-Reply-To: References: Message-ID: On Tue, Nov 9, 2010 at 5:18 PM, Ralf Gommers wrote: > Hi Vincent, > > On Wed, Nov 10, 2010 at 1:03 AM, Vincent Davis > wrote: > > No need to actualy install, You should get an error right away if you do > not > > have both. > > /Library/Frameworks/Python.framework/Versions/2.7/bin/python2.7 > > /Library/Frameworks/Python.framework/Versions/2.7/bin/python2.7-32 > > If you have python-2.7-macosx10.5.dmg installed or built python2.7 for > both > > 32 and 64 bit it should work (let you install) > > If you have python-2.7-macosx10.3.dmg installed it should not. It will > fail > > to > find /Library/Frameworks/Python.framework/Versions/2.7/bin/python2.7-32 I > > think :-) > > Should this check be a warning or prevent install? > > I'd say warning. > > > The installer is numpy-1.5.1rc2-py2.7-python.org-macosx10.5 > > I should have a numpy-1.5.1rc2-py2.7-python.org-macosx10.3 ready for > testing > > later. > > > http://dl.dropbox.com/u/1340248/1-nov%209%202010/numpy1.5.1rc2-Py2.7-32-64-test-python-version.dmg > > > If you propose a patch it would be helpful if you send a github link > to the list, or at least a diff. > I am not to that point yet I am working toward using packagemaker. All I did was manualy change the plist in the pkg. I just what to be sure I am not missing somthing my looking for the files listed above to distiguish between python versions. > > BTW, Dropbox seems to be blocked in China, so it would help me a lot > if you could put binaries on your own site. > http://vincentdavis.info/numpy1.5.1rc2-Py2.7-32-64-test-python-version.dmg > Cheers, > Ralf > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -- Thanks Vincent Davis 720-301-3003 -------------- next part -------------- An HTML attachment was scrubbed... URL: From mwwiebe at gmail.com Wed Nov 10 02:40:15 2010 From: mwwiebe at gmail.com (Mark Wiebe) Date: Tue, 9 Nov 2010 23:40:15 -0800 Subject: [Numpy-discussion] review request for half type Message-ID: This set of patches fully implements a 16-bit floating point half type into NumPy. It's got a lot of changes, but I've tried to organize them logically so that it builds and the tests run after each patch. This took some tweaks to the building and ufunc generation code, so please check that, as particularly the setup.py looks like it could have subtleties I may have missed. I implemented only the half type, but a complex variant wouldn't be that hard to add once this is all set. Here are the main features: * 16-bit float as numpy.half or numpy.float16 * generates underflow/overflow signals * rounds to the nearest (with half to even) * uses character code 'j' -- The short type already had 'h', but if there's a better choice, let me know. The tests might provide the quickest insight into to workings of this code, so I would suggest looking through it first: test_half.py Here's the branch with the patches: https://github.com/m-paradox/numpy/compare/master...implement_half_dtype Thanks, Mark -------------- next part -------------- An HTML attachment was scrubbed... URL: From dyamins at gmail.com Wed Nov 10 14:37:44 2010 From: dyamins at gmail.com (Dan Yamins) Date: Wed, 10 Nov 2010 14:37:44 -0500 Subject: [Numpy-discussion] C-compiler options Message-ID: Hi: I'm trying to build numpy with python27, 64-bit, on OSX 10.5.8. When I run python setup.py build, only 32-bit binaries get built. I can see what is happening is that not 64-bit flags are getting passed to the c compiler (I don't know about the fortran compiler. So I have two questions; 1) Shouldn't the system be determining that 64-bit nature automatically? I've used the same build process on the same machine (in what I thought were identical situations, apparently not) a number of times in the past, and it has always built correctly as 64 bit. I never needed to set compiler options. 2) How do I set C compiler options for the numpy build process? thanks, Dan -------------- next part -------------- An HTML attachment was scrubbed... URL: From robert.kern at gmail.com Wed Nov 10 15:01:45 2010 From: robert.kern at gmail.com (Robert Kern) Date: Wed, 10 Nov 2010 14:01:45 -0600 Subject: [Numpy-discussion] C-compiler options In-Reply-To: References: Message-ID: On Wed, Nov 10, 2010 at 13:37, Dan Yamins wrote: > Hi: > I'm trying to build numpy with python27, 64-bit, on OSX 10.5.8. > When I run ?python setup.py build, ? only 32-bit binaries get built. ? I can > see what is happening is that not 64-bit flags are getting passed to the c > compiler (I don't know about the fortran compiler. Are you sure that the python executable that you are using is the 64-bit python executable that you think it is? -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." ? -- Umberto Eco From dyamins at gmail.com Wed Nov 10 15:07:30 2010 From: dyamins at gmail.com (Dan Yamins) Date: Wed, 10 Nov 2010 15:07:30 -0500 Subject: [Numpy-discussion] C-compiler options In-Reply-To: References: Message-ID: > Are you sure that the python executable that you are using is the > 64-bit python executable that you think it is? > Hm ... Well, I think so ... It certain is 64bit, or at least: In [1]: import platform In [2]: platform.architecture() Out[2]: ('64bit', '') Should I be checking something else? It is a python installed via macports ... thanks, Dan > > -- > Robert Kern > > "I have come to believe that the whole world is an enigma, a harmless > enigma that is made terrible by our own mad attempt to interpret it as > though it had an underlying truth." > -- Umberto Eco > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From braingateway at gmail.com Wed Nov 10 18:43:32 2010 From: braingateway at gmail.com (LittleBigBrain) Date: Thu, 11 Nov 2010 00:43:32 +0100 Subject: [Numpy-discussion] Is numpy.convolve based on LAPACK routine? Message-ID: Hi everyone, I am wondering, is numpy.convolve based on LAPACK routine? Can it be speedup by using ATLAS? Sincerely, LittleBigBrain From ralf.gommers at googlemail.com Wed Nov 10 19:08:51 2010 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Thu, 11 Nov 2010 08:08:51 +0800 Subject: [Numpy-discussion] C-compiler options In-Reply-To: References: Message-ID: On Thu, Nov 11, 2010 at 3:37 AM, Dan Yamins wrote: > Hi: > I'm trying to build numpy with python27, 64-bit, on OSX 10.5.8. > When I run ?python setup.py build, ? only 32-bit binaries get built. ? I can > see what is happening is that not 64-bit flags are getting passed to the c > compiler (I don't know about the fortran compiler. > So I have two questions; > 1) Shouldn't the system be determining that 64-bit nature automatically? Yes, it should. This was fixed a couple of days ago in commit:435c7262592e94c8519f (master) and commit:8346ba04a5c574441304 (1.5.x). I guess you used an older revision, can you please update and try again? > I've used the same build process on the same machine (in what I thought were > identical situations, apparently not) a number of times in the past, and it > has always built correctly as 64 bit. ? I never needed to set compiler > options. > 2) How do I set ?C compiler options for the numpy build process? That shouldn't be necessary. Cheers, Ralf From pav at iki.fi Wed Nov 10 19:15:03 2010 From: pav at iki.fi (Pauli Virtanen) Date: Thu, 11 Nov 2010 00:15:03 +0000 (UTC) Subject: [Numpy-discussion] Is numpy.convolve based on LAPACK routine? References: Message-ID: On Thu, 11 Nov 2010 00:43:32 +0100, LittleBigBrain wrote: > I am wondering, is numpy.convolve based on LAPACK routine? Can it be > speedup by using ATLAS? LAPACK and Atlas do not AFAIK have convolution routines -- that's not linear algebra. MKL on the other hand would have some. The implementation in Numpy is the straightforward one, without SIMD etc. For large datasets, scipy.signal.fftconvolve should be faster. -- Pauli Virtanen From charlesr.harris at gmail.com Wed Nov 10 20:54:18 2010 From: charlesr.harris at gmail.com (Charles R Harris) Date: Wed, 10 Nov 2010 18:54:18 -0700 Subject: [Numpy-discussion] new memcpy implementation exposes bugs in some software. Message-ID: Hi All, Apparently the 64 bit version of memcpy in the Fedora 14 glibc will do the copy in the downwards rather than the usual upwards direction on some processors. This has exposed bugs where the the source and destination overlap in memory. Report and discussion can be found at fedora bugzilla. I don't know that numpy has any problems of this sort but it is worth keeping in mind. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From david at silveregg.co.jp Thu Nov 11 03:27:56 2010 From: david at silveregg.co.jp (David) Date: Thu, 11 Nov 2010 17:27:56 +0900 Subject: [Numpy-discussion] new memcpy implementation exposes bugs in some software. In-Reply-To: References: Message-ID: <4CDBA90C.3080502@silveregg.co.jp> On 11/11/2010 10:54 AM, Charles R Harris wrote: > Hi All, > > Apparently the 64 bit version of memcpy in the Fedora 14 glibc will do > the copy in the downwards rather than the usual upwards direction on > some processors. This has exposed bugs where the the source and > destination overlap in memory. Report and discussion can be found at > fedora bugzilla . I > don't know that numpy has any problems of this sort but it is worth > keeping in mind. It would be kind of cool to get a bug report from Linus, though :) David From braingateway at gmail.com Thu Nov 11 03:32:44 2010 From: braingateway at gmail.com (braingateway) Date: Thu, 11 Nov 2010 09:32:44 +0100 Subject: [Numpy-discussion] Is numpy.convolve based on LAPACK routine? In-Reply-To: References: Message-ID: <4CDBAA2C.8090406@gmail.com> Pauli Virtanen : > On Thu, 11 Nov 2010 00:43:32 +0100, LittleBigBrain wrote: > >> I am wondering, is numpy.convolve based on LAPACK routine? Can it be >> speedup by using ATLAS? >> > > LAPACK and Atlas do not AFAIK have convolution routines -- that's not > linear algebra. MKL on the other hand would have some. The implementation > in Numpy is the straightforward one, without SIMD etc. > > For large datasets, scipy.signal.fftconvolve should be faster. > > Thanks for the point. The fftconvolve is only fast when two input are both long enough say >200, other wise pure convovlution will be faster. And fftconvolve will take too more RAM than normal convolution in this case. I were using MKL which does have convolution routines, so I thought might ATLAS also have it. Thanks a lot! LittleBigBrain From dyamins at gmail.com Thu Nov 11 07:48:41 2010 From: dyamins at gmail.com (Dan Yamins) Date: Thu, 11 Nov 2010 07:48:41 -0500 Subject: [Numpy-discussion] C-compiler options In-Reply-To: References: Message-ID: > Yes, it should. This was fixed a couple of days ago in > commit:435c7262592e94c8519f (master) and commit:8346ba04a5c574441304 > (1.5.x). I guess you used an older revision, can you please update and > try again? > Did this, but am having the same problem. This is the relevant output at the beginning of the build process that is showing that the wrong options being used: $python setup.py build .... C compiler: /usr/bin/gcc-4.0 -fno-strict-aliasing -fno-common -dynamic -pipe -O2 -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes ... So then when I try to actually use the installed binary, I get this error: In [1]: import numpy --------------------------------------------------------------------------- ImportError Traceback (most recent call last) .... ----> 5 import multiarray 6 import umath 7 import _internal # for freeze programs ImportError: dlopen(numpy/core/multiarray.so, 2): no suitable image found. Did find: numpy/core/multiarray.so: mach-o, but wrong architecture Obviously consistently, $file numpy/core/multiarray.so numpy/core/multiarray.so: Mach-O bundle i386 ... Somehow, the build config script is not picking up the fact that the python interpreter I'm using is 64bit. Dan > > > I've used the same build process on the same machine (in what I thought > were > > identical situations, apparently not) a number of times in the past, and > it > > has always built correctly as 64 bit. I never needed to set compiler > > options. > > 2) How do I set C compiler options for the numpy build process? > > That shouldn't be necessary. > > Cheers, > Ralf > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From dyamins at gmail.com Thu Nov 11 08:09:56 2010 From: dyamins at gmail.com (Dan Yamins) Date: Thu, 11 Nov 2010 08:09:56 -0500 Subject: [Numpy-discussion] C-compiler options In-Reply-To: References: Message-ID: This problem has nothing to do with numpy -- somehow, gcc binaries themselves were overwritten by a process that installed just 32-bit versions, ... so the problem is with the c compiler itself, and has been resolved. thanks! Dan On Thu, Nov 11, 2010 at 7:48 AM, Dan Yamins wrote: > > > >> Yes, it should. This was fixed a couple of days ago in >> commit:435c7262592e94c8519f (master) and commit:8346ba04a5c574441304 >> (1.5.x). I guess you used an older revision, can you please update and >> try again? >> > > Did this, but am having the same problem. This is the relevant output at > the beginning of the build process that is showing that the wrong options > being used: > > $python setup.py build > > .... > > C compiler: /usr/bin/gcc-4.0 -fno-strict-aliasing -fno-common -dynamic > -pipe -O2 -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes > > ... > > > So then when I try to actually use the installed binary, I get this error: > > In [1]: import numpy > --------------------------------------------------------------------------- > ImportError Traceback (most recent call last) > > .... > > ----> 5 import multiarray > 6 import umath > 7 import _internal # for freeze programs > > ImportError: dlopen(numpy/core/multiarray.so, 2): no suitable image found. > Did find: > numpy/core/multiarray.so: mach-o, but wrong architecture > > Obviously consistently, > > $file numpy/core/multiarray.so > numpy/core/multiarray.so: Mach-O bundle i386 > > ... > > > Somehow, the build config script is not picking up the fact that the python > interpreter I'm using is 64bit. > > Dan > > > > >> >> > I've used the same build process on the same machine (in what I thought >> were >> > identical situations, apparently not) a number of times in the past, and >> it >> > has always built correctly as 64 bit. I never needed to set compiler >> > options. >> > 2) How do I set C compiler options for the numpy build process? >> >> That shouldn't be necessary. >> >> Cheers, >> Ralf >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at googlemail.com Thu Nov 11 08:40:10 2010 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Thu, 11 Nov 2010 21:40:10 +0800 Subject: [Numpy-discussion] Trac unaware of github move In-Reply-To: References: Message-ID: On Tue, Nov 9, 2010 at 10:24 PM, Ralf Gommers wrote: > 2010/11/9 S?bastien Barth?lemy : >> On Sun, 7 Nov 2010, Ralf Gommers wrote: >>> That will require renaming those files in the source tree from *.txt >>> to *.rst, otherwise there's no way to have github render them >>> properly. Unless I missed something. Would that be fine? >> >> I think a *.rst.txt extension would also be recognized by github. > > Yes it would, documented at https://github.com/github/markup. >> >> Note that the docutils FAQ advises against using .rst as a file >> extension : >> http://docutils.sourceforge.net/FAQ.html#what-s-the-standard-filename-extension-for-a-restructuredtext-file >> > Good point. I agree .rst.txt is better. Renaming all files under doc/ done in https://github.com/rgommers/numpy/tree/rest-files-extension. For an example of how github renders this see https://github.com/rgommers/numpy/blob/rest-files-extension/doc/TESTS.rst.txt Looks good to me - we can just link to it from the Trac wiki. Will push this in a few days unless I hear otherwise. Cheers, Ralf From mpf at cs.ubc.ca Thu Nov 11 09:44:45 2010 From: mpf at cs.ubc.ca (Michael Friedlander) Date: Thu, 11 Nov 2010 14:44:45 +0000 (UTC) Subject: [Numpy-discussion] specifying array sizes in random vs. ones, zeros, etc Message-ID: I'm a hopeful Matlab refugee trying to understand the numpy way. Perhaps someone can explain why some numpy functions require shape specifications in different ways. For example, below I create a random 2-by-3 array, and then a "ones" 2-by-3 array: A = numpy.random.randn(2,3) B = numpy.ones((2,3)) The first call takes 2 arguments; the 2nd takes a single tuple argument. This strikes me as inconsistent, but probably I'm not grokking some numpy subleties. Can someone please explain? From robert.kern at gmail.com Thu Nov 11 10:45:09 2010 From: robert.kern at gmail.com (Robert Kern) Date: Thu, 11 Nov 2010 09:45:09 -0600 Subject: [Numpy-discussion] specifying array sizes in random vs. ones, zeros, etc In-Reply-To: References: Message-ID: On Thu, Nov 11, 2010 at 08:44, Michael Friedlander wrote: > I'm a hopeful Matlab refugee trying to understand the numpy way. > Perhaps someone can explain why some ?numpy functions require > shape specifications in different ways. For example, ?below I create > a random 2-by-3 array, and then a "ones" 2-by-3 array: > > A = numpy.random.randn(2,3) > B = numpy.ones((2,3)) > > The first call takes 2 arguments; the 2nd takes a single tuple argument. > This strikes me as inconsistent, but probably I'm not grokking some > numpy subleties. Can someone please explain? rand() and randn() were added as conveniences for people who were used to the MATLAB functions. numpy.random.random_sample((2,3)) and numpy.random.standard_normal((2,3)) are the preferred, more consistent functions to use. Ignore rand() and randn() if you like. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." ? -- Umberto Eco From pgmdevlist at gmail.com Thu Nov 11 13:21:48 2010 From: pgmdevlist at gmail.com (Pierre GM) Date: Thu, 11 Nov 2010 19:21:48 +0100 Subject: [Numpy-discussion] numpy.genfromtxt converters issue In-Reply-To: References: <66F2753F-6F60-4E1B-9A10-960EBE1F614C@gmail.com> Message-ID: <3F81849E-3C3F-406B-8BB6-C20299F38124@gmail.com> All, Sorry for the delayed answer. I had a bit of time and examined the issue in more details: As you've seen, the output of your converter is not detected as a float, but as an object. That's an unfortunate side effect of using a lambda function such as yours: what if your input string has only 1 character ? You end up taking the float of an empty string, which raises a ValueError. In practice, that's exactly what happens below the hood when genfromtxt tries to guess the output type of the converter. It tries a single value ('1'), fails, and decides that the result must be an object... Probably not the best strategy, as it crashes in your case. But yours is a buggy case anyway. Try that instead of your lambda function {{{ def func(s): try: r = float(s[1:]) except ValueError: r = 1. return r }}} You could object that as the dtype is defined, it should take precedence over the output typeof the converter. Well, I assumed exactly the opposite: if the user took the time to define a converter, we should respect his/her choice and overwrite the dtype. Now, we can argue over the very last point: if both a converter and a dtype are specified, which one should take precedence? You have my opinion, let's hear yours. P. From xscript at gmx.net Thu Nov 11 14:31:03 2010 From: xscript at gmx.net (=?utf-8?Q?Llu=C3=ADs?=) Date: Thu, 11 Nov 2010 20:31:03 +0100 Subject: [Numpy-discussion] numpy.genfromtxt converters issue In-Reply-To: <3F81849E-3C3F-406B-8BB6-C20299F38124@gmail.com> (Pierre GM's message of "Thu, 11 Nov 2010 19:21:48 +0100") References: <66F2753F-6F60-4E1B-9A10-960EBE1F614C@gmail.com> <3F81849E-3C3F-406B-8BB6-C20299F38124@gmail.com> Message-ID: <87y68zr72g.fsf@ginnungagap.bsc.es> Pierre GM writes: > In practice, that's exactly what happens below the hood when > genfromtxt tries to guess the output type of the converter. It tries a > single value ('1'), fails, and decides that the result must be an > object... Probably not the best strategy, as it crashes in your > case. But yours is a buggy case anyway. [...] > Now, we can argue over the very last point: if both a converter and a > dtype are specified, which one should take precedence? > You have my opinion, let's hear yours. What about delaying the calculation of converters? Instead of using type checks with fake data, 'StringConverter.update' could take an optional argument 'imput_sample' (defaulting to "1") in order to perform its checks. Then, use real data from the first (non-comment, non-names) line of the input file when calling 'StringConverter.update' in 'genfromtxt'. Lluis -- "And it's much the same thing with knowledge, for whenever you learn something new, the whole world becomes that much richer." -- The Princess of Pure Reason, as told by Norton Juster in The Phantom Tollbooth From matthew.brett at gmail.com Thu Nov 11 14:32:36 2010 From: matthew.brett at gmail.com (Matthew Brett) Date: Thu, 11 Nov 2010 11:32:36 -0800 Subject: [Numpy-discussion] Developmental version numbering with git In-Reply-To: References: <4CD850D2.7020703@gmail.com> Message-ID: Hi, On Tue, Nov 9, 2010 at 7:48 AM, Charles R Harris wrote: > > > On Tue, Nov 9, 2010 at 8:20 AM, Scott Sinclair > wrote: >> >> On 8 November 2010 23:17, Matthew Brett wrote: >> >> Since the change to git the numpy version in setup.py is '2.0.0.dev' >> >> regardless because the prior numbering was determined by svn. >> >> >> >> Is there a plan to add some numbering system to numpy developmental >> >> version? >> >> >> >> Regardless of the answer, the 'numpy/numpy/version.py' will need to >> >> changed because of the reference to the svn naming. >> > >> > In case it's useful, we (nipy) went for a scheme where the version >> > number stays as '2.0.0.dev', but we keep a record of what git commit >> > has we are on - described here: >> > >> > http://web.archiveorange.com/archive/v/AW2a1CzoOZtfBfNav9hd >> > >> > I can post more details of the implementation if it's of any interest, >> >> In the meantime there's a patch in that direction here: >> >> https://github.com/numpy/numpy/pull/12 Tiny patch for py3k attached. Should the generated numpy/version.py be in .gitignore? Is there a better name in order to signal the generated nature of the file? Best, Matthew -------------- next part -------------- A non-text attachment was scrubbed... Name: 0001-BF-py3k-fix-for-git-version-string.patch Type: application/octet-stream Size: 647 bytes Desc: not available URL: From pgmdevlist at gmail.com Thu Nov 11 14:35:34 2010 From: pgmdevlist at gmail.com (Pierre GM) Date: Thu, 11 Nov 2010 20:35:34 +0100 Subject: [Numpy-discussion] numpy.genfromtxt converters issue In-Reply-To: <87y68zr72g.fsf@ginnungagap.bsc.es> References: <66F2753F-6F60-4E1B-9A10-960EBE1F614C@gmail.com> <3F81849E-3C3F-406B-8BB6-C20299F38124@gmail.com> <87y68zr72g.fsf@ginnungagap.bsc.es> Message-ID: <889FD1C0-0C88-4F7A-A26F-4423D989DB4A@gmail.com> On Nov 11, 2010, at 8:31 PM, Llu?s wrote: > Pierre GM writes: > >> In practice, that's exactly what happens below the hood when >> genfromtxt tries to guess the output type of the converter. It tries a >> single value ('1'), fails, and decides that the result must be an >> object... Probably not the best strategy, as it crashes in your >> case. But yours is a buggy case anyway. > [...] >> Now, we can argue over the very last point: if both a converter and a >> dtype are specified, which one should take precedence? >> You have my opinion, let's hear yours. > > What about delaying the calculation of converters? > Instead of using type > checks with fake data, 'StringConverter.update' could take an optional > argument 'imput_sample' (defaulting to "1") in order to perform its > checks. > > Then, use real data from the first (non-comment, non-names) line of the > input file when calling 'StringConverter.update' in 'genfromtxt'. Mmh. That's an idea... Do you have a patch to suggest? From charlesr.harris at gmail.com Thu Nov 11 14:38:53 2010 From: charlesr.harris at gmail.com (Charles R Harris) Date: Thu, 11 Nov 2010 12:38:53 -0700 Subject: [Numpy-discussion] Merging the refactor. Message-ID: Hi All, I''d like to open a discussion about the steps to be followed in merging the numpy refactor. I have two concerns about this. First, the refactor repository branched off some time ago and I'm concerned about code divergence, not just in the refactoring, but in fixes going into the master branch on github. Second, it is likely that a flag day will look like the easiest solution and I think we should avoid that. At the moment it seems to me that the changes can be broken up into three categories: 1) Movement of files and resulting changes to the build process. 2) Refactoring of the files for CPython. 3) Addition of an IronPython interface. I'd like to see 1) go into the master branch as soon as possible, followed by 2) so that the changes can be tested and fixes will go into a common repository. The main github repository can then be branched for adding the IronPython stuff. In short, I think it would be usefull to abandon the teoliphant fork at some point and let the work continue in a fork of the numpy repository. I'm not intimately familiar with details of the changes that have been made in the refactor, so I welcome any thoughts by those folks involved in the work. And of course by the usual numpy people who will need to adjust to the changes. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Thu Nov 11 14:44:39 2010 From: charlesr.harris at gmail.com (Charles R Harris) Date: Thu, 11 Nov 2010 12:44:39 -0700 Subject: [Numpy-discussion] Developmental version numbering with git In-Reply-To: References: <4CD850D2.7020703@gmail.com> Message-ID: On Thu, Nov 11, 2010 at 12:32 PM, Matthew Brett wrote: > Hi, > > On Tue, Nov 9, 2010 at 7:48 AM, Charles R Harris > wrote: > > > > > > On Tue, Nov 9, 2010 at 8:20 AM, Scott Sinclair gmail.com> > > wrote: > >> > >> On 8 November 2010 23:17, Matthew Brett > wrote: > >> >> Since the change to git the numpy version in setup.py is '2.0.0.dev' > >> >> regardless because the prior numbering was determined by svn. > >> >> > >> >> Is there a plan to add some numbering system to numpy developmental > >> >> version? > >> >> > >> >> Regardless of the answer, the 'numpy/numpy/version.py' will need to > >> >> changed because of the reference to the svn naming. > >> > > >> > In case it's useful, we (nipy) went for a scheme where the version > >> > number stays as '2.0.0.dev', but we keep a record of what git commit > >> > has we are on - described here: > >> > > >> > http://web.archiveorange.com/archive/v/AW2a1CzoOZtfBfNav9hd > >> > > >> > I can post more details of the implementation if it's of any interest, > >> > >> In the meantime there's a patch in that direction here: > >> > >> https://github.com/numpy/numpy/pull/12 > > Tiny patch for py3k attached. > > Should the generated numpy/version.py be in .gitignore? Is there a > better name in order to signal the generated nature of the file? > > I thought it already was, but if not, yes, I think it should be added. I suppose we could add a 'generated' suffix to the name to mark it as such, but really it seems the file should go into the build directory somewhere, although that might make it difficult to access if needed in other parts of the build. Having the generated file in the main tree was something that bothered me when I committed the patch, but not enought to try to fix it. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From matthew.brett at gmail.com Thu Nov 11 15:09:26 2010 From: matthew.brett at gmail.com (Matthew Brett) Date: Thu, 11 Nov 2010 12:09:26 -0800 Subject: [Numpy-discussion] Developmental version numbering with git In-Reply-To: References: <4CD850D2.7020703@gmail.com> Message-ID: Hi, On Thu, Nov 11, 2010 at 11:44 AM, Charles R Harris wrote: > On Thu, Nov 11, 2010 at 12:32 PM, Matthew Brett >> Tiny patch for py3k attached. >> >> Should the generated numpy/version.py be in .gitignore? ?Is there a >> better name in order to signal the generated nature of the file? >> > > I thought it already was, but if not, yes, I think it should be added. I > suppose we could add a 'generated' suffix to the name to mark it as such, > but really it seems the file should go into the build directory somewhere, > although that might make it difficult to access if needed in other parts of > the build. Having the generated file in the main tree was something that > bothered me when I committed the patch, but not enought to try to fix it. I never really understood what numpy/__config__.py was for, or how it came about, but I had the impression that was the place that build-time stuff got written - is that correct? Would it be a sensible place to write version information? See you, Matthew From pav at iki.fi Thu Nov 11 16:08:55 2010 From: pav at iki.fi (Pauli Virtanen) Date: Thu, 11 Nov 2010 21:08:55 +0000 (UTC) Subject: [Numpy-discussion] Merging the refactor. References: Message-ID: On Thu, 11 Nov 2010 12:38:53 -0700, Charles R Harris wrote: > I'd like to open a discussion about the steps to be followed in merging > the numpy refactor. I have two concerns about this. First, the refactor > repository branched off some time ago and I'm concerned about code > divergence, not just in the refactoring, but in fixes going into the > master branch on github. Second, it is likely that a flag day will look > like the easiest solution and I think we should avoid that. What is a "flag day"? > At the moment it seems to me that the changes can be broken up into > three categories: > > 1) Movement of files and resulting changes to the build process. > 2) Refactoring of the files for CPython. > 3) Addition of an IronPython interface. > > I'd like to see 1) go into the master branch as soon as possible, > followed by 2) so that the changes can be tested and fixes will go into > a common repository. The main github repository can then be branched for > adding the IronPython stuff. In short, I think it would be usefull to > abandon the teoliphant fork at some point and let the work continue in a > fork of the numpy repository. The first step I would like to see is to re-graft the teoliphant branch onto the current Git history -- currently, it's still based on Git-SVN. Re-grafting would make incremental merging and tracking easier. Luckily, this is easy to do thanks to Git's data model (I have a script for it), and I believe it could be useful to do it ASAP. -- Pauli Virtanen #!/bin/bash # # Graft changesets $OLD_START..$OLD_BRANCH onto $NEW_START, into a branch # $NEW_BRANCH # set -e OLD_START=7e1e5da84fc110936035660974167cd33f9e4831 # last SVN commit in old repo NEW_START=b056b23a27fe4f56f923168bb9931429765084d1 # corresponding Git commit in new repo OLD_BRANCH=origin/refactor NEW_BRANCH=new-rebased run() { echo "$ $@"; "$@"; } if git remote|grep -q numpy-upstream; then true else run git remote add numpy-upstream git://github.com/numpy/numpy.git run git fetch numpy-upstream fi run git checkout $OLD_BRANCH run git branch -D $NEW_BRANCH || true run git checkout -b $NEW_BRANCH $OLD_BRANCH # # Refilter # # - reparent the root commits # - prune unnecessary (and huge) .sdf files from history # rm -rf .git/refs/original run git filter-branch \ --index-filter 'git rm --cached --ignore-unmatch **.sdf' \ --parent-filter "sed -e ' s/-p $OLD_START/-p $NEW_START/g; s/-p c3f10ec730a5d066838b10cd7f6c9c104eb9f1cf/-p a839a427939f0c29fe4757011f86bb068ab66569/g; '" \ $NEW_BRANCH ^$OLD_START # # Make a few sanity checks # git diff $OLD_START $OLD_BRANCH > old.diff git diff $NEW_START $NEW_BRANCH > new.diff git diff $NEW_BRANCH $OLD_BRANCH > heads.diff test -s heads.diff && { echo "ERROR: New heads do not match!"; exit 1; } diff -u old.diff new.diff || { echo "ERROR: patches from start do not match!"; exit 1; } (git log $NEW_BRANCH | grep -q 'git-svn-id') && { echo "ERROR: some git-svn-id commits remain in the history"; exit 1; } echo "Everything seems OK!" From charlesr.harris at gmail.com Thu Nov 11 16:30:32 2010 From: charlesr.harris at gmail.com (Charles R Harris) Date: Thu, 11 Nov 2010 14:30:32 -0700 Subject: [Numpy-discussion] Merging the refactor. In-Reply-To: References: Message-ID: On Thu, Nov 11, 2010 at 2:08 PM, Pauli Virtanen wrote: > On Thu, 11 Nov 2010 12:38:53 -0700, Charles R Harris wrote: > > I'd like to open a discussion about the steps to be followed in merging > > the numpy refactor. I have two concerns about this. First, the refactor > > repository branched off some time ago and I'm concerned about code > > divergence, not just in the refactoring, but in fixes going into the > > master branch on github. Second, it is likely that a flag day will look > > like the easiest solution and I think we should avoid that. > > What is a "flag day"? > > It all goes in as one big commit. > > At the moment it seems to me that the changes can be broken up into > > three categories: > > > > 1) Movement of files and resulting changes to the build process. > > 2) Refactoring of the files for CPython. > > 3) Addition of an IronPython interface. > > > > I'd like to see 1) go into the master branch as soon as possible, > > followed by 2) so that the changes can be tested and fixes will go into > > a common repository. The main github repository can then be branched for > > adding the IronPython stuff. In short, I think it would be usefull to > > abandon the teoliphant fork at some point and let the work continue in a > > fork of the numpy repository. > > The first step I would like to see is to re-graft the teoliphant branch > onto the current Git history -- currently, it's still based on Git-SVN. > Re-grafting would make incremental merging and tracking easier. Luckily, > this is easy to do thanks to Git's data model (I have a script for it), > and I believe it could be useful to do it ASAP. > > I agree that would be an excellent start. Speaking of repo surgery, you might find esr's latest project of interest. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From oliphant at enthought.com Thu Nov 11 17:15:04 2010 From: oliphant at enthought.com (Travis Oliphant) Date: Thu, 11 Nov 2010 22:15:04 +0000 Subject: [Numpy-discussion] Merging the refactor. In-Reply-To: References: Message-ID: <1EE0C490-C521-4C0B-B8AC-9BBC2E8FDD4A@enthought.com> Thanks for starting the discussion, Charles. Merging of the re-factor is a priority for me once I get back from last 9 weeks of travel I have been on (I have been travelling for business 7 of the last 9 weeks). Ilan Schnell has already been looking at how to accomplish the merge (and I have been reading up on Git so that I understand the commit model better and can possibly help without being a complete neophyte with git). Pauli's script will be very helpful. I'm very enthused about the bug-fixes, memory-leak closures, and new tests that have been added on the re-factor branch. I'm also interested in getting more community feedback on the ndarray library C-API, and the other changes that have been made. This feedback will be essential before the changes can become NumPy 2.0. I would also like to see a few more NEPs become part of NumPy 2.0 over the next several months. I have a long wish list of additional NEPS that I've only sketched in quick drafts at this point as well --- datetime finishes, geometry-information, i.e. dimension and index labels, reduce-by implementation, indirect arrays, and generator array objects. My initial guess as to how quickly we would have a NumPy 2.0 was ambitious partly because I have had almost zero time personally to work on it, and partly because we have been resource constrained which has pushed us to draw out the project a bit. But, I've come up with a long list of new features for NumPy 2.0 that I would like to hash out on the mailing lists over the next months as well. My hope is for NumPy 2.0 to come out by the end of Q1 sometime next year. My hopes may have to be tempered by limited time resources, of course. At the same time, the work on the .NET framework has pushed us to move more of SciPy to a Cython-generated set. There are additional things I would like to see SciPy improve on as well, but I am not sure who is going to work on them. If I had my dream, there would be more modularity to the packages, and an improved packaging system --- and of course, porting to Python 3k. I would like to see core SciPy be a smaller set containing a few core packages. (linear algebra, statistics, optimization, interpolation, signal processing, and image processing). Then, I would like to see scipy. packages which are released and packaged separately with the whole system available on github. The past couple of years have been very busy for me (and continue to be busy), but I am hoping that next year will allow me more time to spend on promoting sprints, and participating more in the community. I will not have the time I used to have when I was a full-time academic, but I plan to be more involved in helping promote SciPy development. With SciPy moved over to github, I think that will even be possible without my stepping on everybody else's hard work. -Travis I have forwarded th On Nov 11, 2010, at 9:30 PM, Charles R Harris wrote: > > > On Thu, Nov 11, 2010 at 2:08 PM, Pauli Virtanen wrote: > On Thu, 11 Nov 2010 12:38:53 -0700, Charles R Harris wrote: > > I'd like to open a discussion about the steps to be followed in merging > > the numpy refactor. I have two concerns about this. First, the refactor > > repository branched off some time ago and I'm concerned about code > > divergence, not just in the refactoring, but in fixes going into the > > master branch on github. Second, it is likely that a flag day will look > > like the easiest solution and I think we should avoid that. > > What is a "flag day"? > > > It all goes in as one big commit. > > > At the moment it seems to me that the changes can be broken up into > > three categories: > > > > 1) Movement of files and resulting changes to the build process. > > 2) Refactoring of the files for CPython. > > 3) Addition of an IronPython interface. > > > > I'd like to see 1) go into the master branch as soon as possible, > > followed by 2) so that the changes can be tested and fixes will go into > > a common repository. The main github repository can then be branched for > > adding the IronPython stuff. In short, I think it would be usefull to > > abandon the teoliphant fork at some point and let the work continue in a > > fork of the numpy repository. > > The first step I would like to see is to re-graft the teoliphant branch > onto the current Git history -- currently, it's still based on Git-SVN. > Re-grafting would make incremental merging and tracking easier. Luckily, > this is easy to do thanks to Git's data model (I have a script for it), > and I believe it could be useful to do it ASAP. > > > I agree that would be an excellent start. Speaking of repo surgery, you might find esr's latest project of interest. > > > > Chuck > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion --- Travis Oliphant Enthought, Inc. oliphant at enthought.com 1-512-536-1057 http://www.enthought.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From jmccampbell at enthought.com Thu Nov 11 17:37:53 2010 From: jmccampbell at enthought.com (Jason McCampbell) Date: Thu, 11 Nov 2010 16:37:53 -0600 Subject: [Numpy-discussion] Merging the refactor. In-Reply-To: References: Message-ID: Hi Chuck, Pauli, This is indeed a good time to bring this up as we are in the process fixing Python 3 issues and then merging changes from the master tree in preparation for being able to consider merging the work. More specific comments inline below. Regards, Jason On Thu, Nov 11, 2010 at 3:30 PM, Charles R Harris wrote: > > > On Thu, Nov 11, 2010 at 2:08 PM, Pauli Virtanen wrote: > >> On Thu, 11 Nov 2010 12:38:53 -0700, Charles R Harris wrote: >> > I'd like to open a discussion about the steps to be followed in merging >> > the numpy refactor. I have two concerns about this. First, the refactor >> > repository branched off some time ago and I'm concerned about code >> > divergence, not just in the refactoring, but in fixes going into the >> > master branch on github. Second, it is likely that a flag day will look >> > like the easiest solution and I think we should avoid that. >> >> What is a "flag day"? >> >> > It all goes in as one big commit. > > >> > At the moment it seems to me that the changes can be broken up into >> > three categories: >> > >> > 1) Movement of files and resulting changes to the build process. >> > 2) Refactoring of the files for CPython. >> > 3) Addition of an IronPython interface. >> > 1) and 2) are really the same step as we haven't moved/renamed existing files but instead moved content from the CPython interface files into new, platform-independent files. Specifically, there is a new top-level directory 'libndarray' that contains the platform-independent core. The existing CPython interface files remain in place, but much of the functionality is now implemented by calling into this core. Unfortunately this makes merging difficult because some changes need to be manually applied to a different file. Once all regression tests are passing on the refactor branch for both Python 2.x and 3.x (3.x is in progress) Ilan is going to start working on applying all accumulated changes. The good news is that 95% of our changes are to core/multiarray and core/umath and there are relatively few changes to these modules in the master repository. The IronPython interface lives in its own directory and is quite standalone. It just links to the .so from libndarray and just has a Visual Studio solution -- it is not part of the main build for now to avoid breaking all of the people who don't care about it. > > I'd like to see 1) go into the master branch as soon as possible, >> > followed by 2) so that the changes can be tested and fixes will go into >> > a common repository. The main github repository can then be branched for >> > adding the IronPython stuff. In short, I think it would be usefull to >> > abandon the teoliphant fork at some point and let the work continue in a >> > fork of the numpy repository. >> >> The first step I would like to see is to re-graft the teoliphant branch >> onto the current Git history -- currently, it's still based on Git-SVN. >> Re-grafting would make incremental merging and tracking easier. Luckily, >> this is easy to do thanks to Git's data model (I have a script for it), >> and I believe it could be useful to do it ASAP. >> >> > I agree that would be an excellent start. Speaking of repo surgery, you > might find esr's latest project of > interest. > We will take a look at this and the script. There is also a feature in git that allows two trees to be grafted together so the refactoring will end up as a branch on the main repository with all edits. My hope is that we can roll all of our changes into the main repository as a branch and then selectively merge to the main branch as desired. For example, as you said, the IronPython changes don't need to be merged immediate. Either way, I fully agree that we want to abandon our fork as soon as possible. If anything, it will go along way towards easing the merge and getting more eyeballs on the changes we have made so far. On Thu, Nov 11, 2010 at 3:30 PM, Charles R Harris wrote: > > > On Thu, Nov 11, 2010 at 2:08 PM, Pauli Virtanen wrote: > >> On Thu, 11 Nov 2010 12:38:53 -0700, Charles R Harris wrote: >> > I'd like to open a discussion about the steps to be followed in merging >> > the numpy refactor. I have two concerns about this. First, the refactor >> > repository branched off some time ago and I'm concerned about code >> > divergence, not just in the refactoring, but in fixes going into the >> > master branch on github. Second, it is likely that a flag day will look >> > like the easiest solution and I think we should avoid that. >> >> What is a "flag day"? >> >> > It all goes in as one big commit. > > >> > At the moment it seems to me that the changes can be broken up into >> > three categories: >> > >> > 1) Movement of files and resulting changes to the build process. >> > 2) Refactoring of the files for CPython. >> > 3) Addition of an IronPython interface. >> > >> > I'd like to see 1) go into the master branch as soon as possible, >> > followed by 2) so that the changes can be tested and fixes will go into >> > a common repository. The main github repository can then be branched for >> > adding the IronPython stuff. In short, I think it would be usefull to >> > abandon the teoliphant fork at some point and let the work continue in a >> > fork of the numpy repository. >> >> The first step I would like to see is to re-graft the teoliphant branch >> onto the current Git history -- currently, it's still based on Git-SVN. >> Re-grafting would make incremental merging and tracking easier. Luckily, >> this is easy to do thanks to Git's data model (I have a script for it), >> and I believe it could be useful to do it ASAP. >> >> > I agree that would be an excellent start. Speaking of repo surgery, you > might find esr's latest project of > interest. > > > > Chuck > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From pav at iki.fi Thu Nov 11 19:36:00 2010 From: pav at iki.fi (Pauli Virtanen) Date: Fri, 12 Nov 2010 00:36:00 +0000 (UTC) Subject: [Numpy-discussion] Merging the refactor. References: Message-ID: On Thu, 11 Nov 2010 21:08:55 +0000, Pauli Virtanen wrote: [clip] > The first step I would like to see is to re-graft the teoliphant branch > onto the current Git history -- currently, it's still based on Git-SVN. > Re-grafting would make incremental merging and tracking easier. Luckily, > this is easy to do thanks to Git's data model (I have a script for it), > and I believe it could be useful to do it ASAP. This needs to be added to the --parent-filter in the script, though: s/-p b629b740c9fb4685c5fd3d822efec8250d556ad4/-p 9ea50db4b5ca3a26c05cf4df364fa40f873da545/; so that it attaches the dangling datetime commits to the correct place. After this, "git rebase" of the changes in master onto the refactor branch seems to proceed reasonably. Based on a quick try, I got to 24 of 214 commits in ~ two hours, so I'd guess it'd take a few days at most to merge the changes to this direction. The conflicts don't seem too bad. The main annoyance is that in some cases (mainly the *.src files) Git fails to notice partial content moves, and generates big conflicts that need to be resolved by applying patches manually to libndarray/src/* instead of numpy/core/src/*. We probably won't want to do the merge by rebasing like this, though. The main technical question in the merging seem to be then if it's possible to rewrite the refactoring history to group changes to bigger logical chunks that would be easier to eyeball over and nicer to present to the posterity. Anyway, things are looking good :) -- Pauli Virtanen From pav at iki.fi Thu Nov 11 19:48:59 2010 From: pav at iki.fi (Pauli Virtanen) Date: Fri, 12 Nov 2010 00:48:59 +0000 (UTC) Subject: [Numpy-discussion] Merging the refactor. References: Message-ID: On Thu, 11 Nov 2010 16:37:53 -0600, Jason McCampbell wrote: [clip] > We will take a look at this and the script. There is also a feature in > git that allows two trees to be grafted together so the refactoring will > end up as a branch on the main repository with all edits. Yes, this is pretty much what the script does -- it detaches the commits in the refactor branch from the Git-SVN history, and reattaches them to the new Git history. This changes only the DAG of the commits, and not the tree and file contents corresponding to each commit. (Git's graft feature can only add new parents, so filter-branch is needed.) > My hope is that we can roll all of our changes into the main > repository as a branch and then selectively merge to the main > branch as desired. For example, as you said, the IronPython > changes don't need to be merged immediate. I'm not sure if we should put development branches at all in the main repository. A repository like github.com/numpy/numpy-refactor might be a better solution, and also give visibility. -- Pauli Virtanen From charlesr.harris at gmail.com Thu Nov 11 20:08:50 2010 From: charlesr.harris at gmail.com (Charles R Harris) Date: Thu, 11 Nov 2010 18:08:50 -0700 Subject: [Numpy-discussion] Merging the refactor. In-Reply-To: References: Message-ID: On Thu, Nov 11, 2010 at 5:48 PM, Pauli Virtanen wrote: > On Thu, 11 Nov 2010 16:37:53 -0600, Jason McCampbell wrote: > [clip] > > We will take a look at this and the script. There is also a feature in > > git that allows two trees to be grafted together so the refactoring will > > end up as a branch on the main repository with all edits. > > Yes, this is pretty much what the script does -- it detaches the commits > in the refactor branch from the Git-SVN history, and reattaches them to > the new Git history. This changes only the DAG of the commits, and not > the tree and file contents corresponding to each commit. > > (Git's graft feature can only add new parents, so filter-branch is > needed.) > > > My hope is that we can roll all of our changes into the main > > repository as a branch and then selectively merge to the main > > branch as desired. For example, as you said, the IronPython > > changes don't need to be merged immediate. > > I'm not sure if we should put development branches at all in > the main repository. > > A repository like > > github.com/numpy/numpy-refactor > > might be a better solution, and also give visibility. > > I think that is right. The problem in merging stuff back into numpy from there will be tracking what has been merged and hasn't and consolidating things up into logical chunks. I'm not sure what the best workflow for that process will be. As to the first bits to merge, I would suggest the tests. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Thu Nov 11 21:43:22 2010 From: charlesr.harris at gmail.com (Charles R Harris) Date: Thu, 11 Nov 2010 19:43:22 -0700 Subject: [Numpy-discussion] Merging the refactor. In-Reply-To: <1EE0C490-C521-4C0B-B8AC-9BBC2E8FDD4A@enthought.com> References: <1EE0C490-C521-4C0B-B8AC-9BBC2E8FDD4A@enthought.com> Message-ID: On Thu, Nov 11, 2010 at 3:15 PM, Travis Oliphant wrote: > Thanks for starting the discussion, Charles. > > Merging of the re-factor is a priority for me once I get back from last 9 > weeks of travel I have been on (I have been travelling for business 7 of the > last 9 weeks). > > Ilan Schnell has already been looking at how to accomplish the merge (and I > have been reading up on Git so that I understand the commit model better and > can possibly help without being a complete neophyte with git). Pauli's > script will be very helpful. > > I'm very enthused about the bug-fixes, memory-leak closures, and new tests > that have been added on the re-factor branch. I'm also interested in > getting more community feedback on the ndarray library C-API, and the other > changes that have been made. This feedback will be essential before the > changes can become NumPy 2.0. I would also like to see a few more NEPs > become part of NumPy 2.0 over the next several months. I have a long wish > list of additional NEPS that I've only sketched in quick drafts at this > point as well --- datetime finishes, geometry-information, i.e. dimension > and index labels, reduce-by implementation, indirect arrays, and generator > array objects. > > Let's not go overboard here. I think it would be a good idea to keep the numpy core as unencumbered as possible. Adding things that let other people build stuff is great, but putting too much at the core will likely make maintenance more difficult and the barrier to entry higher. IMHO, the core of numpy functionality is access to strided memory, topped with ufuncs. Linear algebra, random numbers, etc are add-ons, but useful ones to combine with the core package. I think that index labels are already pushing the limits. What do you want to do with datetime? We could remove it from the current trunk and leave it to come in with the refactoring when it is ready. > My initial guess as to how quickly we would have a NumPy 2.0 was ambitious > partly because I have had almost zero time personally to work on it, and > partly because we have been resource constrained which has pushed us to draw > out the project a bit. But, I've come up with a long list of new > features for NumPy 2.0 that I would like to hash out on the mailing lists > over the next months as well. My hope is for NumPy 2.0 to come out by the > end of Q1 sometime next year. My hopes may have to be tempered by limited > time resources, of course. > > The rule of thumb is to multiply software time estimates by four. The multiplication needs to be done by someone uninvolved because programmers usually think they have already accounted for the unexpected time requirements. > At the same time, the work on the .NET framework has pushed us to move more > of SciPy to a Cython-generated set. There are additional things I would > like to see SciPy improve on as well, but I am not sure who is going to work > on them. If I had my dream, there would be more modularity to the > packages, and an improved packaging system --- and of course, porting to > Python 3k. I would like to see core SciPy be a smaller set containing a > few core packages. (linear algebra, statistics, optimization, > interpolation, signal processing, and image processing). Then, I would > like to see scipy. packages which are released and packaged > separately with the whole system available on github. > > The past couple of years have been very busy for me (and continue to be > busy), but I am hoping that next year will allow me more time to spend on > promoting sprints, and participating more in the community. I will not have > the time I used to have when I was a full-time academic, but I plan to be > more involved in helping promote SciPy development. With SciPy moved over > to github, I think that will even be possible without my stepping on > everybody else's hard work. > > Oh, the guys in the corner offices always say that. Somehow it doesn't work out that way, someone has to keep the business going. The best way to keep working at the bench, so to speak, is to avoid promotion in the first place. I'm afraid it may be too late for you ;) Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From cournape at gmail.com Thu Nov 11 22:02:55 2010 From: cournape at gmail.com (David Cournapeau) Date: Fri, 12 Nov 2010 12:02:55 +0900 Subject: [Numpy-discussion] Merging the refactor. In-Reply-To: <1EE0C490-C521-4C0B-B8AC-9BBC2E8FDD4A@enthought.com> References: <1EE0C490-C521-4C0B-B8AC-9BBC2E8FDD4A@enthought.com> Message-ID: On Fri, Nov 12, 2010 at 7:15 AM, Travis Oliphant wrote: > At the same time, the work on the .NET framework has pushed us to move more > of SciPy to a Cython-generated set. ? There are additional things I would > like to see SciPy improve on as well, but I am not sure who is going to work > on them. ? If I had my dream, there would be more modularity to the > packages, and an improved packaging system --- and of course, porting to > Python 3k. I don't exactly where we are there, but Pauli and me took a look at scipy for python 3 at euroscipy in Paris, and I think it is mostly a matter of low hanging fruits. Most (all ?) changes are in the trunk already. > ? I would like to see core SciPy be a smaller set containing a > few core packages. ? (linear algebra, statistics, optimization, > interpolation, signal processing, and image processing). ? Then, I would > like to see scipy. packages which are released and packaged > separately with the whole system available on github. While I agree with the sentiment, I think it would be a mistake to do so before we have the infrastructure to actually deliver packages and so on. I understand there is a bit of a chicken and egg issue as well. I spent most if not all my free time in 2010 to work on that issue, and I will summarize the current status in a separate email to the ML to avoid disrupting the main discussion on the refactoring, cheers, David From dagss at student.matnat.uio.no Fri Nov 12 02:40:18 2010 From: dagss at student.matnat.uio.no (Dag Sverre Seljebotn) Date: Fri, 12 Nov 2010 08:40:18 +0100 Subject: [Numpy-discussion] Merging the refactor. In-Reply-To: <1EE0C490-C521-4C0B-B8AC-9BBC2E8FDD4A@enthought.com> References: <1EE0C490-C521-4C0B-B8AC-9BBC2E8FDD4A@enthought.com> Message-ID: <4CDCEF62.8090101@student.matnat.uio.no> On 11/11/2010 11:15 PM, Travis Oliphant wrote: > Thanks for starting the discussion, Charles. > > Merging of the re-factor is a priority for me once I get back from > last 9 weeks of travel I have been on (I have been travelling for > business 7 of the last 9 weeks). > > Ilan Schnell has already been looking at how to accomplish the merge > (and I have been reading up on Git so that I understand the commit > model better and can possibly help without being a complete neophyte > with git). Pauli's script will be very helpful. > > I'm very enthused about the bug-fixes, memory-leak closures, and new > tests that have been added on the re-factor branch. I'm also > interested in getting more community feedback on the ndarray library > C-API, and the other changes that have been made. This feedback > will be essential before the changes can become NumPy 2.0. I would > also like to see a few more NEPs become part of NumPy 2.0 over the > next several months. I have a long wish list of additional NEPS that > I've only sketched in quick drafts at this point as well --- datetime > finishes, geometry-information, i.e. dimension and index labels, > reduce-by implementation, indirect arrays, and generator array objects. > > My initial guess as to how quickly we would have a NumPy 2.0 was > ambitious partly because I have had almost zero time personally to > work on it, and partly because we have been resource constrained which > has pushed us to draw out the project a bit. But, I've come up > with a long list of new features for NumPy 2.0 that I would like to > hash out on the mailing lists over the next months as well. My hope > is for NumPy 2.0 to come out by the end of Q1 sometime next year. My > hopes may have to be tempered by limited time resources, of course. Conventionally, 2.0 would be the preferred point to break backwards compatability (and changes that could break stability), while simply adding new backwards compatible features can just as well be done in 2.1. IMO the crucial question is: Would it be possible to split this long list you have in mind in this fashion? And how much remains that will break backwards compatibility or cause instability? Dag Sverre > > At the same time, the work on the .NET framework has pushed us to move > more of SciPy to a Cython-generated set. There are additional things > I would like to see SciPy improve on as well, but I am not sure who is > going to work on them. If I had my dream, there would be more > modularity to the packages, and an improved packaging system --- and > of course, porting to Python 3k. I would like to see core SciPy be a > smaller set containing a few core packages. (linear algebra, > statistics, optimization, interpolation, signal processing, and image > processing). Then, I would like to see scipy. packages which > are released and packaged separately with the whole system available > on github. > > The past couple of years have been very busy for me (and continue to > be busy), but I am hoping that next year will allow me more time to > spend on promoting sprints, and participating more in the community. > I will not have the time I used to have when I was a full-time > academic, but I plan to be more involved in helping promote SciPy > development. With SciPy moved over to github, I think that will even > be possible without my stepping on everybody else's hard work. > > -Travis > > > > > > > I have forwarded th > On Nov 11, 2010, at 9:30 PM, Charles R Harris wrote: > >> >> >> On Thu, Nov 11, 2010 at 2:08 PM, Pauli Virtanen > > wrote: >> >> On Thu, 11 Nov 2010 12:38:53 -0700, Charles R Harris wrote: >> > I'd like to open a discussion about the steps to be followed in >> merging >> > the numpy refactor. I have two concerns about this. First, the >> refactor >> > repository branched off some time ago and I'm concerned about code >> > divergence, not just in the refactoring, but in fixes going >> into the >> > master branch on github. Second, it is likely that a flag day >> will look >> > like the easiest solution and I think we should avoid that. >> >> What is a "flag day"? >> >> >> It all goes in as one big commit. >> >> > At the moment it seems to me that the changes can be broken up into >> > three categories: >> > >> > 1) Movement of files and resulting changes to the build process. >> > 2) Refactoring of the files for CPython. >> > 3) Addition of an IronPython interface. >> > >> > I'd like to see 1) go into the master branch as soon as possible, >> > followed by 2) so that the changes can be tested and fixes will >> go into >> > a common repository. The main github repository can then be >> branched for >> > adding the IronPython stuff. In short, I think it would be >> usefull to >> > abandon the teoliphant fork at some point and let the work >> continue in a >> > fork of the numpy repository. >> >> The first step I would like to see is to re-graft the teoliphant >> branch >> onto the current Git history -- currently, it's still based on >> Git-SVN. >> Re-grafting would make incremental merging and tracking easier. >> Luckily, >> this is easy to do thanks to Git's data model (I have a script >> for it), >> and I believe it could be useful to do it ASAP. >> >> >> I agree that would be an excellent start. Speaking of repo surgery, >> you might find esr's latest project >> of interest. >> >> >> >> Chuck >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion > > --- > Travis Oliphant > Enthought, Inc. > oliphant at enthought.com > 1-512-536-1057 > http://www.enthought.com > > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From pav at iki.fi Fri Nov 12 05:20:25 2010 From: pav at iki.fi (Pauli Virtanen) Date: Fri, 12 Nov 2010 10:20:25 +0000 (UTC) Subject: [Numpy-discussion] Merging the refactor. References: <1EE0C490-C521-4C0B-B8AC-9BBC2E8FDD4A@enthought.com> Message-ID: Fri, 12 Nov 2010 12:02:55 +0900, David Cournapeau wrote: [clip: Python 3 on Scipy] > I don't exactly where we are there, but Pauli and me took a look at > scipy for python 3 at euroscipy in Paris, and I think it is mostly a > matter of low hanging fruits. Most (all ?) changes are in the trunk > already. Only scipy.weave is left to do. Otherwise, the test suite passes on Python 3. -- Pauli Virtanen From lou_boog2000 at yahoo.com Fri Nov 12 10:02:51 2010 From: lou_boog2000 at yahoo.com (Lou Pecora) Date: Fri, 12 Nov 2010 07:02:51 -0800 (PST) Subject: [Numpy-discussion] Change in Python/Numpy numerics with Py version 2.6 ? Message-ID: <928441.88239.qm@web34408.mail.mud.yahoo.com> I ran across what seems to be a change in how numerics are handled in Python 2.6 or Numpy 1.3.0 or both, I'm not sure. I've recently switched from using Python 2.4 and Numpy 1.0.3 to using the Python 2.6 and Numpy 1.3.0 that comes with SAGE which is a large mathematical package. But the issue seems to be a Python one, not a SAGE one. Here is a short example of code that gives the new behavior: # ---- Return the angle between two vectors ------------ def get_angle(v1,v2): '''v1 and v2 are 1 dimensional numpy arrays''' # Make unit vectors out of v1 and v2 v1norm=sqrt(dot(v1,v1)) v2norm=sqrt(dot(v2,v2)) v1unit=v1/v1norm v2unit=v2/v2norm ang=acos(dot(v1unit,v2unit)) return ang When using Python 2.6 with Numpy 1.3.0 and v1 and v2 are parallel the dot product of v1unit and v2unit sometimes gives 1.0+epsilon where epsilon is the smallest step in the floating point numbers around 1.0 as given in the sys module. This causes acos to throw a Domain Exception. This does not happen when I use Python 2.4 with Numpy 1.0.3. I have put in a check for the exception situation and the code works fine again. I am wondering if there are other changes that I should be aware of. Does anyone know the origin of the change above or other differences in the handling of numerics between the two versions? Thanks for any insight. -- Lou Pecora, my views are my own. From bsouthey at gmail.com Fri Nov 12 10:21:10 2010 From: bsouthey at gmail.com (Bruce Southey) Date: Fri, 12 Nov 2010 09:21:10 -0600 Subject: [Numpy-discussion] Merging the refactor. In-Reply-To: References: <1EE0C490-C521-4C0B-B8AC-9BBC2E8FDD4A@enthought.com> Message-ID: <4CDD5B66.9010805@gmail.com> On 11/11/2010 09:02 PM, David Cournapeau wrote: > On Fri, Nov 12, 2010 at 7:15 AM, Travis Oliphant wrote: > >> At the same time, the work on the .NET framework has pushed us to move more >> of SciPy to a Cython-generated set. There are additional things I would >> like to see SciPy improve on as well, but I am not sure who is going to work >> on them. If I had my dream, there would be more modularity to the >> packages, and an improved packaging system --- and of course, porting to >> Python 3k. > I don't exactly where we are there, but Pauli and me took a look at > scipy for python 3 at euroscipy in Paris, and I think it is mostly a > matter of low hanging fruits. Most (all ?) changes are in the trunk > already. > >> I would like to see core SciPy be a smaller set containing a >> few core packages. (linear algebra, statistics, optimization, >> interpolation, signal processing, and image processing). Then, I would >> like to see scipy. packages which are released and packaged >> separately with the whole system available on github. > While I agree with the sentiment, I think it would be a mistake to do > so before we have the infrastructure to actually deliver packages and > so on. I understand there is a bit of a chicken and egg issue as well. > I spent most if not all my free time in 2010 to work on that issue, > and I will summarize the current status in a separate email to the ML > to avoid disrupting the main discussion on the refactoring, > > cheers, > > David I agree with David comment because splitting requires effective package management to handle all these splits. Also there seems to be little point in splitting if users tend to require more than just the core. The problem with splitting things too finely is that these create more problems that it is worth. We have already experienced incompatibility problems in numarray's short history with at least masked arrays and the datetime addition. Related to this, can the refactoring be used to make future developments of numpy and scipy especially in terms of packaging easier? I can see that moving or renaming of directories and files to more convenient places or names could be easily done at this time. Bruce From charlesr.harris at gmail.com Fri Nov 12 10:34:58 2010 From: charlesr.harris at gmail.com (Charles R Harris) Date: Fri, 12 Nov 2010 08:34:58 -0700 Subject: [Numpy-discussion] Change in Python/Numpy numerics with Py version 2.6 ? In-Reply-To: <928441.88239.qm@web34408.mail.mud.yahoo.com> References: <928441.88239.qm@web34408.mail.mud.yahoo.com> Message-ID: On Fri, Nov 12, 2010 at 8:02 AM, Lou Pecora wrote: > I ran across what seems to be a change in how numerics are handled in > Python 2.6 > or Numpy 1.3.0 or both, I'm not sure. I've recently switched from using > Python > 2.4 and Numpy 1.0.3 to using the Python 2.6 and Numpy 1.3.0 that comes with > SAGE > which is a large mathematical package. But the issue seems to be a Python > one, > not a SAGE one. > > Here is a short example of code that gives the new behavior: > > # ---- Return the angle between two vectors ------------ > def get_angle(v1,v2): > '''v1 and v2 are 1 dimensional numpy arrays''' > # Make unit vectors out of v1 and v2 > v1norm=sqrt(dot(v1,v1)) > v2norm=sqrt(dot(v2,v2)) > v1unit=v1/v1norm > v2unit=v2/v2norm > ang=acos(dot(v1unit,v2unit)) > return ang > > When using Python 2.6 with Numpy 1.3.0 and v1 and v2 are parallel the dot > product of v1unit and v2unit sometimes gives 1.0+epsilon where epsilon is > the > smallest step in the floating point numbers around 1.0 as given in the sys > module. This causes acos to throw a Domain Exception. This does not happen > when > I use Python 2.4 with Numpy 1.0.3. > > Probably an accident or slight difference in compiler optimization. Are you running on a 32 bit system by any chance? > > I have put in a check for the exception situation and the code works fine > again. I am wondering if there are other changes that I should be aware > of. > Does anyone know the origin of the change above or other differences in the > handling of numerics between the two versions? > Thanks for any insight. > The check should have been there in the first place, the straight forward method is subject to roundoff error. It isn't very accurate for almost identical vectors either, you would do better to work with the difference vector in that case: 2*arcsin(||v1 - v2||/2) when v1 and v2 are normalized, or you could try 2*arctan2(||v1 - v2||, ||v1 + v2||). It is also useful to see how the Householder reflection deals with the problem. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From robert.kern at gmail.com Fri Nov 12 10:39:54 2010 From: robert.kern at gmail.com (Robert Kern) Date: Fri, 12 Nov 2010 09:39:54 -0600 Subject: [Numpy-discussion] Change in Python/Numpy numerics with Py version 2.6 ? In-Reply-To: <928441.88239.qm@web34408.mail.mud.yahoo.com> References: <928441.88239.qm@web34408.mail.mud.yahoo.com> Message-ID: On Fri, Nov 12, 2010 at 09:02, Lou Pecora wrote: > I ran across what seems to be a change in how numerics are handled in Python 2.6 > or Numpy 1.3.0 or both, I'm not sure. ?I've recently switched from using Python > 2.4 and Numpy 1.0.3 to using the Python 2.6 and Numpy 1.3.0 that comes with SAGE > which is a large mathematical package. ?But the issue seems to be a Python one, > not a SAGE one. > > Here is a short example of code that gives the new behavior: > > # ---- Return the angle between two vectors ------------ > def get_angle(v1,v2): > ? ? '''v1 and v2 are 1 dimensional numpy arrays''' > ? ? # Make unit vectors out of v1 and v2 > ? ? v1norm=sqrt(dot(v1,v1)) > ? ? v2norm=sqrt(dot(v2,v2)) > ? ? v1unit=v1/v1norm > ? ? v2unit=v2/v2norm > ? ? ang=acos(dot(v1unit,v2unit)) > ? ? return ang > > When using Python 2.6 with Numpy 1.3.0 and v1 and v2 are parallel the dot > product of v1unit and v2unit sometimes gives 1.0+epsilon where epsilon is the > smallest step in the floating point numbers around 1.0 as given in the sys > module. This causes acos to throw a Domain Exception. This does not happen when > I use Python 2.4 with Numpy 1.0.3. acos() is not a numpy function. It comes from the stdlib math module. Python 2.6 tightened up many of the border cases for the math functions. That is probably where the behavior difference comes from. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." ? -- Umberto Eco From lou_boog2000 at yahoo.com Fri Nov 12 10:58:39 2010 From: lou_boog2000 at yahoo.com (Lou Pecora) Date: Fri, 12 Nov 2010 07:58:39 -0800 (PST) Subject: [Numpy-discussion] Change in Python/Numpy numerics with Py version 2.6 ? In-Reply-To: References: <928441.88239.qm@web34408.mail.mud.yahoo.com> Message-ID: <329681.20172.qm@web34404.mail.mud.yahoo.com> ----- Original Message ---- From: Robert Kern To: Discussion of Numerical Python Sent: Fri, November 12, 2010 10:39:54 AM Subject: Re: [Numpy-discussion] Change in Python/Numpy numerics with Py version 2.6 ? On Fri, Nov 12, 2010 at 09:02, Lou Pecora wrote: > I ran across what seems to be a change in how numerics are handled in Python >2.6 > or Numpy 1.3.0 or both, I'm not sure. I've recently switched from using Python > 2.4 and Numpy 1.0.3 to using the Python 2.6 and Numpy 1.3.0 that comes with >SAGE > which is a large mathematical package. But the issue seems to be a Python one, > not a SAGE one. > > Here is a short example of code that gives the new behavior: > > # ---- Return the angle between two vectors ------------ > def get_angle(v1,v2): > '''v1 and v2 are 1 dimensional numpy arrays''' > # Make unit vectors out of v1 and v2 > v1norm=sqrt(dot(v1,v1)) > v2norm=sqrt(dot(v2,v2)) > v1unit=v1/v1norm > v2unit=v2/v2norm > ang=acos(dot(v1unit,v2unit)) > return ang > > When using Python 2.6 with Numpy 1.3.0 and v1 and v2 are parallel the dot > product of v1unit and v2unit sometimes gives 1.0+epsilon where epsilon is the > smallest step in the floating point numbers around 1.0 as given in the sys > module. This causes acos to throw a Domain Exception. This does not happen when > I use Python 2.4 with Numpy 1.0.3. acos() is not a numpy function. It comes from the stdlib math module. Python 2.6 tightened up many of the border cases for the math functions. That is probably where the behavior difference comes from. -- Robert Kern Thanks, Robert. I thought some math functions were replaced by numpy functions that can also operate on arrays when using from numpy import *. That's the reason I asked about numpy. I know I should change this. But your explanation sounds like it is indeed in Py 2.6 where they tightened things up. I'll just leave the check for exceptions in place and use it more often now. -- Lou Pecora, my views are my own. _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion at scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion From robert.kern at gmail.com Fri Nov 12 11:14:31 2010 From: robert.kern at gmail.com (Robert Kern) Date: Fri, 12 Nov 2010 10:14:31 -0600 Subject: [Numpy-discussion] Change in Python/Numpy numerics with Py version 2.6 ? In-Reply-To: <329681.20172.qm@web34404.mail.mud.yahoo.com> References: <928441.88239.qm@web34408.mail.mud.yahoo.com> <329681.20172.qm@web34404.mail.mud.yahoo.com> Message-ID: On Fri, Nov 12, 2010 at 09:58, Lou Pecora wrote: > Thanks, Robert. ?I thought some math functions were replaced by numpy functions > that can also operate on arrays when using from numpy import *. That's the > reason I asked about numpy. ?I know I should change this. No, we don't touch the math module at all. That would be evil. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." ? -- Umberto Eco From charlesr.harris at gmail.com Fri Nov 12 11:24:56 2010 From: charlesr.harris at gmail.com (Charles R Harris) Date: Fri, 12 Nov 2010 09:24:56 -0700 Subject: [Numpy-discussion] Merging the refactor. In-Reply-To: References: Message-ID: On Thu, Nov 11, 2010 at 5:48 PM, Pauli Virtanen wrote: > On Thu, 11 Nov 2010 16:37:53 -0600, Jason McCampbell wrote: > [clip] > > We will take a look at this and the script. There is also a feature in > > git that allows two trees to be grafted together so the refactoring will > > end up as a branch on the main repository with all edits. > > Yes, this is pretty much what the script does -- it detaches the commits > in the refactor branch from the Git-SVN history, and reattaches them to > the new Git history. This changes only the DAG of the commits, and not > the tree and file contents corresponding to each commit. > > (Git's graft feature can only add new parents, so filter-branch is > needed.) > > > My hope is that we can roll all of our changes into the main > > repository as a branch and then selectively merge to the main > > branch as desired. For example, as you said, the IronPython > > changes don't need to be merged immediate. > > I'm not sure if we should put development branches at all in > the main repository. > > A repository like > > github.com/numpy/numpy-refactor > > might be a better solution, and also give visibility. > > The teoliphant repository is usually quiet on the weekends. Would it be reasonable to make github.com/numpy/numpy-refactor this weekend and ask the refactor folks to start their work there next Monday? Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From lou_boog2000 at yahoo.com Fri Nov 12 11:26:10 2010 From: lou_boog2000 at yahoo.com (Lou Pecora) Date: Fri, 12 Nov 2010 08:26:10 -0800 (PST) Subject: [Numpy-discussion] Change in Python/Numpy numerics with Py version 2.6 ? In-Reply-To: References: <928441.88239.qm@web34408.mail.mud.yahoo.com> Message-ID: <416898.41798.qm@web34408.mail.mud.yahoo.com> ________________________________ From: Charles R Harris To: Discussion of Numerical Python Sent: Fri, November 12, 2010 10:34:58 AM Subject: Re: [Numpy-discussion] Change in Python/Numpy numerics with Py version 2.6 ? On Fri, Nov 12, 2010 at 8:02 AM, Lou Pecora wrote: I ran across what seems to be a change in how numerics are handled in Python 2.6 >or Numpy 1.3.0 or both, I'm not sure. I've recently switched from using Python >2.4 and Numpy 1.0.3 to using the Python 2.6 and Numpy 1.3.0 that comes with SAGE >which is a large mathematical package. But the issue seems to be a Python one, >not a SAGE one. > >Here is a short example of code that gives the new behavior: > ># ---- Return the angle between two vectors ------------ >def get_angle(v1,v2): > '''v1 and v2 are 1 dimensional numpy arrays''' > # Make unit vectors out of v1 and v2 > v1norm=sqrt(dot(v1,v1)) > v2norm=sqrt(dot(v2,v2)) > v1unit=v1/v1norm > v2unit=v2/v2norm > ang=acos(dot(v1unit,v2unit)) > return ang > >When using Python 2.6 with Numpy 1.3.0 and v1 and v2 are parallel the dot >product of v1unit and v2unit sometimes gives 1.0+epsilon where epsilon is the >smallest step in the floating point numbers around 1.0 as given in the sys >module. This causes acos to throw a Domain Exception. This does not happen when >I use Python 2.4 with Numpy 1.0.3. > > Probably an accident or slight difference in compiler optimization. Are you running on a 32 bit system by any chance? Yes, I am running on a 32 bit system. Mac OS X 10.4.11. >I have put in a check for the exception situation and the code works fine >again. I am wondering if there are other changes that I should be aware of. >Does anyone know the origin of the change above or other differences in the >handling of numerics between the two versions? >Thanks for any insight. > The check should have been there in the first place, the straight forward method is subject to roundoff error. It isn't very accurate for almost identical vectors either, you would do better to work with the difference vector in that case: 2*arcsin(||v1 - v2||/2) when v1 and v2 are normalized, or you could try 2*arctan2(||v1 - v2||, ||v1 + v2||). It is also useful to see how the Householder reflection deals with the problem. I left the exception catch in place. You're right. I hadn't thought about the Householder reflection. Good point. Thanks. -- Lou Pecora, my views are my own. -------------- next part -------------- An HTML attachment was scrubbed... URL: From pav at iki.fi Fri Nov 12 11:56:20 2010 From: pav at iki.fi (Pauli Virtanen) Date: Fri, 12 Nov 2010 16:56:20 +0000 (UTC) Subject: [Numpy-discussion] Merging the refactor. References: Message-ID: Fri, 12 Nov 2010 09:24:56 -0700, Charles R Harris wrote: [clip] > The teoliphant repository is usually quiet on the weekends. Would it be > reasonable to make github.com/numpy/numpy-refactor this weekend and ask > the refactor folks to start their work there next Monday? Sure: https://github.com/numpy/numpy-refactor I can re-sync/scrap it later on if needed, depending on what the refactoring team wants to do with it. -- Pauli Virtanen From jmccampbell at enthought.com Fri Nov 12 15:37:19 2010 From: jmccampbell at enthought.com (Jason McCampbell) Date: Fri, 12 Nov 2010 14:37:19 -0600 Subject: [Numpy-discussion] Merging the refactor. In-Reply-To: References: Message-ID: On Fri, Nov 12, 2010 at 10:56 AM, Pauli Virtanen wrote: > Fri, 12 Nov 2010 09:24:56 -0700, Charles R Harris wrote: > [clip] > > The teoliphant repository is usually quiet on the weekends. Would it be > > reasonable to make github.com/numpy/numpy-refactor this weekend and ask > > the refactor folks to start their work there next Monday? > > Sure: > > https://github.com/numpy/numpy-refactor > > I can re-sync/scrap it later on if needed, depending on what the > refactoring team wants to do with it. > I think it's even easier than that. If someone creates an empty repository and adds me (user: jasonmccampbell) as a contributor I should be able to add it as a remote for my current repository and push it any time. That said, it might make sense to wait a week as Ilan is working on the merge now. Our plan is to create a clone of the master repository and create a refactoring branch off the trunk. We can then graft on our current branch (which is not connected to the master trunk), do the merge, then push this new refactor branch. This keeps us from having a repo with both an old, un-rooted branch plus the new, correct refactor branch. I'm open either way, just wanted to throw this out there. Jason > > -- > Pauli Virtanen > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Fri Nov 12 15:53:37 2010 From: charlesr.harris at gmail.com (Charles R Harris) Date: Fri, 12 Nov 2010 13:53:37 -0700 Subject: [Numpy-discussion] Merging the refactor. In-Reply-To: References: Message-ID: On Fri, Nov 12, 2010 at 1:37 PM, Jason McCampbell wrote: > > > On Fri, Nov 12, 2010 at 10:56 AM, Pauli Virtanen wrote: > >> Fri, 12 Nov 2010 09:24:56 -0700, Charles R Harris wrote: >> [clip] >> > The teoliphant repository is usually quiet on the weekends. Would it be >> > reasonable to make github.com/numpy/numpy-refactor this weekend and ask >> > the refactor folks to start their work there next Monday? >> >> Sure: >> >> https://github.com/numpy/numpy-refactor >> >> I can re-sync/scrap it later on if needed, depending on what the >> refactoring team wants to do with it. >> > > I think it's even easier than that. If someone creates an empty repository > and adds me (user: jasonmccampbell) as a contributor I should be able to add > it as a remote for my current repository and push it any time. > > Well, Pauli already has your stuff in the new repository. Why not just clone it and continue your work there? > That said, it might make sense to wait a week as Ilan is working on the > merge now. Our plan is to create a clone of the master repository and > create a refactoring branch off the trunk. > But that is already done. Although I don't think doing it again will be problem. > We can then graft on our current branch (which is not connected to the > master trunk), do the merge, then push this new refactor branch. This keeps > us from having a repo with both an old, un-rooted branch plus the new, > correct refactor branch. > > But it is already grafted. Unless you are thinking of making a branch in numpy/numpy, which might be a bad idea. > I'm open either way, just wanted to throw this out there. > > Chuck > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From Chris.Barker at noaa.gov Fri Nov 12 15:56:35 2010 From: Chris.Barker at noaa.gov (Christopher Barker) Date: Fri, 12 Nov 2010 12:56:35 -0800 Subject: [Numpy-discussion] Change in Python/Numpy numerics with Py version 2.6 ? In-Reply-To: <416898.41798.qm@web34408.mail.mud.yahoo.com> References: <928441.88239.qm@web34408.mail.mud.yahoo.com> <416898.41798.qm@web34408.mail.mud.yahoo.com> Message-ID: <4CDDAA03.4000708@noaa.gov> Lou, as for your confusion about where "acos()" came from: You are right that if you do: from math import * and from numpy import * numpy will override some of the math functions, but not "acos()", because numpy has "arccos()" function instead. And all of this is a good reminder to not use "import *" import numpy as np is the most common way to import numpy these days -- yes it reasults in a bit of extra typing, but "namespaces are one honking great idea": http://www.python.org/dev/peps/pep-0020/ -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From pav at iki.fi Fri Nov 12 16:31:21 2010 From: pav at iki.fi (Pauli Virtanen) Date: Fri, 12 Nov 2010 21:31:21 +0000 (UTC) Subject: [Numpy-discussion] Merging the refactor. References: Message-ID: On Fri, 12 Nov 2010 14:37:19 -0600, Jason McCampbell wrote: >> Sure: >> >> https://github.com/numpy/numpy-refactor >> >> I can re-sync/scrap it later on if needed, depending on what the >> refactoring team wants to do with it. Ok, maybe to clarify: - That repo is already created, - It contains your refactoring work, grafted on the current Git history, so you can either start merging using it, or first re-do the graft if you want to do it yourselves, - You (and also the rest of the team) have push permissions there. Cheers, Pauli PS. You can verify that the contents of the trees are exactly what you had before the grafting: $ git cat-file commit origin/refactor tree 85170987b6d3582b7928d46eda98bdfb394e0ea7 parent fec0175e306016d0eff688f63912ecd30946dcbb parent 7383a3bbed494aa92be61faeac2054fb609a1ab1 author Ilan Schnell 1289517493 -0600 committer Ilan Schnell 1289517493 -0600 ... $ git cat-file commit new-rebased tree 85170987b6d3582b7928d46eda98bdfb394e0ea7 parent 5e24bd3a9c2bdbd3bb5e92b03997831f15c22e4b parent e7caa5d73912a04ade9b4a327f58788ab5d9d585 author Ilan Schnell 1289517493 -0600 committer Ilan Schnell 1289517493 -0600 The tree hashes coincide, which means that the state of the tree at the two commits is exactly identical. From Chris.Barker at noaa.gov Fri Nov 12 16:40:14 2010 From: Chris.Barker at noaa.gov (Christopher Barker) Date: Fri, 12 Nov 2010 13:40:14 -0800 Subject: [Numpy-discussion] numpy.genfromtxt converters issue In-Reply-To: <3F81849E-3C3F-406B-8BB6-C20299F38124@gmail.com> References: <66F2753F-6F60-4E1B-9A10-960EBE1F614C@gmail.com> <3F81849E-3C3F-406B-8BB6-C20299F38124@gmail.com> Message-ID: <4CDDB43E.3080301@noaa.gov> On 11/11/10 10:21 AM, Pierre GM wrote: > Now, we can argue over the very last point: if both a converter and a dtype are specified, which one should take precedence? > You have my opinion, let's hear yours. I say raise an exception -- it's a programmer's error -- make that clear. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From jmccampbell at enthought.com Fri Nov 12 17:47:47 2010 From: jmccampbell at enthought.com (Jason McCampbell) Date: Fri, 12 Nov 2010 16:47:47 -0600 Subject: [Numpy-discussion] Merging the refactor. In-Reply-To: References: Message-ID: Pauli, Thanks a lot for doing this, it helps a lot. Ilan was on another project this morning so this helps get the merge process started faster. It looks like it is auto-merging changes from Travis's repository because several recent changes are moved over. I will double check, but we should be able to switch to using this repository now. Thanks, Jason On Fri, Nov 12, 2010 at 3:31 PM, Pauli Virtanen wrote: > On Fri, 12 Nov 2010 14:37:19 -0600, Jason McCampbell wrote: > >> Sure: > >> > >> https://github.com/numpy/numpy-refactor > >> > >> I can re-sync/scrap it later on if needed, depending on what the > >> refactoring team wants to do with it. > > Ok, maybe to clarify: > > - That repo is already created, > > - It contains your refactoring work, grafted on the current Git history, > so you can either start merging using it, or first re-do the graft if > you want to do it yourselves, > > - You (and also the rest of the team) have push permissions there. > > Cheers, > Pauli > > > PS. > > You can verify that the contents of the trees are exactly what you had > before the grafting: > > $ git cat-file commit origin/refactor > tree 85170987b6d3582b7928d46eda98bdfb394e0ea7 > parent fec0175e306016d0eff688f63912ecd30946dcbb > parent 7383a3bbed494aa92be61faeac2054fb609a1ab1 > author Ilan Schnell 1289517493 -0600 > committer Ilan Schnell 1289517493 -0600 > ... > > $ git cat-file commit new-rebased > tree 85170987b6d3582b7928d46eda98bdfb394e0ea7 > parent 5e24bd3a9c2bdbd3bb5e92b03997831f15c22e4b > parent e7caa5d73912a04ade9b4a327f58788ab5d9d585 > author Ilan Schnell 1289517493 -0600 > committer Ilan Schnell 1289517493 -0600 > > The tree hashes coincide, which means that the state of the tree at the > two commits is exactly identical. > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at googlemail.com Fri Nov 12 21:42:44 2010 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Sat, 13 Nov 2010 10:42:44 +0800 Subject: [Numpy-discussion] bzr mirror Message-ID: Hi, While cleaning up the numpy wiki start page I came across a bzr mirror that still pointed to svn, https://launchpad.net/numpy, originally registered by Jarrod. It would be good to either point that to git or delete it. I couldn't see how to report or do anything about that on Launchpad, but that's maybe just me - I can never find anything there. For now I've removed the link to it on Trac, if the mirror gets updated please put it back. Cheers, Ralf From dsdale24 at gmail.com Sat Nov 13 07:27:56 2010 From: dsdale24 at gmail.com (Darren Dale) Date: Sat, 13 Nov 2010 07:27:56 -0500 Subject: [Numpy-discussion] bzr mirror In-Reply-To: References: Message-ID: On Fri, Nov 12, 2010 at 9:42 PM, Ralf Gommers wrote: > Hi, > > While cleaning up the numpy wiki start page I came across a bzr mirror > that still pointed to svn, https://launchpad.net/numpy, originally > registered by Jarrod. It would be good to either point that to git or > delete it. I couldn't see how to report or do anything about that on > Launchpad, but that's maybe just me - I can never find anything there. > > For now I've removed the link to it on Trac, if the mirror gets > updated please put it back. From From dsdale24 at gmail.com Sat Nov 13 07:35:18 2010 From: dsdale24 at gmail.com (Darren Dale) Date: Sat, 13 Nov 2010 07:35:18 -0500 Subject: [Numpy-discussion] bzr mirror In-Reply-To: References: Message-ID: On Sat, Nov 13, 2010 at 7:27 AM, Darren Dale wrote: > On Fri, Nov 12, 2010 at 9:42 PM, Ralf Gommers > wrote: >> Hi, >> >> While cleaning up the numpy wiki start page I came across a bzr mirror >> that still pointed to svn, https://launchpad.net/numpy, originally >> registered by Jarrod. It would be good to either point that to git or >> delete it. I couldn't see how to report or do anything about that on >> Launchpad, but that's maybe just me - I can never find anything there. >> >> For now I've removed the link to it on Trac, if the mirror gets >> updated please put it back. Comment 8 at https://bugs.launchpad.net/launchpad-registry/+bug/38349 : "Ask to deactivate the project at https://answers.launchpad.net/launchpad/+addquestion If the project has no data that is useful to the community, it will be deactivated. If the project has code or bugs, the community may still use the project even if the maintainers are no interested in it. Launchpad admins will not deactivate projects that the community can use. Consider transferring maintainership to another user." Note the continued use of "deactivate" throughout the answer to repeated inquiries of how to delete a project. From https://help.launchpad.net/PrivacyPolicy : "Launchpad retains all data submitted by users permanently. Except in the circumstances listed below, Launchpad will only delete data if required to do so by law or if data (including files, PPA submissions, bug reports, bug comments, bug attachments, and translations) is inappropriate. Canonical reserves the right to determine whether data is inappropriate. Spam, malicious code, and defamation are considered inappropriate. Where data is deleted, it will be removed from the Launchpad database but it may continue to exist in backup archives which are maintained by Canonical." From ralf.gommers at googlemail.com Sat Nov 13 10:13:32 2010 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Sat, 13 Nov 2010 23:13:32 +0800 Subject: [Numpy-discussion] bzr mirror In-Reply-To: References: Message-ID: On Sat, Nov 13, 2010 at 8:35 PM, Darren Dale wrote: > On Sat, Nov 13, 2010 at 7:27 AM, Darren Dale wrote: >> On Fri, Nov 12, 2010 at 9:42 PM, Ralf Gommers >> wrote: >>> Hi, >>> >>> While cleaning up the numpy wiki start page I came across a bzr mirror >>> that still pointed to svn, https://launchpad.net/numpy, originally >>> registered by Jarrod. It would be good to either point that to git or >>> delete it. I couldn't see how to report or do anything about that on >>> Launchpad, but that's maybe just me - I can never find anything there. >>> >>> For now I've removed the link to it on Trac, if the mirror gets >>> updated please put it back. > > > Comment 8 at https://bugs.launchpad.net/launchpad-registry/+bug/38349 : Thanks for finding that. > > "Ask to deactivate the project at > ? ?https://answers.launchpad.net/launchpad/+addquestion > > If the project has no data that is useful to the community, it will be > deactivated. If the project has code or bugs, the community may still > use the project even if the maintainers are no interested in it. > Launchpad admins will not deactivate projects that the community can > use. Consider transferring maintainership to another user." > > Note the continued use of "deactivate" throughout the answer to > repeated inquiries of how to delete a project. From > https://help.launchpad.net/PrivacyPolicy : > > "Launchpad retains all data submitted by users permanently. What a pain. Guess it's up to Jarrod to submit the deactivation request. Let's hope they don't consider outdated mirrored files "data useful to the community". Ralf From vincent at vincentdavis.net Sat Nov 13 20:16:05 2010 From: vincent at vincentdavis.net (Vincent Davis) Date: Sat, 13 Nov 2010 18:16:05 -0700 Subject: [Numpy-discussion] pyc and pyo files in dmg and other python3 questions also a dmg for python3.1 to try Message-ID: The questions below regard the osx dmg installer, not sure about how this applies to other installers. I noticed that pyc and pyo files are included in the binaries. Is there a reason for this? I have removed them in the dmg for python3.1 I see that f2py is installed in /usr/local/bin/. f2py is also located in versions/2.7/bin/ but I am not sure when it gets installed there (versions/2.7/bin/) and why id does not from the dmg installer. Any insight into this? What should the permissions be on the installed numpy files. When I install via $python3.1 setup.py install the owner = VMD(me) group=admin. To match the other dmg installs I have to run $sudo python3.1 setup.py install to get owner=root and group=admin I am kind of working backwords. That is the numpy-1.5.1rc2-py3.1.2-python.org-macosx10.3.dmg was created by replacing the pkg installer file in the dmg from numpy/python2.7 with that from for numpy/python3.1 created with packagemaker gui. The packagemaker setup can be saved as a packagemaker document *.pmdoc which is just a collection of xml files. I can't find any python packages/scripts to edit/create .pmdoc files but that should be easy. Then I still need to have an automated script for the dmg (packagemaker only builds the installer) I have not looked for a python script for this but I assume bdist can be used as a guide (not sure about license issues) Are there better tools build dmgs using python? I realize there is a lot I don't know about building binaries but I am interested so any advice on how to proceed or what the plans are for building py3 binaries would be appreciated and I will try to contribute. Finally here is a py3 binary to try. http://vincentdavis.info/installers/numpy-1.5.1rc2-py3.1.2-python.org-macosx10.3.dmg -- Thanks Vincent Davis 720-301-3003 From charlesr.harris at gmail.com Sat Nov 13 21:41:32 2010 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sat, 13 Nov 2010 19:41:32 -0700 Subject: [Numpy-discussion] Problems testing the floating point flags Message-ID: Hi All, This is in reference to numpy ticket #1671and the comments on pull request 13 . The original problem was that the gcc compiler was reordering the instructions so that the floating point flags were tested before the computation that needed to be checked. The compiler couldn't know that the flags were being tested in the original code because it didn't know that about PyUFunc_getfperr(), although the fact that floating point computations have side effects should probably have limited any code reordering given that unknown. However, even when the macro using the glibc function fetestexcept was used the problem persisted. This is a known bug against gcc >= 4.1 that hasn't been addressed in the last four years and it seems unlikely that it will be fixed anytime soon. The upshot is that there is no reliable way to check the floating point flags using either PyUFunc_getfperr or the macro UFUNC_CHECK_STATUS. Enter the ugly workarounds. If a floating point operation produces a value, it is possible to pass a void pointer to that value as a dummy argument to a flag checking routine which will force the value to be computed before the function is called. There are other games that can be played with volatile but I am not convinced that they are robust or portable across compilers. An added complication is that PyUFunc_getfperr is part of the numpy API and the macro UFUNC_CHECK_STATUS is public so if we add workarounds they need new names. There is also the question if we want to expose them. In any case, suggestions as to names and approaches are welcome. And if anyone has a better solution it would be great to hear it. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at googlemail.com Sun Nov 14 09:20:27 2010 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Sun, 14 Nov 2010 22:20:27 +0800 Subject: [Numpy-discussion] pyc and pyo files in dmg and other python3 questions also a dmg for python3.1 to try In-Reply-To: References: Message-ID: On Sun, Nov 14, 2010 at 9:16 AM, Vincent Davis wrote: > The questions below regard the osx dmg installer, not sure about how > this applies to other installers. > > I noticed that pyc and pyo files are included in the binaries. Is > there a reason for this? I have removed them in the dmg for python3.1 > Some reasons I can think of are reducing startup time, avoiding permissions issues for writing .pyc or .pyo files, and consistency (you're sure all installs have the same .pyo files). > > I see that f2py is installed in /usr/local/bin/. f2py is also located > in versions/2.7/bin/ but I am not sure when it gets installed there > (versions/2.7/bin/) and why id does not from the dmg installer. Any > insight into this? > Should come from the installer right, where else can it come from? No idea at what point during install though. > > What should the permissions be on the installed numpy files. When I > install via $python3.1 setup.py install the owner = VMD(me) > group=admin. To match the other dmg installs I have to run $sudo > python3.1 setup.py install to get owner=root and group=admin > > I am kind of working backwords. That is the > numpy-1.5.1rc2-py3.1.2-python.org-macosx10.3.dmg was created by > replacing the pkg installer file in the dmg from numpy/python2.7 with > that from for numpy/python3.1 created with packagemaker gui. > The packagemaker setup can be saved as a packagemaker document *.pmdoc > which is just a collection of xml files. I can't find any python > packages/scripts to edit/create .pmdoc files but that should be easy. > Then I still need to have an automated script for the dmg > (packagemaker only builds the installer) I have not looked for a > python script for this but I assume bdist can be used as a guide (not > sure about license issues) Are there better tools build dmgs using > python? > Look at the numpy/tools/numpy-macosx-installer/new-create-dmg script and how it's called from the paver dmg task. If you have an mpkg that should work with very few changes. > > I realize there is a lot I don't know about building binaries but I am > interested so any advice on how to proceed or what the plans are for > building py3 binaries would be appreciated and I will try to > contribute. > > Finally here is a py3 binary to try. > > http://vincentdavis.info/installers/numpy-1.5.1rc2-py3.1.2-python.org-macosx10.3.dmg > > That works for me. The only small issue I can see is that the Finder window is huge when it opens, wider than my screen. Cheers, Ralf > -- > Thanks > Vincent Davis > 720-301-3003 > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From vincent at vincentdavis.net Sun Nov 14 10:03:23 2010 From: vincent at vincentdavis.net (Vincent Davis) Date: Sun, 14 Nov 2010 08:03:23 -0700 Subject: [Numpy-discussion] pyc and pyo files in dmg and other python3 questions also a dmg for python3.1 to try In-Reply-To: References: Message-ID: On Sun, Nov 14, 2010 at 7:20 AM, Ralf Gommers wrote: > > > On Sun, Nov 14, 2010 at 9:16 AM, Vincent Davis > wrote: >> >> The questions below regard the osx dmg installer, not sure about how >> this applies to other installers. >> >> I noticed that pyc and pyo files are included in the binaries. Is >> there a reason for this? I have removed them in the dmg for python3.1 > > Some reasons I can think of are reducing startup time, avoiding permissions > issues for writing .pyc or .pyo files, and consistency (you're sure all > installs have the same .pyo files). Right but are they not recreated when the corresponding py file is accessed? So the speed to would only apply to the first run after install. >> >> I see that f2py is installed in /usr/local/bin/. f2py is also located >> in versions/2.7/bin/ but I am not sure when it gets installed there >> (versions/2.7/bin/) and why id does not from the dmg installer. Any >> insight into this? > > Should come from the installer right, where else can it come from? No idea > at what point during install though. >> >> What should the permissions be on the installed numpy files. When I >> install via $python3.1 setup.py install the owner = VMD(me) >> group=admin. To match the other dmg installs I have to run $sudo >> python3.1 setup.py install ?to get owner=root and group=admin >> >> I am kind of working backwords. That is the >> numpy-1.5.1rc2-py3.1.2-python.org-macosx10.3.dmg was created by >> replacing the pkg installer file in the dmg from numpy/python2.7 with >> that from for numpy/python3.1 created with packagemaker gui. >> The packagemaker setup can be saved as a packagemaker document *.pmdoc >> which is just a collection of xml files. I can't find any python >> packages/scripts to edit/create .pmdoc files but that should be easy. >> Then I still need to have an automated script for the dmg >> (packagemaker only builds the installer) I have not looked for a >> python script for this but I assume bdist can be used as a guide (not >> sure about license issues) Are there better tools build dmgs using >> python? > > Look at the numpy/tools/numpy-macosx-installer/new-create-dmg script and how > it's called from the paver dmg task. If you have an mpkg that should work > with very few changes. Good point. I'll look at that. >> >> I realize there is a lot I don't know about building binaries but I am >> interested so any advice on how to proceed or what the plans are for >> building py3 binaries would be appreciated and I will try to >> contribute. >> >> Finally here is a py3 binary to try. >> >> http://vincentdavis.info/installers/numpy-1.5.1rc2-py3.1.2-python.org-macosx10.3.dmg >> > That works for me. The only small issue I can see is that the Finder window > is huge when it opens, wider than my screen. That is a feature :-) I guess I need to work on that a bit. It seems the dmg layout is not preserved when converting between read-write and compressed. Or I am otherwise somehow loosing the formatting from the original dmg. I need to look at the new-create-dmg and use that. Vincent > > Cheers, > Ralf > >> >> -- >> Thanks >> Vincent Davis >> 720-301-3003 >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -- Thanks Vincent Davis 720-301-3003 From charlesr.harris at gmail.com Sun Nov 14 10:07:14 2010 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sun, 14 Nov 2010 08:07:14 -0700 Subject: [Numpy-discussion] Problems testing the floating point flags In-Reply-To: References: Message-ID: On Sat, Nov 13, 2010 at 7:41 PM, Charles R Harris wrote: > Hi All, > > This is in reference to numpy ticket #1671and the comments on pull > request 13 . The original problem > was that the gcc compiler was reordering the instructions so that the > floating point flags were tested before the computation that needed to be > checked. The compiler couldn't know that the flags were being tested in the > original code because it didn't know that about PyUFunc_getfperr(), although > the fact that floating point computations have side effects should probably > have limited any code reordering given that unknown. However, even when the > macro using the glibc function fetestexcept was used the problem persisted. > This is a known bug against > gcc >= 4.1 that hasn't been addressed in the last four years and it seems > unlikely that it will be fixed anytime soon. The upshot is that there is no > reliable way to check the floating point flags using either PyUFunc_getfperr > or the macro UFUNC_CHECK_STATUS. > > Enter the ugly workarounds. If a floating point operation produces a value, > it is possible to pass a void pointer to that value as a dummy argument to a > flag checking routine which will force the value to be computed before the > function is called. There are other games that can be played with volatile > but I am not convinced that they are robust or portable across compilers. An > added complication is that PyUFunc_getfperr is part of the numpy API and the > macro UFUNC_CHECK_STATUS is public so if we add workarounds they need new > names. There is also the question if we want to expose them. In any case, > suggestions as to names and approaches are welcome. And if anyone has a > better solution it would be great to hear it. > > Another possible solution is like so: static __attribute__ ((noinline)) int fpecheck(int *status) { *status = PyUFunc_getfperr(); return 0; } static __attribute__ ((noinline)) int fpeclear(int *status) { PyUFunc_clearfperr(); return 0; } int myfunc(void) { int status; fpeclear(&status); do { stuff; } while (fpecheck(&status)); return status; } Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From efiring at hawaii.edu Sun Nov 14 14:36:18 2010 From: efiring at hawaii.edu (Eric Firing) Date: Sun, 14 Nov 2010 09:36:18 -1000 Subject: [Numpy-discussion] pyc and pyo files in dmg and other python3 questions also a dmg for python3.1 to try In-Reply-To: References: Message-ID: <4CE03A32.6080800@hawaii.edu> On 11/14/2010 05:03 AM, Vincent Davis wrote: > On Sun, Nov 14, 2010 at 7:20 AM, Ralf Gommers > wrote: >> >> >> On Sun, Nov 14, 2010 at 9:16 AM, Vincent Davis >> wrote: >>> >>> The questions below regard the osx dmg installer, not sure about how >>> this applies to other installers. >>> >>> I noticed that pyc and pyo files are included in the binaries. Is >>> there a reason for this? I have removed them in the dmg for python3.1 >> >> Some reasons I can think of are reducing startup time, avoiding permissions >> issues for writing .pyc or .pyo files, and consistency (you're sure all >> installs have the same .pyo files). > > Right but are they not recreated when the corresponding py file is > accessed? So the speed to would only apply to the first run after > install. They won't be written unless the user has write permission in the directories where the py files reside, which is normally not the case when an installer is used, or any system-wide installation is done. Inclusion of .pyc and/or .pyo files is a standard part of the installation process for any platform. There is no advantage in trying to remove them. Eric > >>> >>> I see that f2py is installed in /usr/local/bin/. f2py is also located >>> in versions/2.7/bin/ but I am not sure when it gets installed there >>> (versions/2.7/bin/) and why id does not from the dmg installer. Any >>> insight into this? >> >> Should come from the installer right, where else can it come from? No idea >> at what point during install though. >>> >>> What should the permissions be on the installed numpy files. When I >>> install via $python3.1 setup.py install the owner = VMD(me) >>> group=admin. To match the other dmg installs I have to run $sudo >>> python3.1 setup.py install to get owner=root and group=admin >>> >>> I am kind of working backwords. That is the >>> numpy-1.5.1rc2-py3.1.2-python.org-macosx10.3.dmg was created by >>> replacing the pkg installer file in the dmg from numpy/python2.7 with >>> that from for numpy/python3.1 created with packagemaker gui. >>> The packagemaker setup can be saved as a packagemaker document *.pmdoc >>> which is just a collection of xml files. I can't find any python >>> packages/scripts to edit/create .pmdoc files but that should be easy. >>> Then I still need to have an automated script for the dmg >>> (packagemaker only builds the installer) I have not looked for a >>> python script for this but I assume bdist can be used as a guide (not >>> sure about license issues) Are there better tools build dmgs using >>> python? >> >> Look at the numpy/tools/numpy-macosx-installer/new-create-dmg script and how >> it's called from the paver dmg task. If you have an mpkg that should work >> with very few changes. > > Good point. I'll look at that. > >>> >>> I realize there is a lot I don't know about building binaries but I am >>> interested so any advice on how to proceed or what the plans are for >>> building py3 binaries would be appreciated and I will try to >>> contribute. >>> >>> Finally here is a py3 binary to try. >>> >>> http://vincentdavis.info/installers/numpy-1.5.1rc2-py3.1.2-python.org-macosx10.3.dmg >>> >> That works for me. The only small issue I can see is that the Finder window >> is huge when it opens, wider than my screen. > > That is a feature :-) I guess I need to work on that a bit. It seems > the dmg layout is not preserved when converting between read-write and > compressed. Or I am otherwise somehow loosing the formatting from the > original dmg. I need to look at the new-create-dmg and use that. > > Vincent > > >> >> Cheers, >> Ralf >> >>> >>> -- >>> Thanks >>> Vincent Davis >>> 720-301-3003 >>> _______________________________________________ >>> NumPy-Discussion mailing list >>> NumPy-Discussion at scipy.org >>> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> > > > From mwwiebe at gmail.com Sun Nov 14 15:05:42 2010 From: mwwiebe at gmail.com (Mark Wiebe) Date: Sun, 14 Nov 2010 12:05:42 -0800 Subject: [Numpy-discussion] Problems testing the floating point flags In-Reply-To: References: Message-ID: On Sat, Nov 13, 2010 at 6:41 PM, Charles R Harris wrote: > Hi All, > > This is in reference to numpy ticket #1671and the comments on pull > request 13 . The original problem > was that the gcc compiler was reordering the instructions so that the > floating point flags were tested before the computation that needed to be > checked. The compiler couldn't know that the flags were being tested in the > original code because it didn't know that about PyUFunc_getfperr(), although > the fact that floating point computations have side effects should probably > have limited any code reordering given that unknown. However, even when the > macro using the glibc function fetestexcept was used the problem persisted. > This is a known bug against > gcc >= 4.1 that hasn't been addressed in the last four years and it seems > unlikely that it will be fixed anytime soon. The upshot is that there is no > reliable way to check the floating point flags using either PyUFunc_getfperr > or the macro UFUNC_CHECK_STATUS. > > Enter the ugly workarounds. If a floating point operation produces a value, > it is possible to pass a void pointer to that value as a dummy argument to a > flag checking routine which will force the value to be computed before the > function is called. There are other games that can be played with volatile > but I am not convinced that they are robust or portable across compilers. An > added complication is that PyUFunc_getfperr is part of the numpy API and the > macro UFUNC_CHECK_STATUS is public so if we add workarounds they need new > names. There is also the question if we want to expose them. In any case, > suggestions as to names and approaches are welcome. And if anyone has a > better solution it would be great to hear it. > > Chuck > The ways we have thought of to solve this are: 1) Declare the result of the computation volatile. This fixed everything for me, and is a very minimal patch (add the volatile, and cast it away when passing the output pointer parameter to avoid a warning). Volatile is usually applied to variables whose state is visible outside the current thread, or for low-level device driver programming to prevent reordering, so this usage on a local variable may be a little suspect. On the other hand, it worked for me, and with the added unit test, any other systems out there which exhibit this or similar bugs would be flagged during testing. volatile float result = arg1 / arg2; int retstatus = PyUFunc_getfperr(); if (retstatus) ... 2) Use a compiler memory barrier to block reordering around PyUFunc_getfperr(). Unfortunately, the suggested barrier for gcc (asm volatile("" ::: "memory");) did not work, and the bug remained. 3) Enforce ordering by creating a dependency. Since the compiler doesn't consider the FP state as important, a different dependency can be used. One invasive way is to modify PyUFunc_getfperr to take a void* parameter, then pass a pointer to the result as a parameter. For this to work, PyUFunc_getfperr needs to be opaque to the compiler, for instance by being non-inline or called through a function pointer. If we want to force any modules linking to NumPy to check if they're susceptible to the bug, changing the API like this, with appropriate documentation, may be the way to go. float result = arg1 / arg2; int retstatus = PyUFunc_getfperr(&result); if (retstatus) ... Cheers, Mark -------------- next part -------------- An HTML attachment was scrubbed... URL: From mwwiebe at gmail.com Sun Nov 14 15:09:22 2010 From: mwwiebe at gmail.com (Mark Wiebe) Date: Sun, 14 Nov 2010 12:09:22 -0800 Subject: [Numpy-discussion] Problems testing the floating point flags In-Reply-To: References: Message-ID: On Sun, Nov 14, 2010 at 7:07 AM, Charles R Harris wrote: > > Another possible solution is like so: > > static __attribute__ ((noinline)) int > fpecheck(int *status) > { > *status = PyUFunc_getfperr(); > return 0; > } > > static __attribute__ ((noinline)) int > fpeclear(int *status) > { > PyUFunc_clearfperr(); > return 0; > } > > int myfunc(void) > { > int status; > > fpeclear(&status); > do { > stuff; > } while (fpecheck(&status)); > return status; > } > > While this may work in this particular case because of compiler specifics, I don't think this creates a reliable ordering dependency because of the form of 'stuff'. It will be loop invariant, so the compiler may do as follows: do { result = arg1 / arg2; } while (fpecheck(&status)); Pulling out the loop-invariant statement: result = arg1 / arg2; do { } while (fpecheck(&status)); Since the loop doesn't use result, it may feel free to reorder: do { } while (fpecheck(&status)); result = arg1 / arg2; producing the bug again. -Mark -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Sun Nov 14 15:21:31 2010 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sun, 14 Nov 2010 13:21:31 -0700 Subject: [Numpy-discussion] Problems testing the floating point flags In-Reply-To: References: Message-ID: On Sun, Nov 14, 2010 at 1:09 PM, Mark Wiebe wrote: > > > On Sun, Nov 14, 2010 at 7:07 AM, Charles R Harris < > charlesr.harris at gmail.com> wrote: > >> >> Another possible solution is like so: >> >> static __attribute__ ((noinline)) int >> fpecheck(int *status) >> { >> *status = PyUFunc_getfperr(); >> return 0; >> } >> >> static __attribute__ ((noinline)) int >> fpeclear(int *status) >> { >> PyUFunc_clearfperr(); >> return 0; >> } >> >> int myfunc(void) >> { >> int status; >> >> fpeclear(&status); >> do { >> stuff; >> } while (fpecheck(&status)); >> return status; >> } >> >> > While this may work in this particular case because of compiler specifics, > I don't think this creates a reliable ordering dependency because of the > form of 'stuff'. It will be loop invariant, so the compiler may do as > follows: > > do { > result = arg1 / arg2; > } while (fpecheck(&status)); > > Pulling out the loop-invariant statement: > > result = arg1 / arg2; > do { > } while (fpecheck(&status)); > > Since the loop doesn't use result, it may feel free to reorder: > > do { > } while (fpecheck(&status)); > result = arg1 / arg2; > > producing the bug again. > > Good point. I was trying to keep the fpeclear in front of the code to be tested. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Sun Nov 14 15:29:03 2010 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sun, 14 Nov 2010 13:29:03 -0700 Subject: [Numpy-discussion] Where did the github numpy repository go? Message-ID: I keep getting page does not exist. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From mwwiebe at gmail.com Sun Nov 14 15:29:12 2010 From: mwwiebe at gmail.com (Mark Wiebe) Date: Sun, 14 Nov 2010 12:29:12 -0800 Subject: [Numpy-discussion] Problems testing the floating point flags In-Reply-To: References: Message-ID: On Sun, Nov 14, 2010 at 12:21 PM, Charles R Harris < charlesr.harris at gmail.com> wrote: > Good point. I was trying to keep the fpeclear in front of the code to be > tested. > > Yeah, I hadn't considered that possibility too seriously. Hopefully as long as the compiler doesn't see a reason to reorder, it's fine. Compiler optimizers are difficult to trick, they work quite well these days. I believe the reason it reorders in the case of this bug is that it notices that by shifting the divide down to the bottom, the control flow path returning an error never needs to do the divide, and will thus run faster. -Mark -------------- next part -------------- An HTML attachment was scrubbed... URL: From xscript at gmx.net Sun Nov 14 15:30:50 2010 From: xscript at gmx.net (=?utf-8?Q?Llu=C3=ADs?=) Date: Sun, 14 Nov 2010 21:30:50 +0100 Subject: [Numpy-discussion] numpy.genfromtxt converters issue In-Reply-To: <889FD1C0-0C88-4F7A-A26F-4423D989DB4A@gmail.com> (Pierre GM's message of "Thu, 11 Nov 2010 20:35:34 +0100") References: <66F2753F-6F60-4E1B-9A10-960EBE1F614C@gmail.com> <3F81849E-3C3F-406B-8BB6-C20299F38124@gmail.com> <87y68zr72g.fsf@ginnungagap.bsc.es> <889FD1C0-0C88-4F7A-A26F-4423D989DB4A@gmail.com> Message-ID: <871v6n4phh.fsf@fulla.xlab.taz> Pierre GM writes: > On Nov 11, 2010, at 8:31 PM, Llu?s wrote: >> Pierre GM writes: >> >>> In practice, that's exactly what happens below the hood when >>> genfromtxt tries to guess the output type of the converter. It tries a >>> single value ('1'), fails, and decides that the result must be an >>> object... Probably not the best strategy, as it crashes in your >>> case. But yours is a buggy case anyway. >> [...] >>> Now, we can argue over the very last point: if both a converter and a >>> dtype are specified, which one should take precedence? >>> You have my opinion, let's hear yours. >> >> What about delaying the calculation of converters? >> Instead of using type >> checks with fake data, 'StringConverter.update' could take an optional >> argument 'imput_sample' (defaulting to "1") in order to perform its >> checks. >> >> Then, use real data from the first (non-comment, non-names) line of the >> input file when calling 'StringConverter.update' in 'genfromtxt'. > Mmh. That's an idea... Do you have a patch to suggest? This will work as long as 'first_values' is assured to always contain valid data and as long as its indexes are equivalent to those in converters (which I simply haven't checked). -------------- next part -------------- A non-text attachment was scrubbed... Name: numpy-#1665.patch Type: text/x-diff Size: 1654 bytes Desc: not available URL: -------------- next part -------------- apa! -- "And it's much the same thing with knowledge, for whenever you learn something new, the whole world becomes that much richer." -- The Princess of Pure Reason, as told by Norton Juster in The Phantom Tollbooth From pgmdevlist at gmail.com Sun Nov 14 15:34:31 2010 From: pgmdevlist at gmail.com (Pierre GM) Date: Sun, 14 Nov 2010 21:34:31 +0100 Subject: [Numpy-discussion] numpy.genfromtxt converters issue In-Reply-To: <871v6n4phh.fsf@fulla.xlab.taz> References: <66F2753F-6F60-4E1B-9A10-960EBE1F614C@gmail.com> <3F81849E-3C3F-406B-8BB6-C20299F38124@gmail.com> <87y68zr72g.fsf@ginnungagap.bsc.es> <889FD1C0-0C88-4F7A-A26F-4423D989DB4A@gmail.com> <871v6n4phh.fsf@fulla.xlab.taz> Message-ID: On Nov 14, 2010, at 9:30 PM, Llu?s wrote: > This will work as long as 'first_values' is assured to always contain > valid data and as long as its indexes are equivalent to those in > converters (which I simply haven't checked). I beat you to it, actually ;) Check the git push I committed earlier today. I followed exactly the same approach as yours (well, almost: I do check whether the first line is empty (there were names) and then fall back on the default testing value ('1')... From rkraft4 at gmail.com Sun Nov 14 15:48:27 2010 From: rkraft4 at gmail.com (Robin Kraft) Date: Sun, 14 Nov 2010 15:48:27 -0500 Subject: [Numpy-discussion] Where did the github numpy repository go? In-Reply-To: References: Message-ID: <0530B0C1-C5F0-459E-983E-A01CC47A4D05@gmail.com> Git is having some kind of major outage: http://status.github.com/ "The site and git access is unavailable due to a database failure. We're researching the issue." On Nov 14, 2010, at 3:29 PM, numpy-discussion-request at scipy.org wrote: > > Message: 5 > Date: Sun, 14 Nov 2010 13:29:03 -0700 > From: Charles R Harris > Subject: [Numpy-discussion] Where did the github numpy repository go? > To: numpy-discussion > Message-ID: > > Content-Type: text/plain; charset="iso-8859-1" > > I keep getting page does not exist. > > Chuck > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: http://mail.scipy.org/pipermail/numpy-discussion/attachments/20101114/3381d8a4/attachment.html > > ------------------------------ > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > End of NumPy-Discussion Digest, Vol 50, Issue 32 > ************************************************ -------------- next part -------------- An HTML attachment was scrubbed... URL: From gael.varoquaux at normalesup.org Sun Nov 14 17:10:10 2010 From: gael.varoquaux at normalesup.org (Gael Varoquaux) Date: Sun, 14 Nov 2010 23:10:10 +0100 Subject: [Numpy-discussion] Where did the github numpy repository go? In-Reply-To: References: Message-ID: <20101114221010.GB2347@phare.normalesup.org> On Sun, Nov 14, 2010 at 01:29:03PM -0700, Charles R Harris wrote: > I keep getting page does not exist. It looks like github is having difficulties currently. From matthew.brett at gmail.com Sun Nov 14 17:18:10 2010 From: matthew.brett at gmail.com (Matthew Brett) Date: Sun, 14 Nov 2010 14:18:10 -0800 Subject: [Numpy-discussion] Where did the github numpy repository go? In-Reply-To: <0530B0C1-C5F0-459E-983E-A01CC47A4D05@gmail.com> References: <0530B0C1-C5F0-459E-983E-A01CC47A4D05@gmail.com> Message-ID: On Sun, Nov 14, 2010 at 12:48 PM, Robin Kraft wrote: > Git is having some kind of major outage: > http://status.github.com/ > "The site and git access is unavailable due to a database failure. We're > researching the issue." A good excuse for a long lazy Sunday... Matthew From Ross.Wilson at ga.gov.au Sun Nov 14 22:55:30 2010 From: Ross.Wilson at ga.gov.au (Ross.Wilson at ga.gov.au) Date: Mon, 15 Nov 2010 14:55:30 +1100 Subject: [Numpy-discussion] Simple broadcasting? Or not so simple?? [SEC=UNCLASSIFIED] Message-ID: <89AD395DEDA30242BDBE4740B0E608CA675A21D510@EXCCR01.agso.gov.au> Dear list, I thought I understood broadcasting, but now I'm not so sure. I've simplified as much as I can, so here goes. I have two input arrays of shape (1, 3, 1). I want to select elements from one or other of the input arrays depending on whether the corresponding element of a third array exceeds a threshold. My simplest code is: --------- import numpy as np a = np.array([[[1],[2],[3]]]) b = np.array([[[4],[5],[6]]]) x = np.array([[[1],[1],[2]]]) result = np.where(x > 1.5, a, b) ---------- and works as expected. Now, my understanding of broadcasting is that if the 'x' array is defined as np.array([[[1]]]) then broadcasting will ensure the result array will contain elements from array 'b'. That is, the program will behave as if 'x' had shape of (1,3,1) with three elements each of value 1. I tested that and got the result I expected. However, when I ran the test on another machine, it failed with an "array dimensions must agree" error. On the failing machine numpy.__version__ returns '1.2.0'. Machines on which the broadcasting works as I expect I see '1.3.0' (or later) in numpy.__version__. Have broadcast rules changed since 1.2.0? Or maybe I just don't understand broadcasting? Thanks in advance, Ross Wilson From robert.kern at gmail.com Sun Nov 14 23:07:12 2010 From: robert.kern at gmail.com (Robert Kern) Date: Sun, 14 Nov 2010 22:07:12 -0600 Subject: [Numpy-discussion] Simple broadcasting? Or not so simple?? [SEC=UNCLASSIFIED] In-Reply-To: <89AD395DEDA30242BDBE4740B0E608CA675A21D510@EXCCR01.agso.gov.au> References: <89AD395DEDA30242BDBE4740B0E608CA675A21D510@EXCCR01.agso.gov.au> Message-ID: On Sun, Nov 14, 2010 at 21:55, wrote: > Dear list, > > I thought I understood broadcasting, but now I'm not so sure. > > I've simplified as much as I can, so here goes. ?I have two input arrays of shape (1, 3, 1). ?I want to select elements from one or other of the input arrays depending on whether the corresponding element of a third array exceeds a threshold. ?My simplest code is: > --------- > import numpy as np > a = np.array([[[1],[2],[3]]]) > b = np.array([[[4],[5],[6]]]) > > x = np.array([[[1],[1],[2]]]) > > result = np.where(x > 1.5, a, b) > ---------- > and works as expected. > > Now, my understanding of broadcasting is that if the 'x' array is defined as np.array([[[1]]]) then broadcasting will ensure the result array will contain elements from array 'b'. ?That is, the program will behave as if 'x' had shape of (1,3,1) with three elements each of value 1. ?I tested that and got the result I expected. > > However, when I ran the test on another machine, it failed with an "array dimensions must agree" error. ?On the failing machine numpy.__version__ returns '1.2.0'. ?Machines on which the broadcasting works as I expect I see '1.3.0' (or later) in numpy.__version__. > > Have broadcast rules changed since 1.2.0? ?Or maybe I just don't understand broadcasting? I'm not certain, but earlier versions of numpy.where() may not have broadcasted their arguments. Not every function taking arrays as arguments does broadcasting. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." ? -- Umberto Eco From schlesin at cshl.edu Mon Nov 15 00:23:40 2010 From: schlesin at cshl.edu (Felix) Date: Mon, 15 Nov 2010 05:23:40 +0000 (UTC) Subject: [Numpy-discussion] Workaround for Ticket #1504 (MKL linking) Message-ID: is there any workaround or fix for the problem described in Ticket 1504? http://projects.scipy.org/numpy/ticket/1504 Using static linking sounds like it could be the easiest solution. Can numpy.distutils be used to do that? Thank you for any tips Felix From dagss at student.matnat.uio.no Mon Nov 15 02:00:37 2010 From: dagss at student.matnat.uio.no (Dag Sverre Seljebotn) Date: Mon, 15 Nov 2010 08:00:37 +0100 Subject: [Numpy-discussion] Workaround for Ticket #1504 (MKL linking) In-Reply-To: References: Message-ID: <4CE0DA95.60907@student.matnat.uio.no> On 11/15/2010 06:23 AM, Felix wrote: > is there any workaround or fix for the problem described in Ticket > 1504? > http://projects.scipy.org/numpy/ticket/1504 > > Using static linking sounds like it could be the easiest solution. Can > numpy.distutils be used to do that? > You can try to see if sys.setdlopenflags works for you, it does for me: http://www.mail-archive.com/numpy-discussion at scipy.org/msg23151.html Dag Sverre From zinka4u at gmail.com Mon Nov 15 05:02:48 2010 From: zinka4u at gmail.com (srinivas zinka) Date: Mon, 15 Nov 2010 19:02:48 +0900 Subject: [Numpy-discussion] Regrading Numpy Documentation ... Message-ID: Hi, I downloaded the "Numpy reference guide" in HTML format from the following link: http://docs.scipy.org/doc/ My intension is to use this documentation in "offline mode". But, in offline mode, I am unable to search the document using "quick search" option. (However, I can search the same document on the documentation website). I would like to know, if there is any other way to search the entire HTML reference guide in offline mode. Thank you, zinka -------------- next part -------------- An HTML attachment was scrubbed... URL: From scott.sinclair.za at gmail.com Mon Nov 15 06:27:37 2010 From: scott.sinclair.za at gmail.com (Scott Sinclair) Date: Mon, 15 Nov 2010 13:27:37 +0200 Subject: [Numpy-discussion] Regrading Numpy Documentation ... In-Reply-To: References: Message-ID: On 15 November 2010 12:02, srinivas zinka wrote: > I downloaded the "Numpy reference guide" in HTML format from the following > link: > http://docs.scipy.org/doc/ > My intension is to use this documentation in "offline mode". > But, in offline mode, ?I am unable to search the document using "quick > search" option. > (However, I can search the same document on the documentation website). > I would like to know, if there is any other way to search the entire > HTML reference guide in offline mode. That's strange. I've just downloaded the zip file of HTML pages and the quick search works fine in offline mode. Can you specify which zip file you downloaded e.g. http://docs.scipy.org/doc/numpy/numpy-html.zip, http://docs.scipy.org/doc/numpy-1.5.x/numpy-html.zip etc. Exactly what goes wrong when you try to use the "quick search"? Cheers, Scott From xscript at gmx.net Mon Nov 15 07:07:58 2010 From: xscript at gmx.net (=?utf-8?Q?Llu=C3=ADs?=) Date: Mon, 15 Nov 2010 13:07:58 +0100 Subject: [Numpy-discussion] numpy.genfromtxt converters issue In-Reply-To: (Pierre GM's message of "Sun, 14 Nov 2010 21:34:31 +0100") References: <66F2753F-6F60-4E1B-9A10-960EBE1F614C@gmail.com> <3F81849E-3C3F-406B-8BB6-C20299F38124@gmail.com> <87y68zr72g.fsf@ginnungagap.bsc.es> <889FD1C0-0C88-4F7A-A26F-4423D989DB4A@gmail.com> <871v6n4phh.fsf@fulla.xlab.taz> Message-ID: <87tyjin61t.fsf@ginnungagap.bsc.es> Pierre GM writes: > On Nov 14, 2010, at 9:30 PM, Llu?s wrote: >> This will work as long as 'first_values' is assured to always contain >> valid data and as long as its indexes are equivalent to those in >> converters (which I simply haven't checked). > I beat you to it, actually ;) Argh! :) > Check the git push I committed earlier today. I followed exactly the > same approach as yours (well, almost: I do check whether the first > line is empty (there were names) and then fall back on the default > testing value ('1')... Just out of curiosity... why do you check for the length of 'first_line'? Now I've looked into the code, and it seems that 'first_values' will always contain valid contents: https://github.com/numpy/numpy/blob/de4de92be21e4dda3665648ad5102b3729d4e0b0/numpy/lib/npyio.py#L1209 Lluis -- "And it's much the same thing with knowledge, for whenever you learn something new, the whole world becomes that much richer." -- The Princess of Pure Reason, as told by Norton Juster in The Phantom Tollbooth From zinka4u at gmail.com Mon Nov 15 07:15:37 2010 From: zinka4u at gmail.com (srinivas zinka) Date: Mon, 15 Nov 2010 21:15:37 +0900 Subject: [Numpy-discussion] Regrading Numpy Documentation ... In-Reply-To: References: Message-ID: Thank you for the reply. I just downloaded the following zip file: http://docs.scipy.org/doc/numpy-1.5.x/numpy-html.zip When I try to search for some thing (e.g., "array"), it keeps on searching (see the attached file). At the same time, I am able to search the HTML files downloaded from the python website: http://docs.python.org/ This is only happening on my Ubuntu system. But, on Windows, I have no problem with searching. I am not sure what the problem is. But, I think it has some thing to do with the operating system or JAVA!. by the way, these are my system specifications: OS: Ubuntu 10.10 Browser: Chromium On Mon, Nov 15, 2010 at 8:27 PM, Scott Sinclair wrote: > On 15 November 2010 12:02, srinivas zinka wrote: > > I downloaded the "Numpy reference guide" in HTML format from the > following > > link: > > http://docs.scipy.org/doc/ > > My intension is to use this documentation in "offline mode". > > But, in offline mode, I am unable to search the document using "quick > > search" option. > > (However, I can search the same document on the documentation website). > > I would like to know, if there is any other way to search the entire > > HTML reference guide in offline mode. > > That's strange. I've just downloaded the zip file of HTML pages and > the quick search works fine in offline mode. > > Can you specify which zip file you downloaded e.g. > http://docs.scipy.org/doc/numpy/numpy-html.zip, > http://docs.scipy.org/doc/numpy-1.5.x/numpy-html.zip etc. > > Exactly what goes wrong when you try to use the "quick search"? > > Cheers, > Scott > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Screenshot-Search ? NumPy v1.5 Manual (DRAFT) - Chromium.png Type: image/png Size: 150475 bytes Desc: not available URL: From pgmdevlist at gmail.com Mon Nov 15 07:36:09 2010 From: pgmdevlist at gmail.com (Pierre GM) Date: Mon, 15 Nov 2010 13:36:09 +0100 Subject: [Numpy-discussion] numpy.genfromtxt converters issue In-Reply-To: <87tyjin61t.fsf@ginnungagap.bsc.es> References: <66F2753F-6F60-4E1B-9A10-960EBE1F614C@gmail.com> <3F81849E-3C3F-406B-8BB6-C20299F38124@gmail.com> <87y68zr72g.fsf@ginnungagap.bsc.es> <889FD1C0-0C88-4F7A-A26F-4423D989DB4A@gmail.com> <871v6n4phh.fsf@fulla.xlab.taz> <87tyjin61t.fsf@ginnungagap.bsc.es> Message-ID: <6DE569F0-C863-4194-BFAB-90CE324553F2@gmail.com> On Nov 15, 2010, at 1:07 PM, Llu?s wrote: > Pierre GM writes: > >> On Nov 14, 2010, at 9:30 PM, Llu?s wrote: > >>> This will work as long as 'first_values' is assured to always contain >>> valid data and as long as its indexes are equivalent to those in >>> converters (which I simply haven't checked). > >> I beat you to it, actually ;) > > Argh! :) In all good spirit :) > >> Check the git push I committed earlier today. I followed exactly the >> same approach as yours (well, almost: I do check whether the first >> line is empty (there were names) and then fall back on the default >> testing value ('1')... > > Just out of curiosity... why do you check for the length of > 'first_line'? Now I've looked into the code, and it seems that > 'first_values' will always contain valid contents: > https://github.com/numpy/numpy/blob/de4de92be21e4dda3665648ad5102b3729d4e0b0/numpy/lib/npyio.py#L1209 Valid content, not necessarily valid values: you can have column names, if you chose names=True. In that case, you wouldn't want to use these names as testing values. An easy way to catch whether the first values are names is to check the first_line: if it's '', then it was names... From robert.kern at gmail.com Mon Nov 15 09:02:46 2010 From: robert.kern at gmail.com (Robert Kern) Date: Mon, 15 Nov 2010 08:02:46 -0600 Subject: [Numpy-discussion] Regrading Numpy Documentation ... In-Reply-To: References: Message-ID: On Mon, Nov 15, 2010 at 06:15, srinivas zinka wrote: > Thank you for the reply. > I just downloaded?the following zip file: > http://docs.scipy.org/doc/numpy-1.5.x/numpy-html.zip > When I try to search for some thing (e.g., "array"),?it keeps on > searching?(see the attached file). > At?the?same time, I am able to search the HTML files downloaded from the > python website: > http://docs.python.org/ > This is only happening on my Ubuntu system. But, on?Windows,?I have no > problem with searching. > I am not sure what the problem is. But, I think it has some thing to do with > the operating system or?JAVA!. Check your JavaScript settings. The search functionality is implemented in JavaScript. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." ? -- Umberto Eco From pv+numpy at math.duke.edu Mon Nov 15 09:32:34 2010 From: pv+numpy at math.duke.edu (pv+numpy at math.duke.edu) Date: Mon, 15 Nov 2010 09:32:34 -0500 (EST) Subject: [Numpy-discussion] Printing formatted numerical values Message-ID: Hi, what is the best way to print (to a file or to stdout) formatted numerical values? Analogously to C's printf("%d %g",x,y) etc? Numpy Documentation only discusses input *from* a file, or output of entire arrays. (np.savetxt()) I just want tab or space-delimited output of selected formatted values. In the absence of numpy documentation on this matter, I tried to follow python documentation and find errors. Below is ipython -pylab transcript, which apparently complains that an int32 variable is an object of type 'str'. How should I understand this? Does python not understand that numpy.int32 is an integer? Thank you! In [2] import numpy as np In [3]: w = np.arange (1,5,dtype=np.int32).reshape((2,2)) In [4]: w Out[4]: array([[1, 2], [3, 4]], dtype=int32) In [5]: w[0,0] Out[5]: 1 In [6]: w[0,0].class Out[6]: In [7]: print('{0:2d}'.format(w[0,0])) ValueError? Traceback (most recent call last) /home/p/o/dev/thesis/python/wavelet/ in () ValueError?: Unknown format code 'd' for object of type 'str' -- From kwgoodman at gmail.com Mon Nov 15 09:57:01 2010 From: kwgoodman at gmail.com (Keith Goodman) Date: Mon, 15 Nov 2010 06:57:01 -0800 Subject: [Numpy-discussion] getitem and slice Message-ID: There is more than one way to form the same slice. For example, a[:2] and a[0:2] and a[0:2:] pass slice objects to getitem with the same start, stop and step. Is there any way to get a hold of the exact character sequence the user used to form the slice? That is, I'd like to know if the user entered ":2", "0:2" or "0:2:". Context: Say I have a 2d array-like object with axis 0 named 'space' and axis 1 named 'time', I'd like to be able to make the following distinctions: a['time':2] --> a[:,2] a['time':2:] --> a[:,2:] a['space':2] --> a[2] a['space':2:] --> a[2:] From gerrit.holl at gmail.com Mon Nov 15 10:12:28 2010 From: gerrit.holl at gmail.com (Gerrit Holl) Date: Mon, 15 Nov 2010 16:12:28 +0100 Subject: [Numpy-discussion] Printing formatted numerical values In-Reply-To: References: Message-ID: On 15 November 2010 15:32, wrote: > Hi, what is the best way to print (to a file or to stdout) formatted > numerical values? Analogously to C's printf("%d %g",x,y) etc? Use the .tofile() method: numpy.random.random(5).tofile(sys.stdout, ' ', '%s') 0.230466435867 0.609443784908 0.353855676828 0.552641723317 0.186418931597 (works only on "real files", afaik, not on StringIO or similar) cheers, Gerrit. -- Exploring space at http://gerrit-explores.blogspot.com/ Personal homepage at http://www.topjaklont.org/ Asperger Syndroom: http://www.topjaklont.org/nl/asperger.html From robert.kern at gmail.com Mon Nov 15 10:20:43 2010 From: robert.kern at gmail.com (Robert Kern) Date: Mon, 15 Nov 2010 09:20:43 -0600 Subject: [Numpy-discussion] Printing formatted numerical values In-Reply-To: References: Message-ID: On Mon, Nov 15, 2010 at 08:32, wrote: > Hi, what is the best way to print (to a file or to stdout) formatted > numerical values? Analogously to C's printf("%d %g",x,y) etc? > > Numpy Documentation only discusses input *from* a file, or output of > entire arrays. (np.savetxt()) I just want tab or space-delimited output of > selected formatted values. > > In the absence of numpy documentation on this matter, I tried to follow > python documentation and find errors. Below is ipython > -pylab transcript, which apparently complains that an int32 variable is > an object of type 'str'. How should I understand this? Does python not > understand that numpy.int32 is an integer? Correct. On a 64-bit system, numpy.int32 does not subtype from int. The format codes do strict type-checking. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." ? -- Umberto Eco From robert.kern at gmail.com Mon Nov 15 10:23:04 2010 From: robert.kern at gmail.com (Robert Kern) Date: Mon, 15 Nov 2010 09:23:04 -0600 Subject: [Numpy-discussion] getitem and slice In-Reply-To: References: Message-ID: On Mon, Nov 15, 2010 at 08:57, Keith Goodman wrote: > There is more than one way to form the same slice. For example, a[:2] > and a[0:2] and a[0:2:] pass slice objects to getitem with the same > start, stop and step. Not quite correct. a[:2] passes slice(None, 2, None) whereas the next two pass slice(0, 2, None). > Is there any way to get a hold of the exact > character sequence the user used to form the slice? That is, I'd like > to know if the user entered ":2", "0:2" or "0:2:". No, there is no way to get this information. > Context: > > Say I have a 2d array-like object with axis 0 named 'space' and axis 1 > named 'time', I'd like to be able to make the following distinctions: > > a['time':2] --> a[:,2] > a['time':2:] --> a[:,2:] > > a['space':2] --> a[2] > a['space':2:] --> a[2:] Sorry, [i:j] is semantically equivalent to [i:j:]. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." ? -- Umberto Eco From dave.hirschfeld at gmail.com Mon Nov 15 10:44:11 2010 From: dave.hirschfeld at gmail.com (Dave Hirschfeld) Date: Mon, 15 Nov 2010 15:44:11 +0000 (UTC) Subject: [Numpy-discussion] Printing formatted numerical values References: Message-ID: math.duke.edu> writes: > > Hi, what is the best way to print (to a file or to stdout) formatted > numerical values? Analogously to C's printf("%d %g",x,y) etc? > For stdout you can simply do: In [26]: w, x, y, z = np.randint(0,100,4) In [27]: type(w) Out[27]: In [28]: print("%f %g %e %d" % (w,x,y,z,)) 68.000000 64 1.300000e+01 57 In [29]: w, x, y, z Out[29]: (68, 64, 13, 57) For a file I would use np.savetxt HTH, Dave From jkhilmer at gmail.com Mon Nov 15 11:18:56 2010 From: jkhilmer at gmail.com (Jonathan Hilmer) Date: Mon, 15 Nov 2010 09:18:56 -0700 Subject: [Numpy-discussion] Printing formatted numerical values In-Reply-To: References: Message-ID: Is there a convention for dealing with NaN and Inf? I've found that trusting the default behavior is a very bad idea: ----------------------------------- from numpy import * x = zeros((5,7)) x[:,3:] = nan x[:,-1] = inf savetxt('problem_array.txt',x,delimiter='\t') x2 = loadtxt('problem_array.txt') print(x.shape) # (5, 7) print(x2.shape) # (5, 4) print(x2[0,:]) print(open('problem_array.txt').readline().strip().split('\t')) ----------------------------------- On my system the loaded array is reduced in size relative to the saved array. Does savetxt need some conditionals to deal with special values like these? Jonathan On Mon, Nov 15, 2010 at 8:44 AM, Dave Hirschfeld wrote: > ? math.duke.edu> writes: > >> >> Hi, what is the best way to print (to a file or to stdout) formatted >> numerical values? Analogously to C's printf("%d %g",x,y) etc? >> > > For stdout you can simply do: > > In [26]: w, x, y, z = np.randint(0,100,4) > > In [27]: type(w) > Out[27]: > > In [28]: print("%f %g %e %d" % (w,x,y,z,)) > 68.000000 64 1.300000e+01 57 > > In [29]: w, x, y, z > Out[29]: (68, 64, 13, 57) > > For a file I would use np.savetxt > > HTH, > Dave > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From zinka4u at gmail.com Mon Nov 15 12:04:21 2010 From: zinka4u at gmail.com (srinivas zinka) Date: Tue, 16 Nov 2010 02:04:21 +0900 Subject: [Numpy-discussion] Regrading Numpy Documentation ... In-Reply-To: References: Message-ID: But, I have no problem searching "python HTML documentation" which is also created by Sphinx. I don't know much about JavaScript. But, don't they both use the same JavaScript? On Mon, Nov 15, 2010 at 11:02 PM, Robert Kern wrote: > On Mon, Nov 15, 2010 at 06:15, srinivas zinka wrote: > > Thank you for the reply. > > I just downloaded the following zip file: > > http://docs.scipy.org/doc/numpy-1.5.x/numpy-html.zip > > When I try to search for some thing (e.g., "array"), it keeps on > > searching (see the attached file). > > At the same time, I am able to search the HTML files downloaded from the > > python website: > > http://docs.python.org/ > > This is only happening on my Ubuntu system. But, on Windows, I have no > > problem with searching. > > I am not sure what the problem is. But, I think it has some thing to do > with > > the operating system or JAVA!. > > Check your JavaScript settings. The search functionality is > implemented in JavaScript. > > -- > Robert Kern > > "I have come to believe that the whole world is an enigma, a harmless > enigma that is made terrible by our own mad attempt to interpret it as > though it had an underlying truth." > -- Umberto Eco > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From schlesin at cshl.edu Mon Nov 15 12:48:38 2010 From: schlesin at cshl.edu (Felix) Date: Mon, 15 Nov 2010 17:48:38 +0000 (UTC) Subject: [Numpy-discussion] Workaround for Ticket #1504 (MKL linking) References: <4CE0DA95.60907@student.matnat.uio.no> Message-ID: On Nov 15, 2:00?am, Dag Sverre Seljebotn wrote: > On 11/15/2010 06:23 AM, Felix wrote: > > > is there any workaround or fix for the problem described in Ticket > > 1504? > >http://projects.scipy.org/numpy/ticket/1504 > > You can try to see if sys.setdlopenflags works for you, it does for me: > > http://www.mail-archive.com/numpy-discussion at scipy.org/msg23151.html That worked, thank you. Is this patch going to make it into numpy? Being able to use a recent version of the MKL seems important. Or is the long-term preferred solution still static linking? Felix From bsouthey at gmail.com Mon Nov 15 13:01:34 2010 From: bsouthey at gmail.com (Bruce Southey) Date: Mon, 15 Nov 2010 12:01:34 -0600 Subject: [Numpy-discussion] Workaround for Ticket #1504 (MKL linking) In-Reply-To: References: <4CE0DA95.60907@student.matnat.uio.no> Message-ID: <4CE1757E.60001@gmail.com> On 11/15/2010 11:48 AM, Felix wrote: > On Nov 15, 2:00 am, Dag Sverre Seljebotn wrote: >> On 11/15/2010 06:23 AM, Felix wrote: >> >>> is there any workaround or fix for the problem described in Ticket >>> 1504? >>> http://projects.scipy.org/numpy/ticket/1504 >> You can try to see if sys.setdlopenflags works for you, it does for me: >> >> http://www.mail-archive.com/numpy-discussion at scipy.org/msg23151.html > That worked, thank you. Is this patch going to make it into numpy? > Being able to use a recent version of the MKL seems important. > Or is the long-term preferred solution still static linking? > > Felix > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion Just a note that patch this has to work for Python 2.4 as Python 2.4 lacks ctypes. This means that 'import ctypes' may or may not fail depending on whether or not ctypes package (http://python.net/crew/theller/ctypes/) has been installed. Obviously the patch also has to work with different MKL versions but I don't use it. Bruce From dagss at student.matnat.uio.no Mon Nov 15 13:10:39 2010 From: dagss at student.matnat.uio.no (Dag Sverre Seljebotn) Date: Mon, 15 Nov 2010 19:10:39 +0100 Subject: [Numpy-discussion] Workaround for Ticket #1504 (MKL linking) In-Reply-To: <4CE1757E.60001@gmail.com> References: <4CE0DA95.60907@student.matnat.uio.no> <4CE1757E.60001@gmail.com> Message-ID: <4CE1779F.8030003@student.matnat.uio.no> On 11/15/2010 07:01 PM, Bruce Southey wrote: > On 11/15/2010 11:48 AM, Felix wrote: > >> On Nov 15, 2:00 am, Dag Sverre Seljebotn wrote: >> >>> On 11/15/2010 06:23 AM, Felix wrote: >>> >>> >>>> is there any workaround or fix for the problem described in Ticket >>>> 1504? >>>> http://projects.scipy.org/numpy/ticket/1504 >>>> >>> You can try to see if sys.setdlopenflags works for you, it does for me: >>> >>> http://www.mail-archive.com/numpy-discussion at scipy.org/msg23151.html >>> >> That worked, thank you. Is this patch going to make it into numpy? >> Being able to use a recent version of the MKL seems important. >> Or is the long-term preferred solution still static linking? >> >> Felix >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> > Just a note that patch this has to work for Python 2.4 as Python 2.4 > lacks ctypes. This means that 'import ctypes' may or may not fail > depending on whether or not ctypes package > (http://python.net/crew/theller/ctypes/) has been installed. > With a little more work one could probably fall back to import from the "dl" module in Python 2.4 (or, get the flag from the OS by writing a three-line Cython module). It obviously also need to fail gracefully on Windows if it doesn't already and so on. David C. was -1 on inclusion in that thread, and noone spoke up in support, and so I did not bother to spend time improving the patch. It obviously need a little bit of polish before being submitted. > Obviously the patch also has to work with different MKL versions but I > don't use it. > The patch is enabling a very generic mechanism and certainly won't stop older MKL versions from working. Dag Sverre From pav at iki.fi Mon Nov 15 14:35:54 2010 From: pav at iki.fi (Pauli Virtanen) Date: Mon, 15 Nov 2010 19:35:54 +0000 (UTC) Subject: [Numpy-discussion] Implementing __format__ References: Message-ID: On Mon, 15 Nov 2010 09:20:43 -0600, Robert Kern wrote: [clip] > Correct. On a 64-bit system, numpy.int32 does not subtype from int. The > format codes do strict type-checking. One can argue that this is a bug in Python or Numpy: "%d" % numpy.int16(1) "{0:d}".format(numpy.int16(1)) the first one works fine whereas the second does not, and it is not clear what is gained by disallowing the latter. [To answer the OP -- the workaround is to use the "%d" formatting codes, not the format() method, which is a recent addition in Python.] To make it work via changes in Numpy: scalars should implement a __format__ method. Two choices: either we parse the formatting string ourselves, or forward formatting to Python. The PITA in implementing this is in parsing the format string. Doing formatting ourselves would allow e.g. formatting of long doubles properly, which cannot be done via the %-syntax. http://projects.scipy.org/numpy/ticket/1675 -- Pauli Virtanen From Chris.Barker at noaa.gov Mon Nov 15 14:44:12 2010 From: Chris.Barker at noaa.gov (Christopher Barker) Date: Mon, 15 Nov 2010 11:44:12 -0800 Subject: [Numpy-discussion] Implementing __format__ In-Reply-To: References: Message-ID: <4CE18D8C.4030309@noaa.gov> On 11/15/10 11:35 AM, Pauli Virtanen wrote: > One can argue that this is a bug in Python or Numpy: > > "%d" % numpy.int16(1) > > "{0:d}".format(numpy.int16(1)) > To make it work via changes in Numpy: scalars should implement a > __format__ method. Two choices: either we parse the formatting string > ourselves, or forward formatting to Python. The PITA in implementing this > is in parsing the format string. > > Doing formatting ourselves would allow e.g. formatting of long doubles > properly, which cannot be done via the %-syntax. Also good handling of NaN, which there is other custom code in numpy to handle better than python (in string parsing, anyway). -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From charlesr.harris at gmail.com Mon Nov 15 15:06:17 2010 From: charlesr.harris at gmail.com (Charles R Harris) Date: Mon, 15 Nov 2010 13:06:17 -0700 Subject: [Numpy-discussion] Problems testing the floating point flags In-Reply-To: References: Message-ID: On Sun, Nov 14, 2010 at 1:29 PM, Mark Wiebe wrote: > On Sun, Nov 14, 2010 at 12:21 PM, Charles R Harris < > charlesr.harris at gmail.com> wrote: > >> Good point. I was trying to keep the fpeclear in front of the code to be >> tested. >> >> > Yeah, I hadn't considered that possibility too seriously. Hopefully as > long as the compiler doesn't see a reason to reorder, it's fine. Compiler > optimizers are difficult to trick, they work quite well these days. I > believe the reason it reorders in the case of this bug is that it notices > that by shifting the divide down to the bottom, the control flow path > returning an error never needs to do the divide, and will thus run faster. > > It looks like no one here has better ideas. Given that there is no way to guarantee that the compiler behaves absent the pragma I think implementing the simplest solution that works in practice is the way to go. It needs a documentation blurb in the code to flag it, we should fix the other locations where the error is checked, and we should have tests for all of them so we can verify that the compiler doesn't develop new "features" in the future. There aren't too many places to deal with, so I think that is doable. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From ben.root at ou.edu Mon Nov 15 17:43:02 2010 From: ben.root at ou.edu (Benjamin Root) Date: Mon, 15 Nov 2010 16:43:02 -0600 Subject: [Numpy-discussion] dtypes error in recfunctions Message-ID: Hello, I was using append_fields() in numpy.lib.recfunctions when I discovered a slight logic mistake in handling the dtypes argument. The code first checks to see if dtypes is None. If so, it then guesses the dtype info from the input data. Then, it goes to see if the dtypes is not a sequence and puts it into a list. This is where the logic breaks down. It then proceeds to use that dtypes to start merging the data. However, if you pass in a sequence of dtype, that condition gets skipped and the rest of the function has to figure out the dtype (and ignores the supplied names as well). I have attached a patch to fix this. I did not check the rest of the module to see if this mis-logic crops up anywhere else. Thanks, Ben Root -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: recfuncs.patch Type: application/octet-stream Size: 688 bytes Desc: not available URL: From wardefar at iro.umontreal.ca Mon Nov 15 17:52:56 2010 From: wardefar at iro.umontreal.ca (David Warde-Farley) Date: Mon, 15 Nov 2010 17:52:56 -0500 Subject: [Numpy-discussion] --nocapture with numpy.testing.Tester Message-ID: <20101115225256.GA24809@ravage> Hi, I'm trying to use numpy.testing.Tester to run tests for another, numpy-based project. It works beautifully, except for the fact that I can't seem to silence output (i.e. NOSE_NOCAPTURE/--nocapture/-s). I've tried to call test with extra_argv=['-s'] and also tried subclassing to muck with prepare_test_args but nothing seems to work, I still get stdout captured. Does anyone know what I'm doing wrong? David From pgmdevlist at gmail.com Mon Nov 15 17:57:02 2010 From: pgmdevlist at gmail.com (Pierre GM) Date: Mon, 15 Nov 2010 23:57:02 +0100 Subject: [Numpy-discussion] dtypes error in recfunctions In-Reply-To: References: Message-ID: <1EE8C5D5-9CD7-4AB4-9284-8BBE18C5557A@gmail.com> On Nov 15, 2010, at 11:43 PM, Benjamin Root wrote: > Hello, > > I was using append_fields() in numpy.lib.recfunctions when I discovered a slight logic mistake in handling the dtypes argument. > > The code first checks to see if dtypes is None. If so, it then guesses the dtype info from the input data. Then, it goes to see if the dtypes is not a sequence and puts it into a list. This is where the logic breaks down. It then proceeds to use that dtypes to start merging the data. However, if you pass in a sequence of dtype, that condition gets skipped and the rest of the function has to figure out the dtype (and ignores the supplied names as well). Good call. I never advertised these functions just because I thought they hadn't been thoroughly tested. Thanks a million! > I have attached a patch to fix this. I did not check the rest of the module to see if this mis-logic crops up anywhere else. Please file a ticket, otherwise I'm more than likely to forget about it... And please attach your patch! Thanks again P. From ben.root at ou.edu Mon Nov 15 19:21:10 2010 From: ben.root at ou.edu (Benjamin Root) Date: Mon, 15 Nov 2010 18:21:10 -0600 Subject: [Numpy-discussion] dtypes error in recfunctions In-Reply-To: <1EE8C5D5-9CD7-4AB4-9284-8BBE18C5557A@gmail.com> References: <1EE8C5D5-9CD7-4AB4-9284-8BBE18C5557A@gmail.com> Message-ID: On Mon, Nov 15, 2010 at 4:57 PM, Pierre GM wrote: > > On Nov 15, 2010, at 11:43 PM, Benjamin Root wrote: > > > Hello, > > > > I was using append_fields() in numpy.lib.recfunctions when I discovered a > slight logic mistake in handling the dtypes argument. > > > > The code first checks to see if dtypes is None. If so, it then guesses > the dtype info from the input data. Then, it goes to see if the dtypes is > not a sequence and puts it into a list. This is where the logic breaks > down. It then proceeds to use that dtypes to start merging the data. > However, if you pass in a sequence of dtype, that condition gets skipped > and the rest of the function has to figure out the dtype (and ignores the > supplied names as well). > > Good call. I never advertised these functions just because I thought they > hadn't been thoroughly tested. Thanks a million! > > > > I have attached a patch to fix this. I did not check the rest of the > module to see if this mis-logic crops up anywhere else. > > Please file a ticket, otherwise I'm more than likely to forget about it... > And please attach your patch! > Thanks again > P. > No problem. It is ticket 1676. http://projects.scipy.org/numpy/ticket/1676 Glad to be of help! Ben Root -------------- next part -------------- An HTML attachment was scrubbed... URL: From schlesin at cshl.edu Mon Nov 15 19:38:54 2010 From: schlesin at cshl.edu (Felix) Date: Tue, 16 Nov 2010 00:38:54 +0000 (UTC) Subject: [Numpy-discussion] Workaround for Ticket #1504 (MKL linking) References: <4CE0DA95.60907@student.matnat.uio.no> <4CE1757E.60001@gmail.com> Message-ID: Bruce Southey gmail.com> writes: > On 11/15/2010 11:48 AM, Felix wrote: > > On Nov 15, 2:00 am, Dag Sverre Seljebotn wrote: > >> On 11/15/2010 06:23 AM, Felix wrote: > >> > >>> is there any workaround or fix for the problem described in Ticket > >>> 1504? > >>> http://projects.scipy.org/numpy/ticket/1504 > >> You can try to see if sys.setdlopenflags works for you, it does for me: > > That worked, thank you. Is this patch going to make it into numpy? > Just a note that patch this has to work for Python 2.4 as Python 2.4 > lacks ctypes. This means that 'import ctypes' may or may not fail > depending on whether or not ctypes package > (http://python.net/crew/theller/ctypes/) has been installed. > > Obviously the patch also has to work with different MKL versions but I > don't use it. Unfortunately I cannot test this patch with different MKL versions or under Windows. But the current code is clearly broken for an important case, namely the newest (or even second newest) MKL generation on Linux (and probably on windows as well from what I understand). Unless there is a way to work around the problem by static linking I would be +1 on including a similar patch. I do not know how to make it work on python 2.4 but I am happy to test any alternative on 2.6 of course. In the worst casepython 2.4 would fall back to the current behavior if ctypes is not installed. Felix From scott.sinclair.za at gmail.com Tue Nov 16 01:42:34 2010 From: scott.sinclair.za at gmail.com (Scott Sinclair) Date: Tue, 16 Nov 2010 08:42:34 +0200 Subject: [Numpy-discussion] Regrading Numpy Documentation ... In-Reply-To: References: Message-ID: On 15 November 2010 14:15, srinivas zinka wrote: > Thank you for the reply. > I just downloaded?the following zip file: > http://docs.scipy.org/doc/numpy-1.5.x/numpy-html.zip > When I try to search for some thing (e.g., "array"),?it keeps on > searching?(see the attached file). > At?the?same time, I am able to search the HTML files downloaded from the > python website: > http://docs.python.org/ > This is only happening on my Ubuntu system. But, on?Windows,?I have no > problem with searching. > I am not sure what the problem is. But, I think it has some thing to do with > the operating system or?JAVA!. > by the way, these are my system specifications: > OS:?Ubuntu?10.10 > Browser: Chromium I do see the same problem as you using Chromium on Ubuntu, but have no trouble using Firefox. Even more strange, it only happens with the Numpy and Scipy documentation, not Sphinx documentation I've built for my own projects. I guess the workaround is to try using Firefox on Ubuntu.. Cheers, Scott From zinka4u at gmail.com Tue Nov 16 05:19:17 2010 From: zinka4u at gmail.com (srinivas zinka) Date: Tue, 16 Nov 2010 19:19:17 +0900 Subject: [Numpy-discussion] Regrading Numpy Documentation ... In-Reply-To: References: Message-ID: @Scott Thanks for the tip. On Tue, Nov 16, 2010 at 3:42 PM, Scott Sinclair wrote: > On 15 November 2010 14:15, srinivas zinka wrote: > > Thank you for the reply. > > I just downloaded the following zip file: > > http://docs.scipy.org/doc/numpy-1.5.x/numpy-html.zip > > When I try to search for some thing (e.g., "array"), it keeps on > > searching (see the attached file). > > At the same time, I am able to search the HTML files downloaded from the > > python website: > > http://docs.python.org/ > > This is only happening on my Ubuntu system. But, on Windows, I have no > > problem with searching. > > I am not sure what the problem is. But, I think it has some thing to do > with > > the operating system or JAVA!. > > by the way, these are my system specifications: > > OS: Ubuntu 10.10 > > Browser: Chromium > > I do see the same problem as you using Chromium on Ubuntu, but have no > trouble using Firefox. > > Even more strange, it only happens with the Numpy and Scipy > documentation, not Sphinx documentation I've built for my own > projects. > > I guess the workaround is to try using Firefox on Ubuntu.. > > Cheers, > Scott > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From sotanez at gmail.com Tue Nov 16 08:13:25 2010 From: sotanez at gmail.com (SoTaNeZ) Date: Tue, 16 Nov 2010 14:13:25 +0100 Subject: [Numpy-discussion] "numpy.linalg.linalg.LinAlgError: Singular matrix" using "numpy.linalg.solve" Message-ID: Hi all. I got this exception while executin numpy.linalg.solve(a,b) being: a = array([[ 1.00000000e+000, -4.19430400e+006, 0.00000000e+000, 0.00000000e+000, 0.00000000e+000, 0.00000000e+000, 0.00000000e+000, 0.00000000e+000, 0.00000000e+000], [ 1.00000000e+000, 0.00000000e+000, -2.47845883e-119, 0.00000000e+000, 0.00000000e+000, 0.00000000e+000, 0.00000000e+000, 0.00000000e+000, 0.00000000e+000], [ 1.00000000e+000, 0.00000000e+000, 0.00000000e+000, -2.68435456e+008, 0.00000000e+000, 0.00000000e+000, 0.00000000e+000, 0.00000000e+000, 0.00000000e+000], [ 1.00000000e+000, 0.00000000e+000, 0.00000000e+000, 0.00000000e+000, -3.38813179e-021, 0.00000000e+000, 0.00000000e+000, 0.00000000e+000, 0.00000000e+000], [ 1.00000000e+000, 0.00000000e+000, 0.00000000e+000, 0.00000000e+000, 0.00000000e+000, -4.09600000e+003, 0.00000000e+000, 0.00000000e+000, 0.00000000e+000], [ 1.00000000e+000, 0.00000000e+000, 0.00000000e+000, 0.00000000e+000, 0.00000000e+000, 0.00000000e+000, -2.56000000e+002, 0.00000000e+000, 0.00000000e+000], [ 1.00000000e+000, 0.00000000e+000, 0.00000000e+000, 0.00000000e+000, 0.00000000e+000, 0.00000000e+000, 0.00000000e+000, -7.20575940e+016, 0.00000000e+000], [ 1.00000000e+000, 0.00000000e+000, 0.00000000e+000, 0.00000000e+000, 0.00000000e+000, 0.00000000e+000, 0.00000000e+000, 0.00000000e+000, -1.56250000e-002], [ 1.00000000e+000, 1.00000000e+000, 1.00000000e+000, 1.00000000e+000, 1.00000000e+000, 1.00000000e+000, 1.00000000e+000, 1.00000000e+000, 1.00000000e+000]]) b = array([0, 0, 0, 0, 0, 0, 0, 0, 1]) I guess some numbers in a are too big or too small, aren't they? Thanks, David. -------------- next part -------------- An HTML attachment was scrubbed... URL: From dsdale24 at gmail.com Tue Nov 16 09:20:29 2010 From: dsdale24 at gmail.com (Darren Dale) Date: Tue, 16 Nov 2010 09:20:29 -0500 Subject: [Numpy-discussion] seeking advice on a fast string->array conversion Message-ID: I am wrapping up a small package to parse a particular ascii-encoded file format generated by a program we use heavily here at the lab. (In the unlikely event that you work at a synchrotron, and use Certified Scientific's "spec" program, and are actually interested, the code is currently available at https://github.com/darrendale/praxes/tree/specformat/praxes/io/spec/ .) I have been benchmarking the project against another python package developed by a colleague, which is an extension module written in pure C. My python/cython project takes about twice as long to parse and index a file (~0.8 seconds for 100MB), which is acceptable. However, actually converting ascii strings to numpy arrays, which is done using numpy.fromstring, takes a factor of 10 longer than the extension module. So I am wondering about the performance of np.fromstring: import time import numpy as np s = b'1 ' * 2048 *1200 d = time.time() x = np.fromstring(s) print time.time() - d From william.ratcliff at gmail.com Tue Nov 16 09:26:10 2010 From: william.ratcliff at gmail.com (william ratcliff) Date: Tue, 16 Nov 2010 09:26:10 -0500 Subject: [Numpy-discussion] seeking advice on a fast string->array conversion In-Reply-To: References: Message-ID: Actually, I do use spec when I have synchotron experiments. But why are your files so large? On Nov 16, 2010 9:20 AM, "Darren Dale" wrote: > I am wrapping up a small package to parse a particular ascii-encoded > file format generated by a program we use heavily here at the lab. (In > the unlikely event that you work at a synchrotron, and use Certified > Scientific's "spec" program, and are actually interested, the code is > currently available at > https://github.com/darrendale/praxes/tree/specformat/praxes/io/spec/ > .) > > I have been benchmarking the project against another python package > developed by a colleague, which is an extension module written in pure > C. My python/cython project takes about twice as long to parse and > index a file (~0.8 seconds for 100MB), which is acceptable. However, > actually converting ascii strings to numpy arrays, which is done using > numpy.fromstring, takes a factor of 10 longer than the extension > module. So I am wondering about the performance of np.fromstring: > > import time > import numpy as np > s = b'1 ' * 2048 *1200 > d = time.time() > x = np.fromstring(s) > print time.time() - d > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion -------------- next part -------------- An HTML attachment was scrubbed... URL: From dsdale24 at gmail.com Tue Nov 16 09:29:54 2010 From: dsdale24 at gmail.com (Darren Dale) Date: Tue, 16 Nov 2010 09:29:54 -0500 Subject: [Numpy-discussion] seeking advice on a fast string->array conversion In-Reply-To: References: Message-ID: Sorry, I accidentally hit send long before I was finished writing. But to answer your question, they contain many 2048-element multi-channel analyzer spectra. Darren On Tue, Nov 16, 2010 at 9:26 AM, william ratcliff wrote: > Actually, > I do use spec when I have synchotron experiments.? But why are your files so > large? > > On Nov 16, 2010 9:20 AM, "Darren Dale" wrote: >> I am wrapping up a small package to parse a particular ascii-encoded >> file format generated by a program we use heavily here at the lab. (In >> the unlikely event that you work at a synchrotron, and use Certified >> Scientific's "spec" program, and are actually interested, the code is >> currently available at >> https://github.com/darrendale/praxes/tree/specformat/praxes/io/spec/ >> .) >> >> I have been benchmarking the project against another python package >> developed by a colleague, which is an extension module written in pure >> C. My python/cython project takes about twice as long to parse and >> index a file (~0.8 seconds for 100MB), which is acceptable. However, >> actually converting ascii strings to numpy arrays, which is done using >> numpy.fromstring, takes a factor of 10 longer than the extension >> module. So I am wondering about the performance of np.fromstring: >> >> import time >> import numpy as np >> s = b'1 ' * 2048 *1200 >> d = time.time() >> x = np.fromstring(s) >> print time.time() - d >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > From pav at iki.fi Tue Nov 16 09:31:11 2010 From: pav at iki.fi (Pauli Virtanen) Date: Tue, 16 Nov 2010 14:31:11 +0000 (UTC) Subject: [Numpy-discussion] seeking advice on a fast string->array conversion References: Message-ID: Tue, 16 Nov 2010 09:20:29 -0500, Darren Dale wrote: [clip] > module. So I am wondering about the performance of np.fromstring: Fromstring is slow, probably because it must work around locale- dependence of the underlying C parsing functions. Moreover, the Numpy parsing mechanism generates many indirect calls. -- Pauli Virtanen From dsdale24 at gmail.com Tue Nov 16 09:41:04 2010 From: dsdale24 at gmail.com (Darren Dale) Date: Tue, 16 Nov 2010 09:41:04 -0500 Subject: [Numpy-discussion] seeking advice on a fast string->array conversion In-Reply-To: References: Message-ID: Apologies, I accidentally hit send... On Tue, Nov 16, 2010 at 9:20 AM, Darren Dale wrote: > I am wrapping up a small package to parse a particular ascii-encoded > file format generated by a program we use heavily here at the lab. (In > the unlikely event that you work at a synchrotron, and use Certified > Scientific's "spec" program, and are actually interested, the code is > currently available at > https://github.com/darrendale/praxes/tree/specformat/praxes/io/spec/ > .) > > I have been benchmarking the project against another python package > developed by a colleague, which is an extension module written in pure > C. My python/cython project takes about twice as long to parse and > index a file (~0.8 seconds for 100MB), which is acceptable. However, > actually converting ascii strings to numpy arrays, which is done using > numpy.fromstring, ?takes a factor of 10 longer than the extension > module. So I am wondering about the performance of np.fromstring: import time import numpy as np s = b'1 ' * 2048 *1200 d = time.time() x = np.fromstring(s, dtype='d', sep=b' ') print time.time() - d That takes about 1.3 seconds on my machine. A similar metric for the extension module is to load 1200 of these 2048-element arrays from the file: d=time.time() x=[s.mca(i+1) for i in xrange(1200)] print time.time()-d That takes about 0.127 seconds on my machine. This discrepancy is unacceptable for my usecase, so I need to develop an alternative to fromstring. Here is bit of testing with cython: import time cdef extern from 'stdlib.h': double atof(char*) py_string = '100' cdef char* c_string = py_string cdef int i, j j=2048*1200 d = time.time() while iarray conversion References: Message-ID: Tue, 16 Nov 2010 09:41:04 -0500, Darren Dale wrote: [clip] > That loop takes 0.33 seconds to execute, which is a good start. I need > some help converting this example to return an actual numpy array. Could > anyone please offer a suggestion? Easiest way is probably to use ndarray buffers and resize them when needed. For example: https://github.com/pv/scipy-work/blob/enh/interpnd-smooth/scipy/spatial/qhull.pyx#L980 From josef.pktd at gmail.com Tue Nov 16 10:11:16 2010 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Tue, 16 Nov 2010 10:11:16 -0500 Subject: [Numpy-discussion] "numpy.linalg.linalg.LinAlgError: Singular matrix" using "numpy.linalg.solve" In-Reply-To: References: Message-ID: On Tue, Nov 16, 2010 at 8:13 AM, SoTaNeZ wrote: > Hi all. > > I got this exception while executin numpy.linalg.solve(a,b) being: > > a = array([[? 1.00000000e+000,? -4.19430400e+006,?? 0.00000000e+000, > ????????????????? 0.00000000e+000,?? 0.00000000e+000,?? 0.00000000e+000, > ????????????????? 0.00000000e+000,?? 0.00000000e+000,?? 0.00000000e+000], > ?????????????? [? 1.00000000e+000,?? 0.00000000e+000,? -2.47845883e-119, > ????????? ??????? 0.00000000e+000,?? 0.00000000e+000,?? 0.00000000e+000, > ????????? ??????? 0.00000000e+000,?? 0.00000000e+000,?? 0.00000000e+000], > ??????????? ?? [? 1.00000000e+000,?? 0.00000000e+000,?? 0.00000000e+000, > ??????? ???????? -2.68435456e+008,?? 0.00000000e+000,?? 0.00000000e+000, > ??????? ????????? 0.00000000e+000,?? 0.00000000e+000,?? 0.00000000e+000], > ??????? ?????? [? 1.00000000e+000,?? 0.00000000e+000,?? 0.00000000e+000, > ??????? ????????? 0.00000000e+000,? -3.38813179e-021,?? 0.00000000e+000, > ??????? ????????? 0.00000000e+000,?? 0.00000000e+000,?? 0.00000000e+000], > ??????? ?????? [? 1.00000000e+000,?? 0.00000000e+000,?? 0.00000000e+000, > ????????? ??????? 0.00000000e+000,?? 0.00000000e+000,? -4.09600000e+003, > ????????? ??????? 0.00000000e+000,?? 0.00000000e+000,?? 0.00000000e+000], > ????????? ???? [? 1.00000000e+000,?? 0.00000000e+000,?? 0.00000000e+000, > ?????????? ?????? 0.00000000e+000,?? 0.00000000e+000,?? 0.00000000e+000, > ???????? ??????? -2.56000000e+002,?? 0.00000000e+000,?? 0.00000000e+000], > ???????? ????? [? 1.00000000e+000,?? 0.00000000e+000,?? 0.00000000e+000, > ????????? ??????? 0.00000000e+000,?? 0.00000000e+000,?? 0.00000000e+000, > ?????????? ?????? 0.00000000e+000,? -7.20575940e+016,?? 0.00000000e+000], > ??????? ?????? [? 1.00000000e+000,?? 0.00000000e+000,?? 0.00000000e+000, > ???????? ???????? 0.00000000e+000,?? 0.00000000e+000,?? 0.00000000e+000, > ???????? ???????? 0.00000000e+000,?? 0.00000000e+000,? -1.56250000e-002], > ??????? ?????? [? 1.00000000e+000,?? 1.00000000e+000,?? 1.00000000e+000, > ???????? ???????? 1.00000000e+000,?? 1.00000000e+000,?? 1.00000000e+000, > ???????? ???????? 1.00000000e+000,?? 1.00000000e+000,?? 1.00000000e+000]]) > > b = array([0, 0, 0, 0, 0, 0, 0, 0, 1]) > > I guess some numbers in a are too big or too small, aren't they? some eigenvalues are very close to zero >>> np.linalg.eigh(a)[0] array([ -2.87298335e+00, -1.60293723e-16, -3.43582743e-17, -1.24477525e-32, -1.84567471e-33, 3.95359079e-33, 8.81541828e-18, 4.09615908e-16, 4.87298335e+00]) if you need to get a robust solution, you could use lstsq, but whether the solution makes sense will depend on your application (I guess it should be equivalent to using the pinv) >>> np.linalg.lstsq(a,b) (array([ -5.91579259e-08, 2.21469552e-14, 2.31028827e-10, -2.06502900e-16, 2.31028892e-10, 5.95746983e-08, 1.51994027e-05, -8.20980166e-25, 2.31042993e-10]), array([], dtype=float64), 5, array([ 7.20575940e+16, 2.68435456e+08, 4.19430411e+06, 4.09600024e+03, 2.56003891e+02, 2.30353618e+00, 1.30462749e+00, 1.03982867e-02, 1.95397635e-10])) Josef > > Thanks, > David. > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > From dsdale24 at gmail.com Tue Nov 16 10:31:31 2010 From: dsdale24 at gmail.com (Darren Dale) Date: Tue, 16 Nov 2010 10:31:31 -0500 Subject: [Numpy-discussion] seeking advice on a fast string->array conversion In-Reply-To: References: Message-ID: On Tue, Nov 16, 2010 at 9:55 AM, Pauli Virtanen wrote: > Tue, 16 Nov 2010 09:41:04 -0500, Darren Dale wrote: > [clip] >> That loop takes 0.33 seconds to execute, which is a good start. I need >> some help converting this example to return an actual numpy array. Could >> anyone please offer a suggestion? > > Easiest way is probably to use ndarray buffers and resize them when > needed. For example: > > https://github.com/pv/scipy-work/blob/enh/interpnd-smooth/scipy/spatial/qhull.pyx#L980 Thank you Pauli. That makes it *incredibly* simple: import time cimport numpy as np import numpy as np cdef extern from 'stdlib.h': double atof(char*) def test(): py_string = '100' cdef char* c_string = py_string cdef int i, j cdef double val i = 0 j = 2048*1200 cdef np.ndarray[np.float64_t, ndim=1] ret ret_arr = np.empty((2048*1200,), dtype=np.float64) ret = ret_arr d = time.time() while iarray conversion In-Reply-To: References: Message-ID: <4CE2B55B.6040905@noaa.gov> On 11/16/10 7:31 AM, Darren Dale wrote: > On Tue, Nov 16, 2010 at 9:55 AM, Pauli Virtanen wrote: >> Tue, 16 Nov 2010 09:41:04 -0500, Darren Dale wrote: >> [clip] >>> That loop takes 0.33 seconds to execute, which is a good start. I need >>> some help converting this example to return an actual numpy array. Could >>> anyone please offer a suggestion? Darren, It's interesting that you found fromstring() so slow -- I've put some time into trying to get fromfile() and fromstring() to be a bit more robust and featurefull, but found it to be some really painful code to work on -- but it didn't dawn on my that it would be slow too! I saw all the layers of function calls, but I still thought that would be minimal compared to the actual string parsing. I guess not. Shows that you never know where your bottlenecks are without profiling. "Slow" is relative, of course, but since the whole point of fromfile/string is performance (otherwise, we'd just parse with python), it would be nice to get them as fast as possible. I had been thinking that the way to make a good fromfile was Cython, so you've inspired me to think about it some more. Would you be interested in extending what you're doing to a more general purpose tool? Anyway, a comment or two: > cdef extern from 'stdlib.h': > double atof(char*) One thing I found with the current numpy code is that the use of the ato* functions is a source of a lot of bugs (all of them?) the core problem is error handling -- you have to do a lot of pointer checking to see if a call was successful, and with the fromfile code, that error handling is not done in all the layers of calls. Anyone know what the advantage of ato* is over scanf()/fscanf()? Also, why are you doing string parsing rather than parsing the files directly, wouldn't that be a bit faster? I've got some C extension code for simple parsing of text files into arrays of floats or doubles (using fscanf). I'd be curious how the performance compares to what you've got. Let me know if you're interested. -Chris > def test(): > py_string = '100' > cdef char* c_string = py_string > cdef int i, j > cdef double val > i = 0 > j = 2048*1200 > cdef np.ndarray[np.float64_t, ndim=1] ret > > ret_arr = np.empty((2048*1200,), dtype=np.float64) > ret = ret_arr > > d = time.time() > while i c_string = py_string > ret[i] = atof(c_string) > i += 1 > ret_arr.shape = (1200, 2048) > print ret_arr, ret_arr.shape, time.time()-d > > The loop now takes only 0.11 seconds to execute. Thanks again. > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From dsdale24 at gmail.com Tue Nov 16 11:57:16 2010 From: dsdale24 at gmail.com (Darren Dale) Date: Tue, 16 Nov 2010 11:57:16 -0500 Subject: [Numpy-discussion] seeking advice on a fast string->array conversion In-Reply-To: <4CE2B55B.6040905@noaa.gov> References: <4CE2B55B.6040905@noaa.gov> Message-ID: On Tue, Nov 16, 2010 at 11:46 AM, Christopher Barker wrote: > On 11/16/10 7:31 AM, Darren Dale wrote: >> On Tue, Nov 16, 2010 at 9:55 AM, Pauli Virtanen ?wrote: >>> Tue, 16 Nov 2010 09:41:04 -0500, Darren Dale wrote: >>> [clip] >>>> That loop takes 0.33 seconds to execute, which is a good start. I need >>>> some help converting this example to return an actual numpy array. Could >>>> anyone please offer a suggestion? > > Darren, > > It's interesting that you found fromstring() so slow -- I've put some > time into trying to get fromfile() and fromstring() to be a bit more > robust and featurefull, but found it to be some really painful code to > work on -- but it didn't dawn on my that it would be slow too! I saw all > the layers of function calls, but I still thought that would be minimal > compared to the actual string parsing. I guess not. Shows that you never > know where your bottlenecks are without profiling. > > "Slow" is relative, of course, but since the whole point of > fromfile/string is performance (otherwise, we'd just parse with python), > it would be nice to get them as fast as possible. > > I had been thinking that the way to make a good fromfile was Cython, so > you've inspired me to think about it some more. Would you be interested > in extending what you're doing to a more general purpose tool? > > Anyway, ?a comment or two: >> cdef extern from 'stdlib.h': >> ? ? ?double atof(char*) > > One thing I found with the current numpy code is that the use of the > ato* functions is a source of a lot of bugs (all of them?) the core > problem is error handling -- you have to do a lot of pointer checking to > see if a call was successful, and with the fromfile code, that error > handling is not done in all the layers of calls. In my case, I am making an assumption about the integrity of the file. > Anyone know what the advantage of ato* is over scanf()/fscanf()? > > Also, why are you doing string parsing rather than parsing the files > directly, wouldn't that be a bit faster? Rank inexperience, I guess. I don't understand what you have in mind. scanf/fscanf don't actually convert strings to numbers, do they? > I've got some C extension code for simple parsing of text files into > arrays of floats or doubles (using fscanf). I'd be curious how the > performance compares to what you've got. Let me know if you're interested. I'm curious, yes. Darren From Chris.Barker at noaa.gov Tue Nov 16 13:01:11 2010 From: Chris.Barker at noaa.gov (Christopher Barker) Date: Tue, 16 Nov 2010 10:01:11 -0800 Subject: [Numpy-discussion] seeking advice on a fast string->array conversion In-Reply-To: References: <4CE2B55B.6040905@noaa.gov> Message-ID: <4CE2C6E7.4020908@noaa.gov> On 11/16/10 8:57 AM, Darren Dale wrote: > In my case, I am making an assumption about the integrity of the file. That does make things easier, but less universal. I guess this is the whole trade-off about "reusable code". It sure it a lot easier to write code that does the one thing you need than something general purpose. >> Anyone know what the advantage of ato* is over scanf()/fscanf()? >> >> Also, why are you doing string parsing rather than parsing the files >> directly, wouldn't that be a bit faster? > > Rank inexperience, I guess. I don't understand what you have in mind. if your goal is to read numbers from an ascii file, you can use fromfile() directly, rather than reading the file (or some of it) into a string, and then using fromstring(). Also, in C, you can use fscanf to read the file directly (of course, under the hood, it's putting stuff in stings somewhere along the line, but presumably in an optimized way. > scanf/fscanf don't actually convert strings to numbers, do they? yes, that's exactly what they do. http://en.wikipedia.org/wiki/Scanf The C lib may very well use ato* under the hood. My idea at this point is to write a function in Cython to takes a file and a numpy dtype, converts the dtype to a scanf format string, then calls fscanf (or scanf) to parse out the file. My existing scanner code more or less does that, but the format string is hard-code to be either for floats or doubles. >> I've got some C extension code for simple parsing of text files into >> arrays of floats or doubles (using fscanf). I'd be curious how the >> performance compares to what you've got. Let me know if you're interested. > > I'm curious, yes. OK -- I'll whip up a test similar to yours -- stay tuned! -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From Chris.Barker at noaa.gov Tue Nov 16 13:44:10 2010 From: Chris.Barker at noaa.gov (Christopher Barker) Date: Tue, 16 Nov 2010 10:44:10 -0800 Subject: [Numpy-discussion] seeking advice on a fast string->array conversion In-Reply-To: <4CE2C6E7.4020908@noaa.gov> References: <4CE2B55B.6040905@noaa.gov> <4CE2C6E7.4020908@noaa.gov> Message-ID: <4CE2D0FA.9060608@noaa.gov> On 11/16/10 10:01 AM, Christopher Barker wrote: > OK -- I'll whip up a test similar to yours -- stay tuned! Here's what I've done: import numpy as np from maproomlib.utility import file_scanner def gen_file(): f = file('test.dat', 'w') for i in range(1200): f.write('1 ' * 2048) f.write('\n') f.close() def read_file1(): """ read unknown length: doubles""" f = file('test.dat') arr = file_scanner.FileScan(f) f.close() return arr def read_file2(): """ read known length: doubles""" f = file('test.dat') arr = file_scanner.FileScanN(f, 1200*2048) f.close() return arr def read_file3(): """ read known length: singles""" f = file('test.dat') arr = file_scanner.FileScanN_single(f, 1200*2048) f.close() return arr def read_fromfile1(): """ read unknown length with fromfile(): singles""" f = file('test.dat') arr = np.fromfile(f, dtype=np.float32, sep=' ') f.close() return arr def read_fromfile2(): """ read unknown length with fromfile(): doubles""" f = file('test.dat') arr = np.fromfile(f, dtype=np.float64, sep=' ') f.close() return arr def read_fromstring1(): """ read unknown length with fromstring(): singles""" f = file('test.dat') str = f.read() arr = np.fromstring(str, dtype=np.float32, sep=' ') f.close() return arr And the results (ipython's timeit): In [40]: timeit test.read_fromfile1() 1 loops, best of 3: 561 ms per loop In [41]: timeit test.read_fromfile2() 1 loops, best of 3: 570 ms per loop In [42]: timeit test.read_file1() 1 loops, best of 3: 336 ms per loop In [43]: timeit test.read_file2() 1 loops, best of 3: 341 ms per loop In [44]: timeit test.read_file3() 1 loops, best of 3: 515 ms per loop In [46]: timeit test.read_fromstring1() 1 loops, best of 3: 301 ms per loop So my filescanner is faster, but not radically so, than fromfile(). However, reading the whole file into a string, then using fromstring() is, in fact, tne fastest method -- interesting -- shows you why you need to profile! Also, with my code, reading singles is slower than doubles -- odd. Perhaps the C lib fscanf read doubles anyway, then converts to singles? Anyway, for my needs, my file_scanner and fromfile() are fast enough, and much faster than parsing the files with Python. My issue with fromfile is flexibility and robustness -- it's buggy in the face of ill-formed files. See the list archives and the bug reports for more detail. Still, it seems your very basic method is indeed a faster way to go. I've enclosed the files. It's currently built as part of a larger lib, so no setup.py -- though it could be written easily enough. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: file_scan_module.c URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: test_simple_large.py Type: application/x-python Size: 1354 bytes Desc: not available URL: From gregwh at gmail.com Tue Nov 16 16:53:17 2010 From: gregwh at gmail.com (greg whittier) Date: Tue, 16 Nov 2010 16:53:17 -0500 Subject: [Numpy-discussion] broadcasting with numpy.interp Message-ID: Hi all, I'd like to be able to speed up the following code. def replace_dead(cube, dead): # cube.shape == (320, 640, 1200) # dead.shape == (320, 640) # cube[i,j,:] are bad points to be replaced via interpolation if dead[i,j] == True bands = np.arange(0, cube.shape[0]) for line in range(cube.shape[1]): dead_bands = bands[dead[:, line] == True] good_bands = bands[dead[:, line] == False] for sample in range(cube.shape[2]): # interp returns fp[0] for x < xp[0] and fp[-1] for x > xp[-1] cube[dead_bands, line, sample] = \ np.interp(dead_bands, good_bands, cube[good_bands, line, sample]) Removing that last loop via some sort of broadcasting seems like it should be possible, but that doesn't seem to work with interp. While interp allows the x-coordinates of interpolation points to be a multi-dimensional array, it expects the x- and y-coordinates of the data points to be 1-d arrays. Any suggestions for speeding this up? Thanks, Greg From antony.lee at berkeley.edu Wed Nov 17 00:33:34 2010 From: antony.lee at berkeley.edu (Antony Lee) Date: Tue, 16 Nov 2010 21:33:34 -0800 Subject: [Numpy-discussion] unicode string for specifying dtype Message-ID: I just ran into the following: >>> np.dtype(u"f4") Traceback (most recent call last): File "", line 1, in TypeError: data type not understood Is that the expected behaviour? Thanks in advance, Antony Lee -------------- next part -------------- An HTML attachment was scrubbed... URL: From lgautier at gmail.com Wed Nov 17 02:40:50 2010 From: lgautier at gmail.com (Laurent Gautier) Date: Wed, 17 Nov 2010 08:40:50 +0100 Subject: [Numpy-discussion] RuntimeWarning: Item size computed from the PEP 3118 Message-ID: Hi, I am developping a package using the buffer interface, and with Python 2.7 - Numpy 1.5, the following annoying warning has been reported. __main__:1: RuntimeWarning: Item size computed from the PEP 3118 buffer format string does not match the actual item size. Beside warning it appears that all is fine. The source of the warning can be traced down the C level with the following boolean expression being true: descr->elsize != view->itemsize I tried tracing a bit further but I am a little confused by what is the intent (a lot of nested calls). my buffer format is 'f', and the view has itemsize set as: view->itemsize = sizeof(double); Any pointer regarding what might be going on ? Thanks, Laurent From pav at iki.fi Wed Nov 17 05:07:20 2010 From: pav at iki.fi (Pauli Virtanen) Date: Wed, 17 Nov 2010 10:07:20 +0000 (UTC) Subject: [Numpy-discussion] RuntimeWarning: Item size computed from the PEP 3118 References: Message-ID: Wed, 17 Nov 2010 08:40:50 +0100, Laurent Gautier wrote: [clip] > __main__:1: RuntimeWarning: Item size computed from the PEP 3118 buffer > format string does not match the actual item size. [clip] > I tried tracing a bit further but I am a little confused by what is the > intent (a lot of nested calls). Numpy is warning you that the view that you passed to it is inconsistent, and it indicates that it refuses to consider it as a PEP 3118 buffer. If you use it on Python 2.x, it probably falls back to using the legacy buffer interface which does not care about the format string. > my buffer format is 'f', and the view has itemsize set as: > view->itemsize = sizeof(double); > > Any pointer regarding what might be going on ? See http://docs.python.org/library/struct.html#format-strings The format 'f' means 32-bit float, so the view itemsize should be sizeof(float) and not sizeof(double). If your view really contains floats in sizeof(double) bytes, the buffer format string probably should indicate the padding. -- Pauli Virtanen From seb.haase at gmail.com Wed Nov 17 13:20:06 2010 From: seb.haase at gmail.com (Sebastian Haase) Date: Wed, 17 Nov 2010 19:20:06 +0100 Subject: [Numpy-discussion] Python scalar float indices into array work - but not for array indices - why ? Message-ID: Hi, >>> import numpy as np >>> a = np.arange(4) >>> a[1.8] 1 >>> a[ np.array(1.8) ] Traceback (most recent call last): File "", line 1, in IndexError: arrays used as indices must be of integer (or boolean) type >>> Why does numpy not accept float arrays as indices ? I was very happy and quite surprised once I found out that it worked at all for Python float scalars, but would it not just be consequent to also allow float ndarrays then ? Regards, Sebastian Haase From robert.kern at gmail.com Wed Nov 17 13:26:21 2010 From: robert.kern at gmail.com (Robert Kern) Date: Wed, 17 Nov 2010 12:26:21 -0600 Subject: [Numpy-discussion] Python scalar float indices into array work - but not for array indices - why ? In-Reply-To: References: Message-ID: On Wed, Nov 17, 2010 at 12:20, Sebastian Haase wrote: > Hi, >>>> import numpy as np >>>> a = np.arange(4) >>>> a[1.8] > 1 >>>> a[ np.array(1.8) ] > Traceback (most recent call last): > ?File "", line 1, in > IndexError: arrays used as indices must be of integer (or boolean) type >>>> > > Why does numpy not accept float arrays as indices ? > I was very happy and quite surprised once I found out that it worked > at all for Python float scalars, > but would it not just be consequent to also allow float ndarrays then ? It only works for float scalars by accident. Do not rely on it. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." ? -- Umberto Eco From seb.haase at gmail.com Wed Nov 17 13:32:33 2010 From: seb.haase at gmail.com (Sebastian Haase) Date: Wed, 17 Nov 2010 19:32:33 +0100 Subject: [Numpy-discussion] Python scalar float indices into array work - but not for array indices - why ? In-Reply-To: References: Message-ID: On Wed, Nov 17, 2010 at 7:26 PM, Robert Kern wrote: > On Wed, Nov 17, 2010 at 12:20, Sebastian Haase wrote: >> Hi, >>>>> import numpy as np >>>>> a = np.arange(4) >>>>> a[1.8] >> 1 >>>>> a[ np.array(1.8) ] >> Traceback (most recent call last): >> ?File "", line 1, in >> IndexError: arrays used as indices must be of integer (or boolean) type >>>>> >> >> Why does numpy not accept float arrays as indices ? >> I was very happy and quite surprised once I found out that it worked >> at all for Python float scalars, >> but would it not just be consequent to also allow float ndarrays then ? > > It only works for float scalars by accident. Do not rely on it. > Could you be more specific ? As a feature, it for sure can be useful. Alternatively, could one think of some kind of index generator functions that could be used to still use float arrays as indices, without the need for an extra int-valued copy (in memory) of the original and without considerable speed penalty ? Thanks for your reply, Sebastian From njs at pobox.com Wed Nov 17 13:48:03 2010 From: njs at pobox.com (Nathaniel Smith) Date: Wed, 17 Nov 2010 10:48:03 -0800 Subject: [Numpy-discussion] Python scalar float indices into array work - but not for array indices - why ? In-Reply-To: References: Message-ID: On Wed, Nov 17, 2010 at 10:32 AM, Sebastian Haase wrote: > On Wed, Nov 17, 2010 at 7:26 PM, Robert Kern wrote: >> On Wed, Nov 17, 2010 at 12:20, Sebastian Haase wrote: >>> Why does numpy not accept float arrays as indices ? >>> I was very happy and quite surprised once I found out that it worked >>> at all for Python float scalars, >>> but would it not just be consequent to also allow float ndarrays then ? >> >> It only works for float scalars by accident. Do not rely on it. > > Could you be more specific ? ?As a feature, it for sure can be useful. I think Robert Kern has the same intuition as me: that supporting float indices is pointless. So, can you give any *specific examples* of things you can do with float indices that would be difficult or more expensive using integer indices? That's probably the best way to convince people. -- Nathaniel From seb.haase at gmail.com Wed Nov 17 14:11:11 2010 From: seb.haase at gmail.com (Sebastian Haase) Date: Wed, 17 Nov 2010 20:11:11 +0100 Subject: [Numpy-discussion] Python scalar float indices into array work - but not for array indices - why ? In-Reply-To: References: Message-ID: On Wed, Nov 17, 2010 at 7:48 PM, Nathaniel Smith wrote: > On Wed, Nov 17, 2010 at 10:32 AM, Sebastian Haase wrote: >> On Wed, Nov 17, 2010 at 7:26 PM, Robert Kern wrote: >>> On Wed, Nov 17, 2010 at 12:20, Sebastian Haase wrote: >>>> Why does numpy not accept float arrays as indices ? >>>> I was very happy and quite surprised once I found out that it worked >>>> at all for Python float scalars, >>>> but would it not just be consequent to also allow float ndarrays then ? >>> >>> It only works for float scalars by accident. Do not rely on it. >> >> Could you be more specific ? ?As a feature, it for sure can be useful. > > I think Robert Kern has the same intuition as me: that supporting > float indices is pointless. So, can you give any *specific examples* > of things you can do with float indices that would be difficult or > more expensive using integer indices? That's probably the best way to > convince people. > > -- Nathaniel Well, suppose you have 2 vectors of floating point coordinates `x` and `y` and you want to do operations utilizing fancy indexing like image[ [x,y] ] += 1 As I just realized, this specific case seems to be addressed by histogram2d, however, if float indices would work this would of course be much more general: higher dimensionality and not just '+=' operations. Finally, I just started wondering if numexpr could help here: then one could, for example, even do proper rounding (like: image[ [x+.5,y+5] ] += 1) without creating a temporary array. Regards, Sebastian From robert.kern at gmail.com Wed Nov 17 14:15:44 2010 From: robert.kern at gmail.com (Robert Kern) Date: Wed, 17 Nov 2010 13:15:44 -0600 Subject: [Numpy-discussion] Python scalar float indices into array work - but not for array indices - why ? In-Reply-To: References: Message-ID: On Wed, Nov 17, 2010 at 13:11, Sebastian Haase wrote: > On Wed, Nov 17, 2010 at 7:48 PM, Nathaniel Smith wrote: >> On Wed, Nov 17, 2010 at 10:32 AM, Sebastian Haase wrote: >>> On Wed, Nov 17, 2010 at 7:26 PM, Robert Kern wrote: >>>> On Wed, Nov 17, 2010 at 12:20, Sebastian Haase wrote: >>>>> Why does numpy not accept float arrays as indices ? >>>>> I was very happy and quite surprised once I found out that it worked >>>>> at all for Python float scalars, >>>>> but would it not just be consequent to also allow float ndarrays then ? >>>> >>>> It only works for float scalars by accident. Do not rely on it. >>> >>> Could you be more specific ? ?As a feature, it for sure can be useful. >> >> I think Robert Kern has the same intuition as me: that supporting >> float indices is pointless. So, can you give any *specific examples* >> of things you can do with float indices that would be difficult or >> more expensive using integer indices? That's probably the best way to >> convince people. >> >> -- Nathaniel > Well, > suppose you have 2 vectors of floating point coordinates `x` and `y` > and you want to do operations utilizing fancy indexing like > image[ [x,y] ] += ?1 > > As I just realized, this specific case seems to be addressed by histogram2d, > however, if float indices would work this would of course be much more > general: higher dimensionality and not just '+=' operations. Actually, it wouldn't work even if x and y were integers. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." ? -- Umberto Eco From seb.haase at gmail.com Wed Nov 17 14:27:59 2010 From: seb.haase at gmail.com (Sebastian Haase) Date: Wed, 17 Nov 2010 20:27:59 +0100 Subject: [Numpy-discussion] Python scalar float indices into array work - but not for array indices - why ? In-Reply-To: References: Message-ID: On Wed, Nov 17, 2010 at 8:15 PM, Robert Kern wrote: > On Wed, Nov 17, 2010 at 13:11, Sebastian Haase wrote: >> On Wed, Nov 17, 2010 at 7:48 PM, Nathaniel Smith wrote: >>> On Wed, Nov 17, 2010 at 10:32 AM, Sebastian Haase wrote: >>>> On Wed, Nov 17, 2010 at 7:26 PM, Robert Kern wrote: >>>>> On Wed, Nov 17, 2010 at 12:20, Sebastian Haase wrote: >>>>>> Why does numpy not accept float arrays as indices ? >>>>>> I was very happy and quite surprised once I found out that it worked >>>>>> at all for Python float scalars, >>>>>> but would it not just be consequent to also allow float ndarrays then ? >>>>> >>>>> It only works for float scalars by accident. Do not rely on it. >>>> >>>> Could you be more specific ? ?As a feature, it for sure can be useful. >>> >>> I think Robert Kern has the same intuition as me: that supporting >>> float indices is pointless. So, can you give any *specific examples* >>> of things you can do with float indices that would be difficult or >>> more expensive using integer indices? That's probably the best way to >>> convince people. >>> >>> -- Nathaniel >> Well, >> suppose you have 2 vectors of floating point coordinates `x` and `y` >> and you want to do operations utilizing fancy indexing like >> image[ [x,y] ] += ?1 >> >> As I just realized, this specific case seems to be addressed by histogram2d, >> however, if float indices would work this would of course be much more >> general: higher dimensionality and not just '+=' operations. > > Actually, it wouldn't work even if x and y were integers. > I guess you are right again - see this simplified 1d test: >>> a = np.zeros(4, int) >>> a [0 0 0 0] >>> a[ [1,3] ] += 1 >>> a [0 1 0 1] >>> a[ [1,3,1] ] += 1 >>> a [0 2 0 2] >>> >>> a = np.zeros(4, int) >>> a [0 0 0 0] >>> a[ [np.array((1,3))] ] += 1 >>> a [0 1 0 1] >>> a[ [np.array((1,3,1))] ] += 1 >>> a [0 2 0 2] So, the fancy indexing appears to treat arrays exactly like plain lists. And my idea of using it for operating on a sequence of indices appears to work at first, but then in case of duplicate indices (1 in my example) the += works only once .... I don't understand ... Thanks for any hint, Sebastian From robert.kern at gmail.com Wed Nov 17 14:40:15 2010 From: robert.kern at gmail.com (Robert Kern) Date: Wed, 17 Nov 2010 13:40:15 -0600 Subject: [Numpy-discussion] Python scalar float indices into array work - but not for array indices - why ? In-Reply-To: References: Message-ID: On Wed, Nov 17, 2010 at 13:27, Sebastian Haase wrote: > I guess you are right again - see this simplified 1d test: >>>> a = np.zeros(4, int) >>>> a > [0 0 0 0] >>>> a[ [1,3] ] += 1 >>>> a > [0 1 0 1] >>>> a[ [1,3,1] ] += 1 >>>> a > [0 2 0 2] >>>> >>>> a = np.zeros(4, int) >>>> a > [0 0 0 0] >>>> a[ [np.array((1,3))] ] += 1 >>>> a > [0 1 0 1] >>>> a[ [np.array((1,3,1))] ] += 1 >>>> a > [0 2 0 2] > > So, the fancy indexing appears to treat arrays exactly like plain lists. > And my idea of using it for operating on a sequence of indices appears > to work at first, > but then in case of duplicate indices (1 in my example) the += works > only once .... > I don't understand ... As we've discussed several times before on this list, "foo[i] += 1" is not an atomic operation. It breaks down into the equivalent code: tmp = foo.__getitem__(i) tmp = tmp.__iadd__(1) foo.__setitem__(i, tmp) In the case of fancy indexing, tmp is not a view on foo. Each of the duplicate indices makes a copy of the data. Those copies are incremented independently, then they are shoved back into the original foo array. At no point does the array know that these methods are being called because of this special combination of operators and that you want it to behave like a histogram. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." ? -- Umberto Eco From faltet at pytables.org Wed Nov 17 14:43:00 2010 From: faltet at pytables.org (Francesc Alted) Date: Wed, 17 Nov 2010 20:43:00 +0100 Subject: [Numpy-discussion] [ANN] Blosc 1.0.3 released Message-ID: <201011172043.00559.faltet@pytables.org> ==================================================== Announcing python-blosc 1.0.3 A Python wrapper for the Blosc compression library ==================================================== What is it? =========== Blosc (http://blosc.pytables.org) is a high performance compressor optimized for binary data. It has been designed to transmit data to the processor cache faster than the traditional, non-compressed, direct memory fetch approach via a memcpy() OS call. Blosc works well for compressing numerical arrays that contains data with relatively low entropy, like sparse data, time series, grids with regular-spaced values, etc. python-blosc is a Python package that wraps it. What is new? ============ Blosc has been updated to 1.1.3, allowing much improved compression ratio under some circumstances. Also, the number of cores on Windows platform is detected correctly now (thanks to Han Genuit). Last, but not least, Windows binaries for Python 2.6 and 2.7 are provided (both in 32-bit and 64-bit flavors). For more info, you can see the release notes in: https://github.com/FrancescAlted/python-blosc/wiki/Release-notes Basic Usage =========== # Create a binary string made of int (32-bit) elements >>> import array >>> a = array.array('i', range(10*1000*1000)) >>> bytes_array = a.tostring() # Compress it >>> import blosc >>> bpacked = blosc.compress(bytes_array, typesize=a.itemsize) >>> len(bytes_array) / len(bpacked) 110 # 110x compression ratio. Not bad! # Compression speed? >>> from timeit import timeit >>> timeit("blosc.compress(bytes_array, a.itemsize)", "import blosc, array; \ a = array.array('i', range(10*1000*1000)); \ bytes_array = a.tostring()", \ number=10) 0.040534019470214844 >>> len(bytes_array)*10 / 0.0405 / (1024*1024*1024) 9.1982476505232444 # wow, compressing at ~ 9 GB/s. That's fast! # This is actually much faster than a `memcpy` system call >>> timeit("ctypes.memmove(b.buffer_info()[0], a.buffer_info()[0], \ len(a)*a.itemsize)", "import array, ctypes; \ a = array.array('i', range(10*1000*1000)); \ b = a[::-1]", number=10) 0.10316681861877441 >>> len(bytes_array)*10 / 0.1031 / (1024*1024*1024) 3.6132786600018565 # ~ 3 GB/s is memcpy speed # Decompress it >>> bytes_array2 = blosc.decompress(bpacked) # Check whether our data have had a good trip >>> bytes_array == bytes_array2 True # yup, it seems so # Decompression speed? >>> timeit("s2 = blosc.decompress(bpacked)", "import blosc, array; \ a = array.array('i', range(10*1000*1000)); \ bytes_array = a.tostring(); \ bpacked = blosc.compress(bytes_array, a.itemsize)", \ number=10) 0.083872079849243164 > len(bytes_array)*10 / 0.0838 / (1024*1024*1024) 4.4454538167803275 # decompressing at ~ 4.4 GB/s is pretty good too! [Using a machine with 8 physical cores with hyper-threading] The above examples use maximum compression level 9 (default), and although lower compression levels produce smaller compression ratios, they are also faster (reaching speeds exceeding 11 GB/s). More examples showing other features (and using NumPy arrays) are available on the python-blosc wiki page: http://github.com/FrancescAlted/python-blosc/wiki Documentation ============= Please refer to docstrings. Start by the main package: >>> import blosc >>> help(blosc) and ask for more docstrings in the referenced functions. Download sources ================ Go to: http://github.com/FrancescAlted/python-blosc and download the most recent release from here. Blosc is distributed using the MIT license, see LICENSES/BLOSC.txt for details. Mailing list ============ There is an official mailing list for Blosc at: blosc at googlegroups.com http://groups.google.es/group/blosc ---- **Enjoy data!** -- Francesc Alted From scopatz at gmail.com Wed Nov 17 16:09:43 2010 From: scopatz at gmail.com (Anthony Scopatz) Date: Wed, 17 Nov 2010 15:09:43 -0600 Subject: [Numpy-discussion] unicode string for specifying dtype In-Reply-To: References: Message-ID: Hi Antony This seems to work for me... What version of python/numpy are you using? Be Well Anthony On Tue, Nov 16, 2010 at 11:33 PM, Antony Lee wrote: > I just ran into the following: > > >>> np.dtype(u"f4") > Traceback (most recent call last): > File "", line 1, in > TypeError: data type not understood > > Is that the expected behaviour? > > Thanks in advance, > Antony Lee > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ischnell at enthought.com Wed Nov 17 16:15:32 2010 From: ischnell at enthought.com (Ilan Schnell) Date: Wed, 17 Nov 2010 15:15:32 -0600 Subject: [Numpy-discussion] unicode string for specifying dtype In-Reply-To: References: Message-ID: I can confirm that the TypeError appears in numpy 1.4.0, i.e. the version in the current EPD 6.3-1. - Ilan On Wed, Nov 17, 2010 at 3:09 PM, Anthony Scopatz wrote: > Hi Antony > This seems to work for me... ?What?version of python/numpy?are you using? > Be Well > Anthony > > On Tue, Nov 16, 2010 at 11:33 PM, Antony Lee > wrote: >> >> I just ran into the following: >> >> >>> np.dtype(u"f4") >> Traceback (most recent call last): >> ? File "", line 1, in >> TypeError: data type not understood >> >> Is that the expected behaviour? >> >> Thanks in advance, >> Antony Lee >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > From dvr002 at gmail.com Wed Nov 17 19:49:39 2010 From: dvr002 at gmail.com (Venkat) Date: Thu, 18 Nov 2010 06:19:39 +0530 Subject: [Numpy-discussion] unicode string for specifying dtype In-Reply-To: References: Message-ID: I too got same error message. I am using Ubuntu 10.04, with python 2.6.5, numpy 1:1.3.0 Does it require to change to some other version? Thanks & Regards Venkat On Thu, Nov 18, 2010 at 2:45 AM, Ilan Schnell wrote: > I can confirm that the TypeError appears in numpy 1.4.0, i.e. the > version in the current EPD 6.3-1. > > - Ilan > > > On Wed, Nov 17, 2010 at 3:09 PM, Anthony Scopatz > wrote: > > Hi Antony > > This seems to work for me... What version of python/numpy are you using? > > Be Well > > Anthony > > > > On Tue, Nov 16, 2010 at 11:33 PM, Antony Lee > > wrote: > >> > >> I just ran into the following: > >> > >> >>> np.dtype(u"f4") > >> Traceback (most recent call last): > >> File "", line 1, in > >> TypeError: data type not understood > >> > >> Is that the expected behaviour? > >> > >> Thanks in advance, > >> Antony Lee > >> > >> _______________________________________________ > >> NumPy-Discussion mailing list > >> NumPy-Discussion at scipy.org > >> http://mail.scipy.org/mailman/listinfo/numpy-discussion > >> > > > > > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at scipy.org > > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -- ******************************************************************************* D.Venkat Research Scholar Dept of Physics IISc, Bangalore India-560 012 ******************************************************************************** -------------- next part -------------- An HTML attachment was scrubbed... URL: From ischnell at enthought.com Wed Nov 17 20:09:40 2010 From: ischnell at enthought.com (Ilan Schnell) Date: Wed, 17 Nov 2010 19:09:40 -0600 Subject: [Numpy-discussion] unicode string for specifying dtype In-Reply-To: References: Message-ID: Anthony Scopatz was using trunk from June 30, 2010 and it was fixed, so updating to numpy 1.5.0 should fix it. - Ilan On Wed, Nov 17, 2010 at 6:49 PM, Venkat wrote: > I too got same error message. > I am using Ubuntu 10.04, with python 2.6.5, numpy 1:1.3.0 > > Does it require to change to some other version? > > > Thanks & Regards > Venkat > > > On Thu, Nov 18, 2010 at 2:45 AM, Ilan Schnell > wrote: >> >> I can confirm that the TypeError appears in numpy 1.4.0, i.e. the >> version in the current EPD 6.3-1. >> >> - Ilan >> >> >> On Wed, Nov 17, 2010 at 3:09 PM, Anthony Scopatz >> wrote: >> > Hi Antony >> > This seems to work for me... ?What?version of python/numpy?are you >> > using? >> > Be Well >> > Anthony >> > >> > On Tue, Nov 16, 2010 at 11:33 PM, Antony Lee >> > wrote: >> >> >> >> I just ran into the following: >> >> >> >> >>> np.dtype(u"f4") >> >> Traceback (most recent call last): >> >> ? File "", line 1, in >> >> TypeError: data type not understood >> >> >> >> Is that the expected behaviour? >> >> >> >> Thanks in advance, >> >> Antony Lee >> >> >> >> _______________________________________________ >> >> NumPy-Discussion mailing list >> >> NumPy-Discussion at scipy.org >> >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> >> > >> > >> > _______________________________________________ >> > NumPy-Discussion mailing list >> > NumPy-Discussion at scipy.org >> > http://mail.scipy.org/mailman/listinfo/numpy-discussion >> > >> > >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > -- > ******************************************************************************* > D.Venkat > Research Scholar > Dept of Physics > IISc, Bangalore > India-560 012 > ******************************************************************************** > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > From nbigaouette at gmail.com Wed Nov 17 20:19:21 2010 From: nbigaouette at gmail.com (Nicolas Bigaouette) Date: Wed, 17 Nov 2010 20:19:21 -0500 Subject: [Numpy-discussion] Regrading Numpy Documentation ... In-Reply-To: References: Message-ID: Chrome might have some feature where local javascript can't be executed... just an idea.... -------------- next part -------------- An HTML attachment was scrubbed... URL: From mwwiebe at gmail.com Wed Nov 17 20:47:58 2010 From: mwwiebe at gmail.com (Mark Wiebe) Date: Wed, 17 Nov 2010 17:47:58 -0800 Subject: [Numpy-discussion] half-float review/pull request Message-ID: I've created a ticket and a pull request for the half-float type I implemented. Since my previous email, I've moved the half implementation into npymath so modules using numpy can do half-float code as well. There are a few key things that in my mind should get some discussion before saying this is final: * The character code for half. I chose 'j' because 'h' was already taken by the 16-bit integer type, but maybe there's a reason to choose a different letter. * The API in npymath (or libndarray once refactoring is merged). I've tried to make it clean and follow numpy or ieee-754 conventions, but maybe it could be improved a bit. Here's the pull request: https://github.com/numpy/numpy/pull/16 Thanks, Mark -------------- next part -------------- An HTML attachment was scrubbed... URL: From lgautier at gmail.com Wed Nov 17 22:36:34 2010 From: lgautier at gmail.com (Laurent Gautier) Date: Thu, 18 Nov 2010 04:36:34 +0100 Subject: [Numpy-discussion] RuntimeWarning: Item size computed from the PEP 3118 In-Reply-To: References: Message-ID: <4CE49F42.3070703@gmail.com> Thanks for the reply. On 11/17/10 7:00 PM, Pauli Virtanen wrote: > > Wed, 17 Nov 2010 08:40:50 +0100, Laurent Gautier wrote: > [clip] >> > __main__:1: RuntimeWarning: Item size computed from the PEP 3118 buffer >> > format string does not match the actual item size. > [clip] >> > I tried tracing a bit further but I am a little confused by what is the >> > intent (a lot of nested calls). > Numpy is warning you that the view that you passed to it is inconsistent, > and it indicates that it refuses to consider it as a PEP 3118 buffer. If > you use it on Python 2.x, it probably falls back to using the legacy > buffer interface which does not care about the format string. That would be the __array_struct__ attribute, I suppose. >> > my buffer format is 'f', and the view has itemsize set as: >> > view->itemsize = sizeof(double); >> > >> > Any pointer regarding what might be going on ? > Seehttp://docs.python.org/library/struct.html#format-strings > > The format 'f' means 32-bit float, so the view itemsize should be > sizeof(float) and not sizeof(double). If your view really contains floats > in sizeof(double) bytes, the buffer format string probably should > indicate the padding. It seems that I have been mislead by some documentation (which I cannot find again). I had something that 'f' indicated a "Python float", which I understand to be represented by a double at C-level. Laurent > -- Pauli Virtanen From dvr002 at gmail.com Thu Nov 18 09:49:21 2010 From: dvr002 at gmail.com (Venkat) Date: Thu, 18 Nov 2010 20:19:21 +0530 Subject: [Numpy-discussion] How to import input data to make ndarray for batch processing? Message-ID: Hi All, I am new to Numpy (also Scipy). I am trying to reshape my text data which is in one single column (10,000 rows). I want the data to be in 100x100 array form. I have many files to convert like this. All of them are having file names like 0, 1, 2, ....500. with out any extension. Actually, I renamed actual files so that I can import them in Matlab for batch processing. Since Matlab also new for me, I thought I will try Numpy first. Can any body help me in writing the script to do this for making batch processing. Thanks in advance, Venkat -- ******************************************************************************* D.Venkat Research Scholar Dept of Physics IISc, Bangalore India-560 012 ******************************************************************************** -------------- next part -------------- An HTML attachment was scrubbed... URL: From nadavh at visionsense.com Thu Nov 18 10:02:33 2010 From: nadavh at visionsense.com (Nadav Horesh) Date: Thu, 18 Nov 2010 07:02:33 -0800 Subject: [Numpy-discussion] How to import input data to make ndarray for batch processing? In-Reply-To: References: Message-ID: <26FC23E7C398A64083C980D16001012D0452F79611@VA3DIAXVS361.RED001.local> Do you want to save the file to disk as 100x100 matrices, or just to read them into the memory? Are the files in ascii or binary format? Nadav ________________________________________ From: numpy-discussion-bounces at scipy.org [numpy-discussion-bounces at scipy.org] On Behalf Of Venkat [dvr002 at gmail.com] Sent: 18 November 2010 16:49 To: Discussion of Numerical Python Subject: [Numpy-discussion] How to import input data to make ndarray for batch processing? Hi All, I am new to Numpy (also Scipy). I am trying to reshape my text data which is in one single column (10,000 rows). I want the data to be in 100x100 array form. I have many files to convert like this. All of them are having file names like 0, 1, 2, ....500. with out any extension. Actually, I renamed actual files so that I can import them in Matlab for batch processing. Since Matlab also new for me, I thought I will try Numpy first. Can any body help me in writing the script to do this for making batch processing. Thanks in advance, Venkat -- ******************************************************************************* D.Venkat Research Scholar Dept of Physics IISc, Bangalore India-560 012 ******************************************************************************** From dave.hirschfeld at gmail.com Thu Nov 18 10:40:57 2010 From: dave.hirschfeld at gmail.com (Dave Hirschfeld) Date: Thu, 18 Nov 2010 15:40:57 +0000 (UTC) Subject: [Numpy-discussion] =?utf-8?q?How_to_import_input_data_to_make_nda?= =?utf-8?q?rray_for=09batch_processing=3F?= References: Message-ID: Venkat gmail.com> writes: > > Hi All,I am new to Numpy (also Scipy).I am trying to reshape my text data which is in one single column (10,000 rows).I want the data to be in 100x100 array form.I have many files to convert like this. All of them are having file names like 0, 1, 2, ....500. with out any extension. > Actually, I renamed actual files so that I can import them in Matlab for batch processing.Since Matlab also new for me, I thought I will try Numpy first. Can any body help me in writing the script to do this for making batch processing. Thanks in advance,Venkat In [2]: dummy_data = np.random.randn(100,100) In [3]: dummy_data.shape Out[3]: (100, 100) In [4]: dummy_data.flatten().shape Out[4]: (10000,) In [5]: np.savetxt('dummy_data.txt', dummy_data.flatten()) In [6]: !less dummy_data.txt 2.571031186906808100e-01 1.566790681796508500e+00 -6.846267829937818800e-01 3.271332705287631200e-01 -7.482409829656505600e-02 1.429095432454441600e-01 -6.888841694801869400e-01 -5.319842186383831900e-01 -4.047786844569442600e-01 -6.696045994533519300e-01 -4.895085917712171400e-01 6.969419814656118200e-01 6.656815445278644300e-01 7.455135053686292600e-01 ... In [7]: data = np.loadtxt('dummy_data.txt') In [8]: data.shape Out[8]: (10000,) In [9]: data = data.reshape(100, 100) In [10]: data.shape Out[10]: (100, 100) In [11]: np.allclose(dummy_data, data) Out[11]: True HTH, Dave From silva at lma.cnrs-mrs.fr Thu Nov 18 10:55:31 2010 From: silva at lma.cnrs-mrs.fr (Fabrice Silva) Date: Thu, 18 Nov 2010 12:55:31 -0300 Subject: [Numpy-discussion] How to import input data to make ndarray for batch processing? In-Reply-To: References: Message-ID: <1290095731.1782.5.camel@florian-desktop> El jeu., 18-11-2010 a las 20:19 +0530, Venkat escribi?: > I have many files to convert like this. All of them are having file > names like 0, 1, 2, ....500. with out any extension. > Actually, I renamed actual files so that I can import them in Matlab > for batch processing. Since Matlab also new for me, I thought I will > try Numpy first. > Can any body help me in writing the script to do this for making batch > processing. One point that others did not answer is the 'batch' part. If your files are named sequentially, you can 'template' the argument you pass to the loader function. For example, if you load with numpy.loadtxt your data that is stored in files named 'mydata0', 'mydata1', .... 'mydata511', your batch processing may look like that for ind in xrange(512): filename = 'mydata%d' % ind data = numpy.loadtxt(filename, ... ) #... your processing on single file with adapted range of indices (see xrange doc), string formatting (see string doc) and arguments to loader function -- Fabrice Silva From ndbecker2 at gmail.com Thu Nov 18 11:38:41 2010 From: ndbecker2 at gmail.com (Neal Becker) Date: Thu, 18 Nov 2010 11:38:41 -0500 Subject: [Numpy-discussion] numpy + amdlibm? Message-ID: Anyone tried building numpy with amdlibm? From kwgoodman at gmail.com Thu Nov 18 12:51:04 2010 From: kwgoodman at gmail.com (Keith Goodman) Date: Thu, 18 Nov 2010 09:51:04 -0800 Subject: [Numpy-discussion] Returning numpy scalars in cython functions Message-ID: The cython function below returns a long int: @cython.boundscheck(False) def mysum(np.ndarray[np.int64_t, ndim=1] a): "sum of 1d numpy array with dtype=np.int64." cdef Py_ssize_t i cdef int asize = a.shape[0] cdef np.int64_t asum = 0 for i in range(asize): asum += a[i] return asum What's the best way to make it return a numpy long int, or whatever it is called, that has dtype, ndim, size, etc. class methods? The only thing I could come up with is changing the last line to return np.array(asum)[()] It works. And adds some overhead: >> a = np.arange(10) >> timeit mysum(a) 10000000 loops, best of 3: 167 ns per loop >> timeit mysum2(a) 1000000 loops, best of 3: 984 ns per loop And for scale: >> timeit np.sum(a) 100000 loops, best of 3: 3.3 us per loop I'm new to cython. Did I miss any optimizations in the mysum function above? From faltet at pytables.org Thu Nov 18 13:08:00 2010 From: faltet at pytables.org (Francesc Alted) Date: Thu, 18 Nov 2010 19:08:00 +0100 Subject: [Numpy-discussion] Returning numpy scalars in cython functions In-Reply-To: References: Message-ID: <201011181908.00036.faltet@pytables.org> A Thursday 18 November 2010 18:51:04 Keith Goodman escrigu?: > The cython function below returns a long int: > > @cython.boundscheck(False) > def mysum(np.ndarray[np.int64_t, ndim=1] a): > "sum of 1d numpy array with dtype=np.int64." > cdef Py_ssize_t i > cdef int asize = a.shape[0] > cdef np.int64_t asum = 0 > for i in range(asize): > asum += a[i] > return asum > > What's the best way to make it return a numpy long int, or whatever > it is called, that has dtype, ndim, size, etc. class methods? The > only thing I could come up with is changing the last line to > > return np.array(asum)[()] > > It works. And adds some overhead: > >> a = np.arange(10) > >> timeit mysum(a) > > 10000000 loops, best of 3: 167 ns per loop > > >> timeit mysum2(a) > > 1000000 loops, best of 3: 984 ns per loop > > And for scale: > >> timeit np.sum(a) > > 100000 loops, best of 3: 3.3 us per loop Perhaps the scalar constructor is your best bet: >>> type(np.array(2)[()]) >>> type(np.int_(2)) >>> timeit np.array(2)[()] 1000000 loops, best of 3: 791 ns per loop >>> timeit np.int_(2) 1000000 loops, best of 3: 234 ns per loop -- Francesc Alted From faltet at pytables.org Thu Nov 18 13:12:19 2010 From: faltet at pytables.org (Francesc Alted) Date: Thu, 18 Nov 2010 19:12:19 +0100 Subject: [Numpy-discussion] Returning numpy scalars in cython functions In-Reply-To: <201011181908.00036.faltet@pytables.org> References: <201011181908.00036.faltet@pytables.org> Message-ID: <201011181912.19422.faltet@pytables.org> A Thursday 18 November 2010 19:08:00 Francesc Alted escrigu?: > >>> type(np.int_(2)) Err, for maximum portability you can use the int64 constructor: >>> type(np.int64(2)) Cheers, -- Francesc Alted From kwgoodman at gmail.com Thu Nov 18 13:14:59 2010 From: kwgoodman at gmail.com (Keith Goodman) Date: Thu, 18 Nov 2010 10:14:59 -0800 Subject: [Numpy-discussion] Returning numpy scalars in cython functions In-Reply-To: <201011181908.00036.faltet@pytables.org> References: <201011181908.00036.faltet@pytables.org> Message-ID: On Thu, Nov 18, 2010 at 10:08 AM, Francesc Alted wrote: > A Thursday 18 November 2010 18:51:04 Keith Goodman escrigu?: >> What's the best way to make it return a numpy long int, or whatever >> it is called, that has dtype, ndim, size, etc. class methods? The >> only thing I could come up with is changing the last line to >> >> ? ? return np.array(asum)[()] >> > Perhaps the scalar constructor is your best bet: > >>>> type(np.array(2)[()]) > >>>> type(np.int_(2)) > >>>> timeit np.array(2)[()] > 1000000 loops, best of 3: 791 ns per loop >>>> timeit np.int_(2) > 1000000 loops, best of 3: 234 ns per loop Perfect! Thank you. >> a = np.arange(10) >> timeit mysum2(a) 1000000 loops, best of 3: 1.16 us per loop >> timeit mysum2_francesc(a) 1000000 loops, best of 3: 451 ns per loop I also added @cython.wraparound(False). From Chris.Barker at noaa.gov Thu Nov 18 13:43:44 2010 From: Chris.Barker at noaa.gov (Christopher Barker) Date: Thu, 18 Nov 2010 10:43:44 -0800 Subject: [Numpy-discussion] How to import input data to make ndarray for batch processing? In-Reply-To: References: Message-ID: <4CE573E0.9070802@noaa.gov> On 11/18/10 7:40 AM, Dave Hirschfeld wrote: > In [7]: data = np.loadtxt('dummy_data.txt') or, faster: data = np.fromfile('dummy_data.txt', dtype=np.float64, sep = ' ') fromfile() is not very flexible, and doesn't have good error handling, but it's a lot faster than loadtxt for the simple cases like this. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From lutz.maibaum at gmail.com Thu Nov 18 16:05:14 2010 From: lutz.maibaum at gmail.com (Lutz Maibaum) Date: Thu, 18 Nov 2010 13:05:14 -0800 Subject: [Numpy-discussion] How to import input data to make ndarray for batch processing? In-Reply-To: References: Message-ID: On Nov 18, 2010, at 6:49 AM, Venkat wrote: > I am trying to reshape my text data which is in one single column (10,000 rows). > I want the data to be in 100x100 array form. If all you want to do is converting the actual files, and you are using a unix-ish operating system, you don't even need python: paste - - - - - - - - - - < filename > newfilename should do the trick, without any assumptions on the type of data or change in precision due to reading/writing. Hope this helps, Lutz From bradknox at cs.utexas.edu Thu Nov 18 17:36:31 2010 From: bradknox at cs.utexas.edu (W Bradley Knox) Date: Thu, 18 Nov 2010 16:36:31 -0600 Subject: [Numpy-discussion] Test failures on 2.6 Message-ID: I'm having almost exactly the same problem, but with Python 2.6.1, Numpy 1.2.1, and Nose 0.11.3. Nobody responded to TJ the first time around, so any advice would be greatly appreciated. Thanks, Brad ---------------------------------------------------------------------- From: T J gmail.com> Subject: Test failures on 2.6 Newsgroups: gmane.comp.python.numeric.general Date: 2008-10-05 20:53:22 GMT (2 years, 6 weeks, 1 day, 13 hours and 32 minutes ago) Hi, I'm getting a couple of test failures with Python 2.6, Numpy 1.2.0, Nose 0.10.4: nose version 0.10.4 ..........................................................................................................................................................................................................................................................................................................................................................................................................................................F................K.................................................................................................................................................................................................................................................................................................................................................................... .............................................................................................................................................................................................. ......./share/home/me/usr/lib/python2.6/site-packages/numpy/lib/tests/test_io.py:68: SyntaxWarning: assertion is always true, perhaps remove parentheses? assert(c.readlines(), ........................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................./share/home/me/usr/lib/python2.6/site-packages/numpy/ma/tests/test_core.py:1315: SyntaxWarning: assertion is always true, perhaps remove parentheses? assert(store._mask, True) /home/me/usr/lib/python2.6/site-packages/numpy/ma/tests/test_core.py:1322: SyntaxWarning: assertion is always true, perhaps remove parentheses? assert(store._mask, True) /home/me/usr/lib/python2.6/site-packages/numpy/ma/tests/test_core.py:1989: SyntaxWarning: assertion is always true, perhaps remove parentheses? assert(test.mask, [0,1,0,0,0,0,0,0,0,0]) ...............................................E................................................................................................................................................................................ ====================================================================== ERROR: Tests the min/max functions with explicit outputs ---------------------------------------------------------------------- Traceback (most recent call last): File "/home/me/usr/lib/python2.6/site-packages/numpy/ma/tests/test_core.py", line 653, in test_minmax_funcs_with_output result = npfunc(xm,axis=0,out=nout) File "/home/me/usr/lib/python2.6/site-packages/numpy/core/fromnumeric.py", line 1525, in amin return amin(axis, out) File "/home/me/usr/lib/python2.6/site-packages/numpy/ma/core.py", line 2978, in min np.putmask(out, newmask, np.nan) ValueError: cannot convert float NaN to integer ====================================================================== FAIL: test_umath.TestComplexFunctions.test_against_cmath ---------------------------------------------------------------------- Traceback (most recent call last): File "/home/me/usr/lib/python2.6/site-packages/nose-0.10.4-py2.6.egg/nose/case.py", line 182, in runTest self.test(*self.arg) File "/home/me/usr/lib/python2.6/site-packages/numpy/core/tests/test_umath.py", line 268, in test_against_cmath assert abs(a - b) < atol, "%s %s: %s; cmath: %s"%(fname,p,a,b) AssertionError: arcsin 2: (1.57079632679-1.31695789692j); cmath: (1.57079632679+1.31695789692j) ---------------------------------------------------------------------- Ran 1726 tests in 8.856s FAILED (KNOWNFAIL=1, errors=1, failures=1) From pgmdevlist at gmail.com Thu Nov 18 17:56:19 2010 From: pgmdevlist at gmail.com (Pierre GM) Date: Thu, 18 Nov 2010 23:56:19 +0100 Subject: [Numpy-discussion] Test failures on 2.6 In-Reply-To: References: Message-ID: <691DA53F-C49C-499E-B26B-6BB392ED726F@gmail.com> On Oct 5, 2008, at 10:53 PM, T J wrote: > Hi, > > I'm getting a couple of test failures with Python 2.6, Numpy 1.2.0, Nose 0.10.4: Wow, 1.2.0 ? That's fairly ancient. I gather the bugs in numpy.ma have been corrected since (they don't really look familiar, though). And with a more recent numpy ? From SSharma84 at slb.com Thu Nov 18 21:48:54 2010 From: SSharma84 at slb.com (Sachin Kumar Sharma) Date: Fri, 19 Nov 2010 02:48:54 +0000 Subject: [Numpy-discussion] Advise for numerical programming content (New python user) Message-ID: <75C2FED246299A478280FA1470EDA4430A966109@NL0230MBX06N2.DIR.slb.com> Users, I am an average Fortran user. I am new to python and I am currently evaluating options and functionalities of numerical programming and related 2d and 3d graphic outputs with python. Kindly share your experience in scientific programming with python like how do you like it, comparison with Fortran and C++. Which version of python + numpy+scipy are compatible with each other or if any other numerical analysis package is available (I am working on windows environment.) Does graphic output like maps, histogram, crossplot, tornado charts is good enough with basic installation or needs some additional packages? Your feedback is valuable for me to start. Thanks & Regards Sachin ************************************************************************ Sachin Kumar Sharma Senior Geomodeler - Samarang Project (IPM) Field Development & Production Services (DCS) Schlumberger Sdn. Bhd., 7th Floor, West Wing, Rohas Perkasa, No. 8 Jalan Perak, Kuala Lumpur, 50450, Malaysia Mobile: +60 12 2196443 * Email: ssharma84 at exchange.slb.com sachin_sharma at petronas.com.my -------------- next part -------------- An HTML attachment was scrubbed... URL: From alan.isaac at gmail.com Thu Nov 18 21:55:23 2010 From: alan.isaac at gmail.com (Alan G Isaac) Date: Thu, 18 Nov 2010 21:55:23 -0500 Subject: [Numpy-discussion] Advise for numerical programming content (New python user) In-Reply-To: <75C2FED246299A478280FA1470EDA4430A966109@NL0230MBX06N2.DIR.slb.com> References: <75C2FED246299A478280FA1470EDA4430A966109@NL0230MBX06N2.DIR.slb.com> Message-ID: <4CE5E71B.7030509@gmail.com> On 11/18/2010 9:48 PM, Sachin Kumar Sharma wrote: > Does graphic output like maps, histogram, crossplot, tornado charts is good enough with basic installation or needs some additional packages? For the graphics, you should probably first consider Matplotlib. For your other questions, perhaps look at Python Scripting for Computational Science by Hans Petter Langtangen. Alan Isaac From SSharma84 at slb.com Thu Nov 18 22:02:10 2010 From: SSharma84 at slb.com (Sachin Kumar Sharma) Date: Fri, 19 Nov 2010 03:02:10 +0000 Subject: [Numpy-discussion] Advise for numerical programming content (New python user) In-Reply-To: <4CE5E71B.7030509@gmail.com> References: <75C2FED246299A478280FA1470EDA4430A966109@NL0230MBX06N2.DIR.slb.com> <4CE5E71B.7030509@gmail.com> Message-ID: <75C2FED246299A478280FA1470EDA4430A966193@NL0230MBX06N2.DIR.slb.com> Thanks Alan, Best regards Sachin ************************************************************************ Sachin Kumar Sharma Senior Geomodeler -----Original Message----- From: numpy-discussion-bounces at scipy.org [mailto:numpy-discussion-bounces at scipy.org] On Behalf Of Alan G Isaac Sent: Friday, November 19, 2010 10:55 AM To: Discussion of Numerical Python Subject: Re: [Numpy-discussion] Advise for numerical programming content (New python user) On 11/18/2010 9:48 PM, Sachin Kumar Sharma wrote: > Does graphic output like maps, histogram, crossplot, tornado charts is good enough with basic installation or needs some additional packages? For the graphics, you should probably first consider Matplotlib. For your other questions, perhaps look at Python Scripting for Computational Science by Hans Petter Langtangen. Alan Isaac _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion at scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion From zinka4u at gmail.com Fri Nov 19 00:40:53 2010 From: zinka4u at gmail.com (srinivas zinka) Date: Fri, 19 Nov 2010 14:40:53 +0900 Subject: [Numpy-discussion] Advise for numerical programming content (New python user) In-Reply-To: <75C2FED246299A478280FA1470EDA4430A966193@NL0230MBX06N2.DIR.slb.com> References: <75C2FED246299A478280FA1470EDA4430A966109@NL0230MBX06N2.DIR.slb.com> <4CE5E71B.7030509@gmail.com> <75C2FED246299A478280FA1470EDA4430A966193@NL0230MBX06N2.DIR.slb.com> Message-ID: For a beginner, I think "pythonxy" is a good option. Then you don't have to worry about the compatibility issue. http://www.pythonxy.com/ and as for the plotting, you can use the following packages: 2D - Matplotlib or Gnuplot (both are good ... but, if you want Matlab kind of environment, try Matplotlib) 3D - Mayavi or Gnuplot (I think Gnuplot has some limitations in 3D plotting) regards zinka On Fri, Nov 19, 2010 at 12:02 PM, Sachin Kumar Sharma wrote: > Thanks Alan, > > Best regards > > Sachin > > ************************************************************************ > Sachin Kumar Sharma > Senior Geomodeler > > > -----Original Message----- > From: numpy-discussion-bounces at scipy.org [mailto: > numpy-discussion-bounces at scipy.org] On Behalf Of Alan G Isaac > Sent: Friday, November 19, 2010 10:55 AM > To: Discussion of Numerical Python > Subject: Re: [Numpy-discussion] Advise for numerical programming content > (New python user) > > On 11/18/2010 9:48 PM, Sachin Kumar Sharma wrote: > > Does graphic output like maps, histogram, crossplot, tornado charts is > good enough with basic installation or needs some additional packages? > > > For the graphics, you should probably first consider Matplotlib. > For your other questions, perhaps look at > Python Scripting for Computational Science by Hans Petter Langtangen. > > Alan Isaac > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From gerrit.holl at gmail.com Fri Nov 19 05:21:45 2010 From: gerrit.holl at gmail.com (Gerrit Holl) Date: Fri, 19 Nov 2010 11:21:45 +0100 Subject: [Numpy-discussion] Advise for numerical programming content (New python user) In-Reply-To: <75C2FED246299A478280FA1470EDA4430A966109@NL0230MBX06N2.DIR.slb.com> References: <75C2FED246299A478280FA1470EDA4430A966109@NL0230MBX06N2.DIR.slb.com> Message-ID: Hi, On 19 November 2010 03:48, Sachin Kumar Sharma wrote: > Does graphic output like maps, histogram, crossplot, tornado charts is good > enough with basic installation or needs some additional packages? You might want to ask this question at the scipy mailig-list. For maps, you need basemap or PyNCL. I am presently using basemap. In the future, I will look at PyNCL, Python bindings to NCL. I think there are also Python bindings to GMT. Interestingly, both basemap and PyNCL are written by the same person (Jeff Whitaker). Probably those packages complement each other rather than doing the same thing. Others might correct me here. Gerrit. -- Exploring space at http://gerrit-explores.blogspot.com/ Personal homepage at http://www.topjaklont.org/ Asperger Syndroom: http://www.topjaklont.org/nl/asperger.html From dsdale24 at gmail.com Fri Nov 19 11:29:19 2010 From: dsdale24 at gmail.com (Darren Dale) Date: Fri, 19 Nov 2010 11:29:19 -0500 Subject: [Numpy-discussion] seeking advice on a fast string->array conversion In-Reply-To: References: Message-ID: On Tue, Nov 16, 2010 at 10:31 AM, Darren Dale wrote: > On Tue, Nov 16, 2010 at 9:55 AM, Pauli Virtanen wrote: >> Tue, 16 Nov 2010 09:41:04 -0500, Darren Dale wrote: >> [clip] >>> That loop takes 0.33 seconds to execute, which is a good start. I need >>> some help converting this example to return an actual numpy array. Could >>> anyone please offer a suggestion? >> >> Easiest way is probably to use ndarray buffers and resize them when >> needed. For example: >> >> https://github.com/pv/scipy-work/blob/enh/interpnd-smooth/scipy/spatial/qhull.pyx#L980 > > Thank you Pauli. That makes it *incredibly* simple: > > import time > cimport numpy as np > import numpy as np > > cdef extern from 'stdlib.h': > ? ?double atof(char*) > > > def test(): > ? ?py_string = '100' > ? ?cdef char* c_string = py_string > ? ?cdef int i, j > ? ?cdef double val > ? ?i = 0 > ? ?j = 2048*1200 > ? ?cdef np.ndarray[np.float64_t, ndim=1] ret > > ? ?ret_arr = np.empty((2048*1200,), dtype=np.float64) > ? ?ret = ret_arr > > ? ?d = time.time() > ? ?while i ? ? ? ?c_string = py_string > ? ? ? ?ret[i] = atof(c_string) > ? ? ? ?i += 1 > ? ?ret_arr.shape = (1200, 2048) > ? ?print ret_arr, ret_arr.shape, time.time()-d > > The loop now takes only 0.11 seconds to execute. Thanks again. > One follow-up issue: I can't cythonize this code for python-3. I've installed numpy with the most recent changes to the 1.5.x maintenance branch, then re-installed cython-0.13, but when I run "python3 setup.py build_ext --inplace" with this setup script: from distutils.core import setup from distutils.extension import Extension from Cython.Distutils import build_ext import numpy setup( cmdclass = {'build_ext': build_ext}, ext_modules = [ Extension( "test_open", ["test_open.pyx"], include_dirs=[numpy.get_include()] ) ] ) I get the following error. Any suggestions what I need to fix, or should I report it to the cython list? $ python3 setup.py build_ext --inplace running build_ext cythoning test_open.pyx to test_open.c Error converting Pyrex file to C: ------------------------------------------------------------ ... # For use in situations where ndarray can't replace PyArrayObject*, # like PyArrayObject**. pass ctypedef class numpy.ndarray [object PyArrayObject]: cdef __cythonbufferdefaults__ = {"mode": "strided"} ^ ------------------------------------------------------------ /Users/darren/.local/lib/python3.1/site-packages/Cython/Includes/numpy.pxd:173:49: "mode" is not a buffer option Error converting Pyrex file to C: ------------------------------------------------------------ ... cdef char* c_string = py_string cdef int i, j cdef double val i = 0 j = 2048*1200 cdef np.ndarray[np.float64_t, ndim=1] ret ^ ------------------------------------------------------------ /Users/darren/temp/test/test_open.pyx:16:8: 'ndarray' is not a type identifier building 'test_open' extension /usr/bin/gcc-4.2 -fno-strict-aliasing -fno-common -dynamic -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -I/Users/darren/.local/lib/python3.1/site-packages/numpy/core/include -I/opt/local/Library/Frameworks/Python.framework/Versions/3.1/include/python3.1 -c test_open.c -o build/temp.macosx-10.6-x86_64-3.1/test_open.o test_open.c:1:2: error: #error Do not use this file, it is the result of a failed Cython compilation. error: command '/usr/bin/gcc-4.2' failed with exit status 1 From kwgoodman at gmail.com Fri Nov 19 13:33:49 2010 From: kwgoodman at gmail.com (Keith Goodman) Date: Fri, 19 Nov 2010 10:33:49 -0800 Subject: [Numpy-discussion] [ANN] Nanny, faster NaN functions Message-ID: ===== Nanny ===== Nanny uses the magic of Cython to give you a faster, drop-in replacement for the NaN functions in NumPy and SciPy. For example:: >> import nanny as ny >> import numpy as np >> arr = np.random.rand(100, 100) >> timeit np.nansum(arr) 10000 loops, best of 3: 67.5 us per loop >> timeit ny.nansum(arr) 100000 loops, best of 3: 18.2 us per loop Let's not forget to add some NaNs:: >> arr[arr > 0.5] = np.nan >> timeit np.nansum(arr) 1000 loops, best of 3: 411 us per loop >> timeit ny.nansum(arr) 10000 loops, best of 3: 65 us per loop Nanny uses a separate Cython function for each combination of ndim, dtype, and axis. You can get rid of a lot of overhead (useful in an inner loop, e.g.) by directly importing the function that matches your problem:: >> arr = np.random.rand(10, 10) >> from nansum import nansum_2d_float64_axis1 >> timeit np.nansum(arr, axis=1) 10000 loops, best of 3: 25.5 us per loop >> timeit ny.nansum(arr, axis=1) 100000 loops, best of 3: 5.15 us per loop >> timeit nansum_2d_float64_axis1(arr) 1000000 loops, best of 3: 1.75 us per loop I put together Nanny as a way to learn Cython. It currently only supports: - functions: nansum - Operating systems: 64-bit (accumulator for int32 is hard coded to int64) - dtype: int32, int64, float64 - ndim: 1, 2, and 3 If there is interest in the project, I could continue adding the remaining NaN functions from NumPy and SciPy: nanmin, nanmax, nanmean, nanmedian (using a partial sort), nanstd. But why stop there? How about nancumsum or nanprod? Or anynan, which could short-circuit once a NaN is found? Feedback on the code or the direction of the project are welcomed. So is coding help---without that I doubt the package will ever be completed. Once nansum is complete, many of the remaining functions will be copy, paste, touch up operations. Remember, Nanny quickly protects your precious data from the corrupting influence of Mr. Nan. License ======= Nanny is distributed under a Simplified BSD license. Parts of NumPy, which has a BSD licenses, are included in Nanny. See the LICENSE file, which is distributed with Nanny, for details. Installation ============ You can grab Nanny at http://github.com/kwgoodman/nanny. nansum of ints is only supported by 64-bit operating systems at the moment. **GNU/Linux, Mac OS X, et al.** To install Nanny:: $ python setup.py build $ sudo python setup.py install Or, if you wish to specify where Nanny is installed, for example inside ``/usr/local``:: $ python setup.py build $ sudo python setup.py install --prefix=/usr/local **Windows** In order to compile the C code in Nanny you need a Windows version of the gcc compiler. MinGW (Minimalist GNU for Windows) contains gcc and has been used to successfully compile Nanny on Windows. Install MinGW and add it to your system path. Then install Nanny with the commands:: python setup.py build --compiler=mingw32 python setup.py install **Post install** After you have installed Nanny, run the suite of unit tests:: >>> import nanny >>> nanny.test() Ran 1 tests in 0.008s OK From njs at pobox.com Fri Nov 19 13:55:30 2010 From: njs at pobox.com (Nathaniel Smith) Date: Fri, 19 Nov 2010 10:55:30 -0800 Subject: [Numpy-discussion] [ANN] Nanny, faster NaN functions In-Reply-To: References: Message-ID: On Fri, Nov 19, 2010 at 10:33 AM, Keith Goodman wrote: > Nanny uses the magic of Cython to give you a faster, drop-in replacement for > the NaN functions in NumPy and SciPy. Neat! Why not make this a patch to numpy/scipy instead? > Nanny uses a separate Cython function for each combination of ndim, dtype, and > axis. You can get rid of a lot of overhead (useful in an inner loop, e.g.) by > directly importing the function that matches your problem:: > > ? ?>> arr = np.random.rand(10, 10) > ? ?>> from nansum import nansum_2d_float64_axis1 If this is really useful, then better to provide a function that finds the correct function for you? best_nansum = ny.get_best_nansum(ary[0, :, :], axis=1) for i in xrange(ary.shape[0]): best_nansum(ary[i, :, :], axis=1) > - functions: nansum > - Operating systems: 64-bit (accumulator for int32 is hard coded to int64) > - dtype: int32, int64, float64 > - ndim: 1, 2, and 3 What does it even mean to do NaN operations on integers? (I'd sometimes find it *really convenient* if there were a NaN value for standard computer integers... but there isn't?) -- Nathaniel From ben.root at ou.edu Fri Nov 19 14:12:09 2010 From: ben.root at ou.edu (Benjamin Root) Date: Fri, 19 Nov 2010 13:12:09 -0600 Subject: [Numpy-discussion] [ANN] Nanny, faster NaN functions In-Reply-To: References: Message-ID: On Fri, Nov 19, 2010 at 12:55 PM, Nathaniel Smith wrote: > On Fri, Nov 19, 2010 at 10:33 AM, Keith Goodman > wrote: > > Nanny uses the magic of Cython to give you a faster, drop-in replacement > for > > the NaN functions in NumPy and SciPy. > > Neat! > > Why not make this a patch to numpy/scipy instead? > > > Nanny uses a separate Cython function for each combination of ndim, > dtype, and > > axis. You can get rid of a lot of overhead (useful in an inner loop, > e.g.) by > > directly importing the function that matches your problem:: > > > > >> arr = np.random.rand(10, 10) > > >> from nansum import nansum_2d_float64_axis1 > > If this is really useful, then better to provide a function that finds > the correct function for you? > > best_nansum = ny.get_best_nansum(ary[0, :, :], axis=1) > for i in xrange(ary.shape[0]): > best_nansum(ary[i, :, :], axis=1) > > > - functions: nansum > > - Operating systems: 64-bit (accumulator for int32 is hard coded to > int64) > > - dtype: int32, int64, float64 > > - ndim: 1, 2, and 3 > > What does it even mean to do NaN operations on integers? (I'd > sometimes find it *really convenient* if there were a NaN value for > standard computer integers... but there isn't?) > > -- Nathaniel > That's why I use masked arrays. It is dtype agnostic. I am curious if there are any lessons that were learned in making Nanny that could be applied to the masked array functions? Ben Root -------------- next part -------------- An HTML attachment was scrubbed... URL: From kwgoodman at gmail.com Fri Nov 19 14:19:57 2010 From: kwgoodman at gmail.com (Keith Goodman) Date: Fri, 19 Nov 2010 11:19:57 -0800 Subject: [Numpy-discussion] [ANN] Nanny, faster NaN functions In-Reply-To: References: Message-ID: On Fri, Nov 19, 2010 at 10:55 AM, Nathaniel Smith wrote: > On Fri, Nov 19, 2010 at 10:33 AM, Keith Goodman wrote: >> Nanny uses the magic of Cython to give you a faster, drop-in replacement for >> the NaN functions in NumPy and SciPy. > > Neat! > > Why not make this a patch to numpy/scipy instead? My guess is that having separate underlying functions for each dtype, ndim, and axis would be a nightmare for a large project like Numpy. But manageable for a focused project like nanny. >> Nanny uses a separate Cython function for each combination of ndim, dtype, and >> axis. You can get rid of a lot of overhead (useful in an inner loop, e.g.) by >> directly importing the function that matches your problem:: >> >> ? ?>> arr = np.random.rand(10, 10) >> ? ?>> from nansum import nansum_2d_float64_axis1 > > If this is really useful, then better to provide a function that finds > the correct function for you? > > best_nansum = ny.get_best_nansum(ary[0, :, :], axis=1) > for i in xrange(ary.shape[0]): > ? ?best_nansum(ary[i, :, :], axis=1) That would be useful. It is what nanny.nansum does but it returns the sum instead of the function. >> - functions: nansum >> - Operating systems: 64-bit (accumulator for int32 is hard coded to int64) >> - dtype: int32, int64, float64 >> - ndim: 1, 2, and 3 > > What does it even mean to do NaN operations on integers? (I'd > sometimes find it *really convenient* if there were a NaN value for > standard computer integers... but there isn't?) Well, sometimes you write functions without knowing the dtype of the input. From kwgoodman at gmail.com Fri Nov 19 14:35:29 2010 From: kwgoodman at gmail.com (Keith Goodman) Date: Fri, 19 Nov 2010 11:35:29 -0800 Subject: [Numpy-discussion] [ANN] Nanny, faster NaN functions In-Reply-To: References: Message-ID: On Fri, Nov 19, 2010 at 11:12 AM, Benjamin Root wrote: > That's why I use masked arrays.? It is dtype agnostic. > > I am curious if there are any lessons that were learned in making Nanny that > could be applied to the masked array functions? I suppose you could write a cython function that operates on masked arrays. But other than that, I can't think of any lessons. All I can think about is speed: >> x = np.ma.array([[1, 2], [3, 4]], mask=[[0, 1], [1, 0]]) >> timeit np.sum(x) 10000 loops, best of 3: 25.1 us per loop >> a = np.array([[1, np.nan], [np.nan, 4]]) >> timeit ny.nansum(a) 100000 loops, best of 3: 3.11 us per loop >> from nansum import nansum_2d_float64_axisNone >> timeit nansum_2d_float64_axisNone(a) 1000000 loops, best of 3: 395 ns per loop From josef.pktd at gmail.com Fri Nov 19 15:10:24 2010 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Fri, 19 Nov 2010 15:10:24 -0500 Subject: [Numpy-discussion] [ANN] Nanny, faster NaN functions In-Reply-To: References: Message-ID: On Fri, Nov 19, 2010 at 2:35 PM, Keith Goodman wrote: > On Fri, Nov 19, 2010 at 11:12 AM, Benjamin Root wrote: > >> That's why I use masked arrays.? It is dtype agnostic. >> >> I am curious if there are any lessons that were learned in making Nanny that >> could be applied to the masked array functions? > > I suppose you could write a cython function that operates on masked > arrays. But other than that, I can't think of any lessons. All I can > think about is speed: > >>> x = np.ma.array([[1, 2], [3, 4]], mask=[[0, 1], [1, 0]]) >>> timeit np.sum(x) > 10000 loops, best of 3: 25.1 us per loop >>> a = np.array([[1, np.nan], [np.nan, 4]]) >>> timeit ny.nansum(a) > 100000 loops, best of 3: 3.11 us per loop >>> from nansum import nansum_2d_float64_axisNone >>> timeit nansum_2d_float64_axisNone(a) > 1000000 loops, best of 3: 395 ns per loop What's the speed advantage of nanny compared to np.nansum that you have if the arrays are larger, say (1000,10) or (10000,100) axis=0 ? Josef > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From pav at iki.fi Fri Nov 19 15:19:13 2010 From: pav at iki.fi (Pauli Virtanen) Date: Fri, 19 Nov 2010 20:19:13 +0000 (UTC) Subject: [Numpy-discussion] [ANN] Nanny, faster NaN functions References: Message-ID: Fri, 19 Nov 2010 11:19:57 -0800, Keith Goodman wrote: [clip] > My guess is that having separate underlying functions for each dtype, > ndim, and axis would be a nightmare for a large project like Numpy. But > manageable for a focused project like nanny. Might be easier to migrate the nan* functions to using Ufuncs. Unless I'm missing something, np.nanmax -> np.fmax.reduce np.nanmin -> np.fmin.reduce For `nansum`, we'd need to add an ufunc `nanadd`, and for `nanargmax/min`, we'd need `argfmin/fmax'. -- Pauli Virtanen From kwgoodman at gmail.com Fri Nov 19 15:19:56 2010 From: kwgoodman at gmail.com (Keith Goodman) Date: Fri, 19 Nov 2010 12:19:56 -0800 Subject: [Numpy-discussion] [ANN] Nanny, faster NaN functions In-Reply-To: References: Message-ID: On Fri, Nov 19, 2010 at 12:10 PM, wrote: > What's the speed advantage of nanny compared to np.nansum that you > have if the arrays are larger, say (1000,10) or (10000,100) axis=0 ? Good point. In the small examples I showed so far maybe the speed up was all in overhead. Fortunately, that's not the case: >> arr = np.random.rand(1000, 1000) >> timeit np.nansum(arr) 100 loops, best of 3: 4.79 ms per loop >> timeit ny.nansum(arr) 1000 loops, best of 3: 1.53 ms per loop >> arr[arr > 0.5] = np.nan >> timeit np.nansum(arr) 10 loops, best of 3: 44.5 ms per loop >> timeit ny.nansum(arr) 100 loops, best of 3: 6.18 ms per loop >> timeit np.nansum(arr, axis=0) 10 loops, best of 3: 52.3 ms per loop >> timeit ny.nansum(arr, axis=0) 100 loops, best of 3: 12.2 ms per loop np.nansum makes a copy of the input array and makes a mask (another copy) and then uses the mask to set the NaNs to zero in the copy. So not only is nanny faster, but it uses less memory. From kwgoodman at gmail.com Fri Nov 19 15:29:00 2010 From: kwgoodman at gmail.com (Keith Goodman) Date: Fri, 19 Nov 2010 12:29:00 -0800 Subject: [Numpy-discussion] [ANN] Nanny, faster NaN functions In-Reply-To: References: Message-ID: On Fri, Nov 19, 2010 at 12:19 PM, Pauli Virtanen wrote: > Fri, 19 Nov 2010 11:19:57 -0800, Keith Goodman wrote: > [clip] >> My guess is that having separate underlying functions for each dtype, >> ndim, and axis would be a nightmare for a large project like Numpy. But >> manageable for a focused project like nanny. > > Might be easier to migrate the nan* functions to using Ufuncs. > > Unless I'm missing something, > > ? ? ? ?np.nanmax -> np.fmax.reduce > ? ? ? ?np.nanmin -> np.fmin.reduce > > For `nansum`, we'd need to add an ufunc `nanadd`, and for > `nanargmax/min`, we'd need `argfmin/fmax'. How about that! I wasn't aware of fmax/fmin. Yes, I'd like a nanadd, please. >> arr = np.random.rand(1000, 1000) >> arr[arr > 0.5] = np.nan >> np.nanmax(arr) 0.49999625409581072 >> np.fmax.reduce(arr, axis=None) TypeError: an integer is required >> np.fmax.reduce(np.fmax.reduce(arr, axis=0), axis=0) 0.49999625409581072 >> timeit np.fmax.reduce(np.fmax.reduce(arr, axis=0), axis=0) 100 loops, best of 3: 12.7 ms per loop >> timeit np.nanmax(arr) 10 loops, best of 3: 39.6 ms per loop >> timeit np.nanmax(arr, axis=0) 10 loops, best of 3: 46.5 ms per loop >> timeit np.fmax.reduce(arr, axis=0) 100 loops, best of 3: 12.7 ms per loop From kwgoodman at gmail.com Fri Nov 19 15:50:02 2010 From: kwgoodman at gmail.com (Keith Goodman) Date: Fri, 19 Nov 2010 12:50:02 -0800 Subject: [Numpy-discussion] [ANN] Nanny, faster NaN functions In-Reply-To: References: Message-ID: On Fri, Nov 19, 2010 at 12:29 PM, Keith Goodman wrote: > On Fri, Nov 19, 2010 at 12:19 PM, Pauli Virtanen wrote: >> Fri, 19 Nov 2010 11:19:57 -0800, Keith Goodman wrote: >> [clip] >>> My guess is that having separate underlying functions for each dtype, >>> ndim, and axis would be a nightmare for a large project like Numpy. But >>> manageable for a focused project like nanny. >> >> Might be easier to migrate the nan* functions to using Ufuncs. >> >> Unless I'm missing something, >> >> ? ? ? ?np.nanmax -> np.fmax.reduce >> ? ? ? ?np.nanmin -> np.fmin.reduce >> >> For `nansum`, we'd need to add an ufunc `nanadd`, and for >> `nanargmax/min`, we'd need `argfmin/fmax'. > > How about that! I wasn't aware of fmax/fmin. Yes, I'd like a nanadd, please. > >>> arr = np.random.rand(1000, 1000) >>> arr[arr > 0.5] = np.nan >>> np.nanmax(arr) > ? 0.49999625409581072 >>> np.fmax.reduce(arr, axis=None) > > TypeError: an integer is required >>> np.fmax.reduce(np.fmax.reduce(arr, axis=0), axis=0) > ? 0.49999625409581072 > >>> timeit np.fmax.reduce(np.fmax.reduce(arr, axis=0), axis=0) > 100 loops, best of 3: 12.7 ms per loop >>> timeit np.nanmax(arr) > 10 loops, best of 3: 39.6 ms per loop > >>> timeit np.nanmax(arr, axis=0) > 10 loops, best of 3: 46.5 ms per loop >>> timeit np.fmax.reduce(arr, axis=0) > 100 loops, best of 3: 12.7 ms per loop Cython is faster than np.fmax.reduce. I wrote a cython version of np.nanmax, called nanmax below. (It only handles the 2d, float64, axis=None case, but since the array is large I don't think that explains the time difference). Note that fmax.reduce is slower than np.nanmax when there are no NaNs: >> arr = np.random.rand(1000, 1000) >> timeit np.nanmax(arr) 100 loops, best of 3: 5.82 ms per loop >> timeit np.fmax.reduce(np.fmax.reduce(arr)) 100 loops, best of 3: 9.14 ms per loop >> timeit nanmax(arr) 1000 loops, best of 3: 1.17 ms per loop >> arr[arr > 0.5] = np.nan >> timeit np.nanmax(arr) 10 loops, best of 3: 45.5 ms per loop >> timeit np.fmax.reduce(np.fmax.reduce(arr)) 100 loops, best of 3: 12.7 ms per loop >> timeit nanmax(arr) 1000 loops, best of 3: 1.17 ms per loop From Chris.Barker at noaa.gov Fri Nov 19 17:18:44 2010 From: Chris.Barker at noaa.gov (Christopher Barker) Date: Fri, 19 Nov 2010 14:18:44 -0800 Subject: [Numpy-discussion] [ANN] Nanny, faster NaN functions In-Reply-To: References: Message-ID: <4CE6F7C4.3040507@noaa.gov> On 11/19/10 11:19 AM, Keith Goodman wrote: > On Fri, Nov 19, 2010 at 10:55 AM, Nathaniel Smith wrote: >> Why not make this a patch to numpy/scipy instead? > > My guess is that having separate underlying functions for each dtype, > ndim, and axis would be a nightmare for a large project like Numpy. True, but: 1) Having special-cases for the most common cases is not such a bad idea. 2) could one use some sort of templating approach to get all the dtypes and such that you want? 3) as for number of dimensions, I don't think to would be to hard to generalize that -- at least for contiguous arrays. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From charlesr.harris at gmail.com Fri Nov 19 19:04:26 2010 From: charlesr.harris at gmail.com (Charles R Harris) Date: Fri, 19 Nov 2010 17:04:26 -0700 Subject: [Numpy-discussion] [ANN] Nanny, faster NaN functions In-Reply-To: <4CE6F7C4.3040507@noaa.gov> References: <4CE6F7C4.3040507@noaa.gov> Message-ID: On Fri, Nov 19, 2010 at 3:18 PM, Christopher Barker wrote: > On 11/19/10 11:19 AM, Keith Goodman wrote: > > On Fri, Nov 19, 2010 at 10:55 AM, Nathaniel Smith wrote: > >> Why not make this a patch to numpy/scipy instead? > > > > My guess is that having separate underlying functions for each dtype, > > ndim, and axis would be a nightmare for a large project like Numpy. > > True, but: > > 1) Having special-cases for the most common cases is not such a bad idea. > > 2) could one use some sort of templating approach to get all the dtypes > and such that you want? > > 3) as for number of dimensions, I don't think to would be to hard to > generalize that -- at least for contiguous arrays. > > Note that the fmax/fmin versions can be sped up in the same way as sum.reduce was. Also, you should pass the flattened array to the routine for the axis=None case. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From liukis at usc.edu Fri Nov 19 19:09:56 2010 From: liukis at usc.edu (Maria Liukis) Date: Fri, 19 Nov 2010 16:09:56 -0800 Subject: [Numpy-discussion] changing 'nan' string for np.nan in np.savetxt Message-ID: <543575E7-3BC0-440E-BEED-7F4D9411A13B@usc.edu> Hello everybody, I was wondering if there is an elegant way of overwriting 'nan' string representation with 'NaN' when saving numpy array containing np.nan values with numpy.savetxt() function. numpy.set_printoptions(nanstr='NaN') setting (which has default value of 'NaN' already) does not seem to have any effect when writing array to the file, only when printing to the screen: $ python Python 2.5.2 (r252:60911, Sep 30 2008, 15:42:03) [GCC 4.3.2 20080917 (Red Hat 4.3.2-4)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import numpy as np >>> print np.__version__ 1.3.0 >>> np.get_printoptions() {'infstr': 'Inf', 'threshold': 1000, 'suppress': False, 'linewidth': 75, 'edgeitems': 3, 'precision': 8, 'nanstr': 'NaN'} >>> >>> a = np.array([1, 2, 3, np.nan]) >>> a array([ 1., 2., 3., NaN]) >>> np.savetxt('testa.txt', a) >>> $ cat testa.txt 1.000000000000000000e+00 2.000000000000000000e+00 3.000000000000000000e+00 nan $ Many thanks, Masha From charlesr.harris at gmail.com Fri Nov 19 22:19:20 2010 From: charlesr.harris at gmail.com (Charles R Harris) Date: Fri, 19 Nov 2010 20:19:20 -0700 Subject: [Numpy-discussion] [ANN] Nanny, faster NaN functions In-Reply-To: References: Message-ID: On Fri, Nov 19, 2010 at 1:50 PM, Keith Goodman wrote: > On Fri, Nov 19, 2010 at 12:29 PM, Keith Goodman > wrote: > > On Fri, Nov 19, 2010 at 12:19 PM, Pauli Virtanen wrote: > >> Fri, 19 Nov 2010 11:19:57 -0800, Keith Goodman wrote: > >> [clip] > >>> My guess is that having separate underlying functions for each dtype, > >>> ndim, and axis would be a nightmare for a large project like Numpy. But > >>> manageable for a focused project like nanny. > >> > >> Might be easier to migrate the nan* functions to using Ufuncs. > >> > >> Unless I'm missing something, > >> > >> np.nanmax -> np.fmax.reduce > >> np.nanmin -> np.fmin.reduce > >> > >> For `nansum`, we'd need to add an ufunc `nanadd`, and for > >> `nanargmax/min`, we'd need `argfmin/fmax'. > > > > How about that! I wasn't aware of fmax/fmin. Yes, I'd like a nanadd, > please. > > > >>> arr = np.random.rand(1000, 1000) > >>> arr[arr > 0.5] = np.nan > >>> np.nanmax(arr) > > 0.49999625409581072 > >>> np.fmax.reduce(arr, axis=None) > > > > TypeError: an integer is required > >>> np.fmax.reduce(np.fmax.reduce(arr, axis=0), axis=0) > > 0.49999625409581072 > > > >>> timeit np.fmax.reduce(np.fmax.reduce(arr, axis=0), axis=0) > > 100 loops, best of 3: 12.7 ms per loop > >>> timeit np.nanmax(arr) > > 10 loops, best of 3: 39.6 ms per loop > > > >>> timeit np.nanmax(arr, axis=0) > > 10 loops, best of 3: 46.5 ms per loop > >>> timeit np.fmax.reduce(arr, axis=0) > > 100 loops, best of 3: 12.7 ms per loop > > Cython is faster than np.fmax.reduce. > > I wrote a cython version of np.nanmax, called nanmax below. (It only > handles the 2d, float64, axis=None case, but since the array is large > I don't think that explains the time difference). > > Note that fmax.reduce is slower than np.nanmax when there are no NaNs: > > >> arr = np.random.rand(1000, 1000) > >> timeit np.nanmax(arr) > 100 loops, best of 3: 5.82 ms per loop > >> timeit np.fmax.reduce(np.fmax.reduce(arr)) > 100 loops, best of 3: 9.14 ms per loop > >> timeit nanmax(arr) > 1000 loops, best of 3: 1.17 ms per loop > > >> arr[arr > 0.5] = np.nan > > >> timeit np.nanmax(arr) > 10 loops, best of 3: 45.5 ms per loop > >> timeit np.fmax.reduce(np.fmax.reduce(arr)) > 100 loops, best of 3: 12.7 ms per loop > >> timeit nanmax(arr) > 1000 loops, best of 3: 1.17 ms per loop > There seem to be some odd hardware/compiler dependencies. I get quite a different pattern of times: In [1]: arr = np.random.rand(1000, 1000) In [2]: timeit np.nanmax(arr) 100 loops, best of 3: 10.4 ms per loop In [3]: timeit np.fmax.reduce(arr.flat) 100 loops, best of 3: 2.09 ms per loop In [4]: arr[arr > 0.5] = np.nan In [5]: timeit np.nanmax(arr) 100 loops, best of 3: 12.9 ms per loop In [6]: timeit np.fmax.reduce(arr.flat) 100 loops, best of 3: 7.09 ms per loop I've tweaked fmax with the reduce loop option but the nanmax times don't look like yours at all. I'm also a bit surprised that you don't see any difference in times when the array contains a lot of nans. I'm running on AMD Phenom, gcc 4.4.5. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Fri Nov 19 22:37:17 2010 From: charlesr.harris at gmail.com (Charles R Harris) Date: Fri, 19 Nov 2010 20:37:17 -0700 Subject: [Numpy-discussion] [ANN] Nanny, faster NaN functions In-Reply-To: References: Message-ID: On Fri, Nov 19, 2010 at 8:19 PM, Charles R Harris wrote: > > > On Fri, Nov 19, 2010 at 1:50 PM, Keith Goodman wrote: > >> On Fri, Nov 19, 2010 at 12:29 PM, Keith Goodman >> wrote: >> > On Fri, Nov 19, 2010 at 12:19 PM, Pauli Virtanen wrote: >> >> Fri, 19 Nov 2010 11:19:57 -0800, Keith Goodman wrote: >> >> [clip] >> >>> My guess is that having separate underlying functions for each dtype, >> >>> ndim, and axis would be a nightmare for a large project like Numpy. >> But >> >>> manageable for a focused project like nanny. >> >> >> >> Might be easier to migrate the nan* functions to using Ufuncs. >> >> >> >> Unless I'm missing something, >> >> >> >> np.nanmax -> np.fmax.reduce >> >> np.nanmin -> np.fmin.reduce >> >> >> >> For `nansum`, we'd need to add an ufunc `nanadd`, and for >> >> `nanargmax/min`, we'd need `argfmin/fmax'. >> > >> > How about that! I wasn't aware of fmax/fmin. Yes, I'd like a nanadd, >> please. >> > >> >>> arr = np.random.rand(1000, 1000) >> >>> arr[arr > 0.5] = np.nan >> >>> np.nanmax(arr) >> > 0.49999625409581072 >> >>> np.fmax.reduce(arr, axis=None) >> > >> > TypeError: an integer is required >> >>> np.fmax.reduce(np.fmax.reduce(arr, axis=0), axis=0) >> > 0.49999625409581072 >> > >> >>> timeit np.fmax.reduce(np.fmax.reduce(arr, axis=0), axis=0) >> > 100 loops, best of 3: 12.7 ms per loop >> >>> timeit np.nanmax(arr) >> > 10 loops, best of 3: 39.6 ms per loop >> > >> >>> timeit np.nanmax(arr, axis=0) >> > 10 loops, best of 3: 46.5 ms per loop >> >>> timeit np.fmax.reduce(arr, axis=0) >> > 100 loops, best of 3: 12.7 ms per loop >> >> Cython is faster than np.fmax.reduce. >> >> I wrote a cython version of np.nanmax, called nanmax below. (It only >> handles the 2d, float64, axis=None case, but since the array is large >> I don't think that explains the time difference). >> >> Note that fmax.reduce is slower than np.nanmax when there are no NaNs: >> >> >> arr = np.random.rand(1000, 1000) >> >> timeit np.nanmax(arr) >> 100 loops, best of 3: 5.82 ms per loop >> >> timeit np.fmax.reduce(np.fmax.reduce(arr)) >> 100 loops, best of 3: 9.14 ms per loop >> >> timeit nanmax(arr) >> 1000 loops, best of 3: 1.17 ms per loop >> >> >> arr[arr > 0.5] = np.nan >> >> >> timeit np.nanmax(arr) >> 10 loops, best of 3: 45.5 ms per loop >> >> timeit np.fmax.reduce(np.fmax.reduce(arr)) >> 100 loops, best of 3: 12.7 ms per loop >> >> timeit nanmax(arr) >> 1000 loops, best of 3: 1.17 ms per loop >> > > There seem to be some odd hardware/compiler dependencies. I get quite a > different pattern of times: > > In [1]: arr = np.random.rand(1000, 1000) > > In [2]: timeit np.nanmax(arr) > 100 loops, best of 3: 10.4 ms per loop > > In [3]: timeit np.fmax.reduce(arr.flat) > 100 loops, best of 3: 2.09 ms per loop > > In [4]: arr[arr > 0.5] = np.nan > > In [5]: timeit np.nanmax(arr) > 100 loops, best of 3: 12.9 ms per loop > > In [6]: timeit np.fmax.reduce(arr.flat) > 100 loops, best of 3: 7.09 ms per loop > > > I've tweaked fmax with the reduce loop option but the nanmax times don't > look like yours at all. I'm also a bit surprised that > you don't see any difference in times when the array contains a lot of > nans. I'm running on AMD Phenom, gcc 4.4.5. > > However, I noticed that the build wants to be -O1 by default. I have my own CFLAGS that make it -O2, but It looks like ubuntu's python might be built with -O1. Hmm. That could certainly cause some odd timings. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From kwgoodman at gmail.com Fri Nov 19 22:42:33 2010 From: kwgoodman at gmail.com (Keith Goodman) Date: Fri, 19 Nov 2010 19:42:33 -0800 Subject: [Numpy-discussion] [ANN] Nanny, faster NaN functions In-Reply-To: References: Message-ID: On Fri, Nov 19, 2010 at 7:19 PM, Charles R Harris wrote: > > > On Fri, Nov 19, 2010 at 1:50 PM, Keith Goodman wrote: >> >> On Fri, Nov 19, 2010 at 12:29 PM, Keith Goodman >> wrote: >> > On Fri, Nov 19, 2010 at 12:19 PM, Pauli Virtanen wrote: >> >> Fri, 19 Nov 2010 11:19:57 -0800, Keith Goodman wrote: >> >> [clip] >> >>> My guess is that having separate underlying functions for each dtype, >> >>> ndim, and axis would be a nightmare for a large project like Numpy. >> >>> But >> >>> manageable for a focused project like nanny. >> >> >> >> Might be easier to migrate the nan* functions to using Ufuncs. >> >> >> >> Unless I'm missing something, >> >> >> >> ? ? ? ?np.nanmax -> np.fmax.reduce >> >> ? ? ? ?np.nanmin -> np.fmin.reduce >> >> >> >> For `nansum`, we'd need to add an ufunc `nanadd`, and for >> >> `nanargmax/min`, we'd need `argfmin/fmax'. >> > >> > How about that! I wasn't aware of fmax/fmin. Yes, I'd like a nanadd, >> > please. >> > >> >>> arr = np.random.rand(1000, 1000) >> >>> arr[arr > 0.5] = np.nan >> >>> np.nanmax(arr) >> > ? 0.49999625409581072 >> >>> np.fmax.reduce(arr, axis=None) >> > >> > TypeError: an integer is required >> >>> np.fmax.reduce(np.fmax.reduce(arr, axis=0), axis=0) >> > ? 0.49999625409581072 >> > >> >>> timeit np.fmax.reduce(np.fmax.reduce(arr, axis=0), axis=0) >> > 100 loops, best of 3: 12.7 ms per loop >> >>> timeit np.nanmax(arr) >> > 10 loops, best of 3: 39.6 ms per loop >> > >> >>> timeit np.nanmax(arr, axis=0) >> > 10 loops, best of 3: 46.5 ms per loop >> >>> timeit np.fmax.reduce(arr, axis=0) >> > 100 loops, best of 3: 12.7 ms per loop >> >> Cython is faster than np.fmax.reduce. >> >> I wrote a cython version of np.nanmax, called nanmax below. (It only >> handles the 2d, float64, axis=None case, but since the array is large >> I don't think that explains the time difference). >> >> Note that fmax.reduce is slower than np.nanmax when there are no NaNs: >> >> >> arr = np.random.rand(1000, 1000) >> >> timeit np.nanmax(arr) >> 100 loops, best of 3: 5.82 ms per loop >> >> timeit np.fmax.reduce(np.fmax.reduce(arr)) >> 100 loops, best of 3: 9.14 ms per loop >> >> timeit nanmax(arr) >> 1000 loops, best of 3: 1.17 ms per loop >> >> >> arr[arr > 0.5] = np.nan >> >> >> timeit np.nanmax(arr) >> 10 loops, best of 3: 45.5 ms per loop >> >> timeit np.fmax.reduce(np.fmax.reduce(arr)) >> 100 loops, best of 3: 12.7 ms per loop >> >> timeit nanmax(arr) >> 1000 loops, best of 3: 1.17 ms per loop > > There seem to be some odd hardware/compiler dependencies. I get quite a > different pattern of times: > > In [1]: arr = np.random.rand(1000, 1000) > > In [2]: timeit np.nanmax(arr) > 100 loops, best of 3: 10.4 ms per loop > > In [3]: timeit np.fmax.reduce(arr.flat) > 100 loops, best of 3: 2.09 ms per loop > > In [4]: arr[arr > 0.5] = np.nan > > In [5]: timeit np.nanmax(arr) > 100 loops, best of 3: 12.9 ms per loop > > In [6]: timeit np.fmax.reduce(arr.flat) > 100 loops, best of 3: 7.09 ms per loop > > > I've tweaked fmax with the reduce loop option but the nanmax times don't > look like yours at all. I'm also a bit surprised that > you don't see any difference in times when the array contains a lot of nans. > I'm running on AMD Phenom, gcc 4.4.5. Ubuntu 10.04 64 bit, numpy 1.4.1. Difference in which times? nanny.nanmax with and wintout NaNs? The code doesn't explictily check for NaNs (it does check for all NaNs). It basically loops through the data and does: allnan = 1 ai = ai[i,k] if ai > amax: amax = ai allnan = 0 I should make a benchmark suite. From josef.pktd at gmail.com Fri Nov 19 22:51:12 2010 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Fri, 19 Nov 2010 22:51:12 -0500 Subject: [Numpy-discussion] [ANN] Nanny, faster NaN functions In-Reply-To: References: Message-ID: On Fri, Nov 19, 2010 at 10:42 PM, Keith Goodman wrote: > On Fri, Nov 19, 2010 at 7:19 PM, Charles R Harris > wrote: >> >> >> On Fri, Nov 19, 2010 at 1:50 PM, Keith Goodman wrote: >>> >>> On Fri, Nov 19, 2010 at 12:29 PM, Keith Goodman >>> wrote: >>> > On Fri, Nov 19, 2010 at 12:19 PM, Pauli Virtanen wrote: >>> >> Fri, 19 Nov 2010 11:19:57 -0800, Keith Goodman wrote: >>> >> [clip] >>> >>> My guess is that having separate underlying functions for each dtype, >>> >>> ndim, and axis would be a nightmare for a large project like Numpy. >>> >>> But >>> >>> manageable for a focused project like nanny. >>> >> >>> >> Might be easier to migrate the nan* functions to using Ufuncs. >>> >> >>> >> Unless I'm missing something, >>> >> >>> >> ? ? ? ?np.nanmax -> np.fmax.reduce >>> >> ? ? ? ?np.nanmin -> np.fmin.reduce >>> >> >>> >> For `nansum`, we'd need to add an ufunc `nanadd`, and for >>> >> `nanargmax/min`, we'd need `argfmin/fmax'. >>> > >>> > How about that! I wasn't aware of fmax/fmin. Yes, I'd like a nanadd, >>> > please. >>> > >>> >>> arr = np.random.rand(1000, 1000) >>> >>> arr[arr > 0.5] = np.nan >>> >>> np.nanmax(arr) >>> > ? 0.49999625409581072 >>> >>> np.fmax.reduce(arr, axis=None) >>> > >>> > TypeError: an integer is required >>> >>> np.fmax.reduce(np.fmax.reduce(arr, axis=0), axis=0) >>> > ? 0.49999625409581072 >>> > >>> >>> timeit np.fmax.reduce(np.fmax.reduce(arr, axis=0), axis=0) >>> > 100 loops, best of 3: 12.7 ms per loop >>> >>> timeit np.nanmax(arr) >>> > 10 loops, best of 3: 39.6 ms per loop >>> > >>> >>> timeit np.nanmax(arr, axis=0) >>> > 10 loops, best of 3: 46.5 ms per loop >>> >>> timeit np.fmax.reduce(arr, axis=0) >>> > 100 loops, best of 3: 12.7 ms per loop >>> >>> Cython is faster than np.fmax.reduce. >>> >>> I wrote a cython version of np.nanmax, called nanmax below. (It only >>> handles the 2d, float64, axis=None case, but since the array is large >>> I don't think that explains the time difference). >>> >>> Note that fmax.reduce is slower than np.nanmax when there are no NaNs: >>> >>> >> arr = np.random.rand(1000, 1000) >>> >> timeit np.nanmax(arr) >>> 100 loops, best of 3: 5.82 ms per loop >>> >> timeit np.fmax.reduce(np.fmax.reduce(arr)) >>> 100 loops, best of 3: 9.14 ms per loop >>> >> timeit nanmax(arr) >>> 1000 loops, best of 3: 1.17 ms per loop >>> >>> >> arr[arr > 0.5] = np.nan >>> >>> >> timeit np.nanmax(arr) >>> 10 loops, best of 3: 45.5 ms per loop >>> >> timeit np.fmax.reduce(np.fmax.reduce(arr)) >>> 100 loops, best of 3: 12.7 ms per loop >>> >> timeit nanmax(arr) >>> 1000 loops, best of 3: 1.17 ms per loop >> >> There seem to be some odd hardware/compiler dependencies. I get quite a >> different pattern of times: >> >> In [1]: arr = np.random.rand(1000, 1000) >> >> In [2]: timeit np.nanmax(arr) >> 100 loops, best of 3: 10.4 ms per loop >> >> In [3]: timeit np.fmax.reduce(arr.flat) >> 100 loops, best of 3: 2.09 ms per loop >> >> In [4]: arr[arr > 0.5] = np.nan >> >> In [5]: timeit np.nanmax(arr) >> 100 loops, best of 3: 12.9 ms per loop >> >> In [6]: timeit np.fmax.reduce(arr.flat) >> 100 loops, best of 3: 7.09 ms per loop >> >> >> I've tweaked fmax with the reduce loop option but the nanmax times don't >> look like yours at all. I'm also a bit surprised that >> you don't see any difference in times when the array contains a lot of nans. >> I'm running on AMD Phenom, gcc 4.4.5. > > Ubuntu 10.04 64 bit, numpy 1.4.1. > > Difference in which times? nanny.nanmax with and wintout NaNs? The > code doesn't explictily check for NaNs (it does check for all NaNs). > It basically loops through the data and does: > > allnan = 1 > ai = ai[i,k] > if ai > amax: > ? ?amax = ai > ? ?allnan = 0 does this give you the correct answer? >>> 1>np.nan False What's the starting value for amax? -inf? Josef > > I should make a benchmark suite. > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From njs at pobox.com Fri Nov 19 22:57:04 2010 From: njs at pobox.com (Nathaniel Smith) Date: Fri, 19 Nov 2010 19:57:04 -0800 Subject: [Numpy-discussion] [ANN] Nanny, faster NaN functions In-Reply-To: References: Message-ID: On Fri, Nov 19, 2010 at 7:51 PM, wrote: > On Fri, Nov 19, 2010 at 10:42 PM, Keith Goodman wrote: >> It basically loops through the data and does: >> >> allnan = 1 >> ai = ai[i,k] >> if ai > amax: >> ? ?amax = ai >> ? ?allnan = 0 > > does this give you the correct answer? > >>>> 1>np.nan > False Yes -- notice he does the comparison the other way, and >>> 1 < np.nan False (All comparisons involving NaN return false, including, famously, NaN == NaN, which is why we need np.isnan.) -- Nathaniel From kwgoodman at gmail.com Fri Nov 19 22:59:55 2010 From: kwgoodman at gmail.com (Keith Goodman) Date: Fri, 19 Nov 2010 19:59:55 -0800 Subject: [Numpy-discussion] [ANN] Nanny, faster NaN functions In-Reply-To: References: Message-ID: On Fri, Nov 19, 2010 at 7:51 PM, wrote: > > does this give you the correct answer? > >>>> 1>np.nan > False > > What's the starting value for amax? -inf? Because "1 > np.nan" is False, the current running max does not get updated, which is what we want. >> import nanny as ny >> np.nanmax([1, np.nan]) 1.0 >> np.nanmax([np.nan, 1]) 1.0 >> np.nanmax([np.nan, 1, np.nan]) 1.0 Starting value is -np.inf for floats and stuff like this for ints: cdef np.int32_t MININT32 = np.iinfo(np.int32).min cdef np.int64_t MININT64 = np.iinfo(np.int64).min Numpy does this: >> np.nanmax([]) ValueError: zero-size array to ufunc.reduce without identity Nanny does this: >> ny.nanmax([]) nan So I haven't taken care of that corner case yet. I'll commit nanmax to github in case anyone wants to give it a try. From charlesr.harris at gmail.com Fri Nov 19 23:05:03 2010 From: charlesr.harris at gmail.com (Charles R Harris) Date: Fri, 19 Nov 2010 21:05:03 -0700 Subject: [Numpy-discussion] [ANN] Nanny, faster NaN functions In-Reply-To: References: Message-ID: On Fri, Nov 19, 2010 at 8:42 PM, Keith Goodman wrote: > On Fri, Nov 19, 2010 at 7:19 PM, Charles R Harris > wrote: > > > > > > On Fri, Nov 19, 2010 at 1:50 PM, Keith Goodman > wrote: > >> > >> On Fri, Nov 19, 2010 at 12:29 PM, Keith Goodman > >> wrote: > >> > On Fri, Nov 19, 2010 at 12:19 PM, Pauli Virtanen wrote: > >> >> Fri, 19 Nov 2010 11:19:57 -0800, Keith Goodman wrote: > >> >> [clip] > >> >>> My guess is that having separate underlying functions for each > dtype, > >> >>> ndim, and axis would be a nightmare for a large project like Numpy. > >> >>> But > >> >>> manageable for a focused project like nanny. > >> >> > >> >> Might be easier to migrate the nan* functions to using Ufuncs. > >> >> > >> >> Unless I'm missing something, > >> >> > >> >> np.nanmax -> np.fmax.reduce > >> >> np.nanmin -> np.fmin.reduce > >> >> > >> >> For `nansum`, we'd need to add an ufunc `nanadd`, and for > >> >> `nanargmax/min`, we'd need `argfmin/fmax'. > >> > > >> > How about that! I wasn't aware of fmax/fmin. Yes, I'd like a nanadd, > >> > please. > >> > > >> >>> arr = np.random.rand(1000, 1000) > >> >>> arr[arr > 0.5] = np.nan > >> >>> np.nanmax(arr) > >> > 0.49999625409581072 > >> >>> np.fmax.reduce(arr, axis=None) > >> > > >> > TypeError: an integer is required > >> >>> np.fmax.reduce(np.fmax.reduce(arr, axis=0), axis=0) > >> > 0.49999625409581072 > >> > > >> >>> timeit np.fmax.reduce(np.fmax.reduce(arr, axis=0), axis=0) > >> > 100 loops, best of 3: 12.7 ms per loop > >> >>> timeit np.nanmax(arr) > >> > 10 loops, best of 3: 39.6 ms per loop > >> > > >> >>> timeit np.nanmax(arr, axis=0) > >> > 10 loops, best of 3: 46.5 ms per loop > >> >>> timeit np.fmax.reduce(arr, axis=0) > >> > 100 loops, best of 3: 12.7 ms per loop > >> > >> Cython is faster than np.fmax.reduce. > >> > >> I wrote a cython version of np.nanmax, called nanmax below. (It only > >> handles the 2d, float64, axis=None case, but since the array is large > >> I don't think that explains the time difference). > >> > >> Note that fmax.reduce is slower than np.nanmax when there are no NaNs: > >> > >> >> arr = np.random.rand(1000, 1000) > >> >> timeit np.nanmax(arr) > >> 100 loops, best of 3: 5.82 ms per loop > >> >> timeit np.fmax.reduce(np.fmax.reduce(arr)) > >> 100 loops, best of 3: 9.14 ms per loop > >> >> timeit nanmax(arr) > >> 1000 loops, best of 3: 1.17 ms per loop > >> > >> >> arr[arr > 0.5] = np.nan > >> > >> >> timeit np.nanmax(arr) > >> 10 loops, best of 3: 45.5 ms per loop > >> >> timeit np.fmax.reduce(np.fmax.reduce(arr)) > >> 100 loops, best of 3: 12.7 ms per loop > >> >> timeit nanmax(arr) > >> 1000 loops, best of 3: 1.17 ms per loop > > > > There seem to be some odd hardware/compiler dependencies. I get quite a > > different pattern of times: > > > > In [1]: arr = np.random.rand(1000, 1000) > > > > In [2]: timeit np.nanmax(arr) > > 100 loops, best of 3: 10.4 ms per loop > > > > In [3]: timeit np.fmax.reduce(arr.flat) > > 100 loops, best of 3: 2.09 ms per loop > > > > In [4]: arr[arr > 0.5] = np.nan > > > > In [5]: timeit np.nanmax(arr) > > 100 loops, best of 3: 12.9 ms per loop > > > > In [6]: timeit np.fmax.reduce(arr.flat) > > 100 loops, best of 3: 7.09 ms per loop > > > > > > I've tweaked fmax with the reduce loop option but the nanmax times don't > > look like yours at all. I'm also a bit surprised that > > you don't see any difference in times when the array contains a lot of > nans. > > I'm running on AMD Phenom, gcc 4.4.5. > > Ubuntu 10.04 64 bit, numpy 1.4.1. > > Difference in which times? nanny.nanmax with and wintout NaNs? The > code doesn't explictily check for NaNs (it does check for all NaNs). > It basically loops through the data and does: > > allnan = 1 > ai = ai[i,k] > if ai > amax: > amax = ai > allnan = 0 > > I should make a benchmark suite. > _ > This doesn't look right: @cython.boundscheck(False) @cython.wraparound(False) def nanmax_2d_float64_axisNone(np.ndarray[np.float64_t, ndim=2] a): "nanmax of 2d numpy array with dtype=np.float64 along axis=None." cdef Py_ssize_t i, j cdef int arow = a.shape[0], acol = a.shape[1], allnan = 1 cdef np.float64_t amax = 0, aij for i in range(arow): for j in range(acol): aij = a[i,j] if aij == aij: amax += aij allnan = 0 if allnan == 0: return np.float64(amax) else: return NAN It's doing a sum, not a comparison. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From kwgoodman at gmail.com Fri Nov 19 23:07:29 2010 From: kwgoodman at gmail.com (Keith Goodman) Date: Fri, 19 Nov 2010 20:07:29 -0800 Subject: [Numpy-discussion] [ANN] Nanny, faster NaN functions In-Reply-To: References: Message-ID: On Fri, Nov 19, 2010 at 8:05 PM, Charles R Harris wrote: > This doesn't look right: > > @cython.boundscheck(False) > @cython.wraparound(False) > def nanmax_2d_float64_axisNone(np.ndarray[np.float64_t, ndim=2] a): > ????"nanmax of 2d numpy array with dtype=np.float64 along axis=None." > ????cdef Py_ssize_t i, j > ????cdef int arow = a.shape[0], acol = a.shape[1], allnan = 1 > ????cdef np.float64_t amax = 0, aij > ????for i in range(arow): > ????????for j in range(acol): > ????????????aij = a[i,j] > ????????????if aij == aij: > ????????????????amax += aij > ????????????????allnan = 0 > ????if allnan == 0: > ????????return np.float64(amax) > ????else: > ????????return NAN > > ?It's doing a sum, not a comparison. That was a placeholder. Looks at the latest commit. Sorry for the confusion. From josef.pktd at gmail.com Fri Nov 19 23:33:11 2010 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Fri, 19 Nov 2010 23:33:11 -0500 Subject: [Numpy-discussion] [ANN] Nanny, faster NaN functions In-Reply-To: References: Message-ID: On Fri, Nov 19, 2010 at 10:59 PM, Keith Goodman wrote: > On Fri, Nov 19, 2010 at 7:51 PM, ? wrote: >> >> does this give you the correct answer? >> >>>>> 1>np.nan >> False >> >> What's the starting value for amax? -inf? > > Because "1 > np.nan" is False, the current running max does not get > updated, which is what we want. > >>> import nanny as ny >>> np.nanmax([1, np.nan]) > ? 1.0 >>> np.nanmax([np.nan, 1]) > ? 1.0 >>> np.nanmax([np.nan, 1, np.nan]) > ? 1.0 > > Starting value is -np.inf for floats and stuff like this for ints: > > cdef np.int32_t MININT32 = np.iinfo(np.int32).min > cdef np.int64_t MININT64 = np.iinfo(np.int64).min That's what I thought halfway through typing the question. >>> -np.inf>-np.inf False If the only value is -np.inf, you will return nan, I guess. >>> np.nanmax([-np.inf, np.nan]) -inf Josef (being picky) > > Numpy does this: > >>> np.nanmax([]) > > ValueError: zero-size array to ufunc.reduce without identity > > Nanny does this: > >>> ny.nanmax([]) > ? nan > > So I haven't taken care of that corner case yet. I'll commit nanmax to > github in case anyone wants to give it a try. > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From kwgoodman at gmail.com Sat Nov 20 00:29:16 2010 From: kwgoodman at gmail.com (Keith Goodman) Date: Fri, 19 Nov 2010 21:29:16 -0800 Subject: [Numpy-discussion] [ANN] Nanny, faster NaN functions In-Reply-To: References: Message-ID: On Fri, Nov 19, 2010 at 8:33 PM, wrote: >>>> -np.inf>-np.inf > False > > If the only value is -np.inf, you will return nan, I guess. > >>>> np.nanmax([-np.inf, np.nan]) > -inf That's a great corner case. Thanks, Josef. This looks like it would fix it: change if ai > amax: amax = ai to if ai >= amax: amax = ai From dsdale24 at gmail.com Sat Nov 20 10:01:50 2010 From: dsdale24 at gmail.com (Darren Dale) Date: Sat, 20 Nov 2010 10:01:50 -0500 Subject: [Numpy-discussion] problem with numpy/cython on python3, ok with python2 Message-ID: I just installed numpy for both python2 and 3 from an up-to-date checkout of the 1.5.x branch. I am attempting to cythonize the following code with cython-0.13: --- cimport numpy as np import numpy as np def test(): cdef np.ndarray[np.float64_t, ndim=1] ret ret_arr = np.zeros((20,), dtype=np.float64) ret = ret_arr --- I have this setup.py file: --- from distutils.core import setup from distutils.extension import Extension from Cython.Distutils import build_ext import numpy setup( cmdclass = {'build_ext': build_ext}, ext_modules = [ Extension( "test_open", ["test_open.pyx"], include_dirs=[numpy.get_include()] ) ] ) --- When I run "python setup.py build_ext --inplace", everything is fine. When I run "python3 setup.py build_ext --inplace", I get an error: running build_ext cythoning test_open.pyx to test_open.c Error converting Pyrex file to C: ------------------------------------------------------------ ... # For use in situations where ndarray can't replace PyArrayObject*, # like PyArrayObject**. pass ctypedef class numpy.ndarray [object PyArrayObject]: cdef __cythonbufferdefaults__ = {"mode": "strided"} ^ ------------------------------------------------------------ /home/darren/.local/lib/python3.1/site-packages/Cython/Includes/numpy.pxd:173:49: "mode" is not a buffer option Error converting Pyrex file to C: ------------------------------------------------------------ ... cimport numpy as np import numpy as np def test(): cdef np.ndarray[np.float64_t, ndim=1] ret ^ ------------------------------------------------------------ /home/darren/temp/test/test_open.pyx:6:8: 'ndarray' is not a type identifier building 'test_open' extension gcc -pthread -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -fPIC -I/home/darren/.local/lib/python3.1/site-packages/numpy/core/include -I/usr/include/python3.1 -c test_open.c -o build/temp.linux-x86_64-3.1/test_open.o test_open.c:1: error: #error Do not use this file, it is the result of a failed Cython compilation. error: command 'gcc' failed with exit status 1 Is this a bug, or is there a problem with my example? Thanks, Darren From pav at iki.fi Sat Nov 20 10:10:14 2010 From: pav at iki.fi (Pauli Virtanen) Date: Sat, 20 Nov 2010 15:10:14 +0000 (UTC) Subject: [Numpy-discussion] problem with numpy/cython on python3, ok with python2 References: Message-ID: On Sat, 20 Nov 2010 10:01:50 -0500, Darren Dale wrote: > I just installed numpy for both python2 and 3 from an up-to-date > checkout of the 1.5.x branch. > > I am attempting to cythonize the following code with cython-0.13: I think you should ask on the Cython list -- I don't think this issue invokes any Numpy code. -- Pauli Virtanen From pav at iki.fi Sat Nov 20 15:57:07 2010 From: pav at iki.fi (Pauli Virtanen) Date: Sat, 20 Nov 2010 20:57:07 +0000 (UTC) Subject: [Numpy-discussion] Yet more merges in maintenance/1.5.x Message-ID: Hi, There are more unnecessary merge commits in the git history. Before you do "git push", check git log --oneline --graph git log --oneline --graph upstream/maintenance/1.5.x.. so that you don't push unnecessary stuff. (Also, don't use "git pull".) $?git log --oneline --graph upstream/maintenance/1.5.x * cc8d516 BUG: Fix exception handling for python 3k. * 15b875a Merge branch 'maintenance/1.5.x' of github.com:numpy/numpy into |\ | * de969d7 REL: set version to 1.5.1. * | 3d429f1 Merge branch 'maintenance/1.5.x' of github.com:numpy/numpy in |\ \ | |/ | * 3304791 REL: set version number to 1.5.1rc2 and set released=True. | * e885a7a TST: mark longdouble tests as knownfail on OS X PPC. | * 516aa4a TST: mark longdouble tests for spacing/nextafter as knownfail o | * 08ccb3a TST: silence ldexp overflow warning. | * 8346ba0 BUG: fix issue with incorrect Fortran arch flags. Closes #1399. | * c0bd3df TST: core: disable C99 complex tests also on Solaris if it seem | * c6504f5 TST: core: mark test_ldexp_overflow as known failure on Python | * 082956b TST: core: fix test_fromfile_tofile_seeks to work on Windows (c * | f819262 Merge branch 'maintenance/1.5.x' of github.com:numpy/numpy in |\ \ | |/ | * 29c38c6 REL: set released=False again, and minor fix in paver script. | * 97cb28e REL: set version number to 1.5.1rc1, released=True. | * 4b18dc8 REL: Set start and end tags for the Changelog. | * 2aa0317 REL: make the OS X installer naming scheme correspond to what i | * 3bc3174 BUG: on Windows the sysconfig module does not contain CFLAGS in | * c9f2514 REL: add a note on #1399 to the 1.5.1 release notes. * | c035b13 DOC: recommend to turn on deprecation warnings for Python >= 2. |/ * 427d3fc Merge branch 'maintenance/1.5.x' of github.com:numpy/numpy into |\ | * b55eacd BUG: core: implement a long-int loop for ldexp, for cases where | * 0e792e6 BUG: core: adjust ComplexWarning location frame up by one, so t | * 4e177d3 BUG: get fortran arch flags from C arch flags if available. Clo | * c8e1315 BUG: DOC: fix invalid vdot documentation (cherry picked from co From pav at iki.fi Sat Nov 20 16:05:35 2010 From: pav at iki.fi (Pauli Virtanen) Date: Sat, 20 Nov 2010 21:05:35 +0000 (UTC) Subject: [Numpy-discussion] Yet more merges in maintenance/1.5.x References: Message-ID: On Sat, 20 Nov 2010 20:57:07 +0000, Pauli Virtanen wrote: [clip] > There are more unnecessary merge commits in the git history. I did a forced push to clear those up. -- Pauli Virtanen From charlesr.harris at gmail.com Sat Nov 20 16:22:58 2010 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sat, 20 Nov 2010 14:22:58 -0700 Subject: [Numpy-discussion] Yet more merges in maintenance/1.5.x In-Reply-To: References: Message-ID: On Sat, Nov 20, 2010 at 2:05 PM, Pauli Virtanen wrote: > On Sat, 20 Nov 2010 20:57:07 +0000, Pauli Virtanen wrote: > [clip] > > There are more unnecessary merge commits in the git history. > > I did a forced push to clear those up. > > It was a pull from upstream to update that did it, when I noticed I decided not to do a forced push cleanup because, well, it's risky for a public repository. But I'm not sure why the pull didn't just fast forward, I suspect something is off in my local branch. I'll just delete it and co again from upstream. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From kwgoodman at gmail.com Sat Nov 20 18:39:59 2010 From: kwgoodman at gmail.com (Keith Goodman) Date: Sat, 20 Nov 2010 15:39:59 -0800 Subject: [Numpy-discussion] [ANN] Nanny, faster NaN functions In-Reply-To: References: Message-ID: On Fri, Nov 19, 2010 at 7:42 PM, Keith Goodman wrote: > I should make a benchmark suite. >> ny.benchit(verbose=False) Nanny performance benchmark Nanny 0.0.1dev Numpy 1.4.1 Speed is numpy time divided by nanny time NaN means all NaNs Speed Test Shape dtype NaN? 6.6770 nansum(a, axis=-1) (500,500) int64 4.6612 nansum(a, axis=-1) (10000,) float64 9.0351 nansum(a, axis=-1) (500,500) int32 3.0746 nansum(a, axis=-1) (500,500) float64 11.5740 nansum(a, axis=-1) (10000,) int32 6.4484 nansum(a, axis=-1) (10000,) int64 51.3917 nansum(a, axis=-1) (500,500) float64 NaN 13.8692 nansum(a, axis=-1) (10000,) float64 NaN 6.5327 nanmax(a, axis=-1) (500,500) int64 8.8222 nanmax(a, axis=-1) (10000,) float64 0.2059 nanmax(a, axis=-1) (500,500) int32 6.9262 nanmax(a, axis=-1) (500,500) float64 5.0688 nanmax(a, axis=-1) (10000,) int32 6.5605 nanmax(a, axis=-1) (10000,) int64 48.4850 nanmax(a, axis=-1) (500,500) float64 NaN 14.6289 nanmax(a, axis=-1) (10000,) float64 NaN You can also use the makefile to run the benchmark: make bench From wesmckinn at gmail.com Sat Nov 20 18:54:32 2010 From: wesmckinn at gmail.com (Wes McKinney) Date: Sat, 20 Nov 2010 18:54:32 -0500 Subject: [Numpy-discussion] [ANN] Nanny, faster NaN functions In-Reply-To: References: Message-ID: On Sat, Nov 20, 2010 at 6:39 PM, Keith Goodman wrote: > On Fri, Nov 19, 2010 at 7:42 PM, Keith Goodman wrote: >> I should make a benchmark suite. > >>> ny.benchit(verbose=False) > Nanny performance benchmark > ? ?Nanny 0.0.1dev > ? ?Numpy 1.4.1 > ? ?Speed is numpy time divided by nanny time > ? ?NaN means all NaNs > ? Speed ? Test ? ? ? ? ? ? ? ?Shape ? ? ? ?dtype ? ?NaN? > ? 6.6770 ?nansum(a, axis=-1) ?(500,500) ? ?int64 > ? 4.6612 ?nansum(a, axis=-1) ?(10000,) ? ? float64 > ? 9.0351 ?nansum(a, axis=-1) ?(500,500) ? ?int32 > ? 3.0746 ?nansum(a, axis=-1) ?(500,500) ? ?float64 > ?11.5740 ?nansum(a, axis=-1) ?(10000,) ? ? int32 > ? 6.4484 ?nansum(a, axis=-1) ?(10000,) ? ? int64 > ?51.3917 ?nansum(a, axis=-1) ?(500,500) ? ?float64 ?NaN > ?13.8692 ?nansum(a, axis=-1) ?(10000,) ? ? float64 ?NaN > ? 6.5327 ?nanmax(a, axis=-1) ?(500,500) ? ?int64 > ? 8.8222 ?nanmax(a, axis=-1) ?(10000,) ? ? float64 > ? 0.2059 ?nanmax(a, axis=-1) ?(500,500) ? ?int32 > ? 6.9262 ?nanmax(a, axis=-1) ?(500,500) ? ?float64 > ? 5.0688 ?nanmax(a, axis=-1) ?(10000,) ? ? int32 > ? 6.5605 ?nanmax(a, axis=-1) ?(10000,) ? ? int64 > ?48.4850 ?nanmax(a, axis=-1) ?(500,500) ? ?float64 ?NaN > ?14.6289 ?nanmax(a, axis=-1) ?(10000,) ? ? float64 ?NaN > > You can also use the makefile to run the benchmark: make bench > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > Keith (and others), What would you think about creating a library of mostly Cython-based "domain specific functions"? So stuff like rolling statistical moments, nan* functions like you have here, and all that-- NumPy-array only functions that don't necessarily belong in NumPy or SciPy (but could be included on down the road). You were already talking about this on the statsmodels mailing list for larry. I spent a lot of time writing a bunch of these for pandas over the last couple of years, and I would have relatively few qualms about moving these outside of pandas and introducing a dependency. You could do the same for larry-- then we'd all be relying on the same well-vetted and tested codebase. - Wes From wesmckinn at gmail.com Sat Nov 20 18:56:22 2010 From: wesmckinn at gmail.com (Wes McKinney) Date: Sat, 20 Nov 2010 18:56:22 -0500 Subject: [Numpy-discussion] [ANN] Nanny, faster NaN functions In-Reply-To: References: Message-ID: On Sat, Nov 20, 2010 at 6:54 PM, Wes McKinney wrote: > On Sat, Nov 20, 2010 at 6:39 PM, Keith Goodman wrote: >> On Fri, Nov 19, 2010 at 7:42 PM, Keith Goodman wrote: >>> I should make a benchmark suite. >> >>>> ny.benchit(verbose=False) >> Nanny performance benchmark >> ? ?Nanny 0.0.1dev >> ? ?Numpy 1.4.1 >> ? ?Speed is numpy time divided by nanny time >> ? ?NaN means all NaNs >> ? Speed ? Test ? ? ? ? ? ? ? ?Shape ? ? ? ?dtype ? ?NaN? >> ? 6.6770 ?nansum(a, axis=-1) ?(500,500) ? ?int64 >> ? 4.6612 ?nansum(a, axis=-1) ?(10000,) ? ? float64 >> ? 9.0351 ?nansum(a, axis=-1) ?(500,500) ? ?int32 >> ? 3.0746 ?nansum(a, axis=-1) ?(500,500) ? ?float64 >> ?11.5740 ?nansum(a, axis=-1) ?(10000,) ? ? int32 >> ? 6.4484 ?nansum(a, axis=-1) ?(10000,) ? ? int64 >> ?51.3917 ?nansum(a, axis=-1) ?(500,500) ? ?float64 ?NaN >> ?13.8692 ?nansum(a, axis=-1) ?(10000,) ? ? float64 ?NaN >> ? 6.5327 ?nanmax(a, axis=-1) ?(500,500) ? ?int64 >> ? 8.8222 ?nanmax(a, axis=-1) ?(10000,) ? ? float64 >> ? 0.2059 ?nanmax(a, axis=-1) ?(500,500) ? ?int32 >> ? 6.9262 ?nanmax(a, axis=-1) ?(500,500) ? ?float64 >> ? 5.0688 ?nanmax(a, axis=-1) ?(10000,) ? ? int32 >> ? 6.5605 ?nanmax(a, axis=-1) ?(10000,) ? ? int64 >> ?48.4850 ?nanmax(a, axis=-1) ?(500,500) ? ?float64 ?NaN >> ?14.6289 ?nanmax(a, axis=-1) ?(10000,) ? ? float64 ?NaN >> >> You can also use the makefile to run the benchmark: make bench >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> > > Keith (and others), > > What would you think about creating a library of mostly Cython-based > "domain specific functions"? So stuff like rolling statistical > moments, nan* functions like you have here, and all that-- NumPy-array > only functions that don't necessarily belong in NumPy or SciPy (but > could be included on down the road). You were already talking about > this on the statsmodels mailing list for larry. I spent a lot of time > writing a bunch of these for pandas over the last couple of years, and > I would have relatively few qualms about moving these outside of > pandas and introducing a dependency. You could do the same for larry-- > then we'd all be relying on the same well-vetted and tested codebase. > > - Wes > By the way I wouldn't mind pushing all of my datetime-related code (date range generation, date offsets, etc.) into this new library, too. From kwgoodman at gmail.com Sat Nov 20 19:24:44 2010 From: kwgoodman at gmail.com (Keith Goodman) Date: Sat, 20 Nov 2010 16:24:44 -0800 Subject: [Numpy-discussion] [ANN] Nanny, faster NaN functions In-Reply-To: References: Message-ID: On Sat, Nov 20, 2010 at 3:54 PM, Wes McKinney wrote: > Keith (and others), > > What would you think about creating a library of mostly Cython-based > "domain specific functions"? So stuff like rolling statistical > moments, nan* functions like you have here, and all that-- NumPy-array > only functions that don't necessarily belong in NumPy or SciPy (but > could be included on down the road). You were already talking about > this on the statsmodels mailing list for larry. I spent a lot of time > writing a bunch of these for pandas over the last couple of years, and > I would have relatively few qualms about moving these outside of > pandas and introducing a dependency. You could do the same for larry-- > then we'd all be relying on the same well-vetted and tested codebase. I've started working on moving window statistics cython functions. I plan to make it into a package called Roly (for rolling). The signatures are: mov_sum(arr, window, axis=-1) and mov_nansum(arr, window, axis=-1), etc. I think of Nanny and Roly as two separate packages. A narrow focus is good for a new package. But maybe each package could be a subpackage in a super package? Would the function signatures in Nanny (exact duplicates of the corresponding functions in Numpy and Scipy) work for pandas? I plan to use Nanny in larry. I'll try to get the structure of the Nanny package in place. But if it doesn't attract any interest after that then I may fold it into larry. From charlesr.harris at gmail.com Sat Nov 20 23:41:18 2010 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sat, 20 Nov 2010 21:41:18 -0700 Subject: [Numpy-discussion] [ANN] Nanny, faster NaN functions In-Reply-To: References: Message-ID: On Sat, Nov 20, 2010 at 4:39 PM, Keith Goodman wrote: > On Fri, Nov 19, 2010 at 7:42 PM, Keith Goodman > wrote: > > I should make a benchmark suite. > > >> ny.benchit(verbose=False) > Nanny performance benchmark > Nanny 0.0.1dev > Numpy 1.4.1 > Speed is numpy time divided by nanny time > NaN means all NaNs > Speed Test Shape dtype NaN? > 6.6770 nansum(a, axis=-1) (500,500) int64 > 4.6612 nansum(a, axis=-1) (10000,) float64 > 9.0351 nansum(a, axis=-1) (500,500) int32 > 3.0746 nansum(a, axis=-1) (500,500) float64 > 11.5740 nansum(a, axis=-1) (10000,) int32 > 6.4484 nansum(a, axis=-1) (10000,) int64 > 51.3917 nansum(a, axis=-1) (500,500) float64 NaN > 13.8692 nansum(a, axis=-1) (10000,) float64 NaN > 6.5327 nanmax(a, axis=-1) (500,500) int64 > 8.8222 nanmax(a, axis=-1) (10000,) float64 > 0.2059 nanmax(a, axis=-1) (500,500) int32 > 6.9262 nanmax(a, axis=-1) (500,500) float64 > 5.0688 nanmax(a, axis=-1) (10000,) int32 > 6.5605 nanmax(a, axis=-1) (10000,) int64 > 48.4850 nanmax(a, axis=-1) (500,500) float64 NaN > 14.6289 nanmax(a, axis=-1) (10000,) float64 NaN > > Here's what I get using (my current) np.fmax.reduce in place of nanmax. Speed Test Shape dtype NaN? 3.3717 nansum(a, axis=-1) (500,500) int64 5.1639 nansum(a, axis=-1) (10000,) float64 3.8308 nansum(a, axis=-1) (500,500) int32 6.0854 nansum(a, axis=-1) (500,500) float64 8.7821 nansum(a, axis=-1) (10000,) int32 1.1716 nansum(a, axis=-1) (10000,) int64 5.5777 nansum(a, axis=-1) (500,500) float64 NaN 5.8718 nansum(a, axis=-1) (10000,) float64 NaN 0.5419 nanmax(a, axis=-1) (500,500) int64 2.8732 nanmax(a, axis=-1) (10000,) float64 0.0301 nanmax(a, axis=-1) (500,500) int32 2.7437 nanmax(a, axis=-1) (500,500) float64 0.7868 nanmax(a, axis=-1) (10000,) int32 0.5535 nanmax(a, axis=-1) (10000,) int64 2.8715 nanmax(a, axis=-1) (500,500) float64 NaN 2.5937 nanmax(a, axis=-1) (10000,) float64 NaN I think the really small int32 ratio is due to timing granularity. For random ints in the range 0..99 the results are not quite as good for fmax, which I find puzzling. Speed Test Shape dtype NaN? 3.4021 nansum(a, axis=-1) (500,500) int64 5.5913 nansum(a, axis=-1) (10000,) float64 4.4569 nansum(a, axis=-1) (500,500) int32 6.6202 nansum(a, axis=-1) (500,500) float64 7.1847 nansum(a, axis=-1) (10000,) int32 2.0448 nansum(a, axis=-1) (10000,) int64 6.0257 nansum(a, axis=-1) (500,500) float64 NaN 6.3172 nansum(a, axis=-1) (10000,) float64 NaN 0.9598 nanmax(a, axis=-1) (500,500) int64 3.2407 nanmax(a, axis=-1) (10000,) float64 0.0520 nanmax(a, axis=-1) (500,500) int32 3.1954 nanmax(a, axis=-1) (500,500) float64 1.5538 nanmax(a, axis=-1) (10000,) int32 0.3716 nanmax(a, axis=-1) (10000,) int64 3.2372 nanmax(a, axis=-1) (500,500) float64 NaN 2.5633 nanmax(a, axis=-1) (10000,) float64 NaN Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at googlemail.com Sun Nov 21 02:04:23 2010 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Sun, 21 Nov 2010 15:04:23 +0800 Subject: [Numpy-discussion] ANN: NumPy 1.5.1 Message-ID: Hi, I am pleased to announce the availability of NumPy 1.5.1. This bug-fix release comes almost 3 months after the 1.5.0 release, it contains no new features compared to 1.5.0. Binaries, sources and release notes can be found at https://sourceforge.net/projects/numpy/files/. Thank you to everyone who contributed to this release. Enjoy, The numpy developers. ========================= NumPy 1.5.1 Release Notes ========================= Numpy 1.5.1 is a bug-fix release with no new features compared to 1.5.0. Numpy source code location changed ================================== Numpy has stopped using SVN as the version control system, and moved to Git. The development source code for Numpy can from now on be found at http://github.com/numpy/numpy Note on GCC versions ==================== On non-x86 platforms, Numpy can trigger a bug in the recent GCC compiler versions 4.5.0 and 4.5.1: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45967 We recommend not using these versions of GCC for compiling Numpy on these platforms. Bugs fixed ========== Of the following, #1605 is important for Cython modules. - #937: linalg: lstsq should always return real residual - #1196: lib: fix negative indices in s_ and index_exp - #1287: core: fix uint64 -> Python int cast - #1491: core: richcompare should return Py_NotImplemented when undefined - #1517: lib: close file handles after use in numpy.lib.npyio.* - #1605: core: ensure PEP 3118 buffers can be released in exception handler - #1617: core: fix clongdouble cast to Python complex() - #1625: core: fix detection for ``isfinite`` routine - #1626: core: fix compilation with Solaris 10 / Sun Studio 12.1 Scipy could not be built against Numpy 1.5.0 on OS X due to a numpy.distutils bug, #1399. This issue is fixed now. - #1399: distutils: use C arch flags for Fortran compilation on OS X. Python 3 specific; #1610 is important for any I/O: - #----: f2py: make f2py script runnable on Python 3 - #1604: distutils: potential infinite loop in numpy.distutils - #1609: core: use accelerated BLAS, when available - #1610: core: ensure tofile and fromfile maintain file handle positions Checksums ========= b3db7d1ccfc3640b4c33b7911dbceabc release/installers/numpy-1.5.1-py2.5-python.org-macosx10.3.dmg 55f5863856485bbb005b77014edcd34a release/installers/numpy-1.5.1-py2.6-python.org-macosx10.3.dmg 420113e2a30712668445050a0f38e7a6 release/installers/numpy-1.5.1-py2.7-python.org-macosx10.3.dmg 757885ab8d64cf060ef629800da2e65c release/installers/numpy-1.5.1-py2.7-python.org-macosx10.5.dmg 11e60c3f7f3c86fcb5facf88c3981fd3 release/installers/numpy-1.5.1-win32-superpack-python2.5.exe 3fc14943dc2fcf740d8c204455e68aa7 release/installers/numpy-1.5.1-win32-superpack-python2.6.exe a352acce86c8b2cfb247e38339e27fd0 release/installers/numpy-1.5.1-win32-superpack-python2.7.exe 160de9794e4a239c9da1196a5eb30f7e release/installers/numpy-1.5.1-win32-superpack-python3.1.exe 376ef150df41b5353944ab742145352d release/installers/numpy-1.5.1.tar.gz ab6045070c0de5016fdf94dd2a79638b release/installers/numpy-1.5.1.zip -------------- next part -------------- An HTML attachment was scrubbed... URL: From wesmckinn at gmail.com Sun Nov 21 13:25:40 2010 From: wesmckinn at gmail.com (Wes McKinney) Date: Sun, 21 Nov 2010 13:25:40 -0500 Subject: [Numpy-discussion] [ANN] Nanny, faster NaN functions In-Reply-To: References: Message-ID: On Sat, Nov 20, 2010 at 7:24 PM, Keith Goodman wrote: > On Sat, Nov 20, 2010 at 3:54 PM, Wes McKinney wrote: > >> Keith (and others), >> >> What would you think about creating a library of mostly Cython-based >> "domain specific functions"? So stuff like rolling statistical >> moments, nan* functions like you have here, and all that-- NumPy-array >> only functions that don't necessarily belong in NumPy or SciPy (but >> could be included on down the road). You were already talking about >> this on the statsmodels mailing list for larry. I spent a lot of time >> writing a bunch of these for pandas over the last couple of years, and >> I would have relatively few qualms about moving these outside of >> pandas and introducing a dependency. You could do the same for larry-- >> then we'd all be relying on the same well-vetted and tested codebase. > > I've started working on moving window statistics cython functions. I > plan to make it into a package called Roly (for rolling). The > signatures are: mov_sum(arr, window, axis=-1) and mov_nansum(arr, > window, axis=-1), etc. > > I think of Nanny and Roly as two separate packages. A narrow focus is > good for a new package. But maybe each package could be a subpackage > in a super package? > > Would the function signatures in Nanny (exact duplicates of the > corresponding functions in Numpy and Scipy) work for pandas? I plan to > use Nanny in larry. I'll try to get the structure of the Nanny package > in place. But if it doesn't attract any interest after that then I may > fold it into larry. > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > Why make multiple packages? It seems like all these functions are somewhat related: practical tools for real-world data analysis (where observations are often missing). I suspect having everything under one hood would create more interest than chopping things up-- would be very useful to folks in many different disciplines (finance, economics, statistics, etc.). In R, for example, NA-handling is just a part of every day life. Of course in R there is a special NA value which is distinct from NaN-- many folks object to the use of NaN for missing values. The alternative is masked arrays, but in my case I wasn't willing to sacrifice so much performance for purity's sake. I could certainly use the nan* functions to replace code in pandas where I've handled things in a somewhat ad hoc way. From eadrogue at gmx.net Sun Nov 21 14:15:00 2010 From: eadrogue at gmx.net (Ernest =?iso-8859-1?Q?Adrogu=E9?=) Date: Sun, 21 Nov 2010 20:15:00 +0100 Subject: [Numpy-discussion] indexing question Message-ID: <20101121191500.GA17912@doriath.local> Hi, Suppose an array of shape (N,2,2), that is N arrays of shape (2,2). I want to select an element (x,y) from each one of the subarrays, so I get a 1-dimensional array of length N. For instance: In [228]: t=np.arange(8).reshape(2,2,2) In [229]: t Out[229]: array([[[0, 1], [2, 3]], [[4, 5], [6, 7]]]) In [230]: x=[0,1] In [231]: y=[1,1] In [232]: t[[0,1],x,y] Out[232]: array([1, 7]) This way, I get the elements (0,1) and (1,1) which is what I wanted. The question is: is it possible to omit the [0,1] in the index? Thanks in advance. -- Ernest From jsalvati at u.washington.edu Sun Nov 21 14:28:56 2010 From: jsalvati at u.washington.edu (John Salvatier) Date: Sun, 21 Nov 2010 11:28:56 -0800 Subject: [Numpy-discussion] indexing question In-Reply-To: <20101121191500.GA17912@doriath.local> References: <20101121191500.GA17912@doriath.local> Message-ID: yes use the symbol ':' so you want t[:,x,y] 2010/11/21 Ernest Adrogu? : > Hi, > > Suppose an array of shape (N,2,2), that is N arrays of > shape (2,2). I want to select an element (x,y) from each one > of the subarrays, so I get a 1-dimensional array of length > N. For instance: > > In [228]: t=np.arange(8).reshape(2,2,2) > > In [229]: t > Out[229]: > array([[[0, 1], > ? ? ? ?[2, 3]], > > ? ? ? [[4, 5], > ? ? ? ?[6, 7]]]) > > In [230]: x=[0,1] > > In [231]: y=[1,1] > > In [232]: t[[0,1],x,y] > Out[232]: array([1, 7]) > > This way, I get the elements (0,1) and (1,1) which is what > I wanted. The question is: is it possible to omit the [0,1] > in the index? > > Thanks in advance. > > -- > Ernest > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From jsalvati at u.washington.edu Sun Nov 21 14:29:55 2010 From: jsalvati at u.washington.edu (John Salvatier) Date: Sun, 21 Nov 2010 11:29:55 -0800 Subject: [Numpy-discussion] indexing question In-Reply-To: References: <20101121191500.GA17912@doriath.local> Message-ID: read about basic slicing : http://docs.scipy.org/doc/numpy/reference/arrays.indexing.html On Sun, Nov 21, 2010 at 11:28 AM, John Salvatier wrote: > yes use the symbol ':' > > so you want > > t[:,x,y] > > 2010/11/21 Ernest Adrogu? : >> Hi, >> >> Suppose an array of shape (N,2,2), that is N arrays of >> shape (2,2). I want to select an element (x,y) from each one >> of the subarrays, so I get a 1-dimensional array of length >> N. For instance: >> >> In [228]: t=np.arange(8).reshape(2,2,2) >> >> In [229]: t >> Out[229]: >> array([[[0, 1], >> ? ? ? ?[2, 3]], >> >> ? ? ? [[4, 5], >> ? ? ? ?[6, 7]]]) >> >> In [230]: x=[0,1] >> >> In [231]: y=[1,1] >> >> In [232]: t[[0,1],x,y] >> Out[232]: array([1, 7]) >> >> This way, I get the elements (0,1) and (1,1) which is what >> I wanted. The question is: is it possible to omit the [0,1] >> in the index? >> >> Thanks in advance. >> >> -- >> Ernest >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> > From eadrogue at gmx.net Sun Nov 21 14:37:01 2010 From: eadrogue at gmx.net (Ernest =?iso-8859-1?Q?Adrogu=E9?=) Date: Sun, 21 Nov 2010 20:37:01 +0100 Subject: [Numpy-discussion] indexing question In-Reply-To: References: <20101121191500.GA17912@doriath.local> Message-ID: <20101121193701.GA18049@doriath.local> Hi, 21/11/10 @ 11:28 (-0800), thus spake John Salvatier: > yes use the symbol ':' > > so you want > > t[:,x,y] I tried that, but it's not the same: In [307]: t[[0,1],x,y] Out[307]: array([1, 7]) In [308]: t[:,x,y] Out[308]: array([[1, 3], [5, 7]]) No? -- Ernest From kwgoodman at gmail.com Sun Nov 21 14:48:16 2010 From: kwgoodman at gmail.com (Keith Goodman) Date: Sun, 21 Nov 2010 11:48:16 -0800 Subject: [Numpy-discussion] [ANN] Nanny, faster NaN functions In-Reply-To: References: Message-ID: On Sun, Nov 21, 2010 at 10:25 AM, Wes McKinney wrote: > On Sat, Nov 20, 2010 at 7:24 PM, Keith Goodman wrote: >> On Sat, Nov 20, 2010 at 3:54 PM, Wes McKinney wrote: >> >>> Keith (and others), >>> >>> What would you think about creating a library of mostly Cython-based >>> "domain specific functions"? So stuff like rolling statistical >>> moments, nan* functions like you have here, and all that-- NumPy-array >>> only functions that don't necessarily belong in NumPy or SciPy (but >>> could be included on down the road). You were already talking about >>> this on the statsmodels mailing list for larry. I spent a lot of time >>> writing a bunch of these for pandas over the last couple of years, and >>> I would have relatively few qualms about moving these outside of >>> pandas and introducing a dependency. You could do the same for larry-- >>> then we'd all be relying on the same well-vetted and tested codebase. >> >> I've started working on moving window statistics cython functions. I >> plan to make it into a package called Roly (for rolling). The >> signatures are: mov_sum(arr, window, axis=-1) and mov_nansum(arr, >> window, axis=-1), etc. >> >> I think of Nanny and Roly as two separate packages. A narrow focus is >> good for a new package. But maybe each package could be a subpackage >> in a super package? >> >> Would the function signatures in Nanny (exact duplicates of the >> corresponding functions in Numpy and Scipy) work for pandas? I plan to >> use Nanny in larry. I'll try to get the structure of the Nanny package >> in place. But if it doesn't attract any interest after that then I may >> fold it into larry. >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> > > Why make multiple packages? It seems like all these functions are > somewhat related: practical tools for real-world data analysis (where > observations are often missing). I suspect having everything under one > hood would create more interest than chopping things up-- would be > very useful to folks in many different disciplines (finance, > economics, statistics, etc.). In R, for example, NA-handling is just a > part of every day life. Of course in R there is a special NA value > which is distinct from NaN-- many folks object to the use of NaN for > missing values. The alternative is masked arrays, but in my case I > wasn't willing to sacrifice so much performance for purity's sake. > > I could certainly use the nan* functions to replace code in pandas > where I've handled things in a somewhat ad hoc way. A package focused on NaN-aware functions sounds like a good idea. I think a good plan would be to start by making faster, drop-in replacements for the NaN functions that are already in numpy and scipy. That is already a lot of work. After that, one possibility is to add stuff like nancumsum, nanprod, etc. After that moving window stuff? From josef.pktd at gmail.com Sun Nov 21 15:30:27 2010 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Sun, 21 Nov 2010 15:30:27 -0500 Subject: [Numpy-discussion] [ANN] Nanny, faster NaN functions In-Reply-To: References: Message-ID: On Sun, Nov 21, 2010 at 2:48 PM, Keith Goodman wrote: > On Sun, Nov 21, 2010 at 10:25 AM, Wes McKinney wrote: >> On Sat, Nov 20, 2010 at 7:24 PM, Keith Goodman wrote: >>> On Sat, Nov 20, 2010 at 3:54 PM, Wes McKinney wrote: >>> >>>> Keith (and others), >>>> >>>> What would you think about creating a library of mostly Cython-based >>>> "domain specific functions"? So stuff like rolling statistical >>>> moments, nan* functions like you have here, and all that-- NumPy-array >>>> only functions that don't necessarily belong in NumPy or SciPy (but >>>> could be included on down the road). You were already talking about >>>> this on the statsmodels mailing list for larry. I spent a lot of time >>>> writing a bunch of these for pandas over the last couple of years, and >>>> I would have relatively few qualms about moving these outside of >>>> pandas and introducing a dependency. You could do the same for larry-- >>>> then we'd all be relying on the same well-vetted and tested codebase. >>> >>> I've started working on moving window statistics cython functions. I >>> plan to make it into a package called Roly (for rolling). The >>> signatures are: mov_sum(arr, window, axis=-1) and mov_nansum(arr, >>> window, axis=-1), etc. >>> >>> I think of Nanny and Roly as two separate packages. A narrow focus is >>> good for a new package. But maybe each package could be a subpackage >>> in a super package? >>> >>> Would the function signatures in Nanny (exact duplicates of the >>> corresponding functions in Numpy and Scipy) work for pandas? I plan to >>> use Nanny in larry. I'll try to get the structure of the Nanny package >>> in place. But if it doesn't attract any interest after that then I may >>> fold it into larry. >>> _______________________________________________ >>> NumPy-Discussion mailing list >>> NumPy-Discussion at scipy.org >>> http://mail.scipy.org/mailman/listinfo/numpy-discussion >>> >> >> Why make multiple packages? It seems like all these functions are >> somewhat related: practical tools for real-world data analysis (where >> observations are often missing). I suspect having everything under one >> hood would create more interest than chopping things up-- would be >> very useful to folks in many different disciplines (finance, >> economics, statistics, etc.). In R, for example, NA-handling is just a >> part of every day life. Of course in R there is a special NA value >> which is distinct from NaN-- many folks object to the use of NaN for >> missing values. The alternative is masked arrays, but in my case I >> wasn't willing to sacrifice so much performance for purity's sake. >> >> I could certainly use the nan* functions to replace code in pandas >> where I've handled things in a somewhat ad hoc way. > > A package focused on NaN-aware functions sounds like a good idea. I > think a good plan would be to start by making faster, drop-in > replacements for the NaN functions that are already in numpy and > scipy. That is already a lot of work. After that, one possibility is > to add stuff like nancumsum, nanprod, etc. After that moving window > stuff? and maybe group functions after that? If there is a lot of repetition, you could use templating. Even simple string substitution, if it is only replacing the dtype, works pretty well. It would at least reduce some copy-paste. Josef > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From friedrichromstedt at gmail.com Sun Nov 21 15:44:23 2010 From: friedrichromstedt at gmail.com (Friedrich Romstedt) Date: Sun, 21 Nov 2010 21:44:23 +0100 Subject: [Numpy-discussion] Where did the github numpy repository go? In-Reply-To: References: Message-ID: 2010/11/14 Charles R Harris : > I keep getting page does not exist. The comments on the event, https://github.com/blog/744-today-s-outage, are simply great and stunning. Friedrich From kwgoodman at gmail.com Sun Nov 21 17:09:34 2010 From: kwgoodman at gmail.com (Keith Goodman) Date: Sun, 21 Nov 2010 14:09:34 -0800 Subject: [Numpy-discussion] [ANN] Nanny, faster NaN functions In-Reply-To: References: Message-ID: On Sun, Nov 21, 2010 at 12:30 PM, wrote: > On Sun, Nov 21, 2010 at 2:48 PM, Keith Goodman wrote: >> On Sun, Nov 21, 2010 at 10:25 AM, Wes McKinney wrote: >>> On Sat, Nov 20, 2010 at 7:24 PM, Keith Goodman wrote: >>>> On Sat, Nov 20, 2010 at 3:54 PM, Wes McKinney wrote: >>>> >>>>> Keith (and others), >>>>> >>>>> What would you think about creating a library of mostly Cython-based >>>>> "domain specific functions"? So stuff like rolling statistical >>>>> moments, nan* functions like you have here, and all that-- NumPy-array >>>>> only functions that don't necessarily belong in NumPy or SciPy (but >>>>> could be included on down the road). You were already talking about >>>>> this on the statsmodels mailing list for larry. I spent a lot of time >>>>> writing a bunch of these for pandas over the last couple of years, and >>>>> I would have relatively few qualms about moving these outside of >>>>> pandas and introducing a dependency. You could do the same for larry-- >>>>> then we'd all be relying on the same well-vetted and tested codebase. >>>> >>>> I've started working on moving window statistics cython functions. I >>>> plan to make it into a package called Roly (for rolling). The >>>> signatures are: mov_sum(arr, window, axis=-1) and mov_nansum(arr, >>>> window, axis=-1), etc. >>>> >>>> I think of Nanny and Roly as two separate packages. A narrow focus is >>>> good for a new package. But maybe each package could be a subpackage >>>> in a super package? >>>> >>>> Would the function signatures in Nanny (exact duplicates of the >>>> corresponding functions in Numpy and Scipy) work for pandas? I plan to >>>> use Nanny in larry. I'll try to get the structure of the Nanny package >>>> in place. But if it doesn't attract any interest after that then I may >>>> fold it into larry. >>>> _______________________________________________ >>>> NumPy-Discussion mailing list >>>> NumPy-Discussion at scipy.org >>>> http://mail.scipy.org/mailman/listinfo/numpy-discussion >>>> >>> >>> Why make multiple packages? It seems like all these functions are >>> somewhat related: practical tools for real-world data analysis (where >>> observations are often missing). I suspect having everything under one >>> hood would create more interest than chopping things up-- would be >>> very useful to folks in many different disciplines (finance, >>> economics, statistics, etc.). In R, for example, NA-handling is just a >>> part of every day life. Of course in R there is a special NA value >>> which is distinct from NaN-- many folks object to the use of NaN for >>> missing values. The alternative is masked arrays, but in my case I >>> wasn't willing to sacrifice so much performance for purity's sake. >>> >>> I could certainly use the nan* functions to replace code in pandas >>> where I've handled things in a somewhat ad hoc way. >> >> A package focused on NaN-aware functions sounds like a good idea. I >> think a good plan would be to start by making faster, drop-in >> replacements for the NaN functions that are already in numpy and >> scipy. That is already a lot of work. After that, one possibility is >> to add stuff like nancumsum, nanprod, etc. After that moving window >> stuff? > > and maybe group functions after that? Yes, group functions are on my list. > If there is a lot of repetition, you could use templating. Even simple > string substitution, if it is only replacing the dtype, works pretty > well. It would at least reduce some copy-paste. Unit test coverage should be good enough to mess around with trying templating. What's a good way to go? Write my own script that creates the .pyx file and call it from the make file? Or are there packages for doing the templating? I added nanmean (the first scipy function to enter nanny) and nanmin. From josef.pktd at gmail.com Sun Nov 21 18:02:05 2010 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Sun, 21 Nov 2010 18:02:05 -0500 Subject: [Numpy-discussion] [ANN] Nanny, faster NaN functions In-Reply-To: References: Message-ID: On Sun, Nov 21, 2010 at 5:09 PM, Keith Goodman wrote: > On Sun, Nov 21, 2010 at 12:30 PM, ? wrote: >> On Sun, Nov 21, 2010 at 2:48 PM, Keith Goodman wrote: >>> On Sun, Nov 21, 2010 at 10:25 AM, Wes McKinney wrote: >>>> On Sat, Nov 20, 2010 at 7:24 PM, Keith Goodman wrote: >>>>> On Sat, Nov 20, 2010 at 3:54 PM, Wes McKinney wrote: >>>>> >>>>>> Keith (and others), >>>>>> >>>>>> What would you think about creating a library of mostly Cython-based >>>>>> "domain specific functions"? So stuff like rolling statistical >>>>>> moments, nan* functions like you have here, and all that-- NumPy-array >>>>>> only functions that don't necessarily belong in NumPy or SciPy (but >>>>>> could be included on down the road). You were already talking about >>>>>> this on the statsmodels mailing list for larry. I spent a lot of time >>>>>> writing a bunch of these for pandas over the last couple of years, and >>>>>> I would have relatively few qualms about moving these outside of >>>>>> pandas and introducing a dependency. You could do the same for larry-- >>>>>> then we'd all be relying on the same well-vetted and tested codebase. >>>>> >>>>> I've started working on moving window statistics cython functions. I >>>>> plan to make it into a package called Roly (for rolling). The >>>>> signatures are: mov_sum(arr, window, axis=-1) and mov_nansum(arr, >>>>> window, axis=-1), etc. >>>>> >>>>> I think of Nanny and Roly as two separate packages. A narrow focus is >>>>> good for a new package. But maybe each package could be a subpackage >>>>> in a super package? >>>>> >>>>> Would the function signatures in Nanny (exact duplicates of the >>>>> corresponding functions in Numpy and Scipy) work for pandas? I plan to >>>>> use Nanny in larry. I'll try to get the structure of the Nanny package >>>>> in place. But if it doesn't attract any interest after that then I may >>>>> fold it into larry. >>>>> _______________________________________________ >>>>> NumPy-Discussion mailing list >>>>> NumPy-Discussion at scipy.org >>>>> http://mail.scipy.org/mailman/listinfo/numpy-discussion >>>>> >>>> >>>> Why make multiple packages? It seems like all these functions are >>>> somewhat related: practical tools for real-world data analysis (where >>>> observations are often missing). I suspect having everything under one >>>> hood would create more interest than chopping things up-- would be >>>> very useful to folks in many different disciplines (finance, >>>> economics, statistics, etc.). In R, for example, NA-handling is just a >>>> part of every day life. Of course in R there is a special NA value >>>> which is distinct from NaN-- many folks object to the use of NaN for >>>> missing values. The alternative is masked arrays, but in my case I >>>> wasn't willing to sacrifice so much performance for purity's sake. >>>> >>>> I could certainly use the nan* functions to replace code in pandas >>>> where I've handled things in a somewhat ad hoc way. >>> >>> A package focused on NaN-aware functions sounds like a good idea. I >>> think a good plan would be to start by making faster, drop-in >>> replacements for the NaN functions that are already in numpy and >>> scipy. That is already a lot of work. After that, one possibility is >>> to add stuff like nancumsum, nanprod, etc. After that moving window >>> stuff? >> >> and maybe group functions after that? > > Yes, group functions are on my list. > >> If there is a lot of repetition, you could use templating. Even simple >> string substitution, if it is only replacing the dtype, works pretty >> well. It would at least reduce some copy-paste. > > Unit test coverage should be good enough to mess around with trying > templating. What's a good way to go? Write my own script that creates > the .pyx file and call it from the make file? Or are there packages > for doing the templating? Depends on the scale, I tried once with simple string templates http://codespeak.net/pipermail/cython-dev/2009-August/006614.html here is a pastbin of another version by ....(?), http://pastebin.com/f1a49143d discussed on the cython-dev mailing list. The cython list has the discussion every once in a while but I haven't seen any conclusion yet. For heavier duty templating a proper templating package (Jinja?) might be better. I'm not an expert. Josef > > I added nanmean (the first scipy function to enter nanny) and nanmin. > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From wesmckinn at gmail.com Sun Nov 21 18:16:21 2010 From: wesmckinn at gmail.com (Wes McKinney) Date: Sun, 21 Nov 2010 18:16:21 -0500 Subject: [Numpy-discussion] [ANN] Nanny, faster NaN functions In-Reply-To: References: Message-ID: On Sun, Nov 21, 2010 at 6:02 PM, wrote: > On Sun, Nov 21, 2010 at 5:09 PM, Keith Goodman wrote: >> On Sun, Nov 21, 2010 at 12:30 PM, ? wrote: >>> On Sun, Nov 21, 2010 at 2:48 PM, Keith Goodman wrote: >>>> On Sun, Nov 21, 2010 at 10:25 AM, Wes McKinney wrote: >>>>> On Sat, Nov 20, 2010 at 7:24 PM, Keith Goodman wrote: >>>>>> On Sat, Nov 20, 2010 at 3:54 PM, Wes McKinney wrote: >>>>>> >>>>>>> Keith (and others), >>>>>>> >>>>>>> What would you think about creating a library of mostly Cython-based >>>>>>> "domain specific functions"? So stuff like rolling statistical >>>>>>> moments, nan* functions like you have here, and all that-- NumPy-array >>>>>>> only functions that don't necessarily belong in NumPy or SciPy (but >>>>>>> could be included on down the road). You were already talking about >>>>>>> this on the statsmodels mailing list for larry. I spent a lot of time >>>>>>> writing a bunch of these for pandas over the last couple of years, and >>>>>>> I would have relatively few qualms about moving these outside of >>>>>>> pandas and introducing a dependency. You could do the same for larry-- >>>>>>> then we'd all be relying on the same well-vetted and tested codebase. >>>>>> >>>>>> I've started working on moving window statistics cython functions. I >>>>>> plan to make it into a package called Roly (for rolling). The >>>>>> signatures are: mov_sum(arr, window, axis=-1) and mov_nansum(arr, >>>>>> window, axis=-1), etc. >>>>>> >>>>>> I think of Nanny and Roly as two separate packages. A narrow focus is >>>>>> good for a new package. But maybe each package could be a subpackage >>>>>> in a super package? >>>>>> >>>>>> Would the function signatures in Nanny (exact duplicates of the >>>>>> corresponding functions in Numpy and Scipy) work for pandas? I plan to >>>>>> use Nanny in larry. I'll try to get the structure of the Nanny package >>>>>> in place. But if it doesn't attract any interest after that then I may >>>>>> fold it into larry. >>>>>> _______________________________________________ >>>>>> NumPy-Discussion mailing list >>>>>> NumPy-Discussion at scipy.org >>>>>> http://mail.scipy.org/mailman/listinfo/numpy-discussion >>>>>> >>>>> >>>>> Why make multiple packages? It seems like all these functions are >>>>> somewhat related: practical tools for real-world data analysis (where >>>>> observations are often missing). I suspect having everything under one >>>>> hood would create more interest than chopping things up-- would be >>>>> very useful to folks in many different disciplines (finance, >>>>> economics, statistics, etc.). In R, for example, NA-handling is just a >>>>> part of every day life. Of course in R there is a special NA value >>>>> which is distinct from NaN-- many folks object to the use of NaN for >>>>> missing values. The alternative is masked arrays, but in my case I >>>>> wasn't willing to sacrifice so much performance for purity's sake. >>>>> >>>>> I could certainly use the nan* functions to replace code in pandas >>>>> where I've handled things in a somewhat ad hoc way. >>>> >>>> A package focused on NaN-aware functions sounds like a good idea. I >>>> think a good plan would be to start by making faster, drop-in >>>> replacements for the NaN functions that are already in numpy and >>>> scipy. That is already a lot of work. After that, one possibility is >>>> to add stuff like nancumsum, nanprod, etc. After that moving window >>>> stuff? >>> >>> and maybe group functions after that? >> >> Yes, group functions are on my list. >> >>> If there is a lot of repetition, you could use templating. Even simple >>> string substitution, if it is only replacing the dtype, works pretty >>> well. It would at least reduce some copy-paste. >> >> Unit test coverage should be good enough to mess around with trying >> templating. What's a good way to go? Write my own script that creates >> the .pyx file and call it from the make file? Or are there packages >> for doing the templating? > > Depends on the scale, I tried once with simple string templates > http://codespeak.net/pipermail/cython-dev/2009-August/006614.html > > here is a pastbin of another version by ....(?), > http://pastebin.com/f1a49143d discussed on the cython-dev mailing > list. > > The cython list has the discussion every once in a while but I haven't > seen any conclusion yet. For heavier duty templating a proper > templating package (Jinja?) might be better. > > I'm not an expert. > > Josef > > >> >> I added nanmean (the first scipy function to enter nanny) and nanmin. >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > What would you say to a single package that contains: - NaN-aware NumPy and SciPy functions (nanmean, nanmin, etc.) - moving window functions (moving_{count, sum, mean, var, std, etc.}) - core subroutines for labeled data - group-by functions - other things to add to this list? In other words, basic building computational tools for making libraries like larry, pandas, etc. and doing time series / statistical / other manipulations on real world (messy) data sets. The focus isn't so much "NaN-awareness" per se but more practical "data wrangling". I would be happy to work on such a package and to move all the Cython code I've written into it. There's a little bit of datarray overlap potentially but I think that's OK From kwgoodman at gmail.com Sun Nov 21 18:37:26 2010 From: kwgoodman at gmail.com (Keith Goodman) Date: Sun, 21 Nov 2010 15:37:26 -0800 Subject: [Numpy-discussion] [ANN] Nanny, faster NaN functions In-Reply-To: References: Message-ID: On Sun, Nov 21, 2010 at 3:16 PM, Wes McKinney wrote: > What would you say to a single package that contains: > > - NaN-aware NumPy and SciPy functions (nanmean, nanmin, etc.) I'd say yes. > - moving window functions (moving_{count, sum, mean, var, std, etc.}) Yes. BTW, we both do arr=arr.astype(float), I think, before doing the moving statistics. So I speeded things up by running the moving window backwards and writing the result in place. > - core subroutines for labeled data Not sure what this would be. Let's discuss. > - group-by functions Yes. I have some ideas on function signatures. > - other things to add to this list? A no-op function with a really long doc string! > In other words, basic building computational tools for making > libraries like larry, pandas, etc. and doing time series / statistical > / other manipulations on real world (messy) data sets. The focus isn't > so much "NaN-awareness" per se but more practical "data wrangling". I > would be happy to work on such a package and to move all the Cython > code I've written into it. There's a little bit of datarray overlap > potentially but I think that's OK Maybe we should make a list of function signatures along with brief doc strings to get a feel for what we (and hopefully others) have in mind? Where should we continue the discussion? The pystatsmodels mailing list? By now the numpy list probably thinks of NaN as "Not ANother" email from this guy. From kwgoodman at gmail.com Sun Nov 21 18:43:12 2010 From: kwgoodman at gmail.com (Keith Goodman) Date: Sun, 21 Nov 2010 15:43:12 -0800 Subject: [Numpy-discussion] Does np.std() make two passes through the data? Message-ID: Does np.std() make two passes through the data? Numpy: >> arr = np.random.rand(10) >> arr.std() 0.3008736260967052 Looks like an algorithm that makes one pass through the data (one for loop) wouldn't match arr.std(): >> np.sqrt((arr*arr).mean() - arr.mean()**2) 0.30087362609670526 But a slower two-pass algorithm would match arr.std(): >> np.sqrt(((arr - arr.mean())**2).mean()) 0.3008736260967052 Is there a way to get the same result as arr.std() in one pass (cython for loop) of the data? From josef.pktd at gmail.com Sun Nov 21 19:18:13 2010 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Sun, 21 Nov 2010 19:18:13 -0500 Subject: [Numpy-discussion] Does np.std() make two passes through the data? In-Reply-To: References: Message-ID: On Sun, Nov 21, 2010 at 6:43 PM, Keith Goodman wrote: > Does np.std() make two passes through the data? > > Numpy: > >>> arr = np.random.rand(10) >>> arr.std() > ? 0.3008736260967052 > > Looks like an algorithm that makes one pass through the data (one for > loop) wouldn't match arr.std(): > >>> np.sqrt((arr*arr).mean() - arr.mean()**2) > ? 0.30087362609670526 > > But a slower two-pass algorithm would match arr.std(): > >>> np.sqrt(((arr - arr.mean())**2).mean()) > ? 0.3008736260967052 > > Is there a way to get the same result as arr.std() in one pass (cython > for loop) of the data? reference several times pointed to on the list is the wikipedia page, e.g. http://en.wikipedia.org/wiki/Algorithms_for_calculating_variance#On-line_algorithm I don't know about actual implementation. Josef > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From robert.kern at gmail.com Sun Nov 21 19:41:22 2010 From: robert.kern at gmail.com (Robert Kern) Date: Sun, 21 Nov 2010 18:41:22 -0600 Subject: [Numpy-discussion] Does np.std() make two passes through the data? In-Reply-To: References: Message-ID: On Sun, Nov 21, 2010 at 17:43, Keith Goodman wrote: > Does np.std() make two passes through the data? Yes. See PyArray_Std() in numpy/core/src/calculation.c -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." ? -- Umberto Eco From kwgoodman at gmail.com Sun Nov 21 20:49:54 2010 From: kwgoodman at gmail.com (Keith Goodman) Date: Sun, 21 Nov 2010 17:49:54 -0800 Subject: [Numpy-discussion] Does np.std() make two passes through the data? In-Reply-To: References: Message-ID: On Sun, Nov 21, 2010 at 4:18 PM, wrote: > On Sun, Nov 21, 2010 at 6:43 PM, Keith Goodman wrote: >> Does np.std() make two passes through the data? >> >> Numpy: >> >>>> arr = np.random.rand(10) >>>> arr.std() >> ? 0.3008736260967052 >> >> Looks like an algorithm that makes one pass through the data (one for >> loop) wouldn't match arr.std(): >> >>>> np.sqrt((arr*arr).mean() - arr.mean()**2) >> ? 0.30087362609670526 >> >> But a slower two-pass algorithm would match arr.std(): >> >>>> np.sqrt(((arr - arr.mean())**2).mean()) >> ? 0.3008736260967052 >> >> Is there a way to get the same result as arr.std() in one pass (cython >> for loop) of the data? > > reference several times pointed to on the list is the wikipedia page, e.g. > http://en.wikipedia.org/wiki/Algorithms_for_calculating_variance#On-line_algorithm Unfortunately it doesn't give the same result as numpy's two pass algorithm. >From wikipedia: def var(arr): n = 0 mean = 0 M2 = 0 for x in arr: n = n + 1 delta = x - mean mean = mean + delta / n M2 = M2 + delta * (x - mean) return M2 / n This set of random samples gives matching variance: >> a = np.random.rand(100) >> a.var() 0.07867478939716277 >> var(a) 0.07867478939716277 But this sample gives a difference: >> a = np.random.rand(100) >> a.var() 0.080232196646619805 >> var(a) 0.080232196646619791 As you know, I'm trying to make a drop-in replacement for scipy.stats.nanstd. Maybe I'll have to add an asterisk to the drop-in part. Either that, or suck it up and store the damn mean. From robert.kern at gmail.com Sun Nov 21 20:56:36 2010 From: robert.kern at gmail.com (Robert Kern) Date: Sun, 21 Nov 2010 19:56:36 -0600 Subject: [Numpy-discussion] Does np.std() make two passes through the data? In-Reply-To: References: Message-ID: On Sun, Nov 21, 2010 at 19:49, Keith Goodman wrote: > But this sample gives a difference: > >>> a = np.random.rand(100) >>> a.var() > ? 0.080232196646619805 >>> var(a) > ? 0.080232196646619791 > > As you know, I'm trying to make a drop-in replacement for > scipy.stats.nanstd. Maybe I'll have to add an asterisk to the drop-in > part. Either that, or suck it up and store the damn mean. The difference is less than eps. Quite possibly, the one-pass version is even closer to the true value than the two-pass version. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." ? -- Umberto Eco From wesmckinn at gmail.com Sun Nov 21 21:03:22 2010 From: wesmckinn at gmail.com (Wes McKinney) Date: Sun, 21 Nov 2010 21:03:22 -0500 Subject: [Numpy-discussion] [ANN] Nanny, faster NaN functions In-Reply-To: References: Message-ID: On Sun, Nov 21, 2010 at 6:37 PM, Keith Goodman wrote: > On Sun, Nov 21, 2010 at 3:16 PM, Wes McKinney wrote: > >> What would you say to a single package that contains: >> >> - NaN-aware NumPy and SciPy functions (nanmean, nanmin, etc.) > > I'd say yes. > >> - moving window functions (moving_{count, sum, mean, var, std, etc.}) > > Yes. > > BTW, we both do arr=arr.astype(float), I think, before doing the > moving statistics. So I speeded things up by running the moving window > backwards and writing the result in place. > >> - core subroutines for labeled data > > Not sure what this would be. Let's discuss. Basically want to produce a indexing vector based on rules-- something to pass to ndarray.take later on. And maybe your generic binary-op function from a while back? >> - group-by functions > > Yes. I have some ideas on function signatures. > >> - other things to add to this list? > > A no-op function with a really long doc string! > >> In other words, basic building computational tools for making >> libraries like larry, pandas, etc. and doing time series / statistical >> / other manipulations on real world (messy) data sets. The focus isn't >> so much "NaN-awareness" per se but more practical "data wrangling". I >> would be happy to work on such a package and to move all the Cython >> code I've written into it. There's a little bit of datarray overlap >> potentially but I think that's OK > > Maybe we should make a list of function signatures along with brief > doc strings to get a feel for what we (and hopefully others) have in > mind? I've personally never been much for writing specs, but could be useful. We probably aren't going to get it all right on the first try, so we'll just do our best and refactor the code later if necessary. We might be well-served by collecting exemplary data sets and making a list of things we would like to be able to do easily with that data. But writing stuff like: moving_{funcname}(ndarray data, int window, int axis=0, int min_periods=window) -> ndarray group_aggregate(ndarray data, ndarray labels, int axis=0, function agg_function) -> ndarray group_transform(...) ... etc. makes sense > Where should we continue the discussion? The pystatsmodels mailing > list? By now the numpy list probably thinks of NaN as "Not ANother" > email from this guy. > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > Maybe let's have the next thread on SciPy-user-- I think what we're talking about is general enough to be discussed there. From kwgoodman at gmail.com Sun Nov 21 21:33:57 2010 From: kwgoodman at gmail.com (Keith Goodman) Date: Sun, 21 Nov 2010 18:33:57 -0800 Subject: [Numpy-discussion] Does np.std() make two passes through the data? In-Reply-To: References: Message-ID: On Sun, Nov 21, 2010 at 5:56 PM, Robert Kern wrote: > On Sun, Nov 21, 2010 at 19:49, Keith Goodman wrote: > >> But this sample gives a difference: >> >>>> a = np.random.rand(100) >>>> a.var() >> ? 0.080232196646619805 >>>> var(a) >> ? 0.080232196646619791 >> >> As you know, I'm trying to make a drop-in replacement for >> scipy.stats.nanstd. Maybe I'll have to add an asterisk to the drop-in >> part. Either that, or suck it up and store the damn mean. > > The difference is less than eps. Quite possibly, the one-pass version > is even closer to the true value than the two-pass version. Good, it passes the Kern test. Here's an even more robust estimate: >> var(a - a.mean()) 0.080232196646619819 Which is better, numpy's two pass or the one pass on-line method? >> test() NumPy error: 9.31135e-18 Nanny error: 6.5745e-18 <-- One pass wins! def test(n=100000): numpy = 0 nanny = 0 for i in range(n): a = np.random.rand(10) truth = var(a - a.mean()) numpy += np.absolute(truth - a.var()) nanny += np.absolute(truth - var(a)) print 'NumPy error: %g' % (numpy / n) print 'Nanny error: %g' % (nanny / n) print From basherwo at ncsu.edu Mon Nov 22 01:17:17 2010 From: basherwo at ncsu.edu (Bruce Sherwood) Date: Sun, 21 Nov 2010 23:17:17 -0700 Subject: [Numpy-discussion] Slow element-by-element access? Message-ID: A colleague showed me a program using Numeric with Python 2.5 which ran much faster than the same program using numpy with Python 2.7. I distilled this down to a simple test case, characterized by a "for" loop in which he does an element-by-element calculation involving arrays: from numpy import arange # or from Numeric import arange from time import clock # Numeric 0.24 seconds; 15 times as fast as numpy # numpy 3.6 seconds N = 100000 a = arange(N) b = arange(N) t = clock() for i in range(1,N-1): pass tpass = clock()-t t = clock() for i in range(1,N-1): b[i] = a[i]-t*(a[i+1]-2*a[i]+a[i-1])+a[i]*a[i]*t t = clock()-t print t-tpass His calculation b[i] = a[i]-t*(a[i+1]-2*a[i]+a[i-1])+a[i]*a[i]*t is 15 times faster with Numeric than with numpy. It is of course the case that he should have done a single array calculation rather than use a "for" loop, and I've explained that to him. The array calculation runs at about the same speed in numpy as in Numeric. Nevertheless, I'm surprised that the element-by-element calculation is so very slow, and surely there are situations where individual element access is important in a numpy calculation. Is this a known issue? Does it matter? I was unable to find any discussion of this. Bruce Sherwood From charlesr.harris at gmail.com Mon Nov 22 01:26:37 2010 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sun, 21 Nov 2010 23:26:37 -0700 Subject: [Numpy-discussion] Slow element-by-element access? In-Reply-To: References: Message-ID: On Sun, Nov 21, 2010 at 11:17 PM, Bruce Sherwood wrote: > A colleague showed me a program using Numeric with Python 2.5 which > ran much faster than the same program using numpy with Python 2.7. I > distilled this down to a simple test case, characterized by a "for" > loop in which he does an element-by-element calculation involving > arrays: > > from numpy import arange # or from Numeric import arange > from time import clock > # Numeric 0.24 seconds; 15 times as fast as numpy > # numpy 3.6 seconds > N = 100000 > a = arange(N) > b = arange(N) > t = clock() > for i in range(1,N-1): > pass > tpass = clock()-t > t = clock() > for i in range(1,N-1): > b[i] = a[i]-t*(a[i+1]-2*a[i]+a[i-1])+a[i]*a[i]*t > t = clock()-t > print t-tpass > > His calculation b[i] = a[i]-t*(a[i+1]-2*a[i]+a[i-1])+a[i]*a[i]*t is 15 > times faster with Numeric than with numpy. > > It is of course the case that he should have done a single array > calculation rather than use a "for" loop, and I've explained that to > him. The array calculation runs at about the same speed in numpy as in > Numeric. > > Nevertheless, I'm surprised that the element-by-element calculation is > so very slow, and surely there are situations where individual element > access is important in a numpy calculation. Is this a known issue? > Does it matter? I was unable to find any discussion of this. > > Yes, indexing is known to be slow, although I don't recall the precise reason for that. Something to do with way integers are handled or some such. There was some discussion on the list many years ago... Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From hagen at zhuliguan.net Mon Nov 22 02:51:03 2010 From: hagen at zhuliguan.net (=?UTF-8?B?SGFnZW4gRsO8cnN0ZW5hdQ==?=) Date: Mon, 22 Nov 2010 08:51:03 +0100 Subject: [Numpy-discussion] categorical distributions Message-ID: Hi, numpy doesn't seem to have a function for sampling from simple categorical distributions. The easiest solution I could come up with was something like >>> from numpy.random import multinomial >>> multinomial(1, [.5, .3, .2]).nonzero()[0][0] 1 but this is bound to be inefficient as soon as the vector of probabilities gets large, especially if you want to draw multiple samples. Have I overlooked something or should this be added? - Hagen From wardefar at iro.umontreal.ca Mon Nov 22 03:16:57 2010 From: wardefar at iro.umontreal.ca (David Warde-Farley) Date: Mon, 22 Nov 2010 03:16:57 -0500 Subject: [Numpy-discussion] categorical distributions In-Reply-To: References: Message-ID: On 2010-11-22, at 2:51 AM, Hagen F?rstenau wrote: > but this is bound to be inefficient as soon as the vector of > probabilities gets large, especially if you want to draw multiple samples. > > Have I overlooked something or should this be added? I think you misunderstand the point of multinomial distributions. A sample from a multinomial is simply a sample from n i.i.d. categoricals, reported as the counts for each category in the N observations. It's very easy to recover the 'categorical' samples from a 'multinomial' sample. import numpy as np a = np.random.multinomial(50, [.3, .3, .4]) b = np.zeros(50, dtype=int) upper = np.cumsum(a); lower = upper - a for value in range(len(a)): b[lower[value]:upper[value]] = value # mix up the order, in-place, if you care about them not being sorted np.random.shuffle(b) then b is a sample from the corresponding 'categorical' distribution. David From gael.varoquaux at normalesup.org Mon Nov 22 03:46:13 2010 From: gael.varoquaux at normalesup.org (Gael Varoquaux) Date: Mon, 22 Nov 2010 09:46:13 +0100 Subject: [Numpy-discussion] [ANN] Nanny, faster NaN functions In-Reply-To: References: Message-ID: <20101122084613.GC18433@phare.normalesup.org> On Sun, Nov 21, 2010 at 09:03:22PM -0500, Wes McKinney wrote: > Maybe let's have the next thread on SciPy-user-- I think what we're > talking about is general enough to be discussed there. Yes, a lot of this is of general interest. I'd be particularly interested in having the NaN work land in scipy. They are many places where it would be useful (I understand that it would require more work). G From hagen at zhuliguan.net Mon Nov 22 04:18:06 2010 From: hagen at zhuliguan.net (=?UTF-8?B?SGFnZW4gRsO8cnN0ZW5hdQ==?=) Date: Mon, 22 Nov 2010 10:18:06 +0100 Subject: [Numpy-discussion] categorical distributions In-Reply-To: References: Message-ID: >> but this is bound to be inefficient as soon as the vector of >> probabilities gets large, especially if you want to draw multiple samples. >> >> Have I overlooked something or should this be added? > > I think you misunderstand the point of multinomial distributions. I'm afraid the multiple samples were an afterthought and a false lead. My main use case would be consecutive samples of _different_ categorical distributions, for which abusing multinomials seems wasteful (as they have to allocate a large vector each time). Similarly for the alternative spelling >>> import numpy, random >>> a = numpy.array([.5, .3, .2]) >>> (a.cumsum()-random.random() >= 0).nonzero()[0][0] 1 ISTM that this elementary functionality deserves an implementation that's as fast as it can be. - Hagen From pav at iki.fi Mon Nov 22 04:30:05 2010 From: pav at iki.fi (Pauli Virtanen) Date: Mon, 22 Nov 2010 09:30:05 +0000 (UTC) Subject: [Numpy-discussion] Slow element-by-element access? References: Message-ID: Sun, 21 Nov 2010 23:26:37 -0700, Charles R Harris wrote: [clip] > Yes, indexing is known to be slow, although I don't recall the precise > reason for that. Something to do with way integers are handled or some > such. There was some discussion on the list many years ago... It could be useful if someone spent time on profiling the call overhead for integer indexing. From what I remember from that part of the code, I'm fairly sure it can be optimized. -- Pauli Virtanen From hagen at zhuliguan.net Mon Nov 22 06:05:16 2010 From: hagen at zhuliguan.net (=?UTF-8?B?SGFnZW4gRsO8cnN0ZW5hdQ==?=) Date: Mon, 22 Nov 2010 12:05:16 +0100 Subject: [Numpy-discussion] categorical distributions In-Reply-To: References: Message-ID: > ISTM that this elementary functionality deserves an implementation > that's as fast as it can be. To substantiate this, I just wrote a simple implementation of "categorical" in "numpy/random/mtrand.pyx" and it's more than 8x faster than your version for multiple samples of the same distribution and more than 3x faster than using "multinomial(1, ...)" for multiple samples of different distributions (each time tested with 1000 samples drawn from distributions over 1000 categories). I can provide it as a patch if there's any interest. - Hagen From josef.pktd at gmail.com Mon Nov 22 08:46:10 2010 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Mon, 22 Nov 2010 08:46:10 -0500 Subject: [Numpy-discussion] categorical distributions In-Reply-To: References: Message-ID: On Mon, Nov 22, 2010 at 6:05 AM, Hagen F?rstenau wrote: >> ISTM that this elementary functionality deserves an implementation >> that's as fast as it can be. > > To substantiate this, I just wrote a simple implementation of > "categorical" in "numpy/random/mtrand.pyx" and it's more than 8x faster > than your version for multiple samples of the same distribution and more > than 3x faster than using "multinomial(1, ...)" for multiple samples of > different distributions (each time tested with 1000 samples drawn from > distributions over 1000 categories). > > I can provide it as a patch if there's any interest. Can you compare the speed of your cython solution with the version of Chuck -- For instance, weight 0..3 by 1..4, then In [14]: w = arange(1,5) In [15]: p = cumsum(w)/float(w.sum()) In [16]: bincount(p.searchsorted(random(1000000)))/1e6 Out[16]: array([ 0.100336, 0.200382, 0.299132, 0.40015 ]) ------------- from numpy mailing list thread "Weighted random integers", sep. 10 Using searchsorted hat roughly a 10 times speedup compared to my multinomial version Josef > > - Hagen > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From charlesr.harris at gmail.com Mon Nov 22 10:32:54 2010 From: charlesr.harris at gmail.com (Charles R Harris) Date: Mon, 22 Nov 2010 08:32:54 -0700 Subject: [Numpy-discussion] Slow element-by-element access? In-Reply-To: References: Message-ID: On Mon, Nov 22, 2010 at 2:30 AM, Pauli Virtanen wrote: > Sun, 21 Nov 2010 23:26:37 -0700, Charles R Harris wrote: > [clip] > > Yes, indexing is known to be slow, although I don't recall the precise > > reason for that. Something to do with way integers are handled or some > > such. There was some discussion on the list many years ago... > > It could be useful if someone spent time on profiling the call overhead > for integer indexing. From what I remember from that part of the code, > I'm fairly sure it can be optimized. > > I was thinking the same. Profiling the code could be very useful. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From kwgoodman at gmail.com Mon Nov 22 12:03:30 2010 From: kwgoodman at gmail.com (Keith Goodman) Date: Mon, 22 Nov 2010 09:03:30 -0800 Subject: [Numpy-discussion] Does np.std() make two passes through the data? In-Reply-To: References: Message-ID: On Sun, Nov 21, 2010 at 5:56 PM, Robert Kern wrote: > On Sun, Nov 21, 2010 at 19:49, Keith Goodman wrote: > >> But this sample gives a difference: >> >>>> a = np.random.rand(100) >>>> a.var() >> ? 0.080232196646619805 >>>> var(a) >> ? 0.080232196646619791 >> >> As you know, I'm trying to make a drop-in replacement for >> scipy.stats.nanstd. Maybe I'll have to add an asterisk to the drop-in >> part. Either that, or suck it up and store the damn mean. > > The difference is less than eps. Quite possibly, the one-pass version > is even closer to the true value than the two-pass version. I wrote 3 cython prototype implementations of nanstd for 1d float64 arrays: >> a = np.random.rand(1000000) # numpy; doesn't take care of NaNs >> a.std() 0.28852169850186793 # cython of np.sqrt(((arr - arr.mean())**2).mean()) >> nanstd_twopass(a, ddof=0) 0.28852169850186798 # cython of np.sqrt((arr*arr).mean() - arr.mean()**2) >> nanstd_simple(a, ddof=0) 0.28852169850187437 # http://en.wikipedia.org/wiki/Algorithms_for_calculating_variance#On-line_algorithm >> nanstd_online(a, ddof=0) 0.28852169850187243 # My target, scipy version >> from scipy.stats import nanstd >> nanstd(a, bias=True) 0.28852169850186798 Timing: >> timeit nanstd(a, bias=True) 10 loops, best of 3: 27.8 ms per loop >> timeit a.std() 100 loops, best of 3: 11.5 ms per loop >> timeit nanstd_twopass(a, ddof=0) 100 loops, best of 3: 3.24 ms per loop >> timeit nanstd_simple(a, ddof=0) 1000 loops, best of 3: 1.6 ms per loop >> timeit nanstd_online(a, ddof=0) 100 loops, best of 3: 10.8 ms per loop nanstd_simple is the fastest but I assume the algorithm is no good for general use? I think I'll go with nanstd_twopass. It will most closely match numpy/scipy, is more robust than nanstd_simple, and is the second fastest. Here's the code. Improvements welcomed. @cython.boundscheck(False) @cython.wraparound(False) def nanstd_simple(np.ndarray[np.float64_t, ndim=1] a, int ddof): "nanstd of 1d numpy array with dtype=np.float64 along axis=0." cdef Py_ssize_t i cdef int a0 = a.shape[0], count = 0 cdef np.float64_t asum = 0, a2sum=0, ai for i in range(a0): ai = a[i] if ai == ai: asum += ai a2sum += ai * ai count += 1 if count > 0: asum = asum * asum return sqrt((a2sum - asum / count) / (count - ddof)) else: return np.float64(NAN) @cython.boundscheck(False) @cython.wraparound(False) def nanstd_online(np.ndarray[np.float64_t, ndim=1] a, int ddof): "nanstd of 1d numpy array with dtype=np.float64 along axis=0." cdef Py_ssize_t i cdef int a0 = a.shape[0], n = 0 cdef np.float64_t mean = 0, M2 = 0, delta, x for i in range(a0): x = a[i] if x == x: n += 1 delta = x - mean mean = mean + delta / n M2 = M2 + delta * (x - mean) if n > 0: return np.float64(sqrt(M2 / (n - ddof))) else: return np.float64(NAN) @cython.boundscheck(False) @cython.wraparound(False) def nanstd_twopass(np.ndarray[np.float64_t, ndim=1] a, int ddof): "nanstd of 1d numpy array with dtype=np.float64 along axis=0." cdef Py_ssize_t i cdef int a0 = a.shape[0], count = 0 cdef np.float64_t asum = 0, a2sum=0, amean, ai, da for i in range(a0): ai = a[i] if ai == ai: asum += ai count += 1 if count > 0: amean = asum / count asum = 0 for i in range(a0): ai = a[i] if ai == ai: da = ai - amean asum += da a2sum += (da * da) asum = asum * asum return sqrt((a2sum - asum / count) / (count - ddof)) else: return np.float64(NAN) From ben.root at ou.edu Mon Nov 22 12:07:56 2010 From: ben.root at ou.edu (Benjamin Root) Date: Mon, 22 Nov 2010 11:07:56 -0600 Subject: [Numpy-discussion] Does np.std() make two passes through the data? In-Reply-To: References: Message-ID: On Mon, Nov 22, 2010 at 11:03 AM, Keith Goodman wrote: > On Sun, Nov 21, 2010 at 5:56 PM, Robert Kern > wrote: > > On Sun, Nov 21, 2010 at 19:49, Keith Goodman > wrote: > > > >> But this sample gives a difference: > >> > >>>> a = np.random.rand(100) > >>>> a.var() > >> 0.080232196646619805 > >>>> var(a) > >> 0.080232196646619791 > >> > >> As you know, I'm trying to make a drop-in replacement for > >> scipy.stats.nanstd. Maybe I'll have to add an asterisk to the drop-in > >> part. Either that, or suck it up and store the damn mean. > > > > The difference is less than eps. Quite possibly, the one-pass version > > is even closer to the true value than the two-pass version. > > I wrote 3 cython prototype implementations of nanstd for 1d float64 arrays: > > >> a = np.random.rand(1000000) > > # numpy; doesn't take care of NaNs > >> a.std() > 0.28852169850186793 > > # cython of np.sqrt(((arr - arr.mean())**2).mean()) > >> nanstd_twopass(a, ddof=0) > 0.28852169850186798 > > # cython of np.sqrt((arr*arr).mean() - arr.mean()**2) > >> nanstd_simple(a, ddof=0) > 0.28852169850187437 > > # > http://en.wikipedia.org/wiki/Algorithms_for_calculating_variance#On-line_algorithm > >> nanstd_online(a, ddof=0) > 0.28852169850187243 > > # My target, scipy version > >> from scipy.stats import nanstd > >> nanstd(a, bias=True) > 0.28852169850186798 > > Timing: > > >> timeit nanstd(a, bias=True) > 10 loops, best of 3: 27.8 ms per loop > >> timeit a.std() > 100 loops, best of 3: 11.5 ms per loop > >> timeit nanstd_twopass(a, ddof=0) > 100 loops, best of 3: 3.24 ms per loop > >> timeit nanstd_simple(a, ddof=0) > 1000 loops, best of 3: 1.6 ms per loop > >> timeit nanstd_online(a, ddof=0) > 100 loops, best of 3: 10.8 ms per loop > > nanstd_simple is the fastest but I assume the algorithm is no good for > general use? > > I think I'll go with nanstd_twopass. It will most closely match > numpy/scipy, is more robust than nanstd_simple, and is the second > fastest. > > Here's the code. Improvements welcomed. > > @cython.boundscheck(False) > @cython.wraparound(False) > def nanstd_simple(np.ndarray[np.float64_t, ndim=1] a, int ddof): > "nanstd of 1d numpy array with dtype=np.float64 along axis=0." > cdef Py_ssize_t i > cdef int a0 = a.shape[0], count = 0 > cdef np.float64_t asum = 0, a2sum=0, ai > for i in range(a0): > ai = a[i] > if ai == ai: > asum += ai > a2sum += ai * ai > count += 1 > if count > 0: > asum = asum * asum > return sqrt((a2sum - asum / count) / (count - ddof)) > else: > return np.float64(NAN) > > @cython.boundscheck(False) > @cython.wraparound(False) > def nanstd_online(np.ndarray[np.float64_t, ndim=1] a, int ddof): > "nanstd of 1d numpy array with dtype=np.float64 along axis=0." > cdef Py_ssize_t i > cdef int a0 = a.shape[0], n = 0 > cdef np.float64_t mean = 0, M2 = 0, delta, x > for i in range(a0): > x = a[i] > if x == x: > n += 1 > delta = x - mean > mean = mean + delta / n > M2 = M2 + delta * (x - mean) > if n > 0: > return np.float64(sqrt(M2 / (n - ddof))) > else: > return np.float64(NAN) > > @cython.boundscheck(False) > @cython.wraparound(False) > def nanstd_twopass(np.ndarray[np.float64_t, ndim=1] a, int ddof): > "nanstd of 1d numpy array with dtype=np.float64 along axis=0." > cdef Py_ssize_t i > cdef int a0 = a.shape[0], count = 0 > cdef np.float64_t asum = 0, a2sum=0, amean, ai, da > for i in range(a0): > ai = a[i] > if ai == ai: > asum += ai > count += 1 > if count > 0: > amean = asum / count > asum = 0 > for i in range(a0): > ai = a[i] > if ai == ai: > da = ai - amean > asum += da > a2sum += (da * da) > asum = asum * asum > return sqrt((a2sum - asum / count) / (count - ddof)) > else: > return np.float64(NAN) > I wonder how the results would change if the size of the array was larger than the processor cache? I still can't seem to wrap my head around the idea that a two-pass algorithm would be faster than a single-pass. Is this just a big-O thing where sometimes one algorithm will be faster than the other based on the size of the problem? Ben Root -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Mon Nov 22 12:13:44 2010 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Mon, 22 Nov 2010 12:13:44 -0500 Subject: [Numpy-discussion] Does np.std() make two passes through the data? In-Reply-To: References: Message-ID: On Mon, Nov 22, 2010 at 12:07 PM, Benjamin Root wrote: > On Mon, Nov 22, 2010 at 11:03 AM, Keith Goodman wrote: >> >> On Sun, Nov 21, 2010 at 5:56 PM, Robert Kern >> wrote: >> > On Sun, Nov 21, 2010 at 19:49, Keith Goodman >> > wrote: >> > >> >> But this sample gives a difference: >> >> >> >>>> a = np.random.rand(100) >> >>>> a.var() >> >> ? 0.080232196646619805 >> >>>> var(a) >> >> ? 0.080232196646619791 >> >> >> >> As you know, I'm trying to make a drop-in replacement for >> >> scipy.stats.nanstd. Maybe I'll have to add an asterisk to the drop-in >> >> part. Either that, or suck it up and store the damn mean. >> > >> > The difference is less than eps. Quite possibly, the one-pass version >> > is even closer to the true value than the two-pass version. >> >> I wrote 3 cython prototype implementations of nanstd for 1d float64 >> arrays: >> >> >> a = np.random.rand(1000000) >> >> # numpy; doesn't take care of NaNs >> >> a.std() >> ? 0.28852169850186793 >> >> # cython of np.sqrt(((arr - arr.mean())**2).mean()) >> >> nanstd_twopass(a, ddof=0) >> ? 0.28852169850186798 >> >> # cython of np.sqrt((arr*arr).mean() - arr.mean()**2) >> >> nanstd_simple(a, ddof=0) >> ? 0.28852169850187437 >> >> # >> http://en.wikipedia.org/wiki/Algorithms_for_calculating_variance#On-line_algorithm >> >> nanstd_online(a, ddof=0) >> ? 0.28852169850187243 >> >> # My target, scipy version >> >> from scipy.stats import nanstd >> >> nanstd(a, bias=True) >> ? 0.28852169850186798 >> >> Timing: >> >> >> timeit nanstd(a, bias=True) >> 10 loops, best of 3: 27.8 ms per loop >> >> timeit a.std() >> 100 loops, best of 3: 11.5 ms per loop >> >> timeit nanstd_twopass(a, ddof=0) >> 100 loops, best of 3: 3.24 ms per loop >> >> timeit nanstd_simple(a, ddof=0) >> 1000 loops, best of 3: 1.6 ms per loop >> >> timeit nanstd_online(a, ddof=0) >> 100 loops, best of 3: 10.8 ms per loop >> >> nanstd_simple is the fastest but I assume the algorithm is no good for >> general use? >> >> I think I'll go with nanstd_twopass. It will most closely match >> numpy/scipy, is more robust than nanstd_simple, and is the second >> fastest. >> >> Here's the code. Improvements welcomed. >> >> @cython.boundscheck(False) >> @cython.wraparound(False) >> def nanstd_simple(np.ndarray[np.float64_t, ndim=1] a, int ddof): >> ? ?"nanstd of 1d numpy array with dtype=np.float64 along axis=0." >> ? ?cdef Py_ssize_t i >> ? ?cdef int a0 = a.shape[0], count = 0 >> ? ?cdef np.float64_t asum = 0, a2sum=0, ai >> ? ?for i in range(a0): >> ? ? ? ?ai = a[i] >> ? ? ? ?if ai == ai: >> ? ? ? ? ? ?asum += ai >> ? ? ? ? ? ?a2sum += ai * ai >> ? ? ? ? ? ?count += 1 >> ? ?if count > 0: >> ? ? ? ?asum = asum * asum >> ? ? ? ?return sqrt((a2sum - asum / count) / (count - ddof)) >> ? ?else: >> ? ? ? ?return np.float64(NAN) >> >> @cython.boundscheck(False) >> @cython.wraparound(False) >> def nanstd_online(np.ndarray[np.float64_t, ndim=1] a, int ddof): >> ? ?"nanstd of 1d numpy array with dtype=np.float64 along axis=0." >> ? ?cdef Py_ssize_t i >> ? ?cdef int a0 = a.shape[0], n = 0 >> ? ?cdef np.float64_t mean = 0, M2 = 0, delta, x >> ? ?for i in range(a0): >> ? ? ? ?x = a[i] >> ? ? ? ?if x == x: >> ? ? ? ? ? ?n += 1 >> ? ? ? ? ? ?delta = x - mean >> ? ? ? ? ? ?mean = mean + delta / n >> ? ? ? ? ? ?M2 = M2 + delta * (x - mean) >> ? ?if n > 0: >> ? ? ? ?return np.float64(sqrt(M2 / (n - ddof))) >> ? ?else: >> ? ? ? ?return np.float64(NAN) >> >> @cython.boundscheck(False) >> @cython.wraparound(False) >> def nanstd_twopass(np.ndarray[np.float64_t, ndim=1] a, int ddof): >> ? ?"nanstd of 1d numpy array with dtype=np.float64 along axis=0." >> ? ?cdef Py_ssize_t i >> ? ?cdef int a0 = a.shape[0], count = 0 >> ? ?cdef np.float64_t asum = 0, a2sum=0, amean, ai, da >> ? ?for i in range(a0): >> ? ? ? ?ai = a[i] >> ? ? ? ?if ai == ai: >> ? ? ? ? ? ?asum += ai >> ? ? ? ? ? ?count += 1 >> ? ?if count > 0: >> ? ? ? ?amean = asum / count >> ? ? ? ?asum = 0 >> ? ? ? ?for i in range(a0): >> ? ? ? ? ? ?ai = a[i] >> ? ? ? ? ? ?if ai == ai: >> ? ? ? ? ? ? ? ?da = ai - amean >> ? ? ? ? ? ? ? ?asum += da >> ? ? ? ? ? ? ? ?a2sum += (da * da) >> ? ? ? ?asum = asum * asum >> ? ? ? ?return sqrt((a2sum - asum / count) / (count - ddof)) >> ? ?else: >> ? ? ? ?return np.float64(NAN) > > I wonder how the results would change if the size of the array was larger > than the processor cache?? I still can't seem to wrap my head around the > idea that a two-pass algorithm would be faster than a single-pass.? Is this > just a big-O thing where sometimes one algorithm will be faster than the > other based on the size of the problem? nanstd_online requires too many divisions according to the Wikipedia article, and is useful mainly if the array doesn't fit in memory. Two pass would provide precision that we would expect in numpy, but I don't know if anyone ever tested the NIST problems for basic statistics. Josef > > Ben Root > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > From kwgoodman at gmail.com Mon Nov 22 12:28:27 2010 From: kwgoodman at gmail.com (Keith Goodman) Date: Mon, 22 Nov 2010 09:28:27 -0800 Subject: [Numpy-discussion] Does np.std() make two passes through the data? In-Reply-To: References: Message-ID: On Mon, Nov 22, 2010 at 9:13 AM, wrote: > Two pass would provide precision that we would expect in numpy, but I > don't know if anyone ever tested the NIST problems for basic > statistics. Here are the results for their most difficult dataset. But I guess running one test doesn't mean anything. http://www.itl.nist.gov/div898/strd/univ/addinfo/numacc4.html >> np.absolute(a.std(ddof=1) - 0.1) 5.5884095961911129e-10 >> np.absolute(nanstd_online(a, ddof=1) - 0.1) 5.5890501948763216e-10 >> np.absolute(nanstd_simple(a, ddof=1) - 0.1) nan # Ha! >> np.absolute(nanstd_twopass(a, ddof=1) - 0.1) 5.5879308125117433e-10 From oliphant at enthought.com Mon Nov 22 12:42:03 2010 From: oliphant at enthought.com (Travis Oliphant) Date: Mon, 22 Nov 2010 11:42:03 -0600 Subject: [Numpy-discussion] Slow element-by-element access? In-Reply-To: References: Message-ID: Basically, indexing in Python is a little slower, the number of things indexing can do is more varied, and more to the point, the objects returned from arrays are NumPy scalars (which have math which is not optimized). If you do element-by-element indexing, it's generally best to use Python lists (this has always been true, even with Numeric). The item method on NumPy arrays will speed up these kind of loops: from numpy import arange from time import clock N = 100000 a = arange(N) b = arange(N) al = a.tolist() bl = b.tolist() t = clock() for i in range(1, N-1): pass tpass = clock() - t t = clock() ai = a.item for i in range(1, N-1): val = ai(i) - t*(ai(i+1) - 2*ai(i) + ai(i-1)) + ai(i)*ai(i)*t b.itemset(i, val % (2**31)) tfast = clock() - t t = clock() for i in range(1, N-1): val = a[i] - t*(a[i+1] - 2*a[i] + a[i-1]) + a[i]*a[i]*t b[i] = val % (2**31) tslow = clock() - t t = clock() for i in range(1, N-1): val = al[i] - t*(al[i+1] - 2*al[i] + al[i-1]) + al[i]*al[i]*t bl[i] = val % (2**31) tlist = clock() - t print tpass, tfast, tslow, tlist On my system, the list version is the fastest, while the itemset method is about 10x faster than the full indexing approach. The item method not only does faster indexing, but it also returns Python scalars rather than NumPy scalars. -Travis From josef.pktd at gmail.com Mon Nov 22 13:12:14 2010 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Mon, 22 Nov 2010 13:12:14 -0500 Subject: [Numpy-discussion] Does np.std() make two passes through the data? In-Reply-To: References: Message-ID: On Mon, Nov 22, 2010 at 12:28 PM, Keith Goodman wrote: > On Mon, Nov 22, 2010 at 9:13 AM, ? wrote: > >> Two pass would provide precision that we would expect in numpy, but I >> don't know if anyone ever tested the NIST problems for basic >> statistics. > > Here are the results for their most difficult dataset. But I guess > running one test doesn't mean anything. > > http://www.itl.nist.gov/div898/strd/univ/addinfo/numacc4.html > >>> np.absolute(a.std(ddof=1) - 0.1) > ? 5.5884095961911129e-10 >>> np.absolute(nanstd_online(a, ddof=1) - 0.1) > ? 5.5890501948763216e-10 >>> np.absolute(nanstd_simple(a, ddof=1) - 0.1) > ? nan ?# Ha! >>> np.absolute(nanstd_twopass(a, ddof=1) - 0.1) > ? 5.5879308125117433e-10 Thanks, e-10 is better than I expected for a tough test, but confirms that I don't trust any statistics by more than 6 to 10 decimals or digits. Josef > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From kwgoodman at gmail.com Mon Nov 22 13:26:08 2010 From: kwgoodman at gmail.com (Keith Goodman) Date: Mon, 22 Nov 2010 10:26:08 -0800 Subject: [Numpy-discussion] Does np.std() make two passes through the data? In-Reply-To: References: Message-ID: On Mon, Nov 22, 2010 at 9:03 AM, Keith Goodman wrote: > @cython.boundscheck(False) > @cython.wraparound(False) > def nanstd_twopass(np.ndarray[np.float64_t, ndim=1] a, int ddof): > ? ?"nanstd of 1d numpy array with dtype=np.float64 along axis=0." > ? ?cdef Py_ssize_t i > ? ?cdef int a0 = a.shape[0], count = 0 > ? ?cdef np.float64_t asum = 0, a2sum=0, amean, ai, da > ? ?for i in range(a0): > ? ? ? ?ai = a[i] > ? ? ? ?if ai == ai: > ? ? ? ? ? ?asum += ai > ? ? ? ? ? ?count += 1 > ? ?if count > 0: > ? ? ? ?amean = asum / count > ? ? ? ?asum = 0 > ? ? ? ?for i in range(a0): > ? ? ? ? ? ?ai = a[i] > ? ? ? ? ? ?if ai == ai: > ? ? ? ? ? ? ? ?da = ai - amean > ? ? ? ? ? ? ? ?asum += da > ? ? ? ? ? ? ? ?a2sum += (da * da) > ? ? ? ?asum = asum * asum > ? ? ? ?return sqrt((a2sum - asum / count) / (count - ddof)) > ? ?else: > ? ? ? ?return np.float64(NAN) This is 5% faster: @cython.boundscheck(False) @cython.wraparound(False) def nanstd_1d_float64_axis0_2(np.ndarray[np.float64_t, ndim=1] a, int ddof): "nanstd of 1d numpy array with dtype=np.float64 along axis=0." cdef Py_ssize_t i cdef int a0 = a.shape[0], count = 0 cdef np.float64_t asum = 0, amean, ai for i in range(a0): ai = a[i] if ai == ai: asum += ai count += 1 if count > 0: amean = asum / count asum = 0 for i in range(a0): ai = a[i] if ai == ai: ai -= amean asum += (ai * ai) return sqrt(asum / (count - ddof)) else: return np.float64(NAN) From josef.pktd at gmail.com Mon Nov 22 13:32:37 2010 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Mon, 22 Nov 2010 13:32:37 -0500 Subject: [Numpy-discussion] Does np.std() make two passes through the data? In-Reply-To: References: Message-ID: On Mon, Nov 22, 2010 at 1:26 PM, Keith Goodman wrote: > On Mon, Nov 22, 2010 at 9:03 AM, Keith Goodman wrote: > >> @cython.boundscheck(False) >> @cython.wraparound(False) >> def nanstd_twopass(np.ndarray[np.float64_t, ndim=1] a, int ddof): >> ? ?"nanstd of 1d numpy array with dtype=np.float64 along axis=0." >> ? ?cdef Py_ssize_t i >> ? ?cdef int a0 = a.shape[0], count = 0 >> ? ?cdef np.float64_t asum = 0, a2sum=0, amean, ai, da >> ? ?for i in range(a0): >> ? ? ? ?ai = a[i] >> ? ? ? ?if ai == ai: >> ? ? ? ? ? ?asum += ai >> ? ? ? ? ? ?count += 1 >> ? ?if count > 0: >> ? ? ? ?amean = asum / count >> ? ? ? ?asum = 0 >> ? ? ? ?for i in range(a0): >> ? ? ? ? ? ?ai = a[i] >> ? ? ? ? ? ?if ai == ai: >> ? ? ? ? ? ? ? ?da = ai - amean >> ? ? ? ? ? ? ? ?asum += da >> ? ? ? ? ? ? ? ?a2sum += (da * da) >> ? ? ? ?asum = asum * asum >> ? ? ? ?return sqrt((a2sum - asum / count) / (count - ddof)) >> ? ?else: >> ? ? ? ?return np.float64(NAN) > > This is 5% faster: > > @cython.boundscheck(False) > @cython.wraparound(False) > def nanstd_1d_float64_axis0_2(np.ndarray[np.float64_t, ndim=1] a, int ddof): > ? ?"nanstd of 1d numpy array with dtype=np.float64 along axis=0." > ? ?cdef Py_ssize_t i > ? ?cdef int a0 = a.shape[0], count = 0 > ? ?cdef np.float64_t asum = 0, amean, ai > ? ?for i in range(a0): > ? ? ? ?ai = a[i] > ? ? ? ?if ai == ai: > ? ? ? ? ? ?asum += ai > ? ? ? ? ? ?count += 1 > ? ?if count > 0: > ? ? ? ?amean = asum / count > ? ? ? ?asum = 0 > ? ? ? ?for i in range(a0): > ? ? ? ? ? ?ai = a[i] > ? ? ? ? ? ?if ai == ai: > ? ? ? ? ? ? ? ?ai -= amean > ? ? ? ? ? ? ? ?asum += (ai * ai) > ? ? ? ?return sqrt(asum / (count - ddof)) > ? ?else: > ? ? ? ?return np.float64(NAN) I think it would be better to write nanvar instead of nanstd and take the square root only in a delegating nanstd, instead of the other way around. (Also a change that should be made in scipy.stats) Josef > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From kwgoodman at gmail.com Mon Nov 22 13:39:31 2010 From: kwgoodman at gmail.com (Keith Goodman) Date: Mon, 22 Nov 2010 10:39:31 -0800 Subject: [Numpy-discussion] Does np.std() make two passes through the data? In-Reply-To: References: Message-ID: On Mon, Nov 22, 2010 at 10:32 AM, wrote: > On Mon, Nov 22, 2010 at 1:26 PM, Keith Goodman wrote: >> On Mon, Nov 22, 2010 at 9:03 AM, Keith Goodman wrote: >> >>> @cython.boundscheck(False) >>> @cython.wraparound(False) >>> def nanstd_twopass(np.ndarray[np.float64_t, ndim=1] a, int ddof): >>> ? ?"nanstd of 1d numpy array with dtype=np.float64 along axis=0." >>> ? ?cdef Py_ssize_t i >>> ? ?cdef int a0 = a.shape[0], count = 0 >>> ? ?cdef np.float64_t asum = 0, a2sum=0, amean, ai, da >>> ? ?for i in range(a0): >>> ? ? ? ?ai = a[i] >>> ? ? ? ?if ai == ai: >>> ? ? ? ? ? ?asum += ai >>> ? ? ? ? ? ?count += 1 >>> ? ?if count > 0: >>> ? ? ? ?amean = asum / count >>> ? ? ? ?asum = 0 >>> ? ? ? ?for i in range(a0): >>> ? ? ? ? ? ?ai = a[i] >>> ? ? ? ? ? ?if ai == ai: >>> ? ? ? ? ? ? ? ?da = ai - amean >>> ? ? ? ? ? ? ? ?asum += da >>> ? ? ? ? ? ? ? ?a2sum += (da * da) >>> ? ? ? ?asum = asum * asum >>> ? ? ? ?return sqrt((a2sum - asum / count) / (count - ddof)) >>> ? ?else: >>> ? ? ? ?return np.float64(NAN) >> >> This is 5% faster: >> >> @cython.boundscheck(False) >> @cython.wraparound(False) >> def nanstd_1d_float64_axis0_2(np.ndarray[np.float64_t, ndim=1] a, int ddof): >> ? ?"nanstd of 1d numpy array with dtype=np.float64 along axis=0." >> ? ?cdef Py_ssize_t i >> ? ?cdef int a0 = a.shape[0], count = 0 >> ? ?cdef np.float64_t asum = 0, amean, ai >> ? ?for i in range(a0): >> ? ? ? ?ai = a[i] >> ? ? ? ?if ai == ai: >> ? ? ? ? ? ?asum += ai >> ? ? ? ? ? ?count += 1 >> ? ?if count > 0: >> ? ? ? ?amean = asum / count >> ? ? ? ?asum = 0 >> ? ? ? ?for i in range(a0): >> ? ? ? ? ? ?ai = a[i] >> ? ? ? ? ? ?if ai == ai: >> ? ? ? ? ? ? ? ?ai -= amean >> ? ? ? ? ? ? ? ?asum += (ai * ai) >> ? ? ? ?return sqrt(asum / (count - ddof)) >> ? ?else: >> ? ? ? ?return np.float64(NAN) > > I think it would be better to write nanvar instead of nanstd and take > the square root only in a delegating nanstd, instead of the other way > around. (Also a change that should be made in scipy.stats) Yeah, I noticed that numpy does that. I was planning to have separate var and std functions. Here's why (from the readme file, but maybe I should template it, the sqrt automatically converts large ddof to NaN): Under the hood Nanny uses a separate Cython function for each combination of ndim, dtype, and axis. A lot of the overhead in ny.nanmax, for example, is in checking that your axis is within range, converting non-array data to an array, and selecting the function to use to calculate nanmax. You can get rid of the overhead by doing all this before you, say, enter an inner loop: >>> arr = np.random.rand(10,10) >>> axis = 0 >>> func, a = ny.func.nanmax_selector(arr, axis) >>> func.__name__ 'nanmax_2d_float64_axis0' Let's see how much faster than runs: >> timeit np.nanmax(arr, axis=0) 10000 loops, best of 3: 25.7 us per loop >> timeit ny.nanmax(arr, axis=0) 100000 loops, best of 3: 5.25 us per loop >> timeit func(a) 100000 loops, best of 3: 2.5 us per loop Note that func is faster than the Numpy's non-nan version of max: >> timeit arr.max(axis=0) 100000 loops, best of 3: 3.28 us per loop So adding NaN protection to your inner loops has a negative cost! From josef.pktd at gmail.com Mon Nov 22 13:51:10 2010 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Mon, 22 Nov 2010 13:51:10 -0500 Subject: [Numpy-discussion] Does np.std() make two passes through the data? In-Reply-To: References: Message-ID: On Mon, Nov 22, 2010 at 1:39 PM, Keith Goodman wrote: > On Mon, Nov 22, 2010 at 10:32 AM, ? wrote: >> On Mon, Nov 22, 2010 at 1:26 PM, Keith Goodman wrote: >>> On Mon, Nov 22, 2010 at 9:03 AM, Keith Goodman wrote: >>> >>>> @cython.boundscheck(False) >>>> @cython.wraparound(False) >>>> def nanstd_twopass(np.ndarray[np.float64_t, ndim=1] a, int ddof): >>>> ? ?"nanstd of 1d numpy array with dtype=np.float64 along axis=0." >>>> ? ?cdef Py_ssize_t i >>>> ? ?cdef int a0 = a.shape[0], count = 0 >>>> ? ?cdef np.float64_t asum = 0, a2sum=0, amean, ai, da >>>> ? ?for i in range(a0): >>>> ? ? ? ?ai = a[i] >>>> ? ? ? ?if ai == ai: >>>> ? ? ? ? ? ?asum += ai >>>> ? ? ? ? ? ?count += 1 >>>> ? ?if count > 0: >>>> ? ? ? ?amean = asum / count >>>> ? ? ? ?asum = 0 >>>> ? ? ? ?for i in range(a0): >>>> ? ? ? ? ? ?ai = a[i] >>>> ? ? ? ? ? ?if ai == ai: >>>> ? ? ? ? ? ? ? ?da = ai - amean >>>> ? ? ? ? ? ? ? ?asum += da >>>> ? ? ? ? ? ? ? ?a2sum += (da * da) >>>> ? ? ? ?asum = asum * asum >>>> ? ? ? ?return sqrt((a2sum - asum / count) / (count - ddof)) >>>> ? ?else: >>>> ? ? ? ?return np.float64(NAN) >>> >>> This is 5% faster: >>> >>> @cython.boundscheck(False) >>> @cython.wraparound(False) >>> def nanstd_1d_float64_axis0_2(np.ndarray[np.float64_t, ndim=1] a, int ddof): >>> ? ?"nanstd of 1d numpy array with dtype=np.float64 along axis=0." >>> ? ?cdef Py_ssize_t i >>> ? ?cdef int a0 = a.shape[0], count = 0 >>> ? ?cdef np.float64_t asum = 0, amean, ai >>> ? ?for i in range(a0): >>> ? ? ? ?ai = a[i] >>> ? ? ? ?if ai == ai: >>> ? ? ? ? ? ?asum += ai >>> ? ? ? ? ? ?count += 1 >>> ? ?if count > 0: >>> ? ? ? ?amean = asum / count >>> ? ? ? ?asum = 0 >>> ? ? ? ?for i in range(a0): >>> ? ? ? ? ? ?ai = a[i] >>> ? ? ? ? ? ?if ai == ai: >>> ? ? ? ? ? ? ? ?ai -= amean >>> ? ? ? ? ? ? ? ?asum += (ai * ai) >>> ? ? ? ?return sqrt(asum / (count - ddof)) >>> ? ?else: >>> ? ? ? ?return np.float64(NAN) >> >> I think it would be better to write nanvar instead of nanstd and take >> the square root only in a delegating nanstd, instead of the other way >> around. (Also a change that should be made in scipy.stats) > > Yeah, I noticed that numpy does that. I was planning to have separate > var and std functions. Here's why (from the readme file, but maybe I > should template it, the sqrt automatically converts large ddof to > NaN): I'm not sure what you are saying, dropping the squareroot in the function doesn't require nan handling in the inner loop. If you want to return nan when count-ddof<=0, then you could just replace if count > 0: ... by if count -ddof > 0: ... Or am I missing the point? Josef > > Under the hood Nanny uses a separate Cython function for each > combination of ndim, dtype, and axis. A lot of the overhead in > ny.nanmax, for example, is in checking that your axis is within range, > converting non-array data to an array, and selecting the function to > use to calculate nanmax. > > You can get rid of the overhead by doing all this before you, say, > enter an inner loop: > >>>> arr = np.random.rand(10,10) >>>> axis = 0 >>>> func, a = ny.func.nanmax_selector(arr, axis) >>>> func.__name__ > 'nanmax_2d_float64_axis0' > > Let's see how much faster than runs: > >>> timeit np.nanmax(arr, axis=0) > 10000 loops, best of 3: 25.7 us per loop >>> timeit ny.nanmax(arr, axis=0) > 100000 loops, best of 3: 5.25 us per loop >>> timeit func(a) > 100000 loops, best of 3: 2.5 us per loop > > Note that func is faster than the Numpy's non-nan version of max: > >>> timeit arr.max(axis=0) > 100000 loops, best of 3: 3.28 us per loop > > So adding NaN protection to your inner loops has a negative cost! > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From kwgoodman at gmail.com Mon Nov 22 13:59:51 2010 From: kwgoodman at gmail.com (Keith Goodman) Date: Mon, 22 Nov 2010 10:59:51 -0800 Subject: [Numpy-discussion] Does np.std() make two passes through the data? In-Reply-To: References: Message-ID: On Mon, Nov 22, 2010 at 10:51 AM, wrote: > On Mon, Nov 22, 2010 at 1:39 PM, Keith Goodman wrote: >> On Mon, Nov 22, 2010 at 10:32 AM, ? wrote: >>> On Mon, Nov 22, 2010 at 1:26 PM, Keith Goodman wrote: >>>> On Mon, Nov 22, 2010 at 9:03 AM, Keith Goodman wrote: >>>> >>>>> @cython.boundscheck(False) >>>>> @cython.wraparound(False) >>>>> def nanstd_twopass(np.ndarray[np.float64_t, ndim=1] a, int ddof): >>>>> ? ?"nanstd of 1d numpy array with dtype=np.float64 along axis=0." >>>>> ? ?cdef Py_ssize_t i >>>>> ? ?cdef int a0 = a.shape[0], count = 0 >>>>> ? ?cdef np.float64_t asum = 0, a2sum=0, amean, ai, da >>>>> ? ?for i in range(a0): >>>>> ? ? ? ?ai = a[i] >>>>> ? ? ? ?if ai == ai: >>>>> ? ? ? ? ? ?asum += ai >>>>> ? ? ? ? ? ?count += 1 >>>>> ? ?if count > 0: >>>>> ? ? ? ?amean = asum / count >>>>> ? ? ? ?asum = 0 >>>>> ? ? ? ?for i in range(a0): >>>>> ? ? ? ? ? ?ai = a[i] >>>>> ? ? ? ? ? ?if ai == ai: >>>>> ? ? ? ? ? ? ? ?da = ai - amean >>>>> ? ? ? ? ? ? ? ?asum += da >>>>> ? ? ? ? ? ? ? ?a2sum += (da * da) >>>>> ? ? ? ?asum = asum * asum >>>>> ? ? ? ?return sqrt((a2sum - asum / count) / (count - ddof)) >>>>> ? ?else: >>>>> ? ? ? ?return np.float64(NAN) >>>> >>>> This is 5% faster: >>>> >>>> @cython.boundscheck(False) >>>> @cython.wraparound(False) >>>> def nanstd_1d_float64_axis0_2(np.ndarray[np.float64_t, ndim=1] a, int ddof): >>>> ? ?"nanstd of 1d numpy array with dtype=np.float64 along axis=0." >>>> ? ?cdef Py_ssize_t i >>>> ? ?cdef int a0 = a.shape[0], count = 0 >>>> ? ?cdef np.float64_t asum = 0, amean, ai >>>> ? ?for i in range(a0): >>>> ? ? ? ?ai = a[i] >>>> ? ? ? ?if ai == ai: >>>> ? ? ? ? ? ?asum += ai >>>> ? ? ? ? ? ?count += 1 >>>> ? ?if count > 0: >>>> ? ? ? ?amean = asum / count >>>> ? ? ? ?asum = 0 >>>> ? ? ? ?for i in range(a0): >>>> ? ? ? ? ? ?ai = a[i] >>>> ? ? ? ? ? ?if ai == ai: >>>> ? ? ? ? ? ? ? ?ai -= amean >>>> ? ? ? ? ? ? ? ?asum += (ai * ai) >>>> ? ? ? ?return sqrt(asum / (count - ddof)) >>>> ? ?else: >>>> ? ? ? ?return np.float64(NAN) >>> >>> I think it would be better to write nanvar instead of nanstd and take >>> the square root only in a delegating nanstd, instead of the other way >>> around. (Also a change that should be made in scipy.stats) >> >> Yeah, I noticed that numpy does that. I was planning to have separate >> var and std functions. Here's why (from the readme file, but maybe I >> should template it, the sqrt automatically converts large ddof to >> NaN): > > I'm not sure what you are saying, dropping the squareroot in the > function doesn't require nan handling in the inner loop. If you want > to return nan when count-ddof<=0, then you could just replace > > if count > 0: > ... > > by > if count -ddof > 0: > ... > > Or am I missing the point? Yes, sorry. Ignore the sqrt/nan comment. The point is that I want to be able to return the underlying, low-level cython function (see below). So I need separate var and std versions to do that (unless I can modify default kwargs on the fly). >> Under the hood Nanny uses a separate Cython function for each >> combination of ndim, dtype, and axis. A lot of the overhead in >> ny.nanmax, for example, is in checking that your axis is within range, >> converting non-array data to an array, and selecting the function to >> use to calculate nanmax. >> >> You can get rid of the overhead by doing all this before you, say, >> enter an inner loop: >> >>>>> arr = np.random.rand(10,10) >>>>> axis = 0 >>>>> func, a = ny.func.nanmax_selector(arr, axis) >>>>> func.__name__ >> 'nanmax_2d_float64_axis0' >> >> Let's see how much faster than runs: >> >>>> timeit np.nanmax(arr, axis=0) >> 10000 loops, best of 3: 25.7 us per loop >>>> timeit ny.nanmax(arr, axis=0) >> 100000 loops, best of 3: 5.25 us per loop >>>> timeit func(a) >> 100000 loops, best of 3: 2.5 us per loop >> >> Note that func is faster than the Numpy's non-nan version of max: >> >>>> timeit arr.max(axis=0) >> 100000 loops, best of 3: 3.28 us per loop >> >> So adding NaN protection to your inner loops has a negative cost! From josef.pktd at gmail.com Mon Nov 22 14:00:33 2010 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Mon, 22 Nov 2010 14:00:33 -0500 Subject: [Numpy-discussion] Does np.std() make two passes through the data? In-Reply-To: References: Message-ID: On Mon, Nov 22, 2010 at 1:51 PM, wrote: > On Mon, Nov 22, 2010 at 1:39 PM, Keith Goodman wrote: >> On Mon, Nov 22, 2010 at 10:32 AM, ? wrote: >>> On Mon, Nov 22, 2010 at 1:26 PM, Keith Goodman wrote: >>>> On Mon, Nov 22, 2010 at 9:03 AM, Keith Goodman wrote: >>>> >>>>> @cython.boundscheck(False) >>>>> @cython.wraparound(False) >>>>> def nanstd_twopass(np.ndarray[np.float64_t, ndim=1] a, int ddof): >>>>> ? ?"nanstd of 1d numpy array with dtype=np.float64 along axis=0." >>>>> ? ?cdef Py_ssize_t i >>>>> ? ?cdef int a0 = a.shape[0], count = 0 >>>>> ? ?cdef np.float64_t asum = 0, a2sum=0, amean, ai, da >>>>> ? ?for i in range(a0): >>>>> ? ? ? ?ai = a[i] >>>>> ? ? ? ?if ai == ai: >>>>> ? ? ? ? ? ?asum += ai >>>>> ? ? ? ? ? ?count += 1 >>>>> ? ?if count > 0: >>>>> ? ? ? ?amean = asum / count >>>>> ? ? ? ?asum = 0 >>>>> ? ? ? ?for i in range(a0): >>>>> ? ? ? ? ? ?ai = a[i] >>>>> ? ? ? ? ? ?if ai == ai: >>>>> ? ? ? ? ? ? ? ?da = ai - amean >>>>> ? ? ? ? ? ? ? ?asum += da >>>>> ? ? ? ? ? ? ? ?a2sum += (da * da) >>>>> ? ? ? ?asum = asum * asum >>>>> ? ? ? ?return sqrt((a2sum - asum / count) / (count - ddof)) >>>>> ? ?else: >>>>> ? ? ? ?return np.float64(NAN) >>>> >>>> This is 5% faster: >>>> >>>> @cython.boundscheck(False) >>>> @cython.wraparound(False) >>>> def nanstd_1d_float64_axis0_2(np.ndarray[np.float64_t, ndim=1] a, int ddof): >>>> ? ?"nanstd of 1d numpy array with dtype=np.float64 along axis=0." >>>> ? ?cdef Py_ssize_t i >>>> ? ?cdef int a0 = a.shape[0], count = 0 >>>> ? ?cdef np.float64_t asum = 0, amean, ai >>>> ? ?for i in range(a0): >>>> ? ? ? ?ai = a[i] >>>> ? ? ? ?if ai == ai: >>>> ? ? ? ? ? ?asum += ai >>>> ? ? ? ? ? ?count += 1 >>>> ? ?if count > 0: >>>> ? ? ? ?amean = asum / count >>>> ? ? ? ?asum = 0 >>>> ? ? ? ?for i in range(a0): >>>> ? ? ? ? ? ?ai = a[i] >>>> ? ? ? ? ? ?if ai == ai: >>>> ? ? ? ? ? ? ? ?ai -= amean >>>> ? ? ? ? ? ? ? ?asum += (ai * ai) >>>> ? ? ? ?return sqrt(asum / (count - ddof)) >>>> ? ?else: >>>> ? ? ? ?return np.float64(NAN) >>> >>> I think it would be better to write nanvar instead of nanstd and take >>> the square root only in a delegating nanstd, instead of the other way >>> around. (Also a change that should be made in scipy.stats) >> >> Yeah, I noticed that numpy does that. I was planning to have separate >> var and std functions. Here's why (from the readme file, but maybe I >> should template it, >>the sqrt automatically converts large ddof to >> NaN): I don't think that works for complex numbers. (statsmodels has now a preference that calculations work also for complex numbers) Josef > > I'm not sure what you are saying, dropping the squareroot in the > function doesn't require nan handling in the inner loop. If you want > to return nan when count-ddof<=0, then you could just replace > > if count > 0: > ... > > by > if count -ddof > 0: > ... > > Or am I missing the point? > > Josef > > >> >> Under the hood Nanny uses a separate Cython function for each >> combination of ndim, dtype, and axis. A lot of the overhead in >> ny.nanmax, for example, is in checking that your axis is within range, >> converting non-array data to an array, and selecting the function to >> use to calculate nanmax. >> >> You can get rid of the overhead by doing all this before you, say, >> enter an inner loop: >> >>>>> arr = np.random.rand(10,10) >>>>> axis = 0 >>>>> func, a = ny.func.nanmax_selector(arr, axis) >>>>> func.__name__ >> 'nanmax_2d_float64_axis0' >> >> Let's see how much faster than runs: >> >>>> timeit np.nanmax(arr, axis=0) >> 10000 loops, best of 3: 25.7 us per loop >>>> timeit ny.nanmax(arr, axis=0) >> 100000 loops, best of 3: 5.25 us per loop >>>> timeit func(a) >> 100000 loops, best of 3: 2.5 us per loop >> >> Note that func is faster than the Numpy's non-nan version of max: >> >>>> timeit arr.max(axis=0) >> 100000 loops, best of 3: 3.28 us per loop >> >> So adding NaN protection to your inner loops has a negative cost! >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> > From kwgoodman at gmail.com Mon Nov 22 14:04:50 2010 From: kwgoodman at gmail.com (Keith Goodman) Date: Mon, 22 Nov 2010 11:04:50 -0800 Subject: [Numpy-discussion] Does np.std() make two passes through the data? In-Reply-To: References: Message-ID: On Mon, Nov 22, 2010 at 11:00 AM, wrote: > I don't think that works for complex numbers. > (statsmodels has now a preference that calculations work also for > complex numbers) I'm only supporting int32, int64, float64 for now. Getting the other ints and floats should be easy. I don't have plans for complex numbers. If it's not a big change then someone can add support later. From josef.pktd at gmail.com Mon Nov 22 14:06:46 2010 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Mon, 22 Nov 2010 14:06:46 -0500 Subject: [Numpy-discussion] Does np.std() make two passes through the data? In-Reply-To: References: Message-ID: On Mon, Nov 22, 2010 at 1:59 PM, Keith Goodman wrote: > On Mon, Nov 22, 2010 at 10:51 AM, ? wrote: >> On Mon, Nov 22, 2010 at 1:39 PM, Keith Goodman wrote: >>> On Mon, Nov 22, 2010 at 10:32 AM, ? wrote: >>>> On Mon, Nov 22, 2010 at 1:26 PM, Keith Goodman wrote: >>>>> On Mon, Nov 22, 2010 at 9:03 AM, Keith Goodman wrote: >>>>> >>>>>> @cython.boundscheck(False) >>>>>> @cython.wraparound(False) >>>>>> def nanstd_twopass(np.ndarray[np.float64_t, ndim=1] a, int ddof): >>>>>> ? ?"nanstd of 1d numpy array with dtype=np.float64 along axis=0." >>>>>> ? ?cdef Py_ssize_t i >>>>>> ? ?cdef int a0 = a.shape[0], count = 0 >>>>>> ? ?cdef np.float64_t asum = 0, a2sum=0, amean, ai, da >>>>>> ? ?for i in range(a0): >>>>>> ? ? ? ?ai = a[i] >>>>>> ? ? ? ?if ai == ai: >>>>>> ? ? ? ? ? ?asum += ai >>>>>> ? ? ? ? ? ?count += 1 >>>>>> ? ?if count > 0: >>>>>> ? ? ? ?amean = asum / count >>>>>> ? ? ? ?asum = 0 >>>>>> ? ? ? ?for i in range(a0): >>>>>> ? ? ? ? ? ?ai = a[i] >>>>>> ? ? ? ? ? ?if ai == ai: >>>>>> ? ? ? ? ? ? ? ?da = ai - amean >>>>>> ? ? ? ? ? ? ? ?asum += da >>>>>> ? ? ? ? ? ? ? ?a2sum += (da * da) >>>>>> ? ? ? ?asum = asum * asum >>>>>> ? ? ? ?return sqrt((a2sum - asum / count) / (count - ddof)) >>>>>> ? ?else: >>>>>> ? ? ? ?return np.float64(NAN) >>>>> >>>>> This is 5% faster: >>>>> >>>>> @cython.boundscheck(False) >>>>> @cython.wraparound(False) >>>>> def nanstd_1d_float64_axis0_2(np.ndarray[np.float64_t, ndim=1] a, int ddof): >>>>> ? ?"nanstd of 1d numpy array with dtype=np.float64 along axis=0." >>>>> ? ?cdef Py_ssize_t i >>>>> ? ?cdef int a0 = a.shape[0], count = 0 >>>>> ? ?cdef np.float64_t asum = 0, amean, ai >>>>> ? ?for i in range(a0): >>>>> ? ? ? ?ai = a[i] >>>>> ? ? ? ?if ai == ai: >>>>> ? ? ? ? ? ?asum += ai >>>>> ? ? ? ? ? ?count += 1 >>>>> ? ?if count > 0: >>>>> ? ? ? ?amean = asum / count >>>>> ? ? ? ?asum = 0 >>>>> ? ? ? ?for i in range(a0): >>>>> ? ? ? ? ? ?ai = a[i] >>>>> ? ? ? ? ? ?if ai == ai: >>>>> ? ? ? ? ? ? ? ?ai -= amean >>>>> ? ? ? ? ? ? ? ?asum += (ai * ai) >>>>> ? ? ? ?return sqrt(asum / (count - ddof)) >>>>> ? ?else: >>>>> ? ? ? ?return np.float64(NAN) >>>> >>>> I think it would be better to write nanvar instead of nanstd and take >>>> the square root only in a delegating nanstd, instead of the other way >>>> around. (Also a change that should be made in scipy.stats) >>> >>> Yeah, I noticed that numpy does that. I was planning to have separate >>> var and std functions. Here's why (from the readme file, but maybe I >>> should template it, the sqrt automatically converts large ddof to >>> NaN): >> >> I'm not sure what you are saying, dropping the squareroot in the >> function doesn't require nan handling in the inner loop. If you want >> to return nan when count-ddof<=0, then you could just replace >> >> if count > 0: >> ... >> >> by >> if count -ddof > 0: >> ... >> >> Or am I missing the point? > > Yes, sorry. Ignore the sqrt/nan comment. The point is that I want to > be able to return the underlying, low-level cython function (see > below). So I need separate var and std versions to do that (unless I > can modify default kwargs on the fly). I think you could still delegate at the cython level, but given the amount of code duplication that is required, I don't expect it to make much difference for staying DRY. Josef > >>> Under the hood Nanny uses a separate Cython function for each >>> combination of ndim, dtype, and axis. A lot of the overhead in >>> ny.nanmax, for example, is in checking that your axis is within range, >>> converting non-array data to an array, and selecting the function to >>> use to calculate nanmax. >>> >>> You can get rid of the overhead by doing all this before you, say, >>> enter an inner loop: >>> >>>>>> arr = np.random.rand(10,10) >>>>>> axis = 0 >>>>>> func, a = ny.func.nanmax_selector(arr, axis) >>>>>> func.__name__ >>> 'nanmax_2d_float64_axis0' >>> >>> Let's see how much faster than runs: >>> >>>>> timeit np.nanmax(arr, axis=0) >>> 10000 loops, best of 3: 25.7 us per loop >>>>> timeit ny.nanmax(arr, axis=0) >>> 100000 loops, best of 3: 5.25 us per loop >>>>> timeit func(a) >>> 100000 loops, best of 3: 2.5 us per loop >>> >>> Note that func is faster than the Numpy's non-nan version of max: >>> >>>>> timeit arr.max(axis=0) >>> 100000 loops, best of 3: 3.28 us per loop >>> >>> So adding NaN protection to your inner loops has a negative cost! > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From Chris.Barker at noaa.gov Mon Nov 22 14:08:20 2010 From: Chris.Barker at noaa.gov (Christopher Barker) Date: Mon, 22 Nov 2010 11:08:20 -0800 Subject: [Numpy-discussion] indexing question In-Reply-To: <20101121193701.GA18049@doriath.local> References: <20101121191500.GA17912@doriath.local> <20101121193701.GA18049@doriath.local> Message-ID: <4CEABFA4.6040502@noaa.gov> On 11/21/10 11:37 AM, Ernest Adrogu? wrote: >> so you want >> >> t[:,x,y] > > I tried that, but it's not the same: > > In [307]: t[[0,1],x,y] > Out[307]: array([1, 7]) > > In [308]: t[:,x,y] > Out[308]: > array([[1, 3], > [5, 7]]) what is your t? Here's my example, which I think matches what you asked for: In [1]: import numpy as np In [2]: a = np.arange(12) In [3]: a.shape = (3,2,2) In [4]: a Out[4]: array([[[ 0, 1], [ 2, 3]], [[ 4, 5], [ 6, 7]], [[ 8, 9], [10, 11]]]) In [5]: a[:,1,0] Out[5]: array([ 2, 6, 10]) -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From josef.pktd at gmail.com Mon Nov 22 14:15:00 2010 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Mon, 22 Nov 2010 14:15:00 -0500 Subject: [Numpy-discussion] Does np.std() make two passes through the data? In-Reply-To: References: Message-ID: On Mon, Nov 22, 2010 at 2:04 PM, Keith Goodman wrote: > On Mon, Nov 22, 2010 at 11:00 AM, ? wrote: > >> I don't think that works for complex numbers. >> (statsmodels has now a preference that calculations work also for >> complex numbers) > > I'm only supporting int32, int64, float64 for now. Getting the other > ints and floats should be easy. I don't have plans for complex > numbers. If it's not a big change then someone can add support later. Fair enough, but if you need numerical derivatives, then complex support looks useful. Josef > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From jsalvati at u.washington.edu Mon Nov 22 14:20:35 2010 From: jsalvati at u.washington.edu (John Salvatier) Date: Mon, 22 Nov 2010 11:20:35 -0800 Subject: [Numpy-discussion] indexing question In-Reply-To: <4CEABFA4.6040502@noaa.gov> References: <20101121191500.GA17912@doriath.local> <20101121193701.GA18049@doriath.local> <4CEABFA4.6040502@noaa.gov> Message-ID: I didn't realize the x's and y's were varying the first time around. There's probably a way to omit it, but I think the conceptually simplest way is probably what you had to begin with. Build an index by saying i = numpy.arange(0, t.shape[0]) then you can do t[i, x,y] On Mon, Nov 22, 2010 at 11:08 AM, Christopher Barker wrote: > On 11/21/10 11:37 AM, Ernest Adrogu? wrote: >>> so you want >>> >>> t[:,x,y] >> >> I tried that, but it's not the same: >> >> In [307]: t[[0,1],x,y] >> Out[307]: array([1, 7]) >> >> In [308]: t[:,x,y] >> Out[308]: >> array([[1, 3], >> ? ? ? ? [5, 7]]) > > what is your t? Here's my example, which I think matches what you asked for: > > In [1]: import numpy as np > > In [2]: a = np.arange(12) > > In [3]: a.shape = (3,2,2) > > In [4]: a > Out[4]: > array([[[ 0, ?1], > ? ? ? ? [ 2, ?3]], > > ? ? ? ?[[ 4, ?5], > ? ? ? ? [ 6, ?7]], > > ? ? ? ?[[ 8, ?9], > ? ? ? ? [10, 11]]]) > > In [5]: a[:,1,0] > Out[5]: array([ 2, ?6, 10]) > > > > > -- > Christopher Barker, Ph.D. > Oceanographer > > Emergency Response Division > NOAA/NOS/OR&R ? ? ? ? ? ?(206) 526-6959 ? voice > 7600 Sand Point Way NE ? (206) 526-6329 ? fax > Seattle, WA ?98115 ? ? ? (206) 526-6317 ? main reception > > Chris.Barker at noaa.gov > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From Chris.Barker at noaa.gov Mon Nov 22 14:35:11 2010 From: Chris.Barker at noaa.gov (Christopher Barker) Date: Mon, 22 Nov 2010 11:35:11 -0800 Subject: [Numpy-discussion] ANN: NumPy 1.5.1 In-Reply-To: References: Message-ID: <4CEAC5EF.1030205@noaa.gov> On 11/20/10 11:04 PM, Ralf Gommers wrote: > I am pleased to announce the availability of NumPy 1.5.1. > Binaries, sources and release notes can be found at > https://sourceforge.net/projects/numpy/files/. > > Thank you to everyone who contributed to this release. Yes, thanks so much -- in particular thanks to the team that build the OS-X binaries -- looks like a complete set! -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From matthew.brett at gmail.com Mon Nov 22 14:40:07 2010 From: matthew.brett at gmail.com (Matthew Brett) Date: Mon, 22 Nov 2010 11:40:07 -0800 Subject: [Numpy-discussion] ANN: NumPy 1.5.1 In-Reply-To: <4CEAC5EF.1030205@noaa.gov> References: <4CEAC5EF.1030205@noaa.gov> Message-ID: Hi, On Mon, Nov 22, 2010 at 11:35 AM, Christopher Barker wrote: > On 11/20/10 11:04 PM, Ralf Gommers wrote: >> I am pleased to announce the availability of NumPy 1.5.1. > >> Binaries, sources and release notes can be found at >> https://sourceforge.net/projects/numpy/files/. >> >> Thank you to everyone who contributed to this release. > > Yes, thanks so much -- in particular thanks to the team that build the > OS-X binaries -- looks like a complete set! Many thanks from me too - particularly for clearing up that annoying numpy-distuils scipy build problem. Cheers, Matthew From gael.varoquaux at normalesup.org Mon Nov 22 15:03:10 2010 From: gael.varoquaux at normalesup.org (Gael Varoquaux) Date: Mon, 22 Nov 2010 21:03:10 +0100 Subject: [Numpy-discussion] N dimensional dichotomy optimization In-Reply-To: <20101122084613.GC18433@phare.normalesup.org> References: <20101122084613.GC18433@phare.normalesup.org> Message-ID: <1290456190.19533.2.camel@Nokia-N900-51-1> Hi list, does anybody have, or knows where I can find some N dimensional dichotomy optimization code in Python (BSD licensed, or equivalent)? Worst case, it does not look too bad to code, but I am interested by any advice. I haven't done my reading yet, and I don't know how ill-posed a problem it is. I had in mind starting from a set of points and iterating the computation of the objective function's value at the barycenters of these points, and updating this list of points. This does raise a few questions on what are the best possible updates. Cheers, Gael From eadrogue at gmx.net Mon Nov 22 15:01:21 2010 From: eadrogue at gmx.net (Ernest =?iso-8859-1?Q?Adrogu=E9?=) Date: Mon, 22 Nov 2010 21:01:21 +0100 Subject: [Numpy-discussion] indexing question In-Reply-To: <4CEABFA4.6040502@noaa.gov> References: <20101121191500.GA17912@doriath.local> <20101121193701.GA18049@doriath.local> <4CEABFA4.6040502@noaa.gov> Message-ID: <20101122200121.GA25214@doriath.local> 22/11/10 @ 11:08 (-0800), thus spake Christopher Barker: > On 11/21/10 11:37 AM, Ernest Adrogu? wrote: > >>so you want > >> > >>t[:,x,y] > > > >I tried that, but it's not the same: > > > >In [307]: t[[0,1],x,y] > >Out[307]: array([1, 7]) > > > >In [308]: t[:,x,y] > >Out[308]: > >array([[1, 3], > > [5, 7]]) > > what is your t? Here's my example, which I think matches what you asked for: > > In [1]: import numpy as np > > In [2]: a = np.arange(12) > > In [3]: a.shape = (3,2,2) > > In [4]: a > Out[4]: > array([[[ 0, 1], > [ 2, 3]], > > [[ 4, 5], > [ 6, 7]], > > [[ 8, 9], > [10, 11]]]) > > In [5]: a[:,1,0] > Out[5]: array([ 2, 6, 10]) This works with scalar indices, but not with arrays. The problem is that I don't want always the same element from each subarray, but an arbitrary element, say the (1,0) from the first, the (0,0) from the second, and so on, so I have to use arrays. -- Ernest From robert.kern at gmail.com Mon Nov 22 15:04:37 2010 From: robert.kern at gmail.com (Robert Kern) Date: Mon, 22 Nov 2010 14:04:37 -0600 Subject: [Numpy-discussion] indexing question In-Reply-To: <20101121191500.GA17912@doriath.local> References: <20101121191500.GA17912@doriath.local> Message-ID: 2010/11/21 Ernest Adrogu? : > Hi, > > Suppose an array of shape (N,2,2), that is N arrays of > shape (2,2). I want to select an element (x,y) from each one > of the subarrays, so I get a 1-dimensional array of length > N. For instance: > > In [228]: t=np.arange(8).reshape(2,2,2) > > In [229]: t > Out[229]: > array([[[0, 1], > ? ? ? ?[2, 3]], > > ? ? ? [[4, 5], > ? ? ? ?[6, 7]]]) > > In [230]: x=[0,1] > > In [231]: y=[1,1] > > In [232]: t[[0,1],x,y] > Out[232]: array([1, 7]) > > This way, I get the elements (0,1) and (1,1) which is what > I wanted. The question is: is it possible to omit the [0,1] > in the index? No, but you can write generic code for it: t[np.arange(t.shape[0]), x, y] -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." ? -- Umberto Eco From eadrogue at gmx.net Mon Nov 22 15:09:51 2010 From: eadrogue at gmx.net (Ernest =?iso-8859-1?Q?Adrogu=E9?=) Date: Mon, 22 Nov 2010 21:09:51 +0100 Subject: [Numpy-discussion] indexing question In-Reply-To: References: <20101121191500.GA17912@doriath.local> <20101121193701.GA18049@doriath.local> <4CEABFA4.6040502@noaa.gov> Message-ID: <20101122200951.GB25214@doriath.local> 22/11/10 @ 11:20 (-0800), thus spake John Salvatier: > I didn't realize the x's and y's were varying the first time around. > There's probably a way to omit it, but I think the conceptually > simplest way is probably what you had to begin with. Build an index by > saying i = numpy.arange(0, t.shape[0]) > > then you can do t[i, x,y] Exactly. I was just wondering if I can speed this up by omitting building the "arange array". This is inside a function that gets called a lot, so I suppose it would make a difference if I can get rid of it. -- Ernest From matthieu.brucher at gmail.com Mon Nov 22 15:12:45 2010 From: matthieu.brucher at gmail.com (Matthieu Brucher) Date: Mon, 22 Nov 2010 21:12:45 +0100 Subject: [Numpy-discussion] N dimensional dichotomy optimization In-Reply-To: <1290456190.19533.2.camel@Nokia-N900-51-1> References: <20101122084613.GC18433@phare.normalesup.org> <1290456190.19533.2.camel@Nokia-N900-51-1> Message-ID: 2010/11/22 Gael Varoquaux : > Hi list, Hi ;) > does anybody have, or knows where I can find some N dimensional dichotomy optimization code in Python (BSD licensed, or equivalent)? I don't know any code, but it should be too difficult by bgoing through a KdTree. > Worst case, it does not look too bad to code, but I am interested by any advice. I haven't done my reading yet, and I don't know how ill-posed a problem it is. I had in mind starting from a set of points and iterating the computation of the objective function's value at the barycenters of these points, and updating this list of points. This does raise a few questions on what are the best possible updates. In this case, you may want to check Nelder-Mead algotihm (also known as down-hill simplex or polytope), which is available in scikits.optimization, but there are other implementations out there. Cheers ;) Matthieu -- Information System Engineer, Ph.D. Blog: http://matt.eifelle.com LinkedIn: http://www.linkedin.com/in/matthieubrucher From eadrogue at gmx.net Mon Nov 22 15:23:34 2010 From: eadrogue at gmx.net (Ernest =?iso-8859-1?Q?Adrogu=E9?=) Date: Mon, 22 Nov 2010 21:23:34 +0100 Subject: [Numpy-discussion] indexing question In-Reply-To: References: <20101121191500.GA17912@doriath.local> Message-ID: <20101122202334.GA25341@doriath.local> 22/11/10 @ 14:04 (-0600), thus spake Robert Kern: > > This way, I get the elements (0,1) and (1,1) which is what > > I wanted. The question is: is it possible to omit the [0,1] > > in the index? > > No, but you can write generic code for it: > > t[np.arange(t.shape[0]), x, y] Thank you. This is what I wanted to know. -- Ernest From jsalvati at u.washington.edu Mon Nov 22 15:32:45 2010 From: jsalvati at u.washington.edu (John Salvatier) Date: Mon, 22 Nov 2010 12:32:45 -0800 Subject: [Numpy-discussion] indexing question In-Reply-To: <20101122202334.GA25341@doriath.local> References: <20101121191500.GA17912@doriath.local> <20101122202334.GA25341@doriath.local> Message-ID: I think that the only speedup you will get is defining an index only once and reusing it. 2010/11/22 Ernest Adrogu? : > 22/11/10 @ 14:04 (-0600), thus spake Robert Kern: >> > This way, I get the elements (0,1) and (1,1) which is what >> > I wanted. The question is: is it possible to omit the [0,1] >> > in the index? >> >> No, but you can write generic code for it: >> >> ? t[np.arange(t.shape[0]), x, y] > > Thank you. This is what I wanted to know. > > -- > Ernest > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From gael.varoquaux at normalesup.org Mon Nov 22 16:57:35 2010 From: gael.varoquaux at normalesup.org (Gael Varoquaux) Date: Mon, 22 Nov 2010 22:57:35 +0100 Subject: [Numpy-discussion] N dimensional dichotomy optimization In-Reply-To: References: <20101122084613.GC18433@phare.normalesup.org> <1290456190.19533.2.camel@Nokia-N900-51-1> Message-ID: <20101122215735.GE27341@phare.normalesup.org> On Mon, Nov 22, 2010 at 09:12:45PM +0100, Matthieu Brucher wrote: > Hi ;) Hi bro > > does anybody have, or knows where I can find some N dimensional > > dichotomy optimization code in Python (BSD licensed, or equivalent)? > I don't know any code, but it should be too difficult by bgoing > through a KdTree. I am not in terribly high-dimensional spaces, so I don't really need to use a KdTree (but we do happen to have a nice BallTree available in the scikit-learn, so I could use it just to play :>). > > Worst case, it does not look too bad to code, but I am interested by > > any advice. I haven't done my reading yet, and I don't know how > > ill-posed a problem it is. I had in mind starting from a set of > > points and iterating the computation of the objective function's > > value at the barycenters of these points, and updating this list of > > points. This does raise a few questions on what are the best possible > > updates. > In this case, you may want to check Nelder-Mead algotihm (also known > as down-hill simplex or polytope), which is available in > scikits.optimization, but there are other implementations out there. Interesting reference. I had never looked at the Nelder-Mead algorithm. I am wondering if it does what I want, thought. The reason I am looking at dichotomy optimization is that the objective function that I want to optimize has local roughness, but is globally pretty much a bell-shaped curve. Dichotomy looks like it will get quite close to the top of the curve (and I have been told that it works well on such problems). One thing that is nice with dichotomy for my needs is that it is not based on a gradient, and it works in a convex of the parameter space. Will the Nelder-Mead display such properties? It seems so to me, but I don't trust my quick read through of Wikipedia. I realize that maybe I should rephrase my question to try and draw more out of the common wealth of knowledge on this mailing list: what do people suggest to tackle this problem? Guided by Matthieu's suggestion, I have started looking at Powell's algorithm, and given the introduction of www.damtp.cam.ac.uk/user/na/NA_papers/NA2007_03.pdf I am wondering whether I should not investigate it. Can people provide any insights on these problems. Many thanks, Gael PS: The reason I am looking at this optimization problem is that I got tired of looking at grid searches optimize the cross-validation score on my 3-parameter estimator (not in the scikit-learn, because it is way too specific to my problems). From matthieu.brucher at gmail.com Mon Nov 22 17:12:26 2010 From: matthieu.brucher at gmail.com (Matthieu Brucher) Date: Mon, 22 Nov 2010 23:12:26 +0100 Subject: [Numpy-discussion] N dimensional dichotomy optimization In-Reply-To: <20101122215735.GE27341@phare.normalesup.org> References: <20101122084613.GC18433@phare.normalesup.org> <1290456190.19533.2.camel@Nokia-N900-51-1> <20101122215735.GE27341@phare.normalesup.org> Message-ID: 2010/11/22 Gael Varoquaux : > On Mon, Nov 22, 2010 at 09:12:45PM +0100, Matthieu Brucher wrote: >> Hi ;) > > Hi bro > >> > does anybody have, or knows where I can find some N dimensional >> > dichotomy optimization code in Python (BSD licensed, or equivalent)? > >> I don't know any code, but it should be too difficult by bgoing >> through a KdTree. > > I am not in terribly high-dimensional spaces, so I don't really need to > use a KdTree (but we do happen to have a nice BallTree available in the > scikit-learn, so I could use it just to ?play :>). :D >> In this case, you may want to check Nelder-Mead algotihm (also known >> as down-hill simplex or polytope), which is available in >> scikits.optimization, but there are other implementations out there. > > Interesting reference. I had never looked at the Nelder-Mead algorithm. > I am wondering if it does what I want, thought. > > The reason I am looking at dichotomy optimization is that the objective > function that I want to optimize has local roughness, but is globally > pretty much a bell-shaped curve. Dichotomy looks like it will get quite > close to the top of the curve (and I have been told that it works well on > such problems). One thing that is nice with dichotomy for my needs is > that it is not based on a gradient, and it works in a convex of the > parameter space. It seems that a simplex is what you need. It uses the barycenter (more or less) to find a new point in the simplex. And it works well only in convex functions (but in fact almost all functions have an issue with this :D) > Will the Nelder-Mead display such properties? It seems so to me, but I > don't trust my quick read through of Wikipedia. Yes, it does need a gradient and if the function is convex, it works in a convex in the parameter space. > I realize that maybe I should rephrase my question to try and draw more > out of the common wealth of knowledge on this mailing list: what do > people suggest to tackle this problem? Guided by Matthieu's suggestion, I > have started looking at Powell's algorithm, and given the introduction of > www.damtp.cam.ac.uk/user/na/NA_papers/NA2007_03.pdf I am wondering > whether I should not investigate it. Can people provide any insights on > these problems. Indeed, Powell may also a solution. A simplex is just what is closer to what you hinted as an optimization algorithm. > Many thanks, You're welcome ;) > Gael > > PS: The reason I am looking at this optimization problem is that I got > tired of looking at grid searches optimize the cross-validation score on > my 3-parameter estimator (not in the scikit-learn, because it is way too > specific to my problems). Perhaps you may want to combine it with genetic algorithms. We also kind of combine grid search with simplex-based optimizer with simulated annealing in some of our global optimization problems, and I think I'll try at one point to introduce genetic algorithms instead of the grid search. Your problem is simpler though if it displays some convexity. Matthieu -- Information System Engineer, Ph.D. Blog: http://matt.eifelle.com LinkedIn: http://www.linkedin.com/in/matthieubrucher From gael.varoquaux at normalesup.org Mon Nov 22 17:18:29 2010 From: gael.varoquaux at normalesup.org (Gael Varoquaux) Date: Mon, 22 Nov 2010 23:18:29 +0100 Subject: [Numpy-discussion] N dimensional dichotomy optimization In-Reply-To: References: <20101122084613.GC18433@phare.normalesup.org> <1290456190.19533.2.camel@Nokia-N900-51-1> <20101122215735.GE27341@phare.normalesup.org> Message-ID: <20101122221829.GG27341@phare.normalesup.org> On Mon, Nov 22, 2010 at 11:12:26PM +0100, Matthieu Brucher wrote: > It seems that a simplex is what you need. Ha! I am learning new fancy words. Now I can start looking clever. > > I realize that maybe I should rephrase my question to try and draw more > > out of the common wealth of knowledge on this mailing list: what do > > people suggest to tackle this problem? Guided by Matthieu's suggestion, I > > have started looking at Powell's algorithm, and given the introduction of > > www.damtp.cam.ac.uk/user/na/NA_papers/NA2007_03.pdf I am wondering > > whether I should not investigate it. Can people provide any insights on > > these problems. > Indeed, Powell may also a solution. A simplex is just what is closer > to what you hinted as an optimization algorithm. I'll do a bit more reading. > > PS: The reason I am looking at this optimization problem is that I got > > tired of looking at grid searches optimize the cross-validation score on > > my 3-parameter estimator (not in the scikit-learn, because it is way too > > specific to my problems). > Perhaps you may want to combine it with genetic algorithms. We also > kind of combine grid search with simplex-based optimizer with > simulated annealing in some of our global optimization problems, and I > think I'll try at one point to introduce genetic algorithms instead of > the grid search. Well, in the scikit, in the long run (it will take a little while) I'd like to expose other optimization methods then the GridSearchCV, so if you have code or advice to give us, we'd certainly be interested. Gael From matthieu.brucher at gmail.com Mon Nov 22 17:27:32 2010 From: matthieu.brucher at gmail.com (Matthieu Brucher) Date: Mon, 22 Nov 2010 23:27:32 +0100 Subject: [Numpy-discussion] N dimensional dichotomy optimization In-Reply-To: <20101122221829.GG27341@phare.normalesup.org> References: <20101122084613.GC18433@phare.normalesup.org> <1290456190.19533.2.camel@Nokia-N900-51-1> <20101122215735.GE27341@phare.normalesup.org> <20101122221829.GG27341@phare.normalesup.org> Message-ID: 2010/11/22 Gael Varoquaux : > On Mon, Nov 22, 2010 at 11:12:26PM +0100, Matthieu Brucher wrote: >> It seems that a simplex is what you need. > > Ha! I am learning new fancy words. Now I can start looking clever. > >> > I realize that maybe I should rephrase my question to try and draw more >> > out of the common wealth of knowledge on this mailing list: what do >> > people suggest to tackle this problem? Guided by Matthieu's suggestion, I >> > have started looking at Powell's algorithm, and given the introduction of >> > www.damtp.cam.ac.uk/user/na/NA_papers/NA2007_03.pdf I am wondering >> > whether I should not investigate it. Can people provide any insights on >> > these problems. > >> Indeed, Powell may also a solution. A simplex is just what is closer >> to what you hinted as an optimization algorithm. > > I'll do a bit more reading. > >> > PS: The reason I am looking at this optimization problem is that I got >> > tired of looking at grid searches optimize the cross-validation score on >> > my 3-parameter estimator (not in the scikit-learn, because it is way too >> > specific to my problems). > >> Perhaps you may want to combine it with genetic algorithms. We also >> kind of combine grid search with simplex-based optimizer with >> simulated annealing in some of our global optimization problems, and I >> think I'll try at one point to introduce genetic algorithms instead of >> the grid search. > > Well, in the scikit, in the long run (it will take a little while) I'd > like to expose other optimization methods then the GridSearchCV, so if > you have code or advice to give us, we'd certainly be interested. > > Gael There is scikits.optimization partly in the externals :D But I don't think they should be in scikits.learn directly. Of course, the scikit may need access to some global optimization methods, but the most used one is already there (the grid search). Then for genetic algorithms, pyevolve is pretty much all you want (I still have to check the multiprocessing part) Matthieu -- Information System Engineer, Ph.D. Blog: http://matt.eifelle.com LinkedIn: http://www.linkedin.com/in/matthieubrucher From gael.varoquaux at normalesup.org Mon Nov 22 17:36:01 2010 From: gael.varoquaux at normalesup.org (Gael Varoquaux) Date: Mon, 22 Nov 2010 23:36:01 +0100 Subject: [Numpy-discussion] N dimensional dichotomy optimization In-Reply-To: References: <20101122084613.GC18433@phare.normalesup.org> <1290456190.19533.2.camel@Nokia-N900-51-1> <20101122215735.GE27341@phare.normalesup.org> Message-ID: <20101122223601.GH27341@phare.normalesup.org> On Mon, Nov 22, 2010 at 11:12:26PM +0100, Matthieu Brucher wrote: > It seems that a simplex is what you need. It uses the barycenter (more > or less) to find a new point in the simplex. And it works well only in > convex functions (but in fact almost all functions have an issue with > this :D) One last question, now that I know that what I am looking for is a simplex algorithm (it indeed corresponds to what I was after), is there a reason not to use optimize.fmin? It implements a Nelder-Mead. I must admit that I don't see how I can use it to specify the convex hull of the parameters in which it operates, or restrict it to work only on integers, which are two things that I may want to do. Ga?l From josef.pktd at gmail.com Mon Nov 22 18:14:14 2010 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Mon, 22 Nov 2010 18:14:14 -0500 Subject: [Numpy-discussion] N dimensional dichotomy optimization In-Reply-To: References: <20101122084613.GC18433@phare.normalesup.org> <1290456190.19533.2.camel@Nokia-N900-51-1> <20101122215735.GE27341@phare.normalesup.org> <20101122221829.GG27341@phare.normalesup.org> Message-ID: On Mon, Nov 22, 2010 at 5:27 PM, Matthieu Brucher wrote: > 2010/11/22 Gael Varoquaux : >> On Mon, Nov 22, 2010 at 11:12:26PM +0100, Matthieu Brucher wrote: >>> It seems that a simplex is what you need. >> >> Ha! I am learning new fancy words. Now I can start looking clever. >> >>> > I realize that maybe I should rephrase my question to try and draw more >>> > out of the common wealth of knowledge on this mailing list: what do >>> > people suggest to tackle this problem? Guided by Matthieu's suggestion, I >>> > have started looking at Powell's algorithm, and given the introduction of >>> > www.damtp.cam.ac.uk/user/na/NA_papers/NA2007_03.pdf I am wondering >>> > whether I should not investigate it. Can people provide any insights on >>> > these problems. >> >>> Indeed, Powell may also a solution. A simplex is just what is closer >>> to what you hinted as an optimization algorithm. >> >> I'll do a bit more reading. >> >>> > PS: The reason I am looking at this optimization problem is that I got >>> > tired of looking at grid searches optimize the cross-validation score on >>> > my 3-parameter estimator (not in the scikit-learn, because it is way too >>> > specific to my problems). >> >>> Perhaps you may want to combine it with genetic algorithms. We also >>> kind of combine grid search with simplex-based optimizer with >>> simulated annealing in some of our global optimization problems, and I >>> think I'll try at one point to introduce genetic algorithms instead of >>> the grid search. >> >> Well, in the scikit, in the long run (it will take a little while) I'd >> like to expose other optimization methods then the GridSearchCV, so if >> you have code or advice to give us, we'd certainly be interested. >> >> Gael > > There is scikits.optimization partly in the externals :D But I don't > think they should be in scikits.learn directly. Of course, the scikit > may need access to some global optimization methods, but the most used > one is already there (the grid search). > Then for genetic algorithms, pyevolve is pretty much all you want (I > still have to check the multiprocessing part) Is that license http://pyevolve.sourceforge.net/license.html BSD compatible ? Josef > > Matthieu > -- > Information System Engineer, Ph.D. > Blog: http://matt.eifelle.com > LinkedIn: http://www.linkedin.com/in/matthieubrucher > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From lxkain at gmail.com Mon Nov 22 18:30:59 2010 From: lxkain at gmail.com (Alexander Kain) Date: Mon, 22 Nov 2010 23:30:59 +0000 (UTC) Subject: [Numpy-discussion] Matlab IO Warning in mio5.py References: Message-ID: > It's not an error but a harmless (although confusing) warning message. > You should be able to filter it by adding the following to > scipy/__init__.py: > > import warnings > warnings.filterwarnings(action='ignore', message='.*__builtin__.file > size changed.*') > > Can you check if that works for you? Worked on my system! osx 10.6.5, python 2.6, numpy 1.5.1, scipy 0.8! Need to add useless stuff to satisfy the condition that there be more newtext than quoted. Argh. And more: Your email address below has to be valid. An authorization email will be sent to you after posting. You have to reply to that message to confirm that you exist. After you've confirmed your existence, the message will then be forwarded to the mailing list. From matthieu.brucher at gmail.com Tue Nov 23 02:21:55 2010 From: matthieu.brucher at gmail.com (Matthieu Brucher) Date: Tue, 23 Nov 2010 08:21:55 +0100 Subject: [Numpy-discussion] N dimensional dichotomy optimization In-Reply-To: <20101122223601.GH27341@phare.normalesup.org> References: <20101122084613.GC18433@phare.normalesup.org> <1290456190.19533.2.camel@Nokia-N900-51-1> <20101122215735.GE27341@phare.normalesup.org> <20101122223601.GH27341@phare.normalesup.org> Message-ID: 2010/11/22 Gael Varoquaux : > On Mon, Nov 22, 2010 at 11:12:26PM +0100, Matthieu Brucher wrote: >> It seems that a simplex is what you need. It uses the barycenter (more >> or less) to find a new point in the simplex. And it works well only in >> convex functions (but in fact almost all functions have an issue with >> this :D) > > One last question, now that I know that what I am looking for is a > simplex algorithm (it indeed corresponds to what I was after), is there a > reason not to use optimize.fmin? It implements a Nelder-Mead. I must > admit that I don't see how I can use it to specify the convex hull of the > parameters in which it operates, or restrict it to work only on integers, > which are two things that I may want to do. optimize.fmin can be enough, I don't know it well enough. Nelder-Mead is not a constrained optimization algorithm, so you can't specify an outer hull. As for the integer part, I don't know if optimize.fmin is type consistent, I don't know if scikits.optimization is either, but I can check it. It should, as there is nothing impeding it. Matthieu -- Information System Engineer, Ph.D. Blog: http://matt.eifelle.com LinkedIn: http://www.linkedin.com/in/matthieubrucher From nwagner at iam.uni-stuttgart.de Tue Nov 23 02:28:31 2010 From: nwagner at iam.uni-stuttgart.de (Nils Wagner) Date: Tue, 23 Nov 2010 08:28:31 +0100 Subject: [Numpy-discussion] numpy.test() errors Message-ID: Hi all, There are some new test errors ====================================================================== ERROR: Test with missing and filling values ---------------------------------------------------------------------- Traceback (most recent call last): File "/data/home/nwagner/local/lib/python2.5/site-packages/numpy/lib/tests/test_io.py", line 947, in test_user_filling_values test = np.genfromtxt(StringIO(data), **kwargs) File "/data/home/nwagner/local/lib/python2.5/site-packages/numpy/lib/npyio.py", line 1285, in genfromtxt key = names.index(key) AttributeError: 'tuple' object has no attribute 'index' ====================================================================== ERROR: test_user_missing_values (test_io.TestFromTxt) ---------------------------------------------------------------------- Traceback (most recent call last): File "/data/home/nwagner/local/lib/python2.5/site-packages/numpy/lib/tests/test_io.py", line 931, in test_user_missing_values **basekwargs) File "/data/home/nwagner/local/lib/python2.5/site-packages/numpy/lib/npyio.py", line 1657, in mafromtxt return genfromtxt(fname, **kwargs) File "/data/home/nwagner/local/lib/python2.5/site-packages/numpy/lib/npyio.py", line 1285, in genfromtxt key = names.index(key) AttributeError: 'tuple' object has no attribute 'index' ====================================================================== ERROR: test_io.test_gzip_load ---------------------------------------------------------------------- Traceback (most recent call last): File "/data/home/nwagner/local/lib/python2.5/site-packages/nose-0.11.1-py2.5.egg/nose/case.py", line 183, in runTest self.test(*self.arg) File "/data/home/nwagner/local/lib/python2.5/site-packages/numpy/lib/tests/test_io.py", line 1255, in test_gzip_load assert_array_equal(np.load(f), a) File "/data/home/nwagner/local/lib/python2.5/site-packages/numpy/lib/npyio.py", line 327, in load fid = seek_gzip_factory(file) File "/data/home/nwagner/local/lib/python2.5/site-packages/numpy/lib/npyio.py", line 66, in seek_gzip_factory g.name = f.name AttributeError: GzipFile instance has no attribute 'name' ---------------------------------------------------------------------- Ran 3066 tests in 12.458s FAILED (KNOWNFAIL=4, errors=3) >>> numpy.__version__ '2.0.0.dev-12d0200' Nils From hagen at zhuliguan.net Tue Nov 23 03:14:14 2010 From: hagen at zhuliguan.net (=?UTF-8?B?SGFnZW4gRsO8cnN0ZW5hdQ==?=) Date: Tue, 23 Nov 2010 09:14:14 +0100 Subject: [Numpy-discussion] categorical distributions In-Reply-To: References: Message-ID: > Can you compare the speed of your cython solution with the version of Chuck For multiple samples of the same distribution, it would do more or less the same as the "searchsorted" method, so I don't expect any improvement (except for being easier to find). For multiple samples of different distributions, my version is 4-5x faster than "searchsorted(random())". This is without normalizing the probability vector, which means that you typically don't have to sum up the whole vector (and store all the intermediate sums). - Hagen From gael.varoquaux at normalesup.org Tue Nov 23 03:52:26 2010 From: gael.varoquaux at normalesup.org (Gael Varoquaux) Date: Tue, 23 Nov 2010 09:52:26 +0100 Subject: [Numpy-discussion] N dimensional dichotomy optimization In-Reply-To: References: <20101122084613.GC18433@phare.normalesup.org> <1290456190.19533.2.camel@Nokia-N900-51-1> <20101122215735.GE27341@phare.normalesup.org> <20101122223601.GH27341@phare.normalesup.org> Message-ID: <20101123085226.GA11522@phare.normalesup.org> On Tue, Nov 23, 2010 at 08:21:55AM +0100, Matthieu Brucher wrote: > optimize.fmin can be enough, I don't know it well enough. Nelder-Mead > is not a constrained optimization algorithm, so you can't specify an > outer hull. I saw that, after a bit more reading. > As for the integer part, I don't know if optimize.fmin is > type consistent, That not a problem: I wrap my function in a small object to ensure memoization, and input argument casting. The problem is that I can't tell the Nelder-Mead that the smallest jump it should attempt is .5. I can set xtol to .5, but it still attemps jumps of .001 in its initial jumps. Of course optimization on integers is fairly ill-posed, so I am asking for trouble. Gael From matthieu.brucher at gmail.com Tue Nov 23 04:18:50 2010 From: matthieu.brucher at gmail.com (Matthieu Brucher) Date: Tue, 23 Nov 2010 10:18:50 +0100 Subject: [Numpy-discussion] N dimensional dichotomy optimization In-Reply-To: <20101123085226.GA11522@phare.normalesup.org> References: <20101122084613.GC18433@phare.normalesup.org> <1290456190.19533.2.camel@Nokia-N900-51-1> <20101122215735.GE27341@phare.normalesup.org> <20101122223601.GH27341@phare.normalesup.org> <20101123085226.GA11522@phare.normalesup.org> Message-ID: > The problem is that I can't tell the Nelder-Mead that the smallest jump > it should attempt is .5. I can set xtol to .5, but it still attemps jumps > of .001 in its initial jumps. This is strange. It should not if the intiial points are set adequatly. You may want to check if the initial conditions make the optimization start at correct locations. > Of course optimization on integers is > fairly ill-posed, so I am asking for trouble. Indeed :D That's why GA can be a good solution as well. Matthieu -- Information System Engineer, Ph.D. Blog: http://matt.eifelle.com LinkedIn: http://www.linkedin.com/in/matthieubrucher From gael.varoquaux at normalesup.org Tue Nov 23 04:27:37 2010 From: gael.varoquaux at normalesup.org (Gael Varoquaux) Date: Tue, 23 Nov 2010 10:27:37 +0100 Subject: [Numpy-discussion] N dimensional dichotomy optimization In-Reply-To: References: <20101122084613.GC18433@phare.normalesup.org> <1290456190.19533.2.camel@Nokia-N900-51-1> <20101122215735.GE27341@phare.normalesup.org> <20101122223601.GH27341@phare.normalesup.org> <20101123085226.GA11522@phare.normalesup.org> Message-ID: <20101123092737.GB11522@phare.normalesup.org> On Tue, Nov 23, 2010 at 10:18:50AM +0100, Matthieu Brucher wrote: > > The problem is that I can't tell the Nelder-Mead that the smallest jump > > it should attempt is .5. I can set xtol to .5, but it still attemps jumps > > of .001 in its initial jumps. > This is strange. It should not if the intiial points are set > adequatly. You may want to check if the initial conditions make the > optimization start at correct locations. Yes, that's excatly the problem. And it is easy to see why: in scipy.optimise.fmin, around line 186, the initial points are chosen with a relative distance of 0.00025 to the intial guess that is given. That's not what I want in the case of integers :). > > Of course optimization on integers is > > fairly ill-posed, so I am asking for trouble. > Indeed :D That's why GA can be a good solution as well. It's suboptimal if I know that my function is bell-shaped. Gael From sebastian.walter at gmail.com Tue Nov 23 05:13:23 2010 From: sebastian.walter at gmail.com (Sebastian Walter) Date: Tue, 23 Nov 2010 11:13:23 +0100 Subject: [Numpy-discussion] N dimensional dichotomy optimization In-Reply-To: <20101123092737.GB11522@phare.normalesup.org> References: <20101122084613.GC18433@phare.normalesup.org> <1290456190.19533.2.camel@Nokia-N900-51-1> <20101122215735.GE27341@phare.normalesup.org> <20101122223601.GH27341@phare.normalesup.org> <20101123085226.GA11522@phare.normalesup.org> <20101123092737.GB11522@phare.normalesup.org> Message-ID: Hello Gael, On Tue, Nov 23, 2010 at 10:27 AM, Gael Varoquaux wrote: > On Tue, Nov 23, 2010 at 10:18:50AM +0100, Matthieu Brucher wrote: >> > The problem is that I can't tell the Nelder-Mead that the smallest jump >> > it should attempt is .5. I can set xtol to .5, but it still attemps jumps >> > of .001 in its initial jumps. > >> This is strange. It should not if the intiial points are set >> adequatly. You may want to check if the initial conditions make the >> optimization start at correct locations. > > Yes, that's excatly the problem. And it is easy to see why: in > scipy.optimise.fmin, around line 186, the initial points are chosen with > a relative distance of 0.00025 to the intial guess that is given. That's > not what I want in the case of integers :). I'm not familiar with dichotomy optimization. Several techniques have been proposed to solve the problem: genetic algorithms, simulated annealing, Nelder-Mead and Powell. To be honest, I find it quite confusing that these algorithms are named in the same breath. Do you have a continuous or a discrete problem? Is your problem of the following form? min_x f(x) s.t. lo <= Ax + b <= up 0 = g(x) 0 <= h(x) An if yes, in which space does x live? cheers, Sebastian > >> > Of course optimization on integers is >> > fairly ill-posed, so I am asking for trouble. > >> Indeed :D That's why GA can be a good solution as well. > > It's suboptimal if I know that my function is bell-shaped. > > Gael > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From gael.varoquaux at normalesup.org Tue Nov 23 05:17:06 2010 From: gael.varoquaux at normalesup.org (Gael Varoquaux) Date: Tue, 23 Nov 2010 11:17:06 +0100 Subject: [Numpy-discussion] N dimensional dichotomy optimization In-Reply-To: References: <1290456190.19533.2.camel@Nokia-N900-51-1> <20101122215735.GE27341@phare.normalesup.org> <20101122223601.GH27341@phare.normalesup.org> <20101123085226.GA11522@phare.normalesup.org> <20101123092737.GB11522@phare.normalesup.org> Message-ID: <20101123101706.GE7942@phare.normalesup.org> On Tue, Nov 23, 2010 at 11:13:23AM +0100, Sebastian Walter wrote: > I'm not familiar with dichotomy optimization. > Several techniques have been proposed to solve the problem: genetic > algorithms, simulated annealing, Nelder-Mead and Powell. > To be honest, I find it quite confusing that these algorithms are > named in the same breath. I am confused too. But that stems from my lack of knowledge in optimization. > Do you have a continuous or a discrete problem? Both. > Is your problem of the following form? > min_x f(x) > s.t. lo <= Ax + b <= up > 0 = g(x) > 0 <= h(x) No constraints. > An if yes, in which space does x live? Either in R^n, in the set of integers (unidimensional), or in the set of positive integers. Ga?l From sebastian.walter at gmail.com Tue Nov 23 05:37:02 2010 From: sebastian.walter at gmail.com (Sebastian Walter) Date: Tue, 23 Nov 2010 11:37:02 +0100 Subject: [Numpy-discussion] N dimensional dichotomy optimization In-Reply-To: <20101123101706.GE7942@phare.normalesup.org> References: <1290456190.19533.2.camel@Nokia-N900-51-1> <20101122215735.GE27341@phare.normalesup.org> <20101122223601.GH27341@phare.normalesup.org> <20101123085226.GA11522@phare.normalesup.org> <20101123092737.GB11522@phare.normalesup.org> <20101123101706.GE7942@phare.normalesup.org> Message-ID: On Tue, Nov 23, 2010 at 11:17 AM, Gael Varoquaux wrote: > On Tue, Nov 23, 2010 at 11:13:23AM +0100, Sebastian Walter wrote: >> I'm not familiar with dichotomy optimization. >> Several techniques have been proposed to solve the problem: genetic >> algorithms, simulated annealing, Nelder-Mead and Powell. >> To be honest, I find it quite confusing that these algorithms are >> named in the same breath. > > I am confused too. But that stems from my lack of knowledge in > optimization. > >> Do you have a continuous or a discrete problem? > > Both. > >> Is your problem of the following form? > >> min_x f(x) >> s.t. ? lo <= Ax + b <= up >> ? ? ? ? ? ?0 = g(x) >> ? ? ? ? ? ?0 <= h(x) > > No constraints. didn't you say that you operate only in some convex hull? > >> An if yes, in which space does x live? > > Either in R^n, in the set of integers (unidimensional), or in the set of > positive integers. According to http://openopt.org/Problems this is a mixed integer nonlinear program http://openopt.org/MINLP . I don't have experience with the solver though, but it may take a long time to run it since it uses branch-and-bound. In my field of work we typically relax the integers to real numbers, perform the optimization and then round to the next integer. This is often sufficiently close a good solution. > > Ga?l > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From gael.varoquaux at normalesup.org Tue Nov 23 05:43:19 2010 From: gael.varoquaux at normalesup.org (Gael Varoquaux) Date: Tue, 23 Nov 2010 11:43:19 +0100 Subject: [Numpy-discussion] N dimensional dichotomy optimization In-Reply-To: References: <20101122215735.GE27341@phare.normalesup.org> <20101122223601.GH27341@phare.normalesup.org> <20101123085226.GA11522@phare.normalesup.org> <20101123092737.GB11522@phare.normalesup.org> <20101123101706.GE7942@phare.normalesup.org> Message-ID: <20101123104319.GF7942@phare.normalesup.org> On Tue, Nov 23, 2010 at 11:37:02AM +0100, Sebastian Walter wrote: > >> min_x f(x) > >> s.t. ? lo <= Ax + b <= up > >> ? ? ? ? ? ?0 = g(x) > >> ? ? ? ? ? ?0 <= h(x) > > No constraints. > didn't you say that you operate only in some convex hull? No. I have an initial guess that allows me to specify a convex hull in which the minimum should probably lie, but its not a constraint: nothing bad happens if I leave that convex hull. > > Either in R^n, in the set of integers (unidimensional), or in the set of > > positive integers. > According to http://openopt.org/Problems > this is a mixed integer nonlinear program http://openopt.org/MINLP . It is indead the name I know for it, however I have additional hypothesis (namely that f is roughly convex) which makes it much easier. > I don't have experience with the solver though, but it may take a long > time to run it since it uses branch-and-bound. Yes, this is too brutal: this is for non convex optimization. Dichotomy seems well-suited for finding an optimum on the set of intehers. > In my field of work we typically relax the integers to real numbers, > perform the optimization and then round to the next integer. > This is often sufficiently close a good solution. This is pretty much what I am doing, but you have to be careful: if the algorithm does jumps that are smaller than 1, it gets a zero difference between those jumps. If you are not careful, this might confuse a lot the algorithm and trick it into not converging. Thanks for your advice, Ga?l From sebastian.walter at gmail.com Tue Nov 23 08:47:10 2010 From: sebastian.walter at gmail.com (Sebastian Walter) Date: Tue, 23 Nov 2010 14:47:10 +0100 Subject: [Numpy-discussion] N dimensional dichotomy optimization In-Reply-To: <20101123104319.GF7942@phare.normalesup.org> References: <20101122215735.GE27341@phare.normalesup.org> <20101122223601.GH27341@phare.normalesup.org> <20101123085226.GA11522@phare.normalesup.org> <20101123092737.GB11522@phare.normalesup.org> <20101123101706.GE7942@phare.normalesup.org> <20101123104319.GF7942@phare.normalesup.org> Message-ID: On Tue, Nov 23, 2010 at 11:43 AM, Gael Varoquaux wrote: > On Tue, Nov 23, 2010 at 11:37:02AM +0100, Sebastian Walter wrote: >> >> min_x f(x) >> >> s.t. ? lo <= Ax + b <= up >> >> ? ? ? ? ? ?0 = g(x) >> >> ? ? ? ? ? ?0 <= h(x) > >> > No constraints. > >> didn't you say that you operate only in some convex hull? > > No. I have an initial guess that allows me to specify a convex hull in > which the minimum should probably lie, but its not a constraint: nothing > bad happens if I leave that convex hull. > >> > Either in R^n, in the set of integers (unidimensional), or in the set of >> > positive integers. >> According to ?http://openopt.org/Problems >> this is a mixed integer nonlinear program http://openopt.org/MINLP . > > It is indead the name I know for it, however I have additional hypothesis > (namely that f is roughly convex) which makes it much easier. > >> I don't have experience with the solver though, but it may take a long >> time to run it since it uses branch-and-bound. > > Yes, this is too brutal: this is for non convex optimization. > Dichotomy seems well-suited for finding an optimum on the set of > intehers. > >> In my field of work we typically relax the integers to real numbers, >> perform the optimization and then round to the next integer. >> This is often sufficiently close a good solution. > > This is pretty much what I am doing, but you have to be careful: if the > algorithm does jumps that are smaller than 1, it gets a zero difference > between those jumps. If you are not careful, this might confuse a lot the > algorithm and trick it into not converging. > ah, that clears things up a lot. Well, I don't know what the best method is to solve your problem, so take the following with a grain of salt: Wouldn't it be better to change the model than modifying the optimization algorithm? It sounds as if the resulting objective function is piecewise constant. AFAIK most optimization algorithms for continuous problems require at least Lipschitz continuous functions to work ''acceptable well''. Not sure if this is also true for Nelder-Mead. > Thanks for your advice, > > Ga?l > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From gael.varoquaux at normalesup.org Tue Nov 23 08:50:57 2010 From: gael.varoquaux at normalesup.org (Gael Varoquaux) Date: Tue, 23 Nov 2010 14:50:57 +0100 Subject: [Numpy-discussion] N dimensional dichotomy optimization In-Reply-To: References: <20101122223601.GH27341@phare.normalesup.org> <20101123085226.GA11522@phare.normalesup.org> <20101123092737.GB11522@phare.normalesup.org> <20101123101706.GE7942@phare.normalesup.org> <20101123104319.GF7942@phare.normalesup.org> Message-ID: <20101123135057.GM7942@phare.normalesup.org> On Tue, Nov 23, 2010 at 02:47:10PM +0100, Sebastian Walter wrote: > Well, I don't know what the best method is to solve your problem, so > take the following with a grain of salt: > Wouldn't it be better to change the model than modifying the > optimization algorithm? In this case, that's not possible. You can think of this parameter as the number of components in a PCA (it's actually a more complex dictionnary learning framework), so it's a parameter that is discrete, and I can't do anything about it :). > It sounds as if the resulting objective function is piecewise > constant. > AFAIK most optimization algorithms for continuous problems require at > least Lipschitz continuous functions to work ''acceptable well''. Not > sure if this is also true for Nelder-Mead. Yes correct. We do have a problem. I have a Nelder-Mead that seems to be working quite well on a few toy problems. Ga?l From josef.pktd at gmail.com Tue Nov 23 10:19:06 2010 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Tue, 23 Nov 2010 10:19:06 -0500 Subject: [Numpy-discussion] N dimensional dichotomy optimization In-Reply-To: <20101123135057.GM7942@phare.normalesup.org> References: <20101122223601.GH27341@phare.normalesup.org> <20101123085226.GA11522@phare.normalesup.org> <20101123092737.GB11522@phare.normalesup.org> <20101123101706.GE7942@phare.normalesup.org> <20101123104319.GF7942@phare.normalesup.org> <20101123135057.GM7942@phare.normalesup.org> Message-ID: On Tue, Nov 23, 2010 at 8:50 AM, Gael Varoquaux wrote: > On Tue, Nov 23, 2010 at 02:47:10PM +0100, Sebastian Walter wrote: >> Well, I don't know what the best method is to solve your problem, so >> take the following with a grain of salt: >> Wouldn't it be better to change the model than modifying the >> optimization algorithm? > > In this case, that's not possible. You can think of this parameter as the > number of components in a PCA (it's actually a more complex dictionnary > learning framework), so it's a parameter that is discrete, and I can't do > anything about it :). > >> It sounds as if the resulting objective function is piecewise >> constant. > >> AFAIK most optimization algorithms for continuous problems require at >> least Lipschitz continuous functions to work ''acceptable well''. Not >> sure if this is also true for Nelder-Mead. > > Yes correct. We do have a problem. > > I have a Nelder-Mead that seems to be working quite well on a few toy > problems. Assuming your function is well behaved, one possible idea is to try replacing the integer objective function with a continuous interpolation. Or maybe fit a bellshaped curve to a few gridpoints. It might get you faster into the right neighborhood to do an exact search. (There are some similar methods of using a surrogate objective function, when it is very expensive or impossible to calculate an objective function, but I never looked closely at these cases.) Josef > > Ga?l > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From stefan at sun.ac.za Tue Nov 23 10:24:24 2010 From: stefan at sun.ac.za (=?ISO-8859-1?Q?St=E9fan_van_der_Walt?=) Date: Tue, 23 Nov 2010 17:24:24 +0200 Subject: [Numpy-discussion] numpy.test() errors In-Reply-To: References: Message-ID: On Tue, Nov 23, 2010 at 9:28 AM, Nils Wagner wrote: > "/data/home/nwagner/local/lib/python2.5/site-packages/numpy/lib/npyio.py", > line 66, in seek_gzip_factory > ? ? g.name = f.name > AttributeError: GzipFile instance has no attribute 'name' This one is mine--the change was made to avoid a deprecationwarning. Which version of Python are you using? Regards St?fan From sebastian.walter at gmail.com Tue Nov 23 10:33:00 2010 From: sebastian.walter at gmail.com (Sebastian Walter) Date: Tue, 23 Nov 2010 16:33:00 +0100 Subject: [Numpy-discussion] N dimensional dichotomy optimization In-Reply-To: <20101123135057.GM7942@phare.normalesup.org> References: <20101122223601.GH27341@phare.normalesup.org> <20101123085226.GA11522@phare.normalesup.org> <20101123092737.GB11522@phare.normalesup.org> <20101123101706.GE7942@phare.normalesup.org> <20101123104319.GF7942@phare.normalesup.org> <20101123135057.GM7942@phare.normalesup.org> Message-ID: On Tue, Nov 23, 2010 at 2:50 PM, Gael Varoquaux wrote: > On Tue, Nov 23, 2010 at 02:47:10PM +0100, Sebastian Walter wrote: >> Well, I don't know what the best method is to solve your problem, so >> take the following with a grain of salt: >> Wouldn't it be better to change the model than modifying the >> optimization algorithm? > > In this case, that's not possible. You can think of this parameter as the > number of components in a PCA (it's actually a more complex dictionnary > learning framework), so it's a parameter that is discrete, and I can't do > anything about it :). In optimum experimental design one encounters MINLPs where integers define the number of rows of a matrix. At first glance it looks as if a relaxation is simply not possible: either there are additional rows or not. But with some technical transformations it is possible to reformulate the problem into a form that allows the relaxation of the integer constraint in a natural way. Maybe this is also possible in your case? Otherwise, well, let me know if you find a working solution ;) > >> It sounds as if the resulting objective function is piecewise >> constant. > >> AFAIK most optimization algorithms for continuous problems require at >> least Lipschitz continuous functions to work ''acceptable well''. Not >> sure if this is also true for Nelder-Mead. > > Yes correct. We do have a problem. > > I have a Nelder-Mead that seems to be working quite well on a few toy > problems. > > Ga?l > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From gerrit.holl at gmail.com Tue Nov 23 10:39:13 2010 From: gerrit.holl at gmail.com (Gerrit Holl) Date: Tue, 23 Nov 2010 16:39:13 +0100 Subject: [Numpy-discussion] numpy.test() errors In-Reply-To: References: Message-ID: 2010/11/23 St?fan van der Walt : > On Tue, Nov 23, 2010 at 9:28 AM, Nils Wagner > wrote: >> "/data/home/nwagner/local/lib/python2.5/site-packages/numpy/lib/npyio.py", >> line 66, in seek_gzip_factory >> ? ? g.name = f.name >> AttributeError: GzipFile instance has no attribute 'name' > > This one is mine--the change was made to avoid a deprecationwarning. > Which version of Python are you using? I hope 2.5, as his site-packages directory is in lib/python2.5 :) Gerrit. From gael.varoquaux at normalesup.org Tue Nov 23 10:53:20 2010 From: gael.varoquaux at normalesup.org (Gael Varoquaux) Date: Tue, 23 Nov 2010 16:53:20 +0100 Subject: [Numpy-discussion] N dimensional dichotomy optimization In-Reply-To: References: <20101123085226.GA11522@phare.normalesup.org> <20101123092737.GB11522@phare.normalesup.org> <20101123101706.GE7942@phare.normalesup.org> <20101123104319.GF7942@phare.normalesup.org> <20101123135057.GM7942@phare.normalesup.org> Message-ID: <20101123155319.GA24460@phare.normalesup.org> On Tue, Nov 23, 2010 at 10:19:06AM -0500, josef.pktd at gmail.com wrote: > > I have a Nelder-Mead that seems to be working quite well on a few toy > > problems. > Assuming your function is well behaved, one possible idea is to try > replacing the integer objective function with a continuous > interpolation. Or maybe fit a bellshaped curve to a few gridpoints. It > might get you faster into the right neighborhood to do an exact > search. I've actually been wondering if Gaussian Process regression (aka Krigin) would not be useful here :). It's fairly good at fitting processes for which there is very irregular information. Now I am perfectly aware that any move in this direction would require significant work, so this is more day dreaming than project planning. Gael From gael.varoquaux at normalesup.org Tue Nov 23 10:57:28 2010 From: gael.varoquaux at normalesup.org (Gael Varoquaux) Date: Tue, 23 Nov 2010 16:57:28 +0100 Subject: [Numpy-discussion] N dimensional dichotomy optimization In-Reply-To: References: <20101123085226.GA11522@phare.normalesup.org> <20101123092737.GB11522@phare.normalesup.org> <20101123101706.GE7942@phare.normalesup.org> <20101123104319.GF7942@phare.normalesup.org> <20101123135057.GM7942@phare.normalesup.org> Message-ID: <20101123155728.GB24460@phare.normalesup.org> On Tue, Nov 23, 2010 at 04:33:00PM +0100, Sebastian Walter wrote: > At first glance it looks as if a relaxation is simply not possible: > either there are additional rows or not. > But with some technical transformations it is possible to reformulate > the problem into a form that allows the relaxation of the integer > constraint in a natural way. > Maybe this is also possible in your case? Well, given that it is a cross-validation score that I am optimizing, there is not simple algorithm giving this score, so it's not obvious at all that there is a possible relaxation. A road to follow would be to find an oracle giving empirical risk after estimation of the penalized problem, and try to relax this oracle. That's two steps further than I am (I apologize if the above paragraph is incomprehensible, I am getting too much in the technivalities of my problem. > Otherwise, well, let me know if you find a working solution ;) Nelder-Mead seems to be working fine, so far. It will take a few weeks (or more) to have a real insight on what works and what doesn't. Thanks for your input, Gael From zachary.pincus at yale.edu Tue Nov 23 11:31:10 2010 From: zachary.pincus at yale.edu (Zachary Pincus) Date: Tue, 23 Nov 2010 11:31:10 -0500 Subject: [Numpy-discussion] N dimensional dichotomy optimization In-Reply-To: <20101123155728.GB24460@phare.normalesup.org> References: <20101123085226.GA11522@phare.normalesup.org> <20101123092737.GB11522@phare.normalesup.org> <20101123101706.GE7942@phare.normalesup.org> <20101123104319.GF7942@phare.normalesup.org> <20101123135057.GM7942@phare.normalesup.org> <20101123155728.GB24460@phare.normalesup.org> Message-ID: <5F8C0E20-A6D4-4252-9B48-7A0A79EE1C1F@yale.edu> On Nov 23, 2010, at 10:57 AM, Gael Varoquaux wrote: > On Tue, Nov 23, 2010 at 04:33:00PM +0100, Sebastian Walter wrote: >> At first glance it looks as if a relaxation is simply not possible: >> either there are additional rows or not. >> But with some technical transformations it is possible to reformulate >> the problem into a form that allows the relaxation of the integer >> constraint in a natural way. > >> Maybe this is also possible in your case? > > Well, given that it is a cross-validation score that I am optimizing, > there is not simple algorithm giving this score, so it's not obvious > at > all that there is a possible relaxation. A road to follow would be to > find an oracle giving empirical risk after estimation of the penalized > problem, and try to relax this oracle. That's two steps further than > I am > (I apologize if the above paragraph is incomprehensible, I am > getting too > much in the technivalities of my problem. > >> Otherwise, well, let me know if you find a working solution ;) > > Nelder-Mead seems to be working fine, so far. It will take a few weeks > (or more) to have a real insight on what works and what doesn't. Jumping in a little late, but it seems that simulated annealing might be a decent method here: take random steps (drawing from a distribution of integer step sizes), reject steps that fall outside the fitting range, and accept steps according to the standard annealing formula. Something with a global optimum but spikes along the way is pretty well-suited to SA in general, and it's also an easy algorithm to make work on a lattice. If you're in high dimensions, there are also bolt- on methods for biasing the steps toward "good directions" as opposed to just taking isotropic random steps. Again, pretty easy to think of discrete implementations of this... Zach From nwagner at iam.uni-stuttgart.de Tue Nov 23 12:34:17 2010 From: nwagner at iam.uni-stuttgart.de (Nils Wagner) Date: Tue, 23 Nov 2010 18:34:17 +0100 Subject: [Numpy-discussion] numpy.test() errors In-Reply-To: References: Message-ID: On Tue, 23 Nov 2010 16:39:13 +0100 Gerrit Holl wrote: > 2010/11/23 St?fan van der Walt : >> On Tue, Nov 23, 2010 at 9:28 AM, Nils Wagner >> wrote: >>> "/data/home/nwagner/local/lib/python2.5/site-packages/numpy/lib/npyio.py", >>> line 66, in seek_gzip_factory >>> ? ? g.name = f.name >>> AttributeError: GzipFile instance has no attribute >>>'name' >> >> This one is mine--the change was made to avoid a >>deprecationwarning. >> Which version of Python are you using? > > I hope 2.5, as his site-packages directory is in >lib/python2.5 :) > Exactly. Nils From matthieu.brucher at gmail.com Tue Nov 23 13:14:56 2010 From: matthieu.brucher at gmail.com (Matthieu Brucher) Date: Tue, 23 Nov 2010 19:14:56 +0100 Subject: [Numpy-discussion] N dimensional dichotomy optimization In-Reply-To: <5F8C0E20-A6D4-4252-9B48-7A0A79EE1C1F@yale.edu> References: <20101123085226.GA11522@phare.normalesup.org> <20101123092737.GB11522@phare.normalesup.org> <20101123101706.GE7942@phare.normalesup.org> <20101123104319.GF7942@phare.normalesup.org> <20101123135057.GM7942@phare.normalesup.org> <20101123155728.GB24460@phare.normalesup.org> <5F8C0E20-A6D4-4252-9B48-7A0A79EE1C1F@yale.edu> Message-ID: 2010/11/23 Zachary Pincus : > > On Nov 23, 2010, at 10:57 AM, Gael Varoquaux wrote: > >> On Tue, Nov 23, 2010 at 04:33:00PM +0100, Sebastian Walter wrote: >>> At first glance it looks as if a relaxation is simply not possible: >>> either there are additional rows or not. >>> But with some technical transformations it is possible to reformulate >>> the problem into a form that allows the relaxation of the integer >>> constraint in a natural way. >> >>> Maybe this is also possible in your case? >> >> Well, given that it is a cross-validation score that I am optimizing, >> there is not simple algorithm giving this score, so it's not obvious >> at >> all that there is a possible relaxation. A road to follow would be to >> find an oracle giving empirical risk after estimation of the penalized >> problem, and try to relax this oracle. That's two steps further than >> I am >> (I apologize if the above paragraph is incomprehensible, I am >> getting too >> much in the technivalities of my problem. >> >>> Otherwise, well, let me know if you find a working solution ;) >> >> Nelder-Mead seems to be working fine, so far. It will take a few weeks >> (or more) to have a real insight on what works and what doesn't. > > Jumping in a little late, but it seems that simulated annealing might > be a decent method here: take random steps (drawing from a > distribution of integer step sizes), reject steps that fall outside > the fitting range, and accept steps according to the standard > annealing formula. There is also a simulated-annealing modification of Nelder Mead that can be of use. Matthieu -- Information System Engineer, Ph.D. Blog: http://matt.eifelle.com LinkedIn: http://www.linkedin.com/in/matthieubrucher From nouiz at nouiz.org Tue Nov 23 14:18:10 2010 From: nouiz at nouiz.org (=?ISO-8859-1?Q?Fr=E9d=E9ric_Bastien?=) Date: Tue, 23 Nov 2010 14:18:10 -0500 Subject: [Numpy-discussion] Theano 0.3 Released Message-ID: ====================== ?Announcing Theano 0.3 ====================== This is an important release. The upgrade is recommended for everybody using Theano 0.1. For those using the bleeding edge version in the mercurial repository, we encourage you to update to the `0.3` tag. This is the first major release of Theano since 0.1. Version 0.2 development started internally but it was never advertised as a release. What's New ---------- There have been too many changes since 0.1 to keep track of them all. Below is a *partial* list of changes since 0.1. ?* GPU code using NVIDIA's CUDA framework is now generated for many Ops. ?* Some interface changes since 0.1: ? ? * A new "shared variable" system which allows for reusing memory space between ? ? ? Theano functions. ? ? ? ? * A new memory contract has been formally written for Theano, ? ? ? ? ? for people who want to minimize memory copies. ? ? * The old module system has been deprecated. ? ? * By default, inputs to a Theano function will not be silently ? ? ? downcasted (e.g. from float64 to float32). ? ? * An error is now raised when using the result of a logical operation of ? ? ? a Theano variable in an 'if' (i.e. an implicit call to __nonzeros__). ? ? * An error is now raised when we receive a non-aligned ndarray as ? ? ? input to a function (this is not supported). ? ? * An error is raised when the list of dimensions passed to ? ? ? dimshuffle() contains duplicates or is otherwise not sensible. ? ? * Call NumPy BLAS bindings for gemv operations in addition to the ? ? ? already supported gemm. ? ? * If gcc is unavailable at import time, Theano now falls back to a ? ? ? Python-based emulation mode after raising a warning. ? ? * An error is now raised when tensor.grad is called on a non-scalar ? ? ? Theano variable (in the past we would implicitly do a sum on the ? ? ? tensor to make it a scalar). ? ? * Added support for "erf" and "erfc" functions. ?* The current default value of the parameter axis of theano.{max,min, ? argmax,argmin,max_and_argmax} is deprecated. We now use the default NumPy ? behavior of operating on the entire tensor. ?* Theano is now available from PyPI and installable through "easy_install" or ? "pip". You can download Theano from http://pypi.python.org/pypi/Theano. Description ----------- Theano is a Python library that allows you to define, optimize, and efficiently evaluate mathematical expressions involving multi-dimensional arrays. It is built on top of NumPy. Theano features: ?* tight integration with NumPy: a similar interface to NumPy's. ? numpy.ndarrays are also used internally in Theano-compiled functions. ?* transparent use of a GPU: perform data-intensive computations up to ? 140x faster than on a CPU (support for float32 only). ?* efficient symbolic differentiation: Theano can compute derivatives ? for functions of one or many inputs. ?* speed and stability optimizations: avoid nasty bugs when computing ? expressions such as log(1+ exp(x) ) for large values of x. ?* dynamic C code generation: evaluate expressions faster. ?* extensive unit-testing and self-verification: includes tools for ? detecting and diagnosing bugs and/or potential problems. Theano has been powering large-scale computationally intensive scientific research since 2007, but it is also approachable enough to be used in the classroom (IFT6266 at the University of Montreal). Resources --------- About Theano: http://deeplearning.net/software/theano/ About NumPy: http://numpy.scipy.org/ About Scipy: http://www.scipy.org/ Acknowledgments --------------- I would like to thank all contributors of Theano. For this particular release, the people who have helped resolve many outstanding issues: (in alphabetical order) Frederic Bastien, James Bergstra, Guillaume Desjardins, David-Warde Farley, Ian Goodfellow, Pascal Lamblin, Razvan Pascanu and Josh Bleecher Snyder. Also, thank you to all NumPy and Scipy developers as Theano builds on its strength. All questions/comments are always welcome on the Theano mailing-lists ( http://deeplearning.net/software/theano/ ) From stefan at sun.ac.za Tue Nov 23 16:24:25 2010 From: stefan at sun.ac.za (=?ISO-8859-1?Q?St=E9fan_van_der_Walt?=) Date: Tue, 23 Nov 2010 23:24:25 +0200 Subject: [Numpy-discussion] numpy.test() errors In-Reply-To: References: Message-ID: 2010/11/23 St?fan van der Walt : > On Tue, Nov 23, 2010 at 9:28 AM, Nils Wagner > wrote: >> "/data/home/nwagner/local/lib/python2.5/site-packages/numpy/lib/npyio.py", >> line 66, in seek_gzip_factory >> ? ? g.name = f.name >> AttributeError: GzipFile instance has no attribute 'name' > > This one is mine--the change was made to avoid a deprecationwarning. > Which version of Python are you using? OK, should be fixed. Let me know if it is working. Cheers St?fan From pav at iki.fi Tue Nov 23 16:44:49 2010 From: pav at iki.fi (Pauli Virtanen) Date: Tue, 23 Nov 2010 21:44:49 +0000 (UTC) Subject: [Numpy-discussion] numpy.test() errors References: Message-ID: On Tue, 23 Nov 2010 23:24:25 +0200, St?fan van der Walt wrote: > 2010/11/23 St?fan van der Walt : >> On Tue, Nov 23, 2010 at 9:28 AM, Nils Wagner >> wrote: >>> "/data/home/nwagner/local/lib/python2.5/site-packages/numpy/lib/npyio.py", >>> line 66, in seek_gzip_factory >>> ? ? g.name = f.name >>> AttributeError: GzipFile instance has no attribute 'name' >> >> This one is mine--the change was made to avoid a deprecationwarning. >> Which version of Python are you using? > > OK, should be fixed. Let me know if it is working. There's this on Python 3.2: ====================================================================== ERROR: test_io.test_gzip_load ---------------------------------------------------------------------- Traceback (most recent call last): File "/var/lib/buildslave/numpy-real/b15/../../local3/nose/case.py", line 177, in runTest self.test(*self.arg) File "/var/lib/buildslave/numpy-real/b15/numpy-install-3.2/lib/python3.2/site-packages/numpy/lib/tests/test_io.py", line 1255, in test_gzip_load assert_array_equal(np.load(f), a) File "/var/lib/buildslave/numpy-real/b15/numpy-install-3.2/lib/python3.2/site-packages/numpy/lib/npyio.py", line 332, in load fid = seek_gzip_factory(file) File "/var/lib/buildslave/numpy-real/b15/numpy-install-3.2/lib/python3.2/site-packages/numpy/lib/npyio.py", line 73, in seek_gzip_factory f = GzipFile(fileobj=f.fileobj, filename=name) File "/usr/local/stow/python-3.2a4/lib/python3.2/gzip.py", line 162, in __init__ if hasattr(fileobj, 'mode'): mode = fileobj.mode File "/usr/local/stow/python-3.2a4/lib/python3.2/gzip.py", line 101, in __getattr__ return getattr(name, self.file) TypeError: getattr(): attribute name must be string and also Python 3.2a4 (r32a4:86446, Nov 20 2010, 17:59:19) [GCC 4.4.5] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import numpy as np >>> np.core.multiarray.__file__ '.../numpy/core/multiarray.cpython-32m.so' which leads to ====================================================================== ERROR: Failure: OSError (/var/lib/buildslave/numpy-real/b15/numpy-install-3.2/lib/python3.2/site-packages/numpy/core/multiarray.pyd: cannot open shared object file: No such file or directory) ---------------------------------------------------------------------- Traceback (most recent call last): File "/var/lib/buildslave/numpy-real/b15/../../local3/nose/failure.py", line 27, in runTest reraise(self.exc_class, self.exc_val, self.tb) File "/var/lib/buildslave/numpy-real/b15/../../local3/nose/_3.py", line 7, in reraise raise exc_class(exc_val).with_traceback(tb) File "/var/lib/buildslave/numpy-real/b15/../../local3/nose/loader.py", line 372, in loadTestsFromName addr.filename, addr.module) File "/var/lib/buildslave/numpy-real/b15/../../local3/nose/importer.py", line 39, in importFromPath return self.importFromDir(dir_path, fqname) File "/var/lib/buildslave/numpy-real/b15/../../local3/nose/importer.py", line 84, in importFromDir mod = load_module(part_fqname, fh, filename, desc) File "/var/lib/buildslave/numpy-real/b15/numpy-install-3.2/lib/python3.2/site-packages/numpy/tests/test_ctypeslib.py", line 8, in cdll = load_library('multiarray', np.core.multiarray.__file__) File "/var/lib/buildslave/numpy-real/b15/numpy-install-3.2/lib/python3.2/site-packages/numpy/ctypeslib.py", line 122, in load_library raise exc File "/var/lib/buildslave/numpy-real/b15/numpy-install-3.2/lib/python3.2/site-packages/numpy/ctypeslib.py", line 119, in load_library return ctypes.cdll[libpath] File "/usr/local/stow/python-3.2a4/lib/python3.2/ctypes/__init__.py", line 415, in __getitem__ return getattr(self, name) File "/usr/local/stow/python-3.2a4/lib/python3.2/ctypes/__init__.py", line 410, in __getattr__ dll = self._dlltype(name) File "/usr/local/stow/python-3.2a4/lib/python3.2/ctypes/__init__.py", line 340, in __init__ self._handle = _dlopen(self._name, mode) OSError: /var/lib/buildslave/numpy-real/b15/numpy-install-3.2/lib/python3.2/site-packages/numpy/core/multiarray.pyd: cannot open shared object file: No such file or directory From drnlmuller+scipy at gmail.com Tue Nov 23 16:56:07 2010 From: drnlmuller+scipy at gmail.com (Neil Muller) Date: Tue, 23 Nov 2010 23:56:07 +0200 Subject: [Numpy-discussion] numpy.test() errors In-Reply-To: References: Message-ID: On 23 November 2010 23:44, Pauli Virtanen wrote: > On Tue, 23 Nov 2010 23:24:25 +0200, St?fan van der Walt wrote: > There's this on Python 3.2: > > ====================================================================== > ERROR: test_io.test_gzip_load > ---------------------------------------------------------------------- > Traceback (most recent call last): > ?File "/var/lib/buildslave/numpy-real/b15/../../local3/nose/case.py", line 177, in runTest > ? ?self.test(*self.arg) > ?File "/var/lib/buildslave/numpy-real/b15/numpy-install-3.2/lib/python3.2/site-packages/numpy/lib/tests/test_io.py", line 1255, in test_gzip_load > ? ?assert_array_equal(np.load(f), a) > ?File "/var/lib/buildslave/numpy-real/b15/numpy-install-3.2/lib/python3.2/site-packages/numpy/lib/npyio.py", line 332, in load > ? ?fid = seek_gzip_factory(file) > ?File "/var/lib/buildslave/numpy-real/b15/numpy-install-3.2/lib/python3.2/site-packages/numpy/lib/npyio.py", line 73, in seek_gzip_factory > ? ?f = GzipFile(fileobj=f.fileobj, filename=name) > ?File "/usr/local/stow/python-3.2a4/lib/python3.2/gzip.py", line 162, in __init__ > ? ?if hasattr(fileobj, 'mode'): mode = fileobj.mode > ?File "/usr/local/stow/python-3.2a4/lib/python3.2/gzip.py", line 101, in __getattr__ > ? ?return getattr(name, self.file) > TypeError: getattr(): attribute name must be string This was filed and fixed during the python bug weekend (http://bugs.python.org/issue10465), so it shouldn't be a problem with a current 3.2 checkout. -- Neil Muller drnlmuller at gmail.com I've got a gmail account. Why haven't I become cool? From gael.varoquaux at normalesup.org Tue Nov 23 18:15:53 2010 From: gael.varoquaux at normalesup.org (Gael Varoquaux) Date: Wed, 24 Nov 2010 00:15:53 +0100 Subject: [Numpy-discussion] N dimensional dichotomy optimization In-Reply-To: References: <20101123101706.GE7942@phare.normalesup.org> <20101123104319.GF7942@phare.normalesup.org> <20101123135057.GM7942@phare.normalesup.org> <20101123155728.GB24460@phare.normalesup.org> <5F8C0E20-A6D4-4252-9B48-7A0A79EE1C1F@yale.edu> Message-ID: <20101123231553.GD2391@phare.normalesup.org> On Tue, Nov 23, 2010 at 07:14:56PM +0100, Matthieu Brucher wrote: > > Jumping in a little late, but it seems that simulated annealing might > > be a decent method here: take random steps (drawing from a > > distribution of integer step sizes), reject steps that fall outside > > the fitting range, and accept steps according to the standard > > annealing formula. > There is also a simulated-annealing modification of Nelder Mead that > can be of use. Sounds interesting. Any reference? G From kmmyung at kesti.co.kr Tue Nov 23 20:20:48 2010 From: kmmyung at kesti.co.kr (=?utf-8?B?66qF6rSR66+8?=) Date: Wed, 24 Nov 2010 10:20:48 +0900 Subject: [Numpy-discussion] question for numpy import error Message-ID: I receive the following error when I try to import numpy =========================================================== [node0]:/home/koojy/KMM> python2.6 Python 2.6.4 (r264:75706, Jul 26 2010, 16:55:18) [GCC 3.4.6 20060404 (Red Hat 3.4.6-10)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import numpy Traceback (most recent call last): File "", line 1, in File "/usr/local/lib/python2.6/site-packages/numpy/__init__.py", line 132, in import add_newdocs File "/usr/local/lib/python2.6/site-packages/numpy/add_newdocs.py", line 9, in from lib import add_newdoc File "/usr/local/lib/python2.6/site-packages/numpy/lib/__init__.py", line 13, in from polynomial import * File "/usr/local/lib/python2.6/site-packages/numpy/lib/polynomial.py", line 17, in from numpy.linalg import eigvals, lstsq File "/usr/local/lib/python2.6/site-packages/numpy/linalg/__init__.py", line 47, in from linalg import * File "/usr/local/lib/python2.6/site-packages/numpy/linalg/linalg.py", line 22, in from numpy.linalg import lapack_lite ImportError: libgfortran.so.1: cannot open shared object file: No such file or directory >>> ============================================================ OS info : Linux node0 2.6.9-78.ELsmp #1 SMP Thu Jul 24 23:54:48 EDT 2008 x86_64 x86_64 x86_64 GNU/Linux numpy install : following the "Building by hand" of Contents of below site. http://www.scipy.org/Installing_SciPy/Linux#head-40c26a5b93b9afc7e3241e1d7fd84fe9326402e7 and To build with gfortran: python setup.py build ?fcompiler=gnu95 Any help on this error would be appreciated. K.M. Myung ================================== Korea Environmental Science & Technology Institute, inc. phone : 82-2-2113-0705 direct : 82-70-7098-2644 fax : 82-2-2113-0706 e-mail : kmmyung at kesti.co.kr ================================== -------------- next part -------------- An HTML attachment was scrubbed... URL: From matthieu.brucher at gmail.com Wed Nov 24 02:53:53 2010 From: matthieu.brucher at gmail.com (Matthieu Brucher) Date: Wed, 24 Nov 2010 08:53:53 +0100 Subject: [Numpy-discussion] N dimensional dichotomy optimization In-Reply-To: <20101123231553.GD2391@phare.normalesup.org> References: <20101123101706.GE7942@phare.normalesup.org> <20101123104319.GF7942@phare.normalesup.org> <20101123135057.GM7942@phare.normalesup.org> <20101123155728.GB24460@phare.normalesup.org> <5F8C0E20-A6D4-4252-9B48-7A0A79EE1C1F@yale.edu> <20101123231553.GD2391@phare.normalesup.org> Message-ID: 2010/11/24 Gael Varoquaux : > On Tue, Nov 23, 2010 at 07:14:56PM +0100, Matthieu Brucher wrote: >> > Jumping in a little late, but it seems that simulated annealing might >> > be a decent method here: take random steps (drawing from a >> > distribution of integer step sizes), reject steps that fall outside >> > the fitting range, and accept steps according to the standard >> > annealing formula. > >> There is also a simulated-annealing modification of Nelder Mead that >> can be of use. > > Sounds interesting. Any reference? Not right away, I have to check. The main difference is the possible acceptance of a contraction that doesn't lower the cost, and this is done with a temperature like simulated annealing. Matthieu -- Information System Engineer, Ph.D. Blog: http://matt.eifelle.com LinkedIn: http://www.linkedin.com/in/matthieubrucher From charlesr.harris at gmail.com Wed Nov 24 12:18:58 2010 From: charlesr.harris at gmail.com (Charles R Harris) Date: Wed, 24 Nov 2010 10:18:58 -0700 Subject: [Numpy-discussion] N dimensional dichotomy optimization In-Reply-To: References: <1290456190.19533.2.camel@Nokia-N900-51-1> <20101122215735.GE27341@phare.normalesup.org> <20101122223601.GH27341@phare.normalesup.org> <20101123085226.GA11522@phare.normalesup.org> <20101123092737.GB11522@phare.normalesup.org> <20101123101706.GE7942@phare.normalesup.org> Message-ID: On Tue, Nov 23, 2010 at 3:37 AM, Sebastian Walter < sebastian.walter at gmail.com> wrote: > On Tue, Nov 23, 2010 at 11:17 AM, Gael Varoquaux > wrote: > > On Tue, Nov 23, 2010 at 11:13:23AM +0100, Sebastian Walter wrote: > >> I'm not familiar with dichotomy optimization. > >> Several techniques have been proposed to solve the problem: genetic > >> algorithms, simulated annealing, Nelder-Mead and Powell. > >> To be honest, I find it quite confusing that these algorithms are > >> named in the same breath. > > > > I am confused too. But that stems from my lack of knowledge in > > optimization. > > > >> Do you have a continuous or a discrete problem? > > > > Both. > > > >> Is your problem of the following form? > > > >> min_x f(x) > >> s.t. lo <= Ax + b <= up > >> 0 = g(x) > >> 0 <= h(x) > > > > No constraints. > > didn't you say that you operate only in some convex hull? > > > > >> An if yes, in which space does x live? > > > > Either in R^n, in the set of integers (unidimensional), or in the set of > > positive integers. > According to http://openopt.org/Problems > this is a mixed integer nonlinear program http://openopt.org/MINLP . I > don't have experience with the solver though, > but it may take a long time to run it since it uses branch-and-bound. > In my field of work we typically relax the integers to real numbers, > perform the optimization and then round to the next integer. > This is often sufficiently close a good solution. > > I've sometimes applied a genetic algorithm to the rounding, which can improve things a bit if needed. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From jens.marquard.ipsen at gmail.com Wed Nov 24 12:35:34 2010 From: jens.marquard.ipsen at gmail.com (NumPyStudent) Date: Wed, 24 Nov 2010 09:35:34 -0800 (PST) Subject: [Numpy-discussion] How to export a CObject from NumPy to my own module? In-Reply-To: <30295576.post@talk.nabble.com> References: <30295576.post@talk.nabble.com> Message-ID: <30298423.post@talk.nabble.com> Well, I believe I should problably use the PyArray_API instead (http://www.velocityreviews.com/forums/t358140-re-dynamically-loaded-libraries.html), since it seems it's implemented in NumPy for this exact reason(?). Can anyone point me in the direction of a good step-step guide og using this api to export function pointers from NumPy to your own modules? NumPyStudent wrote: > > Hi > > I haven't worked very much with NumPy and Python and hope some assistance > :-) > > I need access to a functionpointer in my own Python module. > This functionpointer is available in a modified NumPy module. > I would like to pass the functionpointer to my module when I import it in > the Python runtime. > Before importing my module I've already imported the NumPy module, and > this is fine by me - > I don't necessarily need my own module to handle the import of NumPy... > > My question concerns how I should modify NumPy source code to be able to > use the method of passing CObjects as described here: > http://docs.python.org/release/2.5.2/ext/using-cobjects.html > > Can anyone tell me where I should add the C-Object in NumPy? > I tried to add it to the "multiarray" module (in the file > multiarraymodule.c), but it dosen't look like this module is available for > import (as in PyImport_ImportModule("multiarray"); in my own module - the > return value of this is null). > However, if I try PyImport_ImportModule("numpy"); it seems like I get a > module imported, but this module does not have the CObject added. This is > not surprising, since I added it to the "multiarray" module. > > I believe I can solve my problem by adding the CObject to the "numpy" > module instead, but in what file or files should I make modifications to > do this? > > Alternatively, how can I make the "multiarray" module available for import > by my own module? > > BR. > NumPyStudent > -- View this message in context: http://old.nabble.com/How-to-export-a-CObject-from-NumPy-to-my-own-module--tp30295576p30298423.html Sent from the Numpy-discussion mailing list archive at Nabble.com. From robert.kern at gmail.com Wed Nov 24 12:40:04 2010 From: robert.kern at gmail.com (Robert Kern) Date: Wed, 24 Nov 2010 11:40:04 -0600 Subject: [Numpy-discussion] How to export a CObject from NumPy to my own module? In-Reply-To: <30298423.post@talk.nabble.com> References: <30295576.post@talk.nabble.com> <30298423.post@talk.nabble.com> Message-ID: On Wed, Nov 24, 2010 at 11:35, NumPyStudent wrote: > > Well, I believe I should problably use the PyArray_API instead > (http://www.velocityreviews.com/forums/t358140-re-dynamically-loaded-libraries.html), > since it seems it's implemented in NumPy for this exact reason(?). > Can anyone point me in the direction of a good step-step guide og using this > api to export function pointers from NumPy to your own modules? http://docs.scipy.org/doc/numpy/user/c-info.how-to-extend.html?highlight=import_array -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." ? -- Umberto Eco From Chris.Barker at noaa.gov Wed Nov 24 12:59:50 2010 From: Chris.Barker at noaa.gov (Christopher Barker) Date: Wed, 24 Nov 2010 09:59:50 -0800 Subject: [Numpy-discussion] How to export a CObject from NumPy to my own module? In-Reply-To: <30298423.post@talk.nabble.com> References: <30295576.post@talk.nabble.com> <30298423.post@talk.nabble.com> Message-ID: <4CED5296.10004@noaa.gov> On 11/24/10 9:35 AM, NumPyStudent wrote: > > Well, I believe I should problably use the PyArray_API instead > (http://www.velocityreviews.com/forums/t358140-re-dynamically-loaded-libraries.html), > since it seems it's implemented in NumPy for this exact reason(?). > Can anyone point me in the direction of a good step-step guide og using this > api to export function pointers from NumPy to your own modules? It's not clear to me what you really are trying accomplish here. A little more detail about your problem may help. If what you are trying to do is write a C-extension that can take numpy arrays as input, and/or provide them as output, while working natively with the data inside them in C, then yes, working with the numpy C api is the way to go. However, I suggest you take a look at Cython: http://www.cython.org/ It provides a much easier way to write python extensions, and has build-in understanding of numpy arrays. It can call arbitrary C code so there are no limits to its functionality, but it will save you much pain in reference coutning, converting to/from python objects, etc. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From charlesr.harris at gmail.com Wed Nov 24 14:13:19 2010 From: charlesr.harris at gmail.com (Charles R Harris) Date: Wed, 24 Nov 2010 12:13:19 -0700 Subject: [Numpy-discussion] Give Mark Wiebe numpy commit rights. Message-ID: Hi All, I'd like to give Mark Wiebe commit rights. That does bring up the question of commit rights proliferation, so along with that I'd like to suggest time limits on commit rights, something along the lines of "no commits for a year, lose rights". It's nothing punitive, it's just a matter of keeping some control over what happens to the repository. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From pav at iki.fi Wed Nov 24 14:25:26 2010 From: pav at iki.fi (Pauli Virtanen) Date: Wed, 24 Nov 2010 19:25:26 +0000 (UTC) Subject: [Numpy-discussion] Give Mark Wiebe numpy commit rights. References: Message-ID: On Wed, 24 Nov 2010 12:13:19 -0700, Charles R Harris wrote: > I'd like to give Mark Wiebe commit rights. No objections from me, we do need more capable hands on the C side. Though it would be nice for someone (me?) to comment on the half-float addition before considering pushing it. > That does > bring up the question of commit rights proliferation, so along with that > I'd like to suggest time limits on commit rights, something along the > lines of "no commits for a year, lose rights". It's nothing punitive, > it's just a matter of keeping some control over what happens to the > repository. +0, I see no problems with this approach. -- Pauli Virtanen From charlesr.harris at gmail.com Wed Nov 24 14:33:31 2010 From: charlesr.harris at gmail.com (Charles R Harris) Date: Wed, 24 Nov 2010 12:33:31 -0700 Subject: [Numpy-discussion] Give Mark Wiebe numpy commit rights. In-Reply-To: References: Message-ID: On Wed, Nov 24, 2010 at 12:25 PM, Pauli Virtanen wrote: > On Wed, 24 Nov 2010 12:13:19 -0700, Charles R Harris wrote: > > I'd like to give Mark Wiebe commit rights. > > No objections from me, we do need more capable hands on the C side. > > Though it would be nice for someone (me?) to comment on the half-float > addition before considering pushing it. > > I'd like to look at it too. I don't think it's a big rush unless we plan on a 1.6 in a few months. For which I'm thinking sometime around April/May might be the right time slot ;) > > That does > > bring up the question of commit rights proliferation, so along with that > > I'd like to suggest time limits on commit rights, something along the > > lines of "no commits for a year, lose rights". It's nothing punitive, > > it's just a matter of keeping some control over what happens to the > > repository. > > +0, I see no problems with this approach. > > That said, I haven't looked to see how many folks actually have rights at this time. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From friedrichromstedt at gmail.com Wed Nov 24 15:16:15 2010 From: friedrichromstedt at gmail.com (Friedrich Romstedt) Date: Wed, 24 Nov 2010 21:16:15 +0100 Subject: [Numpy-discussion] broadcasting with numpy.interp In-Reply-To: References: Message-ID: 2010/11/16 greg whittier : > I'd like to be able to speed up the following code. > > def replace_dead(cube, dead): > ? # cube.shape == (320, 640, 1200) > ? # dead.shape == (320, 640) > ? # cube[i,j,:] are bad points to be replaced via interpolation if > dead[i,j] == True > > ? ?bands = np.arange(0, cube.shape[0]) > ? ?for line in range(cube.shape[1]): > ? ? ? ?dead_bands = bands[dead[:, line] == True] > ? ? ? ?good_bands = bands[dead[:, line] == False] > ? ? ? ?for sample in range(cube.shape[2]): > ? ? ? ? ? ?# interp returns fp[0] for x < xp[0] and fp[-1] for x > xp[-1] > ? ? ? ? ? ?cube[dead_bands, line, sample] = \ > ? ? ? ? ? ? ? ?np.interp(dead_bands, > ? ? ? ? ? ? ? ? ? ? ? ? ?good_bands, > ? ? ? ? ? ? ? ? ? ? ? ? ?cube[good_bands, line, sample]) I assume you just need *some* interpolation, not that specific one? In that case, I'd suggest the following: 1) Use a 2d interpolation, taking into account all nearest neighbours. 2) For this, use a looped interpolation in this nearest-neighbour sense: a) Generate sums of all unmasked nearest-neighbour values b) Generate counts for the nearest neighbours present c) Replace the bad values by the sums divided by the count. d) Continue at (a) if there are bad values left Bad values which are neighbouring each other (>= 3) need multiple passes through the loop. It should be pretty fast. If this is what you have in mind, maybe we (or I) can make up some code. Friedrich From jeanluc.menut at free.fr Thu Nov 25 05:13:49 2010 From: jeanluc.menut at free.fr (Jean-Luc Menut) Date: Thu, 25 Nov 2010 11:13:49 +0100 Subject: [Numpy-discussion] numpy speed question Message-ID: <4CEE36DD.8000105@free.fr> Hello all, I have a little question about the speed of numpy vs IDL 7.0. I did a very simple little check by computing just a cosine in a loop. I was quite surprised to see an order of magnitude of difference between numpy and IDL, I would have thought that for such a basic function, the speed would be approximatively the same. I suppose that some of the difference may come from the default data type of 64bits in numpy and 32 bits in IDL. Is there a way to change the numpy default data type (without recompiling) ? And I'm not an expert at all, maybe there is a better explanation, like a better use of the several CPU core by IDL ? I'm working with windows 7 64 bits on a core i7. any hint is welcome. Thanks. Here the IDL code : Julian1 = SYSTIME( /JULIAN , /UTC ) for j=0,9999 do begin for i=0,999 do begin a=cos(2*!pi*i/100.) endfor endfor Julian2 = SYSTIME( /JULIAN , /UTC ) print, (Julian2-Julian1)*86400.0 print,cpt end result: % Compiled module: $MAIN$. 2.9999837 The python code: from numpy import * from time import time time1 = time() for j in range(10000): for i in range(1000): a=cos(2*pi*i/100.) time2 = time() print time2-time1 result: In [2]: run python_test_speed.py 24.1809999943 From sebastian.walter at gmail.com Thu Nov 25 05:38:06 2010 From: sebastian.walter at gmail.com (Sebastian Walter) Date: Thu, 25 Nov 2010 11:38:06 +0100 Subject: [Numpy-discussion] numpy speed question In-Reply-To: <4CEE36DD.8000105@free.fr> References: <4CEE36DD.8000105@free.fr> Message-ID: using math.cos instead of numpy.cos should be much faster. I believe this is a known issue of numpy. On Thu, Nov 25, 2010 at 11:13 AM, Jean-Luc Menut wrote: > Hello all, > > I have a little question about the speed of numpy vs IDL 7.0. I did a > very simple little check by computing just a cosine in a loop. I was > quite surprised to see an order of magnitude of difference between numpy > and IDL, I would have thought that for such a basic function, the speed > would be approximatively the same. > > I suppose that some of the difference may come from ?the default data > type of 64bits in numpy and 32 bits in IDL. Is there a way to change the > numpy default data type (without recompiling) ? > > And I'm not an expert at all, maybe there is a better explanation, like > a better use of the several CPU core by IDL ? > > I'm working with windows 7 64 bits on a core i7. > > any hint is welcome. > Thanks. > > Here the IDL code : > Julian1 = SYSTIME( /JULIAN , /UTC ) > for j=0,9999 do begin > ? for i=0,999 do begin > ? ? a=cos(2*!pi*i/100.) > ? endfor > endfor > Julian2 = SYSTIME( /JULIAN , /UTC ) > print, (Julian2-Julian1)*86400.0 > print,cpt > end > > result: > % Compiled module: $MAIN$. > ? ? ? ?2.9999837 > > > The python code: > from numpy import * > from time import time > time1 = time() > for j in range(10000): > ? ? for i in range(1000): > ? ? ? ? a=cos(2*pi*i/100.) > time2 = time() > print time2-time1 > > result: > In [2]: run python_test_speed.py > 24.1809999943 > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From jeanluc.menut at free.fr Thu Nov 25 05:49:14 2010 From: jeanluc.menut at free.fr (Jean-Luc Menut) Date: Thu, 25 Nov 2010 11:49:14 +0100 Subject: [Numpy-discussion] numpy speed question In-Reply-To: References: <4CEE36DD.8000105@free.fr> Message-ID: <4CEE3F2A.6010702@free.fr> Le 25/11/2010 11:38, Sebastian Walter a ?crit : > using math.cos instead of numpy.cos should be much faster. > I believe this is a known issue of numpy. You're right, with math.cos, the code take 4.3s to run, not as fast as IDL, but a lot better. From eadrogue at gmx.net Thu Nov 25 05:51:07 2010 From: eadrogue at gmx.net (Ernest =?iso-8859-1?Q?Adrogu=E9?=) Date: Thu, 25 Nov 2010 11:51:07 +0100 Subject: [Numpy-discussion] numpy speed question In-Reply-To: <4CEE36DD.8000105@free.fr> References: <4CEE36DD.8000105@free.fr> Message-ID: <20101125105107.GA8701@doriath.local> Hi, 25/11/10 @ 11:13 (+0100), thus spake Jean-Luc Menut: > I suppose that some of the difference may come from the default data > type of 64bits in numpy and 32 bits in IDL. Is there a way to change the > numpy default data type (without recompiling) ? This is probably not the issue. > And I'm not an expert at all, maybe there is a better explanation, like > a better use of the several CPU core by IDL ? I'm not an expert either, but the basic idea you have to get is that "for" loops in Python are slow. Numpy is not going to change this. Instead, Numpy allows you to work with "vectors" and "arrays" so that you need not putting loops in your code. So, you have to change the way you think about things, it takes a little to get used to it at first. Cheers, -- Ernest From dave.hirschfeld at gmail.com Thu Nov 25 05:49:57 2010 From: dave.hirschfeld at gmail.com (Dave Hirschfeld) Date: Thu, 25 Nov 2010 10:49:57 +0000 (UTC) Subject: [Numpy-discussion] numpy speed question References: <4CEE36DD.8000105@free.fr> Message-ID: Jean-Luc Menut free.fr> writes: > > I have a little question about the speed of numpy vs IDL 7.0. > > Here the IDL result: > % Compiled module: $MAIN$. > 2.9999837 > > The python code: > from numpy import * > from time import time > time1 = time() > for j in range(10000): > for i in range(1000): > a=cos(2*pi*i/100.) > time2 = time() > print time2-time1 > > result: > In [2]: run python_test_speed.py > 24.1809999943 > Whilst you've imported everything from numpy you're not really using numpy - you're still using a slow Python (double) loop. The power of numpy comes from vectorising your code - i.e. applying functions to arrays of data. The example below demonstrates an 80 fold increase in speed by vectorising the calculation: def method1(): a = empty([1000, 10000]) for j in range(10000): for i in range(1000): a[i,j] = cos(2*pi*i/100.) return a # def method2(): ij = np.repeat((2*pi*np.arange(1000)/100.)[:,None], 10000, axis=1) return np.cos(ij) # In [46]: timeit method1() 1 loops, best of 3: 47.9 s per loop In [47]: timeit method2() 1 loops, best of 3: 589 ms per loop In [48]: allclose(method1(), method2()) Out[48]: True From jeanluc.menut at free.fr Thu Nov 25 05:55:24 2010 From: jeanluc.menut at free.fr (Jean-Luc Menut) Date: Thu, 25 Nov 2010 11:55:24 +0100 Subject: [Numpy-discussion] numpy speed question In-Reply-To: <20101125105107.GA8701@doriath.local> References: <4CEE36DD.8000105@free.fr> <20101125105107.GA8701@doriath.local> Message-ID: <4CEE409C.1010306@free.fr> Le 25/11/2010 11:51, Ernest Adrogu? a ?crit : > I'm not an expert either, but the basic idea you have to get is > that "for" loops in Python are slow. Numpy is not going to change > this. Instead, Numpy allows you to work with "vectors" and "arrays" > so that you need not putting loops in your code. So, you have to > change the way you think about things, it takes a little to get > used to it at first. Yes I know but IDL share this characteristics with numpy, and sometimes you cannot avoid loop. Anyway it was just a test to compare the speed of the cosine function in IDL and numpy. From alan.isaac at gmail.com Thu Nov 25 09:00:57 2010 From: alan.isaac at gmail.com (Alan G Isaac) Date: Thu, 25 Nov 2010 09:00:57 -0500 Subject: [Numpy-discussion] numpy speed question In-Reply-To: <4CEE409C.1010306@free.fr> References: <4CEE36DD.8000105@free.fr> <20101125105107.GA8701@doriath.local> <4CEE409C.1010306@free.fr> Message-ID: <4CEE6C19.2050404@gmail.com> On 11/25/2010 5:55 AM, Jean-Luc Menut wrote: > it was just a test to compare the speed of > the cosine function in IDL and numpy The point others are trying to make is that you *instead* tested the speed of creation of a certain object type. To test the *function* speeds, feed both large arrays. >>> type(0.5) >>> type(math.cos(0.5)) >>> type(np.cos(0.5)) hth, Alan Isaac From jens.marquard.ipsen at gmail.com Thu Nov 25 10:07:46 2010 From: jens.marquard.ipsen at gmail.com (NumPyStudent) Date: Thu, 25 Nov 2010 07:07:46 -0800 (PST) Subject: [Numpy-discussion] How to export a CObject from NumPy to my own module? In-Reply-To: References: <30295576.post@talk.nabble.com> <30298423.post@talk.nabble.com> Message-ID: <30306050.post@talk.nabble.com> Thanks for your answers! -- View this message in context: http://old.nabble.com/How-to-export-a-CObject-from-NumPy-to-my-own-module--tp30295576p30306050.html Sent from the Numpy-discussion mailing list archive at Nabble.com. From jens.marquard.ipsen at gmail.com Thu Nov 25 10:08:34 2010 From: jens.marquard.ipsen at gmail.com (NumPyStudent) Date: Thu, 25 Nov 2010 07:08:34 -0800 (PST) Subject: [Numpy-discussion] How to export a CObject from NumPy to my own module? In-Reply-To: <4CED5296.10004@noaa.gov> References: <30295576.post@talk.nabble.com> <30298423.post@talk.nabble.com> <4CED5296.10004@noaa.gov> Message-ID: <30306056.post@talk.nabble.com> Thanks for your answers! -- View this message in context: http://old.nabble.com/How-to-export-a-CObject-from-NumPy-to-my-own-module--tp30295576p30306056.html Sent from the Numpy-discussion mailing list archive at Nabble.com. From cournape at gmail.com Thu Nov 25 17:31:13 2010 From: cournape at gmail.com (David Cournapeau) Date: Fri, 26 Nov 2010 07:31:13 +0900 Subject: [Numpy-discussion] numpy speed question In-Reply-To: <4CEE409C.1010306@free.fr> References: <4CEE36DD.8000105@free.fr> <20101125105107.GA8701@doriath.local> <4CEE409C.1010306@free.fr> Message-ID: On Thu, Nov 25, 2010 at 7:55 PM, Jean-Luc Menut wrote: > Yes I know but IDL share this characteristics with numpy, and sometimes > you cannot avoid loop. Anyway it was just a test to compare the speed of > the cosine function in IDL and numpy. No, you compared IDL looping and python looping. You did not even use numpy. Loops are slow in python, and will remain so in the near future. OTOH, there are many ways to deal with this issue in python compared to IDL (cython being a fairly popular one). David From gokhansever at gmail.com Thu Nov 25 16:34:24 2010 From: gokhansever at gmail.com (=?UTF-8?Q?G=C3=B6khan_Sever?=) Date: Thu, 25 Nov 2010 15:34:24 -0600 Subject: [Numpy-discussion] numpy speed question In-Reply-To: <4CEE36DD.8000105@free.fr> References: <4CEE36DD.8000105@free.fr> Message-ID: On Thu, Nov 25, 2010 at 4:13 AM, Jean-Luc Menut wrote: > Hello all, > > I have a little question about the speed of numpy vs IDL 7.0. I did a > very simple little check by computing just a cosine in a loop. I was > quite surprised to see an order of magnitude of difference between numpy > and IDL, I would have thought that for such a basic function, the speed > would be approximatively the same. > > I suppose that some of the difference may come from ?the default data > type of 64bits in numpy and 32 bits in IDL. Is there a way to change the > numpy default data type (without recompiling) ? > > And I'm not an expert at all, maybe there is a better explanation, like > a better use of the several CPU core by IDL ? > > I'm working with windows 7 64 bits on a core i7. > > any hint is welcome. > Thanks. > > Here the IDL code : > Julian1 = SYSTIME( /JULIAN , /UTC ) > for j=0,9999 do begin > ? for i=0,999 do begin > ? ? a=cos(2*!pi*i/100.) > ? endfor > endfor > Julian2 = SYSTIME( /JULIAN , /UTC ) > print, (Julian2-Julian1)*86400.0 > print,cpt > end > > result: > % Compiled module: $MAIN$. > ? ? ? ?2.9999837 > > > The python code: > from numpy import * > from time import time > time1 = time() > for j in range(10000): > ? ? for i in range(1000): > ? ? ? ? a=cos(2*pi*i/100.) > time2 = time() > print time2-time1 > > result: > In [2]: run python_test_speed.py > 24.1809999943 > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > Vectorised numpy version already blow away the results. Here is what I get using the IDL version (with IDL v7.1): IDL> .r test_idl % Compiled module: $MAIN$. 4.0000185 I[10]: time run test_python 43.305727005 and using a Cythonized version: from math import pi cdef extern from "math.h": float cos(float) cpdef float myloop(int n1, int n2, float n3): cdef float a cdef int i, j for j in range(n1): for i in range(n2): a=cos(2*pi*i/n3) compiling the setup.py file python setup.py build_ext --inplace and importing the function into IPython from mycython import myloop I[6]: timeit myloop(10000, 1000, 100.0) 1 loops, best of 3: 2.91 s per loop -- G?khan From basherwo at ncsu.edu Fri Nov 26 11:48:39 2010 From: basherwo at ncsu.edu (Bruce Sherwood) Date: Fri, 26 Nov 2010 09:48:39 -0700 Subject: [Numpy-discussion] numpy speed question In-Reply-To: References: <4CEE36DD.8000105@free.fr> Message-ID: Although this was mentioned earlier, it's worth emphasizing that if you need to use functions such as cosine with scalar arguments, you should use math.cos(), not numpy.cos(). The numpy versions of these functions are optimized for handling array arguments and are much slower than the math versions for scalar arguments. Bruce Sherwood On Thu, Nov 25, 2010 at 2:34 PM, G?khan Sever wrote: > On Thu, Nov 25, 2010 at 4:13 AM, Jean-Luc Menut wrote: >> Hello all, >> >> I have a little question about the speed of numpy vs IDL 7.0. I did a >> very simple little check by computing just a cosine in a loop. I was >> quite surprised to see an order of magnitude of difference between numpy >> and IDL, I would have thought that for such a basic function, the speed >> would be approximatively the same. >> >> I suppose that some of the difference may come from ?the default data >> type of 64bits in numpy and 32 bits in IDL. Is there a way to change the >> numpy default data type (without recompiling) ? >> >> And I'm not an expert at all, maybe there is a better explanation, like >> a better use of the several CPU core by IDL ? >> >> I'm working with windows 7 64 bits on a core i7. >> >> any hint is welcome. >> Thanks. >> >> Here the IDL code : >> Julian1 = SYSTIME( /JULIAN , /UTC ) >> for j=0,9999 do begin >> ? for i=0,999 do begin >> ? ? a=cos(2*!pi*i/100.) >> ? endfor >> endfor >> Julian2 = SYSTIME( /JULIAN , /UTC ) >> print, (Julian2-Julian1)*86400.0 >> print,cpt >> end >> >> result: >> % Compiled module: $MAIN$. >> ? ? ? ?2.9999837 >> >> >> The python code: >> from numpy import * >> from time import time >> time1 = time() >> for j in range(10000): >> ? ? for i in range(1000): >> ? ? ? ? a=cos(2*pi*i/100.) >> time2 = time() >> print time2-time1 >> >> result: >> In [2]: run python_test_speed.py >> 24.1809999943 >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> > > Vectorised numpy version already blow away the results. > > Here is what I get using the IDL version (with IDL v7.1): > > IDL> .r test_idl > % Compiled module: $MAIN$. > ? ? ? 4.0000185 > > I[10]: time run test_python > 43.305727005 > > and using a Cythonized version: > > from math import pi > > cdef extern from "math.h": > ? ?float cos(float) > > cpdef float myloop(int n1, int n2, float n3): > ? ?cdef float a > ? ?cdef int i, j > ? ?for j in range(n1): > ? ? ? ?for i in range(n2): > ? ? ? ? ? ?a=cos(2*pi*i/n3) > > compiling the setup.py file python setup.py build_ext --inplace > and importing the function into IPython > > from mycython import myloop > > I[6]: timeit myloop(10000, 1000, 100.0) > 1 loops, best of 3: 2.91 s per loop > > > -- > G?khan > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From david.trem at gmail.com Fri Nov 26 12:11:44 2010 From: david.trem at gmail.com (=?ISO-8859-1?Q?David_Tr=E9mouilles?=) Date: Fri, 26 Nov 2010 18:11:44 +0100 Subject: [Numpy-discussion] Weibull analysis ? Message-ID: <4CEFEA50.4060704@gmail.com> Hello, After careful Google searches, I was not successful in finding any project dealing with Weibull analysis with neither python nor numpy or scipy. So before reinventing the wheel, I ask here whether any of you have already started such a project and is eager to share. Thanks, David From faltet at pytables.org Fri Nov 26 13:03:03 2010 From: faltet at pytables.org (Francesc Alted) Date: Fri, 26 Nov 2010 19:03:03 +0100 Subject: [Numpy-discussion] numpy speed question In-Reply-To: <4CEE36DD.8000105@free.fr> References: <4CEE36DD.8000105@free.fr> Message-ID: <201011261903.03072.faltet@pytables.org> A Thursday 25 November 2010 11:13:49 Jean-Luc Menut escrigu?: > Hello all, > > I have a little question about the speed of numpy vs IDL 7.0. I did a > very simple little check by computing just a cosine in a loop. I was > quite surprised to see an order of magnitude of difference between > numpy and IDL, I would have thought that for such a basic function, > the speed would be approximatively the same. > > I suppose that some of the difference may come from the default data > type of 64bits in numpy and 32 bits in IDL. Is there a way to change > the numpy default data type (without recompiling) ? > > And I'm not an expert at all, maybe there is a better explanation, > like a better use of the several CPU core by IDL ? As others have already point out, you should make sure that you use numpy.cos with arrays in order to get good performance. I don't know whether IDL is using multi-cores or not, but if you are looking for ultimate performance, you can always use Numexpr that makes use of multicores. For example, using a machine with 8 cores (w/ hyperthreading), we have: >>> from math import pi >>> import numpy as np >>> import numexpr as ne >>> i = np.arange(1e6) >>> %timeit np.cos(2*pi*i/100.) 10 loops, best of 3: 85.2 ms per loop >>> %timeit ne.evaluate("cos(2*pi*i/100.)") 100 loops, best of 3: 8.28 ms per loop If you don't have a machine with a lot of cores, but still want to get good performance, you can still link Numexpr against Intel's VML (Vector Math Library). For example, using Numexpr+VML with only one core (in another machine): >>> %timeit np.cos(2*pi*i/100.) 10 loops, best of 3: 66.7 ms per loop >>> ne.set_vml_num_threads(1) >>> %timeit ne.evaluate("cos(2*pi*i/100.)") 100 loops, best of 3: 9.1 ms per loop which also gives a pretty good speedup. Curiously, Numexpr+VML is not that good at using multicores in this case: >>> ne.set_vml_num_threads(2) >>> %timeit ne.evaluate("cos(2*pi*i/100.)") 10 loops, best of 3: 14.7 ms per loop I don't really know why Numexpr+VML is taking more time using 2 threads than only one, but it is probably due to Numexpr requiring better fine- tuning in combination with VML :-/ -- Francesc Alted From gerrit.holl at gmail.com Fri Nov 26 14:16:56 2010 From: gerrit.holl at gmail.com (Gerrit Holl) Date: Fri, 26 Nov 2010 20:16:56 +0100 Subject: [Numpy-discussion] merge_arrays is very slow; alternatives? In-Reply-To: References: Message-ID: Hi, upon profiling my code, I found that numpy.lib.recfunctions.merge_arrays is extremely slow; it does some 7000 rows/second. This is not acceptable for me. I have two large record arrays, or arrays with a complicated dtype. All I want to do is to merge them into one. I don't think that should have to be a very slow operation, I don't need to copy anything, I just want to view the two record arrays as one. How can I do this in a faster way? In [45]: cProfile.runctx("numpy.lib.recfunctions.merge_arrays([metarows, targetrows2], flatten=True)", globals(), locals()) ? ? ? ? 225381902 function calls (150254635 primitive calls) in 166.620 CPU seconds ? Ordered by: standard name ? ncalls ?tottime ?percall ?cumtime ?percall filename:lineno(function) ? ? ? ?1 ? ?0.031 ? ?0.031 ?166.620 ?166.620 :1() ? ? 68/1 ? ?0.000 ? ?0.000 ? ?0.000 ? ?0.000 _internal.py:82(_array_descr) ? ? ? ?2 ? ?0.000 ? ?0.000 ? ?0.000 ? ?0.000 numeric.py:286(asanyarray) ? ? ? ?2 ? ?0.000 ? ?0.000 ? ?0.000 ? ?0.000 recfunctions.py:135(flatten_descr) ? ? ? ?1 ? ?0.000 ? ?0.000 ? ?0.001 ? ?0.001 recfunctions.py:161(zip_descr) 149165600/74038400 ?117.195 ? ?0.000 ?139.701 ? ?0.000 recfunctions.py:235(_izip_fields_flat) ?1088801 ? 12.146 ? ?0.000 ?151.847 ? ?0.000 recfunctions.py:263(izip_records) ? ? ? ?3 ? ?0.000 ? ?0.000 ? ?0.000 ? ?0.000 recfunctions.py:277(sentinel) ? ? ? ?1 ? ?4.599 ? ?4.599 ?166.589 ?166.589 recfunctions.py:328(merge_arrays) ? ? ? ?3 ? ?0.000 ? ?0.000 ? ?0.000 ? ?0.000 recfunctions.py:406() ?75127201 ? 22.506 ? ?0.000 ? 22.506 ? ?0.000 {isinstance} ? ? ? 69 ? ?0.000 ? ?0.000 ? ?0.000 ? ?0.000 {len} ? ? ? ?1 ? ?0.000 ? ?0.000 ? ?0.000 ? ?0.000 {map} ? ? ? ?1 ? ?0.000 ? ?0.000 ? ?0.000 ? ?0.000 {max} ? ? ? ?2 ? ?0.000 ? ?0.000 ? ?0.000 ? ?0.000 {method '__array__' of 'numpy.ndarray' objects} ? ? ?136 ? ?0.000 ? ?0.000 ? ?0.000 ? ?0.000 {method 'append' of 'list' objects} ? ? ? ?1 ? ?0.000 ? ?0.000 ? ?0.000 ? ?0.000 {method 'disable' of '_lsprof.Profiler' objects} ? ? ? ?2 ? ?0.000 ? ?0.000 ? ?0.000 ? ?0.000 {method 'extend' of 'list' objects} ? ? ? ?2 ? ?0.000 ? ?0.000 ? ?0.000 ? ?0.000 {method 'pop' of 'list' objects} ? ? ? ?2 ? ?0.000 ? ?0.000 ? ?0.000 ? ?0.000 {method 'ravel' of 'numpy.ndarray' objects} ? ? ? ?2 ? ?0.000 ? ?0.000 ? ?0.000 ? ?0.000 {numpy.core.multiarray.array} ? ? ? ?1 ? 10.142 ? 10.142 ? 10.142 ? 10.142 {numpy.core.multiarray.fromiter} Gerrit. -- Gerrit Holl PhD student at Department of Space Science, Lule? University of Technology, Kiruna, Sweden http://www.sat.ltu.se/members/gerrit/ From gerrit.holl at gmail.com Fri Nov 26 14:57:30 2010 From: gerrit.holl at gmail.com (Gerrit Holl) Date: Fri, 26 Nov 2010 20:57:30 +0100 Subject: [Numpy-discussion] merge_arrays is very slow; alternatives? In-Reply-To: References: Message-ID: On 26 November 2010 20:16, Gerrit Holl wrote: > Hi, > > upon profiling my code, I found that > numpy.lib.recfunctions.merge_arrays is extremely slow; it does some > 7000 rows/second. This is not acceptable for me. ... > How can I do this in a faster way? Replying to my own code here. Either I have written a much faster implementation of this, or I am missing something. I consider it unlikely that I write a much faster implementation of an established numpy function with little effort, so I suspect I am missing something here. I wrote this implementation of the flattened version of merge_arrays: def merge_arrays(arr1, arr2): t1 = arr1.dtype t2 = arr2.dtype newdtype = numpy.dtype(t1.descr + t2.descr) newarray = numpy.empty(shape=arr1.shape, dtype=newdtype) for field in t1.names: newarray[field] = arr1[field] for field in t2.names: newarray[field] = arr2[field] return newarray and benchmarks show it's almost 100 times faster for a medium-sized array: In [211]: %timeit merged1 = numpy.lib.recfunctions.merge_arrays([metarows[:10000], targetrows2[:10000]], flatten=True) 1 loops, best of 3: 1.01 s per loop In [212]: %timeit merged2 = pyatmlab.tools.merge_arrays(metarows[:10000], targetrows2[:10000]) 100 loops, best of 3: 10.8 ms per loop In [214]: (merged1.view(dtype=uint64).reshape(-1, 100) == merged2.view(dtype=uint64).reshape(-1, 100)).all() Out[214]: True # and still 4 times faster for a small array: In [215]: %timeit merged1 = numpy.lib.recfunctions.merge_arrays([metarows[:10], targetrows2[:10]], flatten=True) 1000 loops, best of 3: 1.31 ms per loop In [216]: %timeit merged2 = pyatmlab.tools.merge_arrays(metarows[:10], targetrows2[:10]) 1000 loops, best of 3: 344 us per loop # and 15 times faster for a large array (1.5 million elements): In [218]: %timeit merged1 = numpy.lib.recfunctions.merge_arrays([metarows, targetrows2], flatten=True) 1 loops, best of 3: 110 s per loop In [217]: %timeit merged2 = pyatmlab.tools.merge_arrays(metarows, targetrows2) 1 loops, best of 3: 7.26 s per loop I wonder, am I missing something or have I really written a significant improvement in less than 10 LOC? Should I file a patch for this? Gerrit. -- Exploring space at http://gerrit-explores.blogspot.com/ Personal homepage at http://www.topjaklont.org/ Asperger Syndroom: http://www.topjaklont.org/nl/asperger.html From pav at iki.fi Fri Nov 26 15:27:08 2010 From: pav at iki.fi (Pauli Virtanen) Date: Fri, 26 Nov 2010 20:27:08 +0000 (UTC) Subject: [Numpy-discussion] merge_arrays is very slow; alternatives? References: Message-ID: On Fri, 26 Nov 2010 20:57:30 +0100, Gerrit Holl wrote: [clip] > I wonder, am I missing something or have I really written a significant > improvement in less than 10 LOC? Should I file a patch for this? The implementation of merge_arrays doesn't look optimal -- it seems to actually iterate over the data, which should not be needed. So yes, rewriting the function would be useful. The main difficulty in the rewrite seems to be appropriate mask handling, but slicing is a faster way to do that. -- Pauli Virtanen From paul.anton.letnes at gmail.com Sun Nov 28 03:46:16 2010 From: paul.anton.letnes at gmail.com (Paul Anton Letnes) Date: Sun, 28 Nov 2010 09:46:16 +0100 Subject: [Numpy-discussion] N dimensional dichotomy optimization In-Reply-To: References: <1290456190.19533.2.camel@Nokia-N900-51-1> <20101122215735.GE27341@phare.normalesup.org> <20101122223601.GH27341@phare.normalesup.org> <20101123085226.GA11522@phare.normalesup.org> <20101123092737.GB11522@phare.normalesup.org> <20101123101706.GE7942@phare.normalesup.org> Message-ID: <6F7EB26C-86BE-4F91-B766-B0499F0455E0@gmail.com> > > > On Tue, Nov 23, 2010 at 3:37 AM, Sebastian Walter wrote: > On Tue, Nov 23, 2010 at 11:17 AM, Gael Varoquaux > wrote: > > On Tue, Nov 23, 2010 at 11:13:23AM +0100, Sebastian Walter wrote: > >> I'm not familiar with dichotomy optimization. > >> Several techniques have been proposed to solve the problem: genetic > >> algorithms, simulated annealing, Nelder-Mead and Powell. > >> To be honest, I find it quite confusing that these algorithms are > >> named in the same breath. > > > > I am confused too. But that stems from my lack of knowledge in > > optimization. > > > >> Do you have a continuous or a discrete problem? > > > > Both. I would like to advertise a bit for genetic algorithms. In my experience, they seem to be the most powerful of the optimization techniques mentioned here. In particular, they are good at getting out of local minima, and don't really care if you are talking about integer or continuous problems. As long as you can think of a good representation and good genetic operators, you should be good! I have just a little experience with pyevolve myself, but it is really easy to test GAs with pyevolve, as you just have to make a few settings to get going. One word of warning: GA performance is very sensitive to the actual parameters you choose! Especially, you should think about mutation rates, crossover rates, selection protocols, and number of crossover points. (This list came off the top of my head...) If you have any GA questions, ask, and perhaps I can come up with an answer. Paul. From basherwo at ncsu.edu Sun Nov 28 11:52:05 2010 From: basherwo at ncsu.edu (Bruce Sherwood) Date: Sun, 28 Nov 2010 09:52:05 -0700 Subject: [Numpy-discussion] Type of init_numpy()? Message-ID: For Python 2.x, init_numpy() was void on all platforms. For Python 3.1, I find experimentally that init_numpy() is int on Windows (but still void on Mac, and I think also void on Ubuntu Linux). Is this a bug? Bruce Sherwood From pav at iki.fi Sun Nov 28 12:07:19 2010 From: pav at iki.fi (Pauli Virtanen) Date: Sun, 28 Nov 2010 17:07:19 +0000 (UTC) Subject: [Numpy-discussion] Type of init_numpy()? References: Message-ID: On Sun, 28 Nov 2010 09:52:05 -0700, Bruce Sherwood wrote: > For Python 2.x, init_numpy() was void on all platforms. > > For Python 3.1, I find experimentally that init_numpy() is int on > Windows (but still void on Mac, and I think also void on Ubuntu Linux). > > Is this a bug? There is no symbol or function called "init_numpy" in Numpy. What symbol did you actually mean? If you meant "import_array()", it is a macro, which on Python 3 contains a "return NULL;" statement and Python 2 a "return;". -- Pauli Virtanen From basherwo at ncsu.edu Sun Nov 28 13:12:45 2010 From: basherwo at ncsu.edu (Bruce Sherwood) Date: Sun, 28 Nov 2010 11:12:45 -0700 Subject: [Numpy-discussion] Type of init_numpy()? In-Reply-To: References: Message-ID: Sorry for the confusion. I misspoke. I guess the issue is with import_array. I'll look more closely at what I'm seeing. Thanks. Bruce Sherwood On Sun, Nov 28, 2010 at 10:07 AM, Pauli Virtanen wrote: > On Sun, 28 Nov 2010 09:52:05 -0700, Bruce Sherwood wrote: >> For Python 2.x, init_numpy() was void on all platforms. >> >> For Python 3.1, I find experimentally that init_numpy() is int on >> Windows (but still void on Mac, and I think also void on Ubuntu Linux). >> >> Is this a bug? > > There is no symbol or function called "init_numpy" in Numpy. What symbol > did you actually mean? > > If you meant "import_array()", it is a macro, which on Python 3 contains > a "return NULL;" statement and Python 2 a "return;". > > -- > Pauli Virtanen > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From brodbd at uw.edu Mon Nov 29 13:32:04 2010 From: brodbd at uw.edu (David Brodbeck) Date: Mon, 29 Nov 2010 10:32:04 -0800 Subject: [Numpy-discussion] NumPy 1.5.1 on RedHat 5.5 Message-ID: I'm trying to install NumPy 1.5.1 on RedHat 5.5 and I'm having trouble getting it to find the ATLAS libraries. I did a lot of Googling but didn't find anything that helped...also looked through the install instructions, but they focus mainly on Ubuntu. The problem I'm having is it's looking for the libraries in the right location, but not finding them. e.g.: atlas_blas_info: libraries f77blas,cblas,atlas not found in /opt/python-2.5/lib libraries f77blas,cblas,atlas not found in /usr/local/lib64 libraries f77blas,cblas,atlas not found in /usr/local/lib libraries f77blas,cblas,atlas not found in /usr/lib64/atlas libraries f77blas,cblas,atlas not found in /usr/lib64/sse2 libraries f77blas,cblas,atlas not found in /usr/lib64 libraries f77blas,cblas,atlas not found in /usr/lib/sse2 libraries f77blas,cblas,atlas not found in /usr/lib NOT AVAILABLE ...and yet... brodbd at patas:~/numpy-1.5.1$ locate libf77blas /usr/lib64/atlas/libf77blas.so.3 /usr/lib64/atlas/libf77blas.so.3.0 brodbd at patas:~/numpy-1.5.1$ locate libcblas /usr/lib64/atlas/libcblas.so.3 /usr/lib64/atlas/libcblas.so.3.0 brodbd at patas:~/numpy-1.5.1$ locate libatlas /usr/lib64/atlas/libatlas.so.3 /usr/lib64/atlas/libatlas.so.3.0 So the libraries are there, and they're where NumPy is looking for them, but it's still not finding them? Clearly I'm missing something, but I'm not sure what. -- David Brodbeck System Administrator,?Linguistics University of Washington From david at silveregg.co.jp Tue Nov 30 00:08:27 2010 From: david at silveregg.co.jp (David) Date: Tue, 30 Nov 2010 14:08:27 +0900 Subject: [Numpy-discussion] NumPy 1.5.1 on RedHat 5.5 In-Reply-To: References: Message-ID: <4CF486CB.7010308@silveregg.co.jp> On 11/30/2010 03:32 AM, David Brodbeck wrote: > I'm trying to install NumPy 1.5.1 on RedHat 5.5 and I'm having trouble > getting it to find the ATLAS libraries. I did a lot of Googling but > didn't find anything that helped...also looked through the install > instructions, but they focus mainly on Ubuntu. > > The problem I'm having is it's looking for the libraries in the right > location, but not finding them. e.g.: > > atlas_blas_info: > libraries f77blas,cblas,atlas not found in /opt/python-2.5/lib > libraries f77blas,cblas,atlas not found in /usr/local/lib64 > libraries f77blas,cblas,atlas not found in /usr/local/lib > libraries f77blas,cblas,atlas not found in /usr/lib64/atlas > libraries f77blas,cblas,atlas not found in /usr/lib64/sse2 > libraries f77blas,cblas,atlas not found in /usr/lib64 > libraries f77blas,cblas,atlas not found in /usr/lib/sse2 > libraries f77blas,cblas,atlas not found in /usr/lib > NOT AVAILABLE > > ...and yet... > > brodbd at patas:~/numpy-1.5.1$ locate libf77blas > /usr/lib64/atlas/libf77blas.so.3 > /usr/lib64/atlas/libf77blas.so.3.0 > brodbd at patas:~/numpy-1.5.1$ locate libcblas > /usr/lib64/atlas/libcblas.so.3 > /usr/lib64/atlas/libcblas.so.3.0 > brodbd at patas:~/numpy-1.5.1$ locate libatlas > /usr/lib64/atlas/libatlas.so.3 > /usr/lib64/atlas/libatlas.so.3.0 the *.so.N.M are enough for binaries, but you need the *.so to link against a library. Those are generally provided in the -devel RPMS on RH distributions, cheers, David > > So the libraries are there, and they're where NumPy is looking for > them, but it's still not finding them? Clearly I'm missing something, > but I'm not sure what. > From washakie at gmail.com Tue Nov 30 11:40:56 2010 From: washakie at gmail.com (John) Date: Tue, 30 Nov 2010 17:40:56 +0100 Subject: [Numpy-discussion] subdivide array Message-ID: Hello, I have an array of data for a global grid at 1 degree resolution. It's filled with 1s and 0s, and it is just a land sea mask (not only, but as an example). I want to be able to regrid the data to higher or lower resolutions (i.e. 0.5 or 2 degrees). But if I try to use any standard interp functions, such as mpl_toolkits.basemap.interp it fails -- I assume due to the data being binary. I guess there may be a fairly easy routine to do this?? Does someone have an example? Thanks! From tim.whitcomb at nrlmry.navy.mil Tue Nov 30 11:57:07 2010 From: tim.whitcomb at nrlmry.navy.mil (Whitcomb, Mr. Tim) Date: Tue, 30 Nov 2010 08:57:07 -0800 Subject: [Numpy-discussion] subdivide array In-Reply-To: References: Message-ID: > I have an array of data for a global grid at 1 degree resolution. It's > filled with 1s and 0s, and it is just a land sea mask (not only, but > as an example). I want to be able to regrid the data to higher or > lower resolutions (i.e. 0.5 or 2 degrees). But if I try to use any > standard interp functions, such as mpl_toolkits.basemap.interp it > fails -- I assume due to the data being binary. > > I guess there may be a fairly easy routine to do this?? Does someone > have an example? > When I've had to do this, I typically set Basemap's interp to do nearest-neighbor interpolation (by setting order=0). It defaults to order=1, which is bilinear interpolation, which will destroy the binary nature of your data (as you perhaps noticed). Tim From pgmdevlist at gmail.com Tue Nov 30 11:58:07 2010 From: pgmdevlist at gmail.com (Pierre GM) Date: Tue, 30 Nov 2010 17:58:07 +0100 Subject: [Numpy-discussion] subdivide array In-Reply-To: References: Message-ID: <0C9DF07E-E015-4E1F-A048-83EFBA0F3C38@gmail.com> On Nov 30, 2010, at 5:40 PM, John wrote: > Hello, > > I have an array of data for a global grid at 1 degree resolution. It's > filled with 1s and 0s, and it is just a land sea mask (not only, but > as an example). I want to be able to regrid the data to higher or > lower resolutions (i.e. 0.5 or 2 degrees). But if I try to use any > standard interp functions, such as mpl_toolkits.basemap.interp it > fails -- I assume due to the data being binary. > > I guess there may be a fairly easy routine to do this?? Does someone > have an example? Just a random idea: have you tried to convert your input data to float? Hopefully you could get some values between 0 and 1 for your interpolated values, that you'll have to transform back to integers following a scheme of your choosing... From brodbd at uw.edu Tue Nov 30 12:38:23 2010 From: brodbd at uw.edu (David Brodbeck) Date: Tue, 30 Nov 2010 09:38:23 -0800 Subject: [Numpy-discussion] NumPy 1.5.1 on RedHat 5.5 In-Reply-To: <4CF486CB.7010308@silveregg.co.jp> References: <4CF486CB.7010308@silveregg.co.jp> Message-ID: On Mon, Nov 29, 2010 at 9:08 PM, David wrote: > the *.so.N.M are enough for binaries, but you need the *.so to link > against a library. Those are generally provided in the -devel RPMS on RH > distributions, Ah, right. Thank you for filling in that missing piece of information for me. I'll see if I can find development RPMs. I could have sworn I got this to build once before, too. I should have taken notes. -- David Brodbeck System Administrator,?Linguistics University of Washington From gerrit.holl at gmail.com Tue Nov 30 12:51:02 2010 From: gerrit.holl at gmail.com (Gerrit Holl) Date: Tue, 30 Nov 2010 18:51:02 +0100 Subject: [Numpy-discussion] subdivide array In-Reply-To: References: <0C9DF07E-E015-4E1F-A048-83EFBA0F3C38@gmail.com> Message-ID: On 30 November 2010 17:58, Pierre GM wrote: > On Nov 30, 2010, at 5:40 PM, John wrote: >> I have an array of data for a global grid at 1 degree resolution. It's >> filled with 1s and 0s, and it is just a land sea mask (not only, but >> as an example). I want to be able to regrid the data to higher or >> lower resolutions (i.e. 0.5 or 2 degrees). But if I try to use any >> standard interp functions, such as mpl_toolkits.basemap.interp it >> fails -- I assume due to the data being binary. >> >> I guess there may be a fairly easy routine to do this?? Does someone >> have an example? > > Just a random idea: have you tried to convert your input data to float? Hopefully you could get some values between 0 and 1 for your interpolated values, that you'll have to transform back to integers following a scheme of your choosing... I would argue that some float between 0 and 1 is an excellent representation when regridding a binary land-sea-mask onto a higher resolution. After all, this information is not there. Why should a land-sea mask be binary anyway? As if a grid-cell can only be fully ocean or fully land... BTW, I just realised Pythons convention that negative indices count from the end of the array is perfect when using a 180x360 land-sea-mask, as lon[-30] and lon[330] mean and should mean the same :) Gerrit. -- Gerrit Holl PhD student at Department of Space Science, Lule? University of Technology, Kiruna, Sweden http://www.sat.ltu.se/members/gerrit/ From kwgoodman at gmail.com Tue Nov 30 13:34:24 2010 From: kwgoodman at gmail.com (Keith Goodman) Date: Tue, 30 Nov 2010 10:34:24 -0800 Subject: [Numpy-discussion] Warning: invalid value encountered in subtract Message-ID: After upgrading from numpy 1.4.1 to 1.5.1 I get warnings like "Warning: invalid value encountered in subtract" when I run unit tests (or timeit) using "python -c 'blah'" but not from an interactive session. How can I tell the warnings to go away? From kwgoodman at gmail.com Tue Nov 30 14:21:24 2010 From: kwgoodman at gmail.com (Keith Goodman) Date: Tue, 30 Nov 2010 11:21:24 -0800 Subject: [Numpy-discussion] A faster median (Wirth's method) In-Reply-To: <4A9D9432.20907@molden.no> References: <4A9C9DDA.9060503@molden.no> <4A9D7B5E.6040009@student.matnat.uio.no> <4A9D9432.20907@molden.no> Message-ID: On Tue, Sep 1, 2009 at 2:37 PM, Sturla Molden wrote: > Dag Sverre Seljebotn skrev: >> >> Nitpick: This will fail on large arrays. I guess numpy.npy_intp is the >> right type to use in this case? >> > By the way, here is a more polished version, does it look ok? > > http://projects.scipy.org/numpy/attachment/ticket/1213/generate_qselect.py > http://projects.scipy.org/numpy/attachment/ticket/1213/quickselect.pyx This is my favorite numpy/scipy ticket. So I am happy that I can contribute in a small way by pointing out a bug. The search for the k-th smallest element is only done over the first k elements (that's the bug) instead of over the entire array. Specifically "while l < k" should be "while l < r". I added a median function to the Bottleneck package: https://github.com/kwgoodman/bottleneck Timings: >> import bottleneck as bn >> arr = np.random.rand(100, 100) >> timeit np.median(arr) 1000 loops, best of 3: 762 us per loop >> timeit bn.median(arr) 10000 loops, best of 3: 198 us per loop What other functions could be built from a selection algorithm? nanmedian scoreatpercentile quantile knn select others? But before I add more functions to the package I need to figure out how to make a cython apply_along_axis function. For the first release I am hand coding the 1d, 2d, and 3d cases. Boring to write, hard to maintain, and doesn't solve the nd case. Does anyone have a cython apply_along_axis that takes a cython reducing function as input? The ticket has an example but I couldn't get it to run. If no one has one (the horror!) I'll begin to work on one sometime after the first release. From jsalvati at u.washington.edu Tue Nov 30 14:25:37 2010 From: jsalvati at u.washington.edu (John Salvatier) Date: Tue, 30 Nov 2010 11:25:37 -0800 Subject: [Numpy-discussion] A faster median (Wirth's method) In-Reply-To: References: <4A9C9DDA.9060503@molden.no> <4A9D7B5E.6040009@student.matnat.uio.no> <4A9D9432.20907@molden.no> Message-ID: I am very interested in this result. I have wanted to know how to do an apply_along_axis function for a while now. On Tue, Nov 30, 2010 at 11:21 AM, Keith Goodman wrote: > On Tue, Sep 1, 2009 at 2:37 PM, Sturla Molden wrote: > > Dag Sverre Seljebotn skrev: > >> > >> Nitpick: This will fail on large arrays. I guess numpy.npy_intp is the > >> right type to use in this case? > >> > > By the way, here is a more polished version, does it look ok? > > > > > http://projects.scipy.org/numpy/attachment/ticket/1213/generate_qselect.py > > http://projects.scipy.org/numpy/attachment/ticket/1213/quickselect.pyx > > This is my favorite numpy/scipy ticket. So I am happy that I can > contribute in a small way by pointing out a bug. The search for the > k-th smallest element is only done over the first k elements (that's > the bug) instead of over the entire array. Specifically "while l < k" > should be "while l < r". > > I added a median function to the Bottleneck package: > https://github.com/kwgoodman/bottleneck > > Timings: > > >> import bottleneck as bn > >> arr = np.random.rand(100, 100) > >> timeit np.median(arr) > 1000 loops, best of 3: 762 us per loop > >> timeit bn.median(arr) > 10000 loops, best of 3: 198 us per loop > > What other functions could be built from a selection algorithm? > > nanmedian > scoreatpercentile > quantile > knn > select > others? > > But before I add more functions to the package I need to figure out > how to make a cython apply_along_axis function. For the first release > I am hand coding the 1d, 2d, and 3d cases. Boring to write, hard to > maintain, and doesn't solve the nd case. > > Does anyone have a cython apply_along_axis that takes a cython > reducing function as input? The ticket has an example but I couldn't > get it to run. If no one has one (the horror!) I'll begin to work on > one sometime after the first release. > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From kwgoodman at gmail.com Tue Nov 30 14:35:01 2010 From: kwgoodman at gmail.com (Keith Goodman) Date: Tue, 30 Nov 2010 11:35:01 -0800 Subject: [Numpy-discussion] A faster median (Wirth's method) In-Reply-To: References: <4A9C9DDA.9060503@molden.no> <4A9D7B5E.6040009@student.matnat.uio.no> <4A9D9432.20907@molden.no> Message-ID: On Tue, Nov 30, 2010 at 11:25 AM, John Salvatier wrote: > I am very interested in this result. I have wanted to know how to do an My first thought was to write the reducing function like this cdef np.float64_t namean(np.ndarray[np.float64_t, ndim=1] a): but cython doesn't allow np.ndarray in a cdef. That's why the ticket (URL earlier in the thread) uses a (the?) buffer interface to the array. The particular example in the ticket is not a reducing function, it works on the data in place. We can change that. I can set up a sandbox in the Bottleneck project if anyone (John!) is interested in working on this. I plan to get a first (preview) release out soon and then take a break. The first project for the second release is this; second project is templating. Then the third release is just turning the crank and adding more functions. From brodbd at uw.edu Tue Nov 30 14:40:08 2010 From: brodbd at uw.edu (David Brodbeck) Date: Tue, 30 Nov 2010 11:40:08 -0800 Subject: [Numpy-discussion] NumPy 1.5.1 on RedHat 5.5 In-Reply-To: References: <4CF486CB.7010308@silveregg.co.jp> Message-ID: On Tue, Nov 30, 2010 at 9:38 AM, David Brodbeck wrote: > On Mon, Nov 29, 2010 at 9:08 PM, David wrote: >> the *.so.N.M are enough for binaries, but you need the *.so to link >> against a library. Those are generally provided in the -devel RPMS on RH >> distributions, > > Ah, right. Thank you for filling in that missing piece of information > for me. ?I'll see if I can find development RPMs. > > I could have sworn I got this to build once before, too. ?I should > have taken notes. Turns out there is no atlas-devel package, so I changed tactics and installed blas, blas-devel, lapack, and lapack-devel, instead. This was enough to get both NumPy and SciPy built. However, now SciPy is segfaulting when I try to run the test suite: brodbd at patas:~$ python2.5 Python 2.5.5 (r255:77872, May 17 2010, 14:07:05) [GCC 4.1.2 20080704 (Red Hat 4.1.2-46)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import scipy; >>> scipy.test(); Running unit tests for scipy NumPy version 1.5.1 NumPy is installed in /opt/python-2.5/lib/python2.5/site-packages/numpy SciPy version 0.8.0 SciPy is installed in /opt/python-2.5/lib/python2.5/site-packages/scipy Python version 2.5.5 (r255:77872, May 17 2010, 14:07:05) [GCC 4.1.2 20080704 (Red Hat 4.1.2-46)] nose version 0.11.2 ................................................................................................................................................................................/opt/python-2.5/lib/python2.5/site-packages/scipy/fftpack/tests/test_basic.py:404: ComplexWarning: Casting complex values to real discards the imaginary part y1 = fftn(x.astype(np.float32)) /opt/python-2.5/lib/python2.5/site-packages/scipy/fftpack/tests/test_basic.py:405: ComplexWarning: Casting complex values to real discards the imaginary part y2 = fftn(x.astype(np.float64)).astype(np.complex64) /opt/python-2.5/lib/python2.5/site-packages/scipy/fftpack/tests/test_basic.py:413: ComplexWarning: Casting complex values to real discards the imaginary part y1 = fftn(x.astype(np.float32)) /opt/python-2.5/lib/python2.5/site-packages/scipy/fftpack/tests/test_basic.py:414: ComplexWarning: Casting complex values to real discards the imaginary part y2 = fftn(x.astype(np.float64)).astype(np.complex64) ......................K.............................................................................................................................K..K............................................................Warning: divide by zero encountered in log Warning: invalid value encountered in multiply Warning: divide by zero encountered in log Warning: invalid value encountered in multiply Warning: divide by zero encountered in log Warning: invalid value encountered in multiply .Warning: divide by zero encountered in log Warning: invalid value encountered in multiply Warning: divide by zero encountered in log Warning: invalid value encountered in multiply .Warning: divide by zero encountered in log Warning: invalid value encountered in multiply Warning: divide by zero encountered in log Warning: invalid value encountered in multiply .........Warning: divide by zero encountered in log Warning: invalid value encountered in multiply Warning: divide by zero encountered in log Warning: invalid value encountered in multiply ......................................................................................................................................................................................................................................................................................................./opt/python-2.5/lib/python2.5/site-packages/scipy/io/recaster.py:328: ComplexWarning: Casting complex values to real discards the imaginary part test_arr = arr.astype(T) ../opt/python-2.5/lib/python2.5/site-packages/scipy/io/recaster.py:375: ComplexWarning: Casting complex values to real discards the imaginary part return arr.astype(idt) ......................................................................................F..FF.........................................../opt/python-2.5/lib/python2.5/site-packages/scipy/lib/blas/tests/test_fblas.py:86: ComplexWarning: Casting complex values to real discards the imaginary part self.blas_func(x,y,n=3,incy=5) ....../opt/python-2.5/lib/python2.5/site-packages/scipy/lib/blas/tests/test_fblas.py:196: ComplexWarning: Casting complex values to real discards the imaginary part self.blas_func(x,y,n=3,incy=5) .................../opt/python-2.5/lib/python2.5/site-packages/scipy/lib/blas/tests/test_fblas.py:279: ComplexWarning: Casting complex values to real discards the imaginary part self.blas_func(x,y,n=3,incy=5) ..................................................................SSSSSS......SSSSSS......SSSS...................................................F.Segmentation fault -- David Brodbeck System Administrator,?Linguistics University of Washington From matthew.brett at gmail.com Tue Nov 30 14:58:55 2010 From: matthew.brett at gmail.com (Matthew Brett) Date: Tue, 30 Nov 2010 11:58:55 -0800 Subject: [Numpy-discussion] A faster median (Wirth's method) In-Reply-To: References: <4A9C9DDA.9060503@molden.no> <4A9D7B5E.6040009@student.matnat.uio.no> <4A9D9432.20907@molden.no> Message-ID: Hi, On Tue, Nov 30, 2010 at 11:35 AM, Keith Goodman wrote: > On Tue, Nov 30, 2010 at 11:25 AM, John Salvatier > wrote: >> I am very interested in this result. I have wanted to know how to do an > > My first thought was to write the reducing function like this > > cdef np.float64_t namean(np.ndarray[np.float64_t, ndim=1] a): > > but cython doesn't allow np.ndarray in a cdef. Sorry for the ill-considered hasty reply, but do you mean that this: import numpy as np cimport numpy as cnp cdef cnp.float64_t namean(cnp.ndarray[cnp.float64_t, ndim=1] a): return np.nanmean(a) # just a placeholder is not allowed? It works for me. Is it a cython version thing? (I've got 0.13), See you, Matthew From kwgoodman at gmail.com Tue Nov 30 15:06:42 2010 From: kwgoodman at gmail.com (Keith Goodman) Date: Tue, 30 Nov 2010 12:06:42 -0800 Subject: [Numpy-discussion] A faster median (Wirth's method) In-Reply-To: References: <4A9C9DDA.9060503@molden.no> <4A9D7B5E.6040009@student.matnat.uio.no> <4A9D9432.20907@molden.no> Message-ID: On Tue, Nov 30, 2010 at 11:58 AM, Matthew Brett wrote: > Hi, > > On Tue, Nov 30, 2010 at 11:35 AM, Keith Goodman wrote: >> On Tue, Nov 30, 2010 at 11:25 AM, John Salvatier >> wrote: >>> I am very interested in this result. I have wanted to know how to do an >> >> My first thought was to write the reducing function like this >> >> cdef np.float64_t namean(np.ndarray[np.float64_t, ndim=1] a): >> >> but cython doesn't allow np.ndarray in a cdef. > > Sorry for the ill-considered hasty reply, but do you mean that this: > > import numpy as np > cimport numpy as cnp > > cdef cnp.float64_t namean(cnp.ndarray[cnp.float64_t, ndim=1] a): > ? ?return np.nanmean(a) ?# just a placeholder > > is not allowed? ?It works for me. ?Is it a cython version thing? > (I've got 0.13), Oh, that's nice! I'm using 0.11.2. OK, time to upgrade. From brodbd at uw.edu Tue Nov 30 16:01:40 2010 From: brodbd at uw.edu (David Brodbeck) Date: Tue, 30 Nov 2010 13:01:40 -0800 Subject: [Numpy-discussion] NumPy 1.5.1 on RedHat 5.5 In-Reply-To: References: <4CF486CB.7010308@silveregg.co.jp> Message-ID: On Tue, Nov 30, 2010 at 11:40 AM, David Brodbeck wrote: > On Tue, Nov 30, 2010 at 9:38 AM, David Brodbeck wrote: > Turns out there is no atlas-devel package, so I changed tactics and > installed blas, blas-devel, lapack, and lapack-devel, instead. ?This > was enough to get both NumPy and SciPy built. However, now SciPy is > segfaulting when I try to run the test suite: Never mind, I got it. It appears to have been an ABI mismatch. Building with --fcompiler=gnu95 fixed it. -- David Brodbeck System Administrator,?Linguistics University of Washington From jsseabold at gmail.com Tue Nov 30 16:41:20 2010 From: jsseabold at gmail.com (Skipper Seabold) Date: Tue, 30 Nov 2010 16:41:20 -0500 Subject: [Numpy-discussion] Warning: invalid value encountered in subtract In-Reply-To: References: Message-ID: On Tue, Nov 30, 2010 at 1:34 PM, Keith Goodman wrote: > After upgrading from numpy 1.4.1 to 1.5.1 I get warnings like > "Warning: invalid value encountered in subtract" when I run unit tests > (or timeit) using "python -c 'blah'" but not from an interactive > session. How can I tell the warnings to go away? If it's this type of floating point related stuff, you can use np.seterr In [1]: import numpy as np In [2]: np.log(1./np.array(0)) Warning: divide by zero encountered in divide Out[2]: inf In [3]: orig_settings = np.seterr() In [4]: np.seterr(all="ignore") Out[4]: {'divide': 'print', 'invalid': 'print', 'over': 'print', 'under': 'ignor e'} In [5]: np.log(1./np.array(0)) Out[5]: inf In [6]: np.seterr(**orig_settings) Out[6]: {'divide': 'ignore', 'invalid': 'ignore', 'over': 'ignore', 'under': 'ig nore'} In [7]: np.log(1./np.array(0)) Warning: divide by zero encountered in divide Out[7]: inf I have been using the orig_settings so that I can take over the control of this from the user and then set it back to how it was. Skipper From kwgoodman at gmail.com Tue Nov 30 17:22:44 2010 From: kwgoodman at gmail.com (Keith Goodman) Date: Tue, 30 Nov 2010 14:22:44 -0800 Subject: [Numpy-discussion] Warning: invalid value encountered in subtract In-Reply-To: References: Message-ID: On Tue, Nov 30, 2010 at 1:41 PM, Skipper Seabold wrote: > On Tue, Nov 30, 2010 at 1:34 PM, Keith Goodman wrote: >> After upgrading from numpy 1.4.1 to 1.5.1 I get warnings like >> "Warning: invalid value encountered in subtract" when I run unit tests >> (or timeit) using "python -c 'blah'" but not from an interactive >> session. How can I tell the warnings to go away? > > If it's this type of floating point related stuff, you can use np.seterr > > In [1]: import numpy as np > > In [2]: np.log(1./np.array(0)) > Warning: divide by zero encountered in divide > Out[2]: inf > > In [3]: orig_settings = np.seterr() > > In [4]: np.seterr(all="ignore") > Out[4]: {'divide': 'print', 'invalid': 'print', 'over': 'print', 'under': 'ignor > e'} > > In [5]: np.log(1./np.array(0)) > Out[5]: inf > > In [6]: np.seterr(**orig_settings) > Out[6]: {'divide': 'ignore', 'invalid': 'ignore', 'over': 'ignore', 'under': 'ig > nore'} > > In [7]: np.log(1./np.array(0)) > Warning: divide by zero encountered in divide > Out[7]: inf > > I have been using the orig_settings so that I can take over the > control of this from the user and then set it back to how it was. Thank, Skipper. That works. Do you wrap it in a try...except? And then raise whatever brought you to the exception? Sounds like a pain. Is it considered OK for a package to change the state of np.seterr if there is an error? Silly question. I'm just looking for an easy fix. From robert.kern at gmail.com Tue Nov 30 17:25:32 2010 From: robert.kern at gmail.com (Robert Kern) Date: Tue, 30 Nov 2010 16:25:32 -0600 Subject: [Numpy-discussion] Warning: invalid value encountered in subtract In-Reply-To: References: Message-ID: On Tue, Nov 30, 2010 at 16:22, Keith Goodman wrote: > On Tue, Nov 30, 2010 at 1:41 PM, Skipper Seabold wrote: >> On Tue, Nov 30, 2010 at 1:34 PM, Keith Goodman wrote: >>> After upgrading from numpy 1.4.1 to 1.5.1 I get warnings like >>> "Warning: invalid value encountered in subtract" when I run unit tests >>> (or timeit) using "python -c 'blah'" but not from an interactive >>> session. How can I tell the warnings to go away? >> >> If it's this type of floating point related stuff, you can use np.seterr >> >> In [1]: import numpy as np >> >> In [2]: np.log(1./np.array(0)) >> Warning: divide by zero encountered in divide >> Out[2]: inf >> >> In [3]: orig_settings = np.seterr() >> >> In [4]: np.seterr(all="ignore") >> Out[4]: {'divide': 'print', 'invalid': 'print', 'over': 'print', 'under': 'ignor >> e'} >> >> In [5]: np.log(1./np.array(0)) >> Out[5]: inf >> >> In [6]: np.seterr(**orig_settings) >> Out[6]: {'divide': 'ignore', 'invalid': 'ignore', 'over': 'ignore', 'under': 'ig >> nore'} >> >> In [7]: np.log(1./np.array(0)) >> Warning: divide by zero encountered in divide >> Out[7]: inf >> >> I have been using the orig_settings so that I can take over the >> control of this from the user and then set it back to how it was. > > Thank, Skipper. That works. Do you wrap it in a try...except? And then > raise whatever brought you to the exception? Sounds like a pain. > > Is it considered OK for a package to change the state of np.seterr if > there is an error? Silly question. I'm just looking for an easy fix. with np.errstate(invalid='ignore'): ... -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." ? -- Umberto Eco From kwgoodman at gmail.com Tue Nov 30 17:27:19 2010 From: kwgoodman at gmail.com (Keith Goodman) Date: Tue, 30 Nov 2010 14:27:19 -0800 Subject: [Numpy-discussion] Warning: invalid value encountered in subtract In-Reply-To: References: Message-ID: On Tue, Nov 30, 2010 at 2:25 PM, Robert Kern wrote: > On Tue, Nov 30, 2010 at 16:22, Keith Goodman wrote: >> On Tue, Nov 30, 2010 at 1:41 PM, Skipper Seabold wrote: >>> On Tue, Nov 30, 2010 at 1:34 PM, Keith Goodman wrote: >>>> After upgrading from numpy 1.4.1 to 1.5.1 I get warnings like >>>> "Warning: invalid value encountered in subtract" when I run unit tests >>>> (or timeit) using "python -c 'blah'" but not from an interactive >>>> session. How can I tell the warnings to go away? >>> >>> If it's this type of floating point related stuff, you can use np.seterr >>> >>> In [1]: import numpy as np >>> >>> In [2]: np.log(1./np.array(0)) >>> Warning: divide by zero encountered in divide >>> Out[2]: inf >>> >>> In [3]: orig_settings = np.seterr() >>> >>> In [4]: np.seterr(all="ignore") >>> Out[4]: {'divide': 'print', 'invalid': 'print', 'over': 'print', 'under': 'ignor >>> e'} >>> >>> In [5]: np.log(1./np.array(0)) >>> Out[5]: inf >>> >>> In [6]: np.seterr(**orig_settings) >>> Out[6]: {'divide': 'ignore', 'invalid': 'ignore', 'over': 'ignore', 'under': 'ig >>> nore'} >>> >>> In [7]: np.log(1./np.array(0)) >>> Warning: divide by zero encountered in divide >>> Out[7]: inf >>> >>> I have been using the orig_settings so that I can take over the >>> control of this from the user and then set it back to how it was. >> >> Thank, Skipper. That works. Do you wrap it in a try...except? And then >> raise whatever brought you to the exception? Sounds like a pain. >> >> Is it considered OK for a package to change the state of np.seterr if >> there is an error? Silly question. I'm just looking for an easy fix. > > with np.errstate(invalid='ignore'): Ah! Thank you, Robert! From pgmdevlist at gmail.com Tue Nov 30 17:30:36 2010 From: pgmdevlist at gmail.com (Pierre GM) Date: Tue, 30 Nov 2010 23:30:36 +0100 Subject: [Numpy-discussion] Warning: invalid value encountered in subtract In-Reply-To: References: Message-ID: <4F3561BD-22ED-462D-AF6F-78B56966CB7A@gmail.com> On Nov 30, 2010, at 11:22 PM, Keith Goodman wrote: > On Tue, Nov 30, 2010 at 1:41 PM, Skipper Seabold wrote: >> On Tue, Nov 30, 2010 at 1:34 PM, Keith Goodman wrote: >>> After upgrading from numpy 1.4.1 to 1.5.1 I get warnings like >>> "Warning: invalid value encountered in subtract" when I run unit tests >>> (or timeit) using "python -c 'blah'" but not from an interactive >>> session. How can I tell the warnings to go away? >> >> If it's this type of floating point related stuff, you can use np.seterr >> >> In [1]: import numpy as np >> >> In [2]: np.log(1./np.array(0)) >> Warning: divide by zero encountered in divide >> Out[2]: inf >> >> In [3]: orig_settings = np.seterr() >> >> In [4]: np.seterr(all="ignore") >> Out[4]: {'divide': 'print', 'invalid': 'print', 'over': 'print', 'under': 'ignor >> e'} >> >> In [5]: np.log(1./np.array(0)) >> Out[5]: inf >> >> In [6]: np.seterr(**orig_settings) >> Out[6]: {'divide': 'ignore', 'invalid': 'ignore', 'over': 'ignore', 'under': 'ig >> nore'} >> >> In [7]: np.log(1./np.array(0)) >> Warning: divide by zero encountered in divide >> Out[7]: inf >> >> I have been using the orig_settings so that I can take over the >> control of this from the user and then set it back to how it was. > > Thank, Skipper. That works. Do you wrap it in a try...except? And then > raise whatever brought you to the exception? Sounds like a pain. > > Is it considered OK for a package to change the state of np.seterr if > there is an error? Silly question. I'm just looking for an easy fix. I had to go through the try/except/set-and-reset-the-error-options dance myself in numpy.ma a few months ago. I realized that setting errors globally in a module (as was the case before) was a tad too sneaky. Sure, it was a bit of a pain, but at least you're not hiding anything.... From jsalvati at u.washington.edu Tue Nov 30 22:22:00 2010 From: jsalvati at u.washington.edu (John Salvatier) Date: Tue, 30 Nov 2010 19:22:00 -0800 Subject: [Numpy-discussion] A faster median (Wirth's method) In-Reply-To: References: <4A9C9DDA.9060503@molden.no> <4A9D7B5E.6040009@student.matnat.uio.no> <4A9D9432.20907@molden.no> Message-ID: I think last time I looked into how to apply a function along an axis I thought that the PyArray_IterAllButAxis would not work for that task ( http://docs.scipy.org/doc/numpy/reference/c-api.array.html#PyArray_IterAllButAxis), but I think perhaps I misunderstood it. I'm looking into how to use it. On Tue, Nov 30, 2010 at 12:06 PM, Keith Goodman wrote: > On Tue, Nov 30, 2010 at 11:58 AM, Matthew Brett > wrote: > > Hi, > > > > On Tue, Nov 30, 2010 at 11:35 AM, Keith Goodman > wrote: > >> On Tue, Nov 30, 2010 at 11:25 AM, John Salvatier > >> wrote: > >>> I am very interested in this result. I have wanted to know how to do an > >> > >> My first thought was to write the reducing function like this > >> > >> cdef np.float64_t namean(np.ndarray[np.float64_t, ndim=1] a): > >> > >> but cython doesn't allow np.ndarray in a cdef. > > > > Sorry for the ill-considered hasty reply, but do you mean that this: > > > > import numpy as np > > cimport numpy as cnp > > > > cdef cnp.float64_t namean(cnp.ndarray[cnp.float64_t, ndim=1] a): > > return np.nanmean(a) # just a placeholder > > > > is not allowed? It works for me. Is it a cython version thing? > > (I've got 0.13), > > Oh, that's nice! I'm using 0.11.2. OK, time to upgrade. > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: