From faltet at carabos.com Sat Jan 1 14:01:13 2005 From: faltet at carabos.com (Francesc Altet) Date: Sat Jan 1 14:01:13 2005 Subject: [Numpy-discussion] Padding policy in CharArrays Message-ID: <200501012244.23474.faltet@carabos.com> Hi, I'm experiencing some problems derived from the fact that CharArrays in numarray are padded with spaces. That leads to somewhat curious consequences like this: In [180]: a=strings.array(None, itemsize = 4, shape=1) In [181]: a[0] = '0' In [182]: a >= '0\x00\x00\x00\x01' Out[182]: array([1], type=Bool) # Incorrect but... In [183]: a[0] >= '0\x00\x00\x00\x01' Out[183]: False # correct While this is not a bug (see the padding policy for chararrays) I think it would be much better to use '\0x00' as default padding. Would be any problem with that?. If yes, well, I've found a workaround on this, but quite inelegant I'm afraid :-/ Have a Happy New Year! -- Francesc Altet ? >qo< ? http://www.carabos.com/ C?rabos Coop. V. ? V ?V ? Enjoy Data ? ? ? ? ? ? ? ? ? ? "" From jmiller at stsci.edu Mon Jan 3 08:41:27 2005 From: jmiller at stsci.edu (Todd Miller) Date: Mon Jan 3 08:41:27 2005 Subject: [Numpy-discussion] Padding policy in CharArrays In-Reply-To: <200501012244.23474.faltet@carabos.com> References: <200501012244.23474.faltet@carabos.com> Message-ID: <1104770406.26038.118.camel@halloween.stsci.edu> On Sat, 2005-01-01 at 16:44, Francesc Altet wrote: > Hi, > > I'm experiencing some problems derived from the fact that CharArrays in > numarray are padded with spaces. That leads to somewhat curious consequences > like this: > > In [180]: a=strings.array(None, itemsize = 4, shape=1) > In [181]: a[0] = '0' > In [182]: a >= '0\x00\x00\x00\x01' > Out[182]: array([1], type=Bool) # Incorrect > > but... > > In [183]: a[0] >= '0\x00\x00\x00\x01' > Out[183]: False # correct > > While this is not a bug (see the padding policy for chararrays) I think it > would be much better to use '\0x00' as default padding. Would be any problem > with that?. The design intent of numarray.strings was that you could use RawCharArray, the baseclass of CharArray, for NULL padded arrays. I tried it out like this: >>> a=strings.array(None, itemsize = 4, shape=1, kind=strings.RawCharArray) >>> a[0] = '0\0\0\0' >>> print repr(a >= '0\x00\x00\x00\x01') array([0], type=Bool) You'll note that I "hand padded" the assigned value; because RawCharArray is a little used feature, it needs more work. I think RawCharArray either makes partial/inconsistent use of NULL padding. > If yes, well, I've found a workaround on this, but quite > inelegant I'm afraid :-/ Give RawCharArray a try; it *is* the basis of CharArray, so it basically works but there will likely be a few issues to sort out. My guess is that anything that really needs fixing can be added for numarray-1.2. Regards, Todd From faltet at carabos.com Mon Jan 3 13:26:27 2005 From: faltet at carabos.com (Francesc Altet) Date: Mon Jan 3 13:26:27 2005 Subject: [Numpy-discussion] Padding policy in CharArrays In-Reply-To: <1104770406.26038.118.camel@halloween.stsci.edu> References: <200501012244.23474.faltet@carabos.com> <1104770406.26038.118.camel@halloween.stsci.edu> Message-ID: <200501032225.15238.faltet@carabos.com> A Dilluns 03 Gener 2005 17:40, Todd Miller va escriure: > >>> a=strings.array(None, itemsize = 4, shape=1, kind=strings.RawCharArray) > >>> a[0] = '0\0\0\0' > >>> print repr(a >= '0\x00\x00\x00\x01') > array([0], type=Bool) > > You'll note that I "hand padded" the assigned value; because > RawCharArray is a little used feature, it needs more work. I think > RawCharArray either makes partial/inconsistent use of NULL padding. Well, I've already tried that, but what I would like is to have the possibility to assign values *and* padding with NULL values. However, using a RawCharArray does not allow this: >>> a=strings.array(None, itemsize = 4, shape=1, kind=strings.RawCharArray) >>> a RawCharArray([' ']) >>> a[0] = str(0) Traceback (most recent call last): File "", line 1, in ? File "/usr/local/lib/python2.4/site-packages/numarray/strings.py", line 185, in _setitem where[bo:bo+self._itemsize] = self.pad(value)[0:self._itemsize] TypeError: right operand length must match slice length > Give RawCharArray a try; it *is* the basis of CharArray, so it > basically works but there will likely be a few issues to sort out. My > guess is that anything that really needs fixing can be added for > numarray-1.2. Mmm, perhaps having the possibility to select the pad value in CharArray creation time would be nice. Cheers, -- Francesc Altet ? >qo< ? http://www.carabos.com/ C?rabos Coop. V. ? V ?V ? Enjoy Data ? ? ? ? ? ? ? ? ? ? "" From simon at arrowtheory.com Mon Jan 3 14:42:30 2005 From: simon at arrowtheory.com (Simon Burton) Date: Mon Jan 3 14:42:30 2005 Subject: [Numpy-discussion] Matlab/Numeric/numarray benchmarks Message-ID: <41D9CA01.9040108@arrowtheory.com> Hi all, Here are some benchmarks, measured in seconds on a 1.5GHz celeron. Each test does a matrix add (1000x1000), mul (1000x1000) and eigenvalue find (500x500). Matlab: 0.0562 1.5180 3.7630 Numeric: 0.0962309813499 1.73247330189 3.72153270245 numarray: 7.17220497131 19.3960719109 5.72376401424 I have attached the code. Looks like numarray is way behind on the basic linear algebra stuff. We have (so far) chosen to go with numarray for our scientific computations, but will be needing fast add/multiply. I am surmising that these methods just have not been pluged into the native BLAS/ATLAS lib. We will also be needing other solvers from LAPACK such as dpotrf/dposv, and some of the special functions (bessel) already implemented in scipy. I understand that Todd is working on scipy/numarray compatability. How is that progressing, and what should we be doing to get the above functionality into numarray ? I have been pokeing around the code already, and am able to help out with that. bye for now, Simon. -- Simon Burton, B.Sc. Licensed PO Box 8066 ANU Canberra 2601 Australia Ph. 61 02 6249 6940 http://arrowtheory.com -------------- next part -------------- A non-text attachment was scrubbed... Name: benchmark.py Type: text/x-python Size: 1786 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: bench.m Type: text/x-objcsrc Size: 272 bytes Desc: not available URL: From rbastian at club-internet.fr Tue Jan 4 05:21:16 2005 From: rbastian at club-internet.fr (=?iso-8859-15?q?Ren=E9=20Bastian?=) Date: Tue Jan 4 05:21:16 2005 Subject: [Numpy-discussion] install, numpy Message-ID: <05010405560602.00761@rbastian> Hi, I tried "python setup.py install" (Python2.4) in order to get Numeric-23.6 messages : running install running build running build_py running build_ext building 'lapack_lite' extension gcc -pthread -shared build/temp.linux-i686-2.4/Src/lapack_litemodule.o -L/usr/lib/atlas -llapack -lcblas -lf77blas -latlas -lg2c -o build/lib.linux-i686-2.4/lapack_lite.so error : /usr/i486-suse-linux/bin/ld: cannot find -llapack collect2: ld returned 1 exit status error: command 'gcc' failed with exit status 1 Please, what is missing ? -- Ren? Bastian http://www.musiques-rb.org : Musique en Python From Chris.Barker at noaa.gov Tue Jan 4 09:30:30 2005 From: Chris.Barker at noaa.gov (Chris Barker) Date: Tue Jan 4 09:30:30 2005 Subject: [Numpy-discussion] install, numpy In-Reply-To: <05010405560602.00761@rbastian> References: <05010405560602.00761@rbastian> Message-ID: <41DAD148.9090703@noaa.gov> Ren? Bastian wrote: > I tried "python setup.py install" (Python2.4) in order to get Numeric-23.6 > /usr/i486-suse-linux/bin/ld: cannot find -llapack AARRGG!! I can't believe this bug is still there! Who is responsible for maintaining the setup.py for Numeric? This has been discussed numerous times on this list, does there need to be a bug report officially filed somewhere? Anyway, the problem is that it's looking for lapack libs that you don't have. By default, setup.py is supposed to be configured to use only the built-in lapack-lite, so it should build anywhere. I've looked in the setup.py, and found that it's closer, but apparently not fixed. I've got lapack on my system, so it's hard for me to test, but try making these changes in setup.py: # delete all but the first one in this list if using your own LAPACK/BLAS #This looks to be right: sourcelist = [os.path.join('Src', 'lapack_litemodule.c'), # os.path.join('Src', 'blas_lite.c'), # os.path.join('Src', 'f2c_lite.c'), # os.path.join('Src', 'zlapack_lite.c'), # os.path.join('Src', 'dlapack_lite.c') ] # set these to use your own BLAS; #library_dirs_list = ['/usr/lib/atlas'] library_dirs_list = [] #libraries_list = ['lapack', 'cblas', 'f77blas', 'atlas', 'g2c'] # if you also set `use_dotblas` (see below), you'll need: # ['lapack', 'cblas', 'f77blas', 'atlas', 'g2c'] libraries_list = [] # set to true (1), if you also want BLAS optimized #matrixmultiply/dot/innerproduct #use_dotblas = 1 use_dotblas = 0 #include_dirs = ['/usr/include/atlas'] # You may need to set this to include_dirs = [] find cblas.h # e.g. on UNIX using ATLAS this should be ['/usr/include/atlas'] Note that some of those may be harmless, even if they don't exist, but it won't hurt to get rid of paths you don't have anyway. Also, if you are doing any linear algebra, you'll get much better performance with a native lapack, such as the atlas one, so you may want to get that installed, rather than making this fix. search this list for lapack and/or atlas, to learn how. Suse is likely to provide an atlas rpm. -Chris -- Christopher Barker, Ph.D. Oceanographer NOAA/OR&R/HAZMAT (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From rbastian at club-internet.fr Tue Jan 4 09:51:27 2005 From: rbastian at club-internet.fr (=?iso-8859-15?q?Ren=E9=20Bastian?=) Date: Tue Jan 4 09:51:27 2005 Subject: [Numpy-discussion] install, numpy In-Reply-To: <41DAD148.9090703@noaa.gov> References: <05010405560602.00761@rbastian> <41DAD148.9090703@noaa.gov> Message-ID: <05010410263100.00761@rbastian> Thanks! Now it work's. I will see if numpy is faster than numarray for my music/audio business. Le Mardi 4 Janvier 2005 18:24, Chris Barker a ?crit : > Ren? Bastian wrote: > > I tried "python setup.py install" (Python2.4) in order to get > > Numeric-23.6 > > > > /usr/i486-suse-linux/bin/ld: cannot find -llapack > > AARRGG!! I can't believe this bug is still there! Who is responsible for > maintaining the setup.py for Numeric? This has been discussed numerous > times on this list, does there need to be a bug report officially filed > somewhere? > > Anyway, the problem is that it's looking for lapack libs that you don't > have. By default, setup.py is supposed to be configured to use only the > built-in lapack-lite, so it should build anywhere. > > I've looked in the setup.py, and found that it's closer, but apparently > not fixed. I've got lapack on my system, so it's hard for me to test, > but try making these changes in setup.py: > > # delete all but the first one in this list if using your own LAPACK/BLAS > > #This looks to be right: > > sourcelist = [os.path.join('Src', 'lapack_litemodule.c'), > # os.path.join('Src', 'blas_lite.c'), > # os.path.join('Src', 'f2c_lite.c'), > # os.path.join('Src', 'zlapack_lite.c'), > # os.path.join('Src', 'dlapack_lite.c') > ] > # set these to use your own BLAS; > > #library_dirs_list = ['/usr/lib/atlas'] > library_dirs_list = [] > #libraries_list = ['lapack', 'cblas', 'f77blas', 'atlas', 'g2c'] > # if you also set `use_dotblas` (see below), you'll > need: > # ['lapack', 'cblas', 'f77blas', 'atlas', 'g2c'] > > libraries_list = [] > > # set to true (1), if you also want BLAS optimized > #matrixmultiply/dot/innerproduct > #use_dotblas = 1 > use_dotblas = 0 > #include_dirs = ['/usr/include/atlas'] # You may need to set this to > include_dirs = [] > > > > Note that some of those may be harmless, even if they don't exist, but > it won't hurt to get rid of paths you don't have anyway. > > Also, if you are doing any linear algebra, you'll get much better > performance with a native lapack, such as the atlas one, so you may want > to get that installed, rather than making this fix. search this list for > lapack and/or atlas, to learn how. > > Suse is likely to provide an atlas rpm. > > -Chris -- Ren? Bastian http://www.musiques-rb.org : Musique en Python From stephen.walton at csun.edu Tue Jan 4 10:11:27 2005 From: stephen.walton at csun.edu (Stephen Walton) Date: Tue Jan 4 10:11:27 2005 Subject: [Numpy-discussion] install, numpy In-Reply-To: <41DAD148.9090703@noaa.gov> References: <05010405560602.00761@rbastian> <41DAD148.9090703@noaa.gov> Message-ID: <41DADC0C.3000501@csun.edu> Chris Barker wrote: > Suse is likely to provide an atlas rpm. How? I've struggled with trying to create ATLAS RPMs in order to maintain it locally, but the rpm program doesn't have the hooks to distinguish between the various architectures ATLAS supports; it distinguishes between Athlon and Pentium, for example, and an ATLAS library built on the former core dumps on the latter. It would also be nice if ATLAS could be made a shared library, but that's also not supported at this time. Without it, Numeric and numarray built against ATLAS inherit its architecture dependence. It makes maintaining all of this at our site a real pain. From jbrandmeyer at earthlink.net Tue Jan 4 10:50:48 2005 From: jbrandmeyer at earthlink.net (Jonathan Brandmeyer) Date: Tue Jan 4 10:50:48 2005 Subject: [Numpy-discussion] install, numpy In-Reply-To: <41DADC0C.3000501@csun.edu> References: <05010405560602.00761@rbastian> <41DAD148.9090703@noaa.gov> <41DADC0C.3000501@csun.edu> Message-ID: <1104864489.26580.15.camel@illuvatar> On Tue, 2005-01-04 at 10:10 -0800, Stephen Walton wrote: > Chris Barker wrote: > > > Suse is likely to provide an atlas rpm. > > How? I've struggled with trying to create ATLAS RPMs in order to > maintain it locally, but the rpm program doesn't have the hooks to > distinguish between the various architectures ATLAS supports; it > distinguishes between Athlon and Pentium, for example, and an ATLAS > library built on the former core dumps on the latter. > > It would also be nice if ATLAS could be made a shared library, but > that's also not supported at this time. ATLAS is built as a shared library in Debian, named libatlas.so.3 and libblas.so.3. > Without it, Numeric and > numarray built against ATLAS inherit its architecture dependence. It > makes maintaining all of this at our site a real pain. In Debian there are several packages that "Provide" the atlas shared libraries (atlas3-base, atlas3-sse atlas3-sse2 atlas3-3dnow). A dependent package would "Depend" on the generic name. I don't know anything about how RPM's support for "Provides" works, but I would assume that you can manage something similar. The preinstall scripts for each one verify that the CPU supports the instructions used in the package. HTH, -Jonathan From rbastian at club-internet.fr Tue Jan 4 13:35:07 2005 From: rbastian at club-internet.fr (=?iso-8859-1?q?Ren=E9=20Bastian?=) Date: Tue Jan 4 13:35:07 2005 Subject: [Numpy-discussion] Matlab/Numeric/numarray benchmarks In-Reply-To: <41D9CA01.9040108@arrowtheory.com> References: <41D9CA01.9040108@arrowtheory.com> Message-ID: <05010414075901.00761@rbastian> Hm, Numeric is installed but : Traceback (most recent call last): File "benchmark.py", line 93, in ? test00() File "benchmark.py", line 8, in test00 import RandomArray as RA File "/usr/local/lib/python2.4/site-packages/Numeric/RandomArray.py", line 3, in ? import LinearAlgebra File "/usr/local/lib/python2.4/site-packages/Numeric/LinearAlgebra.py", line 8, in ? import lapack_lite ImportError: /usr/local/lib/python2.4/site-packages/Numeric/lapack_lite.so: undefined symbol: dgesdd_ Something wrong ? -- Ren? Bastian http://www.musiques-rb.org : Musique en Python From perry at stsci.edu Tue Jan 4 13:54:28 2005 From: perry at stsci.edu (Perry Greenfield) Date: Tue Jan 4 13:54:28 2005 Subject: [Numpy-discussion] Matlab/Numeric/numarray benchmarks In-Reply-To: <41D9CA01.9040108@arrowtheory.com> References: <41D9CA01.9040108@arrowtheory.com> Message-ID: <573FE986-5E9B-11D9-BF21-000A95B68E50@stsci.edu> On Jan 3, 2005, at 5:41 PM, Simon Burton wrote: > I understand that Todd is working on scipy/numarray compatability. How > is that progressing, and what should we be doing to get the above > functionality into numarray ? I have been pokeing around the code > already, and am able to help out with that. > Todd has done the first phase of making changes to numarray to handle generalized ufuncs (the area of greatest incompatibility) and also make the necessary changes to scipy_base to support both numarray and Numeric. We are currently waiting for some feedback on the acceptability of these changes so we can continue modifying the rest of scipy to support numarray. So it is going ahead. I figure that once these changes are accepted and an agreement is reached on how setup.py should handle dual builds that anyone should be able to contribute to the porting effort. I hope that can happen in a few weeks. I don't know if that is quick enough for your needs. But you are welcome to take a look at what Todd has already checked into CVS in scipy (as a branch). Perry Greenfield From Chris.Barker at noaa.gov Tue Jan 4 15:33:54 2005 From: Chris.Barker at noaa.gov (Chris Barker) Date: Tue Jan 4 15:33:54 2005 Subject: [Numpy-discussion] install, numpy In-Reply-To: <41DADC0C.3000501@csun.edu> References: <05010405560602.00761@rbastian> <41DAD148.9090703@noaa.gov> <41DADC0C.3000501@csun.edu> Message-ID: <41DB2667.8050908@noaa.gov> Stephen Walton wrote: > How? I've struggled with trying to create ATLAS RPMs in order to > maintain it locally, but the rpm program doesn't have the hooks to > distinguish between the various architectures ATLAS supports; it > distinguishes between Athlon and Pentium, for example, and an ATLAS > library built on the former core dumps on the latter. sorry, I can't help here. I'm running Gentoo, which has a "compile everything yourself" philosophy! Ren? Bastian wrote: > Numeric is installed but : > > Traceback (most recent call last): > File "benchmark.py", line 93, in ? > test00() > File "benchmark.py", line 8, in test00 > import RandomArray as RA > File "/usr/local/lib/python2.4/site-packages/Numeric/RandomArray.py", line > 3, in ? > import LinearAlgebra > File "/usr/local/lib/python2.4/site-packages/Numeric/LinearAlgebra.py", > line 8, in ? > import lapack_lite > ImportError: /usr/local/lib/python2.4/site-packages/Numeric/lapack_lite.so: > undefined symbol: dgesdd_ > > Something wrong ? Sorry, I'm kind of out of my depth here, but one thing I would try is to trash the build directory of Numeric, and build again, just to make sure you're re-building everything. You might want to delete the /usr/lib/python2.3/site-packages/Numeric Directory too, before installing. - Chris -- Christopher Barker, Ph.D. Oceanographer NOAA/OR&R/HAZMAT (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From NadavH at VisionSense.com Wed Jan 5 02:31:00 2005 From: NadavH at VisionSense.com (Nadav Horesh) Date: Wed Jan 5 02:31:00 2005 Subject: [Numpy-discussion] A bug in numarray sign function. Message-ID: <41DBC107.7070702@VisionSense.com> in numarraycore.py line 1504 should be changed from return zeros(shape(m))-less(m,0)+greater(m,0) to return zeros(shape(m))-ufunc.less(m,0)+ufunc.greater(m,0) otherwise sign function raises an error: /usr/local/lib/python2.4/site-packages/numarray/numarraycore.py in sign(m) 1502 """ 1503 m = asarray(m) -> 1504 return zeros(shape(m))-less(m,0)+greater(m,0) 1505 1506 def alltrue(array, axis=0): NameError: global name 'less' is not defined Nadav. From jmiller at stsci.edu Wed Jan 5 06:11:31 2005 From: jmiller at stsci.edu (Todd Miller) Date: Wed Jan 5 06:11:31 2005 Subject: [Numpy-discussion] Matlab/Numeric/numarray benchmarks In-Reply-To: <41D9CA01.9040108@arrowtheory.com> References: <41D9CA01.9040108@arrowtheory.com> Message-ID: <1104934210.31516.1563.camel@halloween.stsci.edu> Hi Simon, I found a benchmark bug which explains the performance difference in +. Here are my times with the modified benchmark (Python-2.4 gcc version 3.2.2 20030222 (Red Hat Linux 3.2.2-5) on 1.7 GHz P-IV w/ 2G): numarray + : 0.0540893316269 numarray matrixmultiply : 16.9448821545 numarray eigenvalues : 9.67254910469 Numeric + : 0.0653991508484 Numeric matrixmultiply : 33.0565470934 Numeric eigenvalues : 9.44225819111 So, for large arrays with a simple to install / built-in linear algebra system, numarray is doing just fine. Looking at your results, I think you may have been comparing numarray built using a built-in blas_lite versus Numeric using ATLAS. There, I think numarray *is* behind but that is fixable with some effort. The key is porting and integrating the Numeric dotblas package with numarray. I've been looking at that some today... err, yesterday, apparently I forgot to hit "send". On Mon, 2005-01-03 at 17:41, Simon Burton wrote: > Hi all, > > Here are some benchmarks, measured in seconds on a 1.5GHz celeron. > Each test does a matrix add (1000x1000), mul (1000x1000) and eigenvalue > find (500x500). > > Matlab: > 0.0562 > 1.5180 > 3.7630 > > Numeric: > 0.0962309813499 > 1.73247330189 > 3.72153270245 > > numarray: > 7.17220497131 > 19.3960719109 > 5.72376401424 > > I understand that Todd is working on scipy/numarray compatability. How > is that progressing, and what should we be doing to get the above > functionality into numarray ? Perry already addressed this. > I have been pokeing around the code > already, and am able to help out with that. Regards, Todd -------------- next part -------------- A non-text attachment was scrubbed... Name: benchmark.py Type: text/x-python Size: 2055 bytes Desc: not available URL: From stephen.walton at csun.edu Wed Jan 5 09:16:05 2005 From: stephen.walton at csun.edu (Stephen Walton) Date: Wed Jan 5 09:16:05 2005 Subject: [Numpy-discussion] A bug in numarray sign function. In-Reply-To: <41DBC107.7070702@VisionSense.com> References: <41DBC107.7070702@VisionSense.com> Message-ID: <41DC208F.5060605@csun.edu> Nadav Horesh wrote: > in numarraycore.py line 1504 should be changed from > > return zeros(shape(m))-less(m,0)+greater(m,0) What version of numarray are you running? I don't see this code in version 1.1.1. (I did a 'grep zeros(shape(m)) *.py' in /usr/lib/python2.3/site-packages/numarray and got no matches; line 1504 of my copy of numarraycore.py doesn't look anything like the above. From stephen.walton at csun.edu Wed Jan 5 09:34:28 2005 From: stephen.walton at csun.edu (Stephen Walton) Date: Wed Jan 5 09:34:28 2005 Subject: [Numpy-discussion] Matlab/Numeric/numarray benchmarks In-Reply-To: <1104934210.31516.1563.camel@halloween.stsci.edu> References: <41D9CA01.9040108@arrowtheory.com> <1104934210.31516.1563.camel@halloween.stsci.edu> Message-ID: <41DC24DD.10002@csun.edu> Todd, I ran your updated benchmark on a Dell Precision 350 (P4 at 2.26GHz) with numarray 1.1.1 and Numeric 23.6 both built against ATLAS. My results were: numarray + : 0.026392891407 numarray matrixmultiply : 4.37110900879 numarray eigenvalues : 2.95166471004 Numeric + : 0.0369043111801 Numeric matrixmultiply : 0.69968931675 Numeric eigenvalues : 2.81557621956 Might there still be a matrixmultiply problem somewhere? From jmiller at stsci.edu Wed Jan 5 09:54:03 2005 From: jmiller at stsci.edu (Todd Miller) Date: Wed Jan 5 09:54:03 2005 Subject: [Numpy-discussion] Matlab/Numeric/numarray benchmarks In-Reply-To: <41DC24DD.10002@csun.edu> References: <41D9CA01.9040108@arrowtheory.com> <1104934210.31516.1563.camel@halloween.stsci.edu> <41DC24DD.10002@csun.edu> Message-ID: <1104947578.31516.1752.camel@halloween.stsci.edu> On Wed, 2005-01-05 at 12:33, Stephen Walton wrote: > Todd, I ran your updated benchmark on a Dell Precision 350 (P4 at > 2.26GHz) with numarray 1.1.1 and Numeric 23.6 both built against ATLAS. > My results were: > > numarray + : 0.026392891407 > numarray matrixmultiply : 4.37110900879 > numarray eigenvalues : 2.95166471004 > > Numeric + : 0.0369043111801 > Numeric matrixmultiply : 0.69968931675 > Numeric eigenvalues : 2.81557621956 > > Might there still be a matrixmultiply problem somewhere? That's what "dotblas" does; it replaces matrixmultiply() and innerproduct() with versions which are dependent on a laundry list of numerical libraries. Numeric has dotblas, numarray doesn't. I'm looking into it now. The port itself is trivial, but integrating it into the numarray package structure has my head spinning a little. Regards, Todd From simon at arrowtheory.com Wed Jan 5 16:25:00 2005 From: simon at arrowtheory.com (Simon Burton) Date: Wed Jan 5 16:25:00 2005 Subject: [Numpy-discussion] Matlab/Numeric/numarray benchmarks In-Reply-To: <1104934210.31516.1563.camel@halloween.stsci.edu> References: <41D9CA01.9040108@arrowtheory.com> <1104934210.31516.1563.camel@halloween.stsci.edu> Message-ID: <41DC84FA.1090703@arrowtheory.com> Todd Miller wrote: >Hi Simon, > >I found a benchmark bug which explains the performance difference in +. >Here are my times with the modified benchmark (Python-2.4 gcc version >3.2.2 20030222 (Red Hat Linux 3.2.2-5) on 1.7 GHz P-IV w/ 2G): > >numarray + : 0.0540893316269 >numarray matrixmultiply : 16.9448821545 >numarray eigenvalues : 9.67254910469 > >Numeric + : 0.0653991508484 >Numeric matrixmultiply : 33.0565470934 >Numeric eigenvalues : 9.44225819111 > >So, for large arrays with a simple to install / built-in linear algebra >system, numarray is doing just fine. > >Looking at your results, I think you may have been comparing numarray >built using a built-in blas_lite versus Numeric using ATLAS. There, I >think numarray *is* behind but that is fixable with some effort. The >key is porting and integrating the Numeric dotblas package with >numarray. I've been looking at that some today... err, yesterday, >apparently I forgot to hit "send". > > Wow, those results look great, Todd. I have double checked my install. However, the numarray multiply is still x10 slower. This is set in addons: lapack_libs = ['lapack', 'f77blas', 'cblas', 'atlas', 'blas'] and, at runtime, python has loaded: 40771000-40cb6000 r-xp 00000000 00:0c 783245 /usr/lib/atlas/liblapack.so.3.0 40cb6000-40cb9000 rw-p 00545000 00:0c 783245 /usr/lib/atlas/liblapack.so.3.0 40cb9000-40dbd000 rw-p 00000000 00:00 0 40dbd000-40dd7000 r-xp 00000000 00:0c 783242 /usr/lib/libf77blas.so.3.0 40dd7000-40dd8000 rw-p 00019000 00:0c 783242 /usr/lib/libf77blas.so.3.0 40dd8000-40df7000 r-xp 00000000 00:0c 783241 /usr/lib/libcblas.so.3.0 40df7000-40df8000 rw-p 0001e000 00:0c 783241 /usr/lib/libcblas.so.3.0 40df8000-4110b000 r-xp 00000000 00:0c 783240 /usr/lib/libatlas.so.3.0 4110b000-4110f000 rw-p 00312000 00:0c 783240 /usr/lib/libatlas.so.3.0 4110f000-41454000 r-xp 00000000 00:0c 783244 /usr/lib/atlas/libblas.so.3.0 41454000-41458000 rw-p 00345000 00:0c 783244 /usr/lib/atlas/libblas.so.3.0 41458000-41472000 r-xp 00000000 00:0c 17227 /usr/lib/libg2c.so.0.0.0 41472000-41473000 rw-p 0001a000 00:0c 17227 /usr/lib/libg2c.so.0.0.0 ( as well as lapack_lite2.so ) So I assumed that since the eigenvalues came in fast ATLAS was alive and well. Also, the above libs are exactly what Numeric uses: 4033b000-40880000 r-xp 00000000 00:0c 783245 /usr/lib/atlas/liblapack.so.3.0 40880000-40883000 rw-p 00545000 00:0c 783245 /usr/lib/atlas/liblapack.so.3.0 40883000-40987000 rw-p 00000000 00:00 0 40987000-409a6000 r-xp 00000000 00:0c 783241 /usr/lib/libcblas.so.3.0 409a6000-409a7000 rw-p 0001e000 00:0c 783241 /usr/lib/libcblas.so.3.0 409a7000-409c1000 r-xp 00000000 00:0c 783242 /usr/lib/libf77blas.so.3.0 409c1000-409c2000 rw-p 00019000 00:0c 783242 /usr/lib/libf77blas.so.3.0 409c2000-40cd5000 r-xp 00000000 00:0c 783240 /usr/lib/libatlas.so.3.0 40cd5000-40cd9000 rw-p 00312000 00:0c 783240 /usr/lib/libatlas.so.3.0 40cd9000-40cf3000 r-xp 00000000 00:0c 17227 /usr/lib/libg2c.so.0.0.0 40cf3000-40cf4000 rw-p 0001a000 00:0c 17227 /usr/lib/libg2c.so.0.0.0 40cf4000-40cf7000 rw-p 00000000 00:00 0 40cf7000-4103c000 r-xp 00000000 00:0c 783244 /usr/lib/atlas/libblas.so.3.0 4103c000-41040000 rw-p 00345000 00:0c 783244 /usr/lib/atlas/libblas.so.3.0 Any ideas ? It's not even clear to me where the matrixmultiply is taking place. I couldn't find it in my lapack_lite2.so even though there seems to be a lite version of dgemm in the source. But then dgemm is not referenced anywhere else in the numarray source. Comeing from the other end, I traced matrixmultiply to an _ipFloat64. But then the trail went cold again :) Flumoxed. Simon. -- Simon Burton, B.Sc. Licensed PO Box 8066 ANU Canberra 2601 Australia Ph. 61 02 6249 6940 http://arrowtheory.com From stephen.walton at csun.edu Wed Jan 5 17:59:12 2005 From: stephen.walton at csun.edu (Stephen Walton) Date: Wed Jan 5 17:59:12 2005 Subject: [Numpy-discussion] Matlab/Numeric/numarray benchmarks In-Reply-To: <41DC84FA.1090703@arrowtheory.com> References: <41D9CA01.9040108@arrowtheory.com> <1104934210.31516.1563.camel@halloween.stsci.edu> <41DC84FA.1090703@arrowtheory.com> Message-ID: <41DC9B35.2070801@csun.edu> Simon Burton wrote: > I have double checked my install. However, the numarray multiply is > still x10 slower. > > This is set in addons: > lapack_libs = ['lapack', 'f77blas', 'cblas', 'atlas', 'blas'] One more thing: did you set the environment variable USE_LAPACK before building; i.e., env USE_LAPACK=1 python setup.py build in your numarray directory? Just checking. From simon at arrowtheory.com Wed Jan 5 21:15:02 2005 From: simon at arrowtheory.com (Simon Burton) Date: Wed Jan 5 21:15:02 2005 Subject: [Numpy-discussion] Matlab/Numeric/numarray benchmarks In-Reply-To: <41DC9B35.2070801@csun.edu> References: <41D9CA01.9040108@arrowtheory.com> <1104934210.31516.1563.camel@halloween.stsci.edu> <41DC84FA.1090703@arrowtheory.com> <41DC9B35.2070801@csun.edu> Message-ID: <20050106160431.0e910010.simon@arrowtheory.com> On Wed, 05 Jan 2005 17:58:13 -0800 Stephen Walton wrote: > One more thing: did you set the environment variable USE_LAPACK before > building; i.e., > > env USE_LAPACK=1 python setup.py build > > in your numarray directory? Just checking. > yes, i also checked it using a print statement. Simon. -- Simon Burton, B.Sc. Licensed PO Box 8066 ANU Canberra 2601 Australia Ph. 61 02 6249 6940 http://arrowtheory.com From NadavH at VisionSense.com Thu Jan 6 04:26:29 2005 From: NadavH at VisionSense.com (Nadav Horesh) Date: Thu Jan 6 04:26:29 2005 Subject: [Numpy-discussion] A bug in numarray sign function. In-Reply-To: <41DC208F.5060605@csun.edu> References: <41DBC107.7070702@VisionSense.com> <41DC208F.5060605@csun.edu> Message-ID: <41DD2D8F.8000803@VisionSense.com> >>> print numarray.__version__ 1.2a It is from the CVS repositoty Nadav Stephen Walton wrote: > Nadav Horesh wrote: > >> in numarraycore.py line 1504 should be changed from >> >> return zeros(shape(m))-less(m,0)+greater(m,0) > > > What version of numarray are you running? I don't see this code in > version 1.1.1. (I did a 'grep zeros(shape(m)) *.py' in > /usr/lib/python2.3/site-packages/numarray and got no matches; line > 1504 of my copy of numarraycore.py doesn't look anything like the above. > > From southey at uiuc.edu Thu Jan 6 06:31:34 2005 From: southey at uiuc.edu (Bruce Southey) Date: Thu Jan 6 06:31:34 2005 Subject: [Numpy-discussion] Matlab/Numeric/numarray benchmarks Message-ID: Hi, While on the subject on benchmarks, I thought I would point out an really excellent book by Hans Petter Langtangen's book: 'Python Scripting for computational science' (Springer, 2004: http://www.springeronline.com/sgw/cda/frontpage/0,0,4-0-22-17627636-0,0.html ). The book web site is http://folk.uio.no/hpl/scripting/ that also has the scripts. There is considerable detailed material on using Numeric and numarray as well as using Python callbacks from C/C++ and Fortran. Also addresses GUI programming and other topics in Python including regular expressions. One of the really great things about this book is the discussion on how to improve code with reference to a single example called gridloop. Gridloop just evaluates a function (the actual function used was 'sin(x,y) + 8*x') over a rectangular grid and stores the results in an array. There are well over 25 versions from using straight C, Fortran and C++ to using Python and Numerical Python. These benchmarks are on different ways to implement this gridloop function in Fortran, C/C++, numarray, Numeric and Python callbacks from C/C++ and Fortran. In the vectorized form relative to the F77 version, numarray (v0.9) was 2.7 times slower and Numeric (v23) was 3.0 times slower. Another items that appeared was that since the sin function is scalar, there was a huge difference in the Python implementation between using math.sin (140 times slower than F77), Numeric.sin (230 times slower than F77)and numarray.sin (350 times slower than F77). Perhaps, this suggests that namespace should be checked for scalar arguments before using vectorized versions. Regards Bruce Southey From Jack.Jansen at cwi.nl Thu Jan 6 06:55:24 2005 From: Jack.Jansen at cwi.nl (Jack Jansen) Date: Thu Jan 6 06:55:24 2005 Subject: [Numpy-discussion] A request for new distributions Message-ID: I'm catching up with half a year of back postings on the numpy list, and the messages about automatically using vecLib on the macintosh sound very interesting. I maintain the official Package Manager databases for MacPython (which allow users to install a handful of common packages with one mouseclick), and Numeric and numarray have been in there since day one. I'm revising all the packages again at the moment (I do that about once a year, or on request), and I had to manually fiddle Numeric 23.6 to actually build on the Mac. This is a bit of a bother, as I have to keep a separate source distribution in stead of just referring to the official one, but more important is the fact that neither Numeric nor numarray support vecLib out of the box. So, a plea for help: I don't know what the usual frequency for Numeric and numarray distributions is, but it would be very helpful for me (and for anyone using Numeric/numarray on the Mac) if new distributions were available that contained the fixes mentioned in the november discussions... -- Jack Jansen, , http://www.cwi.nl/~jack If I can't dance I don't want to be part of your revolution -- Emma Goldman From jmiller at stsci.edu Thu Jan 6 08:12:00 2005 From: jmiller at stsci.edu (Todd Miller) Date: Thu Jan 6 08:12:00 2005 Subject: [Numpy-discussion] A bug in numarray sign function. In-Reply-To: <41DD2D8F.8000803@VisionSense.com> References: <41DBC107.7070702@VisionSense.com> <41DC208F.5060605@csun.edu> <41DD2D8F.8000803@VisionSense.com> Message-ID: <1105027838.10516.190.camel@halloween.stsci.edu> On Thu, 2005-01-06 at 07:22, Nadav Horesh wrote: > >>> print numarray.__version__ > 1.2a > > It is from the CVS repositoty > > Nadav Thanks Nadav. Fixed in CVS. Regards, Todd > > Stephen Walton wrote: > > > Nadav Horesh wrote: > > > >> in numarraycore.py line 1504 should be changed from > >> > >> return zeros(shape(m))-less(m,0)+greater(m,0) > > > > > > What version of numarray are you running? I don't see this code in > > version 1.1.1. (I did a 'grep zeros(shape(m)) *.py' in > > /usr/lib/python2.3/site-packages/numarray and got no matches; line > > 1504 of my copy of numarraycore.py doesn't look anything like the above. > > > > > > > ------------------------------------------------------- > The SF.Net email is sponsored by: Beat the post-holiday blues > Get a FREE limited edition SourceForge.net t-shirt from ThinkGeek. > It's fun and FREE -- well, almost....http://www.thinkgeek.com/sfshirt > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/numpy-discussion -- From jmiller at stsci.edu Thu Jan 6 08:18:59 2005 From: jmiller at stsci.edu (Todd Miller) Date: Thu Jan 6 08:18:59 2005 Subject: [Numpy-discussion] Matlab/Numeric/numarray benchmarks In-Reply-To: <41DC84FA.1090703@arrowtheory.com> References: <41D9CA01.9040108@arrowtheory.com> <1104934210.31516.1563.camel@halloween.stsci.edu> <41DC84FA.1090703@arrowtheory.com> Message-ID: <1105028223.10516.210.camel@halloween.stsci.edu> On Wed, 2005-01-05 at 19:23, Simon Burton wrote: > Todd Miller wrote: > > >Hi Simon, > > > >I found a benchmark bug which explains the performance difference in +. > >Here are my times with the modified benchmark (Python-2.4 gcc version > >3.2.2 20030222 (Red Hat Linux 3.2.2-5) on 1.7 GHz P-IV w/ 2G): > > > >numarray + : 0.0540893316269 > >numarray matrixmultiply : 16.9448821545 > >numarray eigenvalues : 9.67254910469 > > > >Numeric + : 0.0653991508484 > >Numeric matrixmultiply : 33.0565470934 > >Numeric eigenvalues : 9.44225819111 > > > >So, for large arrays with a simple to install / built-in linear algebra > >system, numarray is doing just fine. > > > >Looking at your results, I think you may have been comparing numarray > >built using a built-in blas_lite versus Numeric using ATLAS. There, I > >think numarray *is* behind but that is fixable with some effort. The > >key is porting and integrating the Numeric dotblas package with > >numarray. I've been looking at that some today... err, yesterday, > >apparently I forgot to hit "send". > > > > > > Wow, those results look great, Todd. > > I have double checked my install. However, the numarray multiply is > still x10 slower. > > This is set in addons: > lapack_libs = ['lapack', 'f77blas', 'cblas', 'atlas', 'blas'] > > and, at runtime, python has loaded: > 40771000-40cb6000 r-xp 00000000 00:0c 783245 > /usr/lib/atlas/liblapack.so.3.0 > 40cb6000-40cb9000 rw-p 00545000 00:0c 783245 > /usr/lib/atlas/liblapack.so.3.0 > 40cb9000-40dbd000 rw-p 00000000 00:00 0 > 40dbd000-40dd7000 r-xp 00000000 00:0c 783242 /usr/lib/libf77blas.so.3.0 > 40dd7000-40dd8000 rw-p 00019000 00:0c 783242 /usr/lib/libf77blas.so.3.0 > 40dd8000-40df7000 r-xp 00000000 00:0c 783241 /usr/lib/libcblas.so.3.0 > 40df7000-40df8000 rw-p 0001e000 00:0c 783241 /usr/lib/libcblas.so.3.0 > 40df8000-4110b000 r-xp 00000000 00:0c 783240 /usr/lib/libatlas.so.3.0 > 4110b000-4110f000 rw-p 00312000 00:0c 783240 /usr/lib/libatlas.so.3.0 > 4110f000-41454000 r-xp 00000000 00:0c 783244 > /usr/lib/atlas/libblas.so.3.0 > 41454000-41458000 rw-p 00345000 00:0c 783244 > /usr/lib/atlas/libblas.so.3.0 > 41458000-41472000 r-xp 00000000 00:0c 17227 /usr/lib/libg2c.so.0.0.0 > 41472000-41473000 rw-p 0001a000 00:0c 17227 /usr/lib/libg2c.so.0.0.0 > ( as well as lapack_lite2.so ) > > So I assumed that since the eigenvalues came in fast ATLAS was alive and > well. > Also, the above libs are exactly what Numeric uses: > 4033b000-40880000 r-xp 00000000 00:0c 783245 > /usr/lib/atlas/liblapack.so.3.0 > 40880000-40883000 rw-p 00545000 00:0c 783245 > /usr/lib/atlas/liblapack.so.3.0 > 40883000-40987000 rw-p 00000000 00:00 0 > 40987000-409a6000 r-xp 00000000 00:0c 783241 /usr/lib/libcblas.so.3.0 > 409a6000-409a7000 rw-p 0001e000 00:0c 783241 /usr/lib/libcblas.so.3.0 > 409a7000-409c1000 r-xp 00000000 00:0c 783242 /usr/lib/libf77blas.so.3.0 > 409c1000-409c2000 rw-p 00019000 00:0c 783242 /usr/lib/libf77blas.so.3.0 > 409c2000-40cd5000 r-xp 00000000 00:0c 783240 /usr/lib/libatlas.so.3.0 > 40cd5000-40cd9000 rw-p 00312000 00:0c 783240 /usr/lib/libatlas.so.3.0 > 40cd9000-40cf3000 r-xp 00000000 00:0c 17227 /usr/lib/libg2c.so.0.0.0 > 40cf3000-40cf4000 rw-p 0001a000 00:0c 17227 /usr/lib/libg2c.so.0.0.0 > 40cf4000-40cf7000 rw-p 00000000 00:00 0 > 40cf7000-4103c000 r-xp 00000000 00:0c 783244 > /usr/lib/atlas/libblas.so.3.0 > 4103c000-41040000 rw-p 00345000 00:0c 783244 > /usr/lib/atlas/libblas.so.3.0 > > > Any ideas ? What we've got here is... a'falya to communicate. numarray's dot/matrixmultiply() and innerproduct() have never been implemented in terms of a BLAS. *That's* the problem. Numeric has the "dotblas" extension which augments the the built-in versions of these functions with ones that are souped up using external libraries. I ported dotblas yesterday and checked it into numarray CVS yesterday afternoon. I fixed the last doctest artifact and re-arranged a little this morning. If you're working from CVS you should be able to see the new performance optimization now by doing an update and building/linking against the right libraries. To do that I: setenv USE_LAPACK 1 setenv LINALG_LIB setenv LINALG_INCLUDE Have a look. I think numarray is slightly ahead now. Todd From stephen.walton at csun.edu Thu Jan 6 09:40:26 2005 From: stephen.walton at csun.edu (Stephen Walton) Date: Thu Jan 6 09:40:26 2005 Subject: [Numpy-discussion] Matlab/Numeric/numarray benchmarks In-Reply-To: <1105028223.10516.210.camel@halloween.stsci.edu> References: <41D9CA01.9040108@arrowtheory.com> <1104934210.31516.1563.camel@halloween.stsci.edu> <41DC84FA.1090703@arrowtheory.com> <1105028223.10516.210.camel@halloween.stsci.edu> Message-ID: <41DD77C0.10402@csun.edu> Todd Miller wrote: >What we've got here is... a'falya to communicate. > > > Yeah, include me in that category as well. >I ported dotblas yesterday and checked it into numarray CVS yesterday >afternoon. I fixed the last doctest artifact and re-arranged a little >this morning. > > > I just checked out the 1.2a CVS and am getting the same result I did before, with matrix multiplies about a factor of 7 slower in numarray than numeric. Now, I'm building with the Absoft compiler, and am wondering if some glitch in the build process is causing ATLAS not to be used. How did Simon Burton get the list of libraries loaded by python after importing numarray? I should check that next. By the way, libg2c still needs to be linked against with the Absoft compiler; add 'g2c' to 'lapack_libs' between 'atlas' and 'f90math'. Steve From jmiller at stsci.edu Thu Jan 6 12:25:43 2005 From: jmiller at stsci.edu (Todd Miller) Date: Thu Jan 6 12:25:43 2005 Subject: [Numpy-discussion] Matlab/Numeric/numarray benchmarks In-Reply-To: <41DD77C0.10402@csun.edu> References: <41D9CA01.9040108@arrowtheory.com> <1104934210.31516.1563.camel@halloween.stsci.edu> <41DC84FA.1090703@arrowtheory.com> <1105028223.10516.210.camel@halloween.stsci.edu> <41DD77C0.10402@csun.edu> Message-ID: <1105043050.10516.557.camel@halloween.stsci.edu> > >I ported dotblas yesterday and checked it into numarray CVS yesterday > >afternoon. I fixed the last doctest artifact and re-arranged a little > >this morning. > > > > > > > I just checked out the 1.2a CVS and am getting the same result I did > before, with matrix multiplies about a factor of 7 slower in numarray > than numeric. Try: >>> import numarray.dotblas as db >>> db.USING_BLAS 1 # 0 means the import fails Then try: >>> import numarray._dotblas and the traceback should identify the problem. > Now, I'm building with the Absoft compiler, and am > wondering if some glitch in the build process is causing ATLAS not to be > used. How did Simon Burton get the list of libraries loaded by python > after importing numarray? I should check that next. > > By the way, libg2c still needs to be linked against with the Absoft > compiler; add 'g2c' to 'lapack_libs' between 'atlas' and 'f90math'. > > Steve -- From faltet at carabos.com Thu Jan 6 13:17:05 2005 From: faltet at carabos.com (Francesc Altet) Date: Thu Jan 6 13:17:05 2005 Subject: [Numpy-discussion] Padding policy in CharArrays In-Reply-To: <200501032225.15238.faltet@carabos.com> References: <200501012244.23474.faltet@carabos.com> <1104770406.26038.118.camel@halloween.stsci.edu> <200501032225.15238.faltet@carabos.com> Message-ID: <200501061853.20673.faltet@carabos.com> A Dilluns 03 Gener 2005 22:25, Francesc Altet va escriure: > Mmm, perhaps having the possibility to select the pad value in CharArray > creation time would be nice. I've ended making an implementation of this in numarray. With the patches (against numarray 1.1.1) I'm attaching, the next works: >>> b=strings.array(['0'], itemsize = 4, padc="\x00") >>> b.raw() RawCharArray(['0\x00\x00\x00']) >>> b.raw() >= '0\x00\x00\x00\x01' array([0], type=Bool) While the actual behaviour in numarray 1.1.1 is: >>> b=strings.array(['0'], itemsize = 4) >>> b.raw() RawCharArray(['0 ']) >>> b.raw() >= '0\x00\x00\x00\x01' array([1], type=Bool) As you may have already noted, I've added a new parameter named padc to the CharArray/RawCharArray constructor being the default pad character value the space (" "), for backward compatibility. All the current tests for CharArray passes with patch applied. The new functionality is restricted to what I needed, but I guess it should be easily extended to be completely consistent in other cases. Feel free to add the patch to numarray if you feel it to be appropriate. Cheers, -- Francesc Altet ? >qo< ? http://www.carabos.com/ C?rabos Coop. V. ? V ?V ? Enjoy Data ? ? ? ? ? ? ? ? ? ? "" -------------- next part -------------- A non-text attachment was scrubbed... Name: _chararraymodule.c.patch Type: text/x-diff Size: 1085 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: strings.py.patch Type: text/x-diff Size: 7664 bytes Desc: not available URL: From jmiller at stsci.edu Thu Jan 6 14:11:48 2005 From: jmiller at stsci.edu (Todd Miller) Date: Thu Jan 6 14:11:48 2005 Subject: [Numpy-discussion] A request for new distributions In-Reply-To: References: Message-ID: <1105049407.10516.809.camel@halloween.stsci.edu> On Thu, 2005-01-06 at 09:54, Jack Jansen wrote: > I'm catching up with half a year of back postings on the numpy list, > and the messages about automatically using vecLib on the macintosh > sound very interesting. > > I maintain the official Package Manager databases for MacPython (which > allow users to install a handful of common packages with one > mouseclick), and Numeric and numarray have been in there since day one. > I'm revising all the packages again at the moment (I do that about once > a year, or on request), and I had to manually fiddle Numeric 23.6 to > actually build on the Mac. This is a bit of a bother, as I have to keep > a separate source distribution in stead of just referring to the > official one, but more important is the fact that neither Numeric nor > numarray support vecLib out of the box. > > So, a plea for help: I don't know what the usual frequency for Numeric > and numarray distributions is, but it would be very helpful for me (and > for anyone using Numeric/numarray on the Mac) if new distributions were > available that contained the fixes mentioned in the november > discussions... numarray-1.2 is relatively near at hand, sometime in the next 2-3 weeks I hope. For numarray-1.2 on the Mac, I think all you will need to do to get a vecLib build is: python setup.py install --use_lapack Regards, Todd From jmiller at stsci.edu Thu Jan 6 14:40:01 2005 From: jmiller at stsci.edu (Todd Miller) Date: Thu Jan 6 14:40:01 2005 Subject: [Numpy-discussion] Padding policy in CharArrays In-Reply-To: <200501061853.20673.faltet@carabos.com> References: <200501012244.23474.faltet@carabos.com> <1104770406.26038.118.camel@halloween.stsci.edu> <200501032225.15238.faltet@carabos.com> <200501061853.20673.faltet@carabos.com> Message-ID: <1105051133.10516.819.camel@halloween.stsci.edu> In some kind of cosmic irony, your bona-fide-patch was filed as Junk Mail by my filter. Anyway, thanks, it's committed in CVS. I added the extra code to handle the PadAll case you flagged as "to be corrected." Regards, Todd On Thu, 2005-01-06 at 12:53, Francesc Altet wrote: > A Dilluns 03 Gener 2005 22:25, Francesc Altet va escriure: > > Mmm, perhaps having the possibility to select the pad value in CharArray > > creation time would be nice. > > I've ended making an implementation of this in numarray. With the patches > (against numarray 1.1.1) I'm attaching, the next works: > > >>> b=strings.array(['0'], itemsize = 4, padc="\x00") > >>> b.raw() > RawCharArray(['0\x00\x00\x00']) > >>> b.raw() >= '0\x00\x00\x00\x01' > array([0], type=Bool) > > While the actual behaviour in numarray 1.1.1 is: > > >>> b=strings.array(['0'], itemsize = 4) > >>> b.raw() > RawCharArray(['0 ']) > >>> b.raw() >= '0\x00\x00\x00\x01' > array([1], type=Bool) > > As you may have already noted, I've added a new parameter named padc to the > CharArray/RawCharArray constructor being the default pad character value the > space (" "), for backward compatibility. All the current tests for CharArray > passes with patch applied. > > The new functionality is restricted to what I needed, but I guess it should > be easily extended to be completely consistent in other cases. Feel free to > add the patch to numarray if you feel it to be appropriate. > > Cheers, -- From stephen.walton at csun.edu Thu Jan 6 14:44:21 2005 From: stephen.walton at csun.edu (Stephen Walton) Date: Thu Jan 6 14:44:21 2005 Subject: [Numpy-discussion] Matlab/Numeric/numarray benchmarks In-Reply-To: <1105043050.10516.557.camel@halloween.stsci.edu> References: <41D9CA01.9040108@arrowtheory.com> <1104934210.31516.1563.camel@halloween.stsci.edu> <41DC84FA.1090703@arrowtheory.com> <1105028223.10516.210.camel@halloween.stsci.edu> <41DD77C0.10402@csun.edu> <1105043050.10516.557.camel@halloween.stsci.edu> Message-ID: <41DDBEF5.6060306@csun.edu> Todd Miller wrote: > Try: > > > >>>>import numarray.dotblas as db >>>>db.USING_BLAS >>>> You must have changed something in CVS today, or maybe things just propagated slowly. I didn't get the dotblas.py file until a couple of hours ago. Everything's hunky-dory now. Steve From simon at arrowtheory.com Thu Jan 6 15:57:19 2005 From: simon at arrowtheory.com (Simon Burton) Date: Thu Jan 6 15:57:19 2005 Subject: [Numpy-discussion] Matlab/Numeric/numarray benchmarks In-Reply-To: <1105028223.10516.210.camel@halloween.stsci.edu> References: <41D9CA01.9040108@arrowtheory.com> <1104934210.31516.1563.camel@halloween.stsci.edu> <41DC84FA.1090703@arrowtheory.com> <1105028223.10516.210.camel@halloween.stsci.edu> Message-ID: <41DDD040.9020505@arrowtheory.com> Todd Miller wrote: > >I ported dotblas yesterday and checked it into numarray CVS yesterday >afternoon. I fixed the last doctest artifact and re-arranged a little >this morning. > > > OK, it works great! One thing: to get it to compile i needed to change the includes in _dotblas.c to: #include "libnumarray.h" #include "arrayobject.h" from #include "numarray/libnumarray.h" #include "numarray/arrayobject.h" because we are useing "-IInclude/numarray" Simon. -- Simon Burton, B.Sc. Licensed PO Box 8066 ANU Canberra 2601 Australia Ph. 61 02 6249 6940 http://arrowtheory.com From prabhu_r at users.sf.net Thu Jan 6 20:32:03 2005 From: prabhu_r at users.sf.net (Prabhu Ramachandran) Date: Thu Jan 6 20:32:03 2005 Subject: [Numpy-discussion] Numarray feature request: supporting the buffer interface Message-ID: <16862.4265.62327.648134@monster.linux.in> Hi Numarray developers, Numeric arrays support the buffer interface by providing an array_as_buffer structure in the type object definition by doing this: (PyBufferProcs *)&array_as_buffer, /*tp_as_buffer*/ and adding this to the tp_flags: Py_TPFLAGS_HAVE_GETCHARBUFFER), /*tp_flags*/ This is very handy when one needs to pass void arrays into C/C++ code and is used to pass data from Numeric to C or C++ libraries very efficiently. In particular, this is very useful when passing Numeric array data to VTK. I noticed that numarray does not support this interface. My feature request is that numarray arrays also support this buffer interface (if possible). Thanks! cheers, prabhu p.s. I'm not on this list so please cc me in on any messages. Thanks! From Jack.Jansen at cwi.nl Fri Jan 7 01:34:32 2005 From: Jack.Jansen at cwi.nl (Jack Jansen) Date: Fri Jan 7 01:34:32 2005 Subject: [Numpy-discussion] A request for new distributions In-Reply-To: <1105049407.10516.809.camel@halloween.stsci.edu> References: <1105049407.10516.809.camel@halloween.stsci.edu> Message-ID: <128915B8-608F-11D9-AE74-000A958D1666@cwi.nl> On 6 Jan 2005, at 23:10, Todd Miller wrote: > numarray-1.2 is relatively near at hand, sometime in the next 2-3 weeks > I hope. For numarray-1.2 on the Mac, I think all you will need to do > to get a vecLib build is: > > python setup.py install --use_lapack Is there a reason to require the "--use_lapack"? I.e. are there any adverse consequences to using it? -- Jack Jansen, , http://www.cwi.nl/~jack If I can't dance I don't want to be part of your revolution -- Emma Goldman From rkern at ucsd.edu Fri Jan 7 01:54:11 2005 From: rkern at ucsd.edu (Robert Kern) Date: Fri Jan 7 01:54:11 2005 Subject: [Numpy-discussion] A request for new distributions In-Reply-To: <128915B8-608F-11D9-AE74-000A958D1666@cwi.nl> References: <1105049407.10516.809.camel@halloween.stsci.edu> <128915B8-608F-11D9-AE74-000A958D1666@cwi.nl> Message-ID: <41DE5C2D.7050400@ucsd.edu> Jack Jansen wrote: > > On 6 Jan 2005, at 23:10, Todd Miller wrote: > >> numarray-1.2 is relatively near at hand, sometime in the next 2-3 weeks >> I hope. For numarray-1.2 on the Mac, I think all you will need to do >> to get a vecLib build is: >> >> python setup.py install --use_lapack > > > Is there a reason to require the "--use_lapack"? I.e. are there any > adverse consequences to using it? On other platforms, one has to edit the setup scripts to add the information about where the libraries are. The default fallback is to use the unoptimized version packaged with numarray. The alternative would be to add autoconf-like capabilities to the setup script such that it could determine if the libraries were in the default places (and valid!), then fall back to the lite versions if not. On the Mac, --use_lapack should have no adverse consequences, if I'm reading you right. On other platforms, numarray might fail to build correctly if one hadn't supplied the necessary information. -- Robert Kern rkern at ucsd.edu "In the fields of hell where the grass grows high Are the graves of dreams allowed to die." -- Richard Harter From Jack.Jansen at cwi.nl Fri Jan 7 06:19:20 2005 From: Jack.Jansen at cwi.nl (Jack Jansen) Date: Fri Jan 7 06:19:20 2005 Subject: [Numpy-discussion] A request for new distributions In-Reply-To: <41DE5C2D.7050400@ucsd.edu> References: <1105049407.10516.809.camel@halloween.stsci.edu> <128915B8-608F-11D9-AE74-000A958D1666@cwi.nl> <41DE5C2D.7050400@ucsd.edu> Message-ID: <026ED217-60B7-11D9-AE74-000A958D1666@cwi.nl> On 7 Jan 2005, at 10:53, Robert Kern wrote: > Jack Jansen wrote: >> On 6 Jan 2005, at 23:10, Todd Miller wrote: >>> numarray-1.2 is relatively near at hand, sometime in the next 2-3 >>> weeks >>> I hope. For numarray-1.2 on the Mac, I think all you will need to >>> do >>> to get a vecLib build is: >>> >>> python setup.py install --use_lapack >> Is there a reason to require the "--use_lapack"? I.e. are there any >> adverse consequences to using it? > > On other platforms, one has to edit the setup scripts to add the > information about where the libraries are. The default fallback is to > use the unoptimized version packaged with numarray. > > The alternative would be to add autoconf-like capabilities to the > setup script such that it could determine if the libraries were in the > default places (and valid!), then fall back to the lite versions if > not. Ah, I see. So the problem is really that the library detection code hasn't been written. Hmm, having a look at the code, it seems that it should be fairly simple to fix (but I'm not completely sure I understand the interdependencies between setup.py, generate.py and addons.py, so I don't dare creating a patch). If the whole lapack section of addons was restructured like if os.environ.has_key('LINALG_LIB'): set things up for using that path elif os.path.exists('/usr/local/lib/atlas') use that elif os.path.exists('/System/Library/Frameworks/vecLib.framework') use that else use builtin blas_atlas I think it would have the same functionality as now but without need for the -use_lapack option. OTOH I may be oversimplifying things, I have no idea how these numerical libraries would normally be installed on Linux or other unixen, let alone on Windows. -- Jack Jansen, , http://www.cwi.nl/~jack If I can't dance I don't want to be part of your revolution -- Emma Goldman From chrisperkins99 at gmail.com Fri Jan 7 06:20:09 2005 From: chrisperkins99 at gmail.com (Chris Perkins) Date: Fri Jan 7 06:20:09 2005 Subject: [Numpy-discussion] Numarray feature request: supporting the buffer interface In-Reply-To: <16862.4265.62327.648134@monster.linux.in> References: <16862.4265.62327.648134@monster.linux.in> Message-ID: <184a9f5a0501070619c29386@mail.gmail.com> On Fri, 7 Jan 2005 10:01:37 +0530, Prabhu Ramachandran wrote: > > I noticed that numarray does not support this interface. My feature > request is that numarray arrays also support this buffer interface (if > possible). > I second this request. Note that numarray arrays have a member called "_data" that does support the buffer interface. I have been using code like this for a while (pseudocode): def asBuffer(a): if a is a numarray array: return a._data elif a is a Numeric array: return a else: ... do something else But it would be nice if the numarray array supported the buffer interface directly. I have no idea how hard or easy this would be to do. Chris Perkins From jmiller at stsci.edu Fri Jan 7 06:25:02 2005 From: jmiller at stsci.edu (Todd Miller) Date: Fri Jan 7 06:25:02 2005 Subject: [Numpy-discussion] A request for new distributions In-Reply-To: <41DE5C2D.7050400@ucsd.edu> References: <1105049407.10516.809.camel@halloween.stsci.edu> <128915B8-608F-11D9-AE74-000A958D1666@cwi.nl> <41DE5C2D.7050400@ucsd.edu> Message-ID: <1105107831.14757.72.camel@halloween.stsci.edu> On Fri, 2005-01-07 at 04:53, Robert Kern wrote: > Jack Jansen wrote: > > > > On 6 Jan 2005, at 23:10, Todd Miller wrote: > > > >> numarray-1.2 is relatively near at hand, sometime in the next 2-3 weeks > >> I hope. For numarray-1.2 on the Mac, I think all you will need to do > >> to get a vecLib build is: > >> > >> python setup.py install --use_lapack > > > > > > Is there a reason to require the "--use_lapack"? I.e. are there any > > adverse consequences to using it? > > On other platforms, one has to edit the setup scripts to add the > information about where the libraries are. The default fallback is to > use the unoptimized version packaged with numarray. > > The alternative would be to add autoconf-like capabilities to the setup > script such that it could determine if the libraries were in the default > places (and valid!), then fall back to the lite versions if not. > > On the Mac, --use_lapack should have no adverse consequences, if I'm > reading you right. On other platforms, numarray might fail to build > correctly if one hadn't supplied the necessary information. Since I'm not a Mac user, I'll beat a dead horse. Are we all agreed that: 1. vecLib is universally available on OS-X. 2. Using vecLib rather than blaslite is preferred. If so, I'll make it the default. Regards, Todd From rkern at ucsd.edu Fri Jan 7 06:34:13 2005 From: rkern at ucsd.edu (Robert Kern) Date: Fri Jan 7 06:34:13 2005 Subject: [Numpy-discussion] A request for new distributions In-Reply-To: <1105107831.14757.72.camel@halloween.stsci.edu> References: <1105049407.10516.809.camel@halloween.stsci.edu> <128915B8-608F-11D9-AE74-000A958D1666@cwi.nl> <41DE5C2D.7050400@ucsd.edu> <1105107831.14757.72.camel@halloween.stsci.edu> Message-ID: <41DE9D97.2090707@ucsd.edu> Todd Miller wrote: > Since I'm not a Mac user, I'll beat a dead horse. Are we all agreed > that: > > 1. vecLib is universally available on OS-X. Yes. > 2. Using vecLib rather than blaslite is preferred. Yes. > If so, I'll make it the default. Woohoo! Thank you! -- Robert Kern rkern at ucsd.edu "In the fields of hell where the grass grows high Are the graves of dreams allowed to die." -- Richard Harter From jmiller at stsci.edu Fri Jan 7 06:45:04 2005 From: jmiller at stsci.edu (Todd Miller) Date: Fri Jan 7 06:45:04 2005 Subject: [Numpy-discussion] Numarray feature request: supporting the buffer interface In-Reply-To: <184a9f5a0501070619c29386@mail.gmail.com> References: <16862.4265.62327.648134@monster.linux.in> <184a9f5a0501070619c29386@mail.gmail.com> Message-ID: <1105109054.14757.129.camel@halloween.stsci.edu> On Fri, 2005-01-07 at 09:19, Chris Perkins wrote: > On Fri, 7 Jan 2005 10:01:37 +0530, Prabhu Ramachandran > wrote: > > > > I noticed that numarray does not support this interface. My feature > > request is that numarray arrays also support this buffer interface (if > > possible). > > > > I second this request. > > Note that numarray arrays have a member called "_data" that does > support the buffer interface. I have been using code like this for a > while (pseudocode): > > def asBuffer(a): > if a is a numarray array: > return a._data > elif a is a Numeric array: > return a > else: ... do something else > > But it would be nice if the numarray array supported the buffer > interface directly. I have no idea how hard or easy this would be to > do. Without looking at code, my guess is that the C source level compatibility of numarray with Numeric will enable a "direct graft" of the buffer protocol code from Numeric to numarray. I think it will be easy... but... numarray has a concept of "misbehaved arrays", i.e. arrays in the binary format of another platform and therefore byte-swapped, or arrays spread across records and therefore possibly noncontiguous or misaligned. I think these buffers are likely unusable so providing access to them is a mistake. Misbehaved arrays probably don't arise in the work of most users, but they are a possibility that has to be accounted for. For cases of misbehaved arrays, I think raising a ValueError exception is necessary. How does that sound? Regards, Todd From jmiller at stsci.edu Fri Jan 7 07:10:22 2005 From: jmiller at stsci.edu (Todd Miller) Date: Fri Jan 7 07:10:22 2005 Subject: [Numpy-discussion] Numarray feature request: supporting the buffer interface In-Reply-To: <41DEA2A5.2020100@ucsd.edu> References: <16862.4265.62327.648134@monster.linux.in> <184a9f5a0501070619c29386@mail.gmail.com> <1105109054.14757.129.camel@halloween.stsci.edu> <41DEA2A5.2020100@ucsd.edu> Message-ID: <1105110589.15167.5.camel@halloween.stsci.edu> On Fri, 2005-01-07 at 09:54, Robert Kern wrote: > Todd Miller wrote: > > > numarray has a concept of "misbehaved arrays", i.e. arrays in the > > binary format of another platform and therefore byte-swapped, or arrays > > spread across records and therefore possibly noncontiguous or > > misaligned. I think these buffers are likely unusable so providing > > access to them is a mistake. Misbehaved arrays probably don't arise in > > the work of most users, but they are a possibility that has to be > > accounted for. > > > > For cases of misbehaved arrays, I think raising a ValueError exception > > is necessary. How does that sound? > > For the byteswapped case, could I still get a buffer object around the > raw data by using _data? If so, I vote +1. Sure. Alternately, you could make a copy of the array which will automatically be well behaved and therefore usable in C. Regards, Todd From prabhu_r at users.sf.net Fri Jan 7 09:47:05 2005 From: prabhu_r at users.sf.net (Prabhu Ramachandran) Date: Fri Jan 7 09:47:05 2005 Subject: [Numpy-discussion] Numarray feature request: supporting the buffer interface In-Reply-To: <184a9f5a0501070619c29386@mail.gmail.com> References: <16862.4265.62327.648134@monster.linux.in> <184a9f5a0501070619c29386@mail.gmail.com> Message-ID: <16862.51916.126290.46177@monster.linux.in> >>>>> "CP" == Chris Perkins writes: CP> On Fri, 7 Jan 2005 10:01:37 +0530, Prabhu Ramachandran CP> wrote: >> >> I noticed that numarray does not support this interface. My >> feature request is that numarray arrays also support this >> buffer interface (if possible). CP> I second this request. CP> Note that numarray arrays have a member called "_data" that CP> does support the buffer interface. I have been using code Aha! Thanks for that hint. :) cheers, prabhu From prabhu_r at users.sf.net Fri Jan 7 09:55:10 2005 From: prabhu_r at users.sf.net (Prabhu Ramachandran) Date: Fri Jan 7 09:55:10 2005 Subject: [Numpy-discussion] Numarray feature request: supporting the buffer interface In-Reply-To: <1105109054.14757.129.camel@halloween.stsci.edu> References: <16862.4265.62327.648134@monster.linux.in> <184a9f5a0501070619c29386@mail.gmail.com> <1105109054.14757.129.camel@halloween.stsci.edu> Message-ID: <16862.52414.137855.751814@monster.linux.in> >>>>> "TM" == Todd Miller writes: TM> On Fri, 2005-01-07 at 09:19, Chris Perkins wrote: >> On Fri, 7 Jan 2005 10:01:37 +0530, Prabhu Ramachandran >> wrote: >> > >> > I noticed that numarray does not support this interface. My >> > feature request is that numarray arrays also support this >> > buffer interface (if possible). >> >> I second this request. [...] TM> Without looking at code, my guess is that the C source level TM> compatibility of numarray with Numeric will enable a "direct TM> graft" of the buffer protocol code from Numeric to numarray. TM> I think it will be easy... but... TM> numarray has a concept of "misbehaved arrays", i.e. arrays in TM> the binary format of another platform and therefore TM> byte-swapped, or arrays spread across records and therefore TM> possibly noncontiguous or misaligned. I think these buffers TM> are likely unusable so providing access to them is a mistake. TM> Misbehaved arrays probably don't arise in the work of most TM> users, but they are a possibility that has to be accounted TM> for. TM> For cases of misbehaved arrays, I think raising a ValueError TM> exception is necessary. How does that sound? I think that sounds reasonable. In my particular use case I always flatten (ravel) the array before using it as a buffer. I guess that in cases where the array is non-contiguous or misaligned a copy of the data is made on ravel so these would not be a problem for me. For misbehaved arrays, a ValueError with a decent error message would be perfect! Anyway, thanks for considering the feature request! cheers, prabhu From Fernando.Perez at colorado.edu Fri Jan 7 10:34:09 2005 From: Fernando.Perez at colorado.edu (Fernando Perez) Date: Fri Jan 7 10:34:09 2005 Subject: [Numpy-discussion] A request for new distributions In-Reply-To: <026ED217-60B7-11D9-AE74-000A958D1666@cwi.nl> References: <1105049407.10516.809.camel@halloween.stsci.edu> <128915B8-608F-11D9-AE74-000A958D1666@cwi.nl> <41DE5C2D.7050400@ucsd.edu> <026ED217-60B7-11D9-AE74-000A958D1666@cwi.nl> Message-ID: <41DED610.8080409@colorado.edu> Jack Jansen wrote: > If the whole lapack section of addons was restructured like > if os.environ.has_key('LINALG_LIB'): > set things up for using that path > elif os.path.exists('/usr/local/lib/atlas') > use that > elif os.path.exists('/System/Library/Frameworks/vecLib.framework') > use that > else > use builtin blas_atlas If I may ask, it would be great if /usr/lib/(atlas/ATLAS) were added to these default search paths, like the scipy setup.py file does. In a mixed-architecture environment, where /usr/local is often NFS shared, one must put things like ATLAS in machine-specific locations. One simple solution is to put it directly in /usr/lib/atlas, instead of /usr/local/lib/atlas, since /usr/lib is rarely NFS-shared. This gives a way to share over NFS the bulk of things which are built from source, while leaving architecture-specific things in a location where they don't cause conflicts. Numpy and scipy already have this search path, so hopefully numarray can adopt the same convention as well. It's nice to be able to just unpack those and, without needing to set absolutely anything, simply say './setup.py bdist_rpm' and be done :) Cheers, f From rbastian at club-internet.fr Fri Jan 7 12:21:02 2005 From: rbastian at club-internet.fr (=?iso-8859-15?q?Ren=E9=20Bastian?=) Date: Fri Jan 7 12:21:02 2005 Subject: [Numpy-discussion] ImportError Message-ID: <05010712542904.00761@rbastian> Hi, I use python2.4+numarray1.1.1. import numarray.convolve as conv produces an ImportError : No module named convolve Can you help me ? -- Ren? Bastian http://www.musiques-rb.org : Musique en Python From jmiller at stsci.edu Fri Jan 7 12:24:02 2005 From: jmiller at stsci.edu (Todd Miller) Date: Fri Jan 7 12:24:02 2005 Subject: [Numpy-discussion] A request for new distributions In-Reply-To: <41DED610.8080409@colorado.edu> References: <1105049407.10516.809.camel@halloween.stsci.edu> <128915B8-608F-11D9-AE74-000A958D1666@cwi.nl> <41DE5C2D.7050400@ucsd.edu> <026ED217-60B7-11D9-AE74-000A958D1666@cwi.nl> <41DED610.8080409@colorado.edu> Message-ID: <1105129384.15167.31.camel@halloween.stsci.edu> On Fri, 2005-01-07 at 13:33, Fernando Perez wrote: > Jack Jansen wrote: > > > If the whole lapack section of addons was restructured like > > if os.environ.has_key('LINALG_LIB'): > > set things up for using that path > > elif os.path.exists('/usr/local/lib/atlas') > > use that > > elif os.path.exists('/System/Library/Frameworks/vecLib.framework') > > use that > > else > > use builtin blas_atlas > > If I may ask, it would be great if /usr/lib/(atlas/ATLAS) were added to these > default search paths, like the scipy setup.py file does. In a > mixed-architecture environment, where /usr/local is often NFS shared, one must > put things like ATLAS in machine-specific locations. One simple solution is > to put it directly in /usr/lib/atlas, instead of /usr/local/lib/atlas, since > /usr/lib is rarely NFS-shared. This gives a way to share over NFS the bulk of > things which are built from source, while leaving architecture-specific things > in a location where they don't cause conflicts. > > Numpy and scipy already have this search path, so hopefully numarray can adopt > the same convention as well. It's nice to be able to just unpack those and, > without needing to set absolutely anything, simply say './setup.py bdist_rpm' > and be done :) > > Cheers, > > f These sound like reasonable ideas but I want to mull it over some and I'm pretty much out of time this week... I'm supposed to be working on the scipy to numarray port. Both ideas look like they may be easy but I'm out of oomph and... they may not. Regards, Todd From jmiller at stsci.edu Fri Jan 7 12:27:04 2005 From: jmiller at stsci.edu (Todd Miller) Date: Fri Jan 7 12:27:04 2005 Subject: [Numpy-discussion] ImportError In-Reply-To: <05010712542904.00761@rbastian> References: <05010712542904.00761@rbastian> Message-ID: <1105129589.15167.35.camel@halloween.stsci.edu> On Fri, 2005-01-07 at 06:54, Ren? Bastian wrote: > Hi, > > I use python2.4+numarray1.1.1. > > import numarray.convolve as conv > > produces an ImportError : No module named convolve > > Can you help me ? The above works fine for me. I'd suggest deleting your current numarray install and "build" tree and re-installing. What's your OS and processor? From Fernando.Perez at colorado.edu Fri Jan 7 12:39:01 2005 From: Fernando.Perez at colorado.edu (Fernando Perez) Date: Fri Jan 7 12:39:01 2005 Subject: [Numpy-discussion] A request for new distributions In-Reply-To: <1105129384.15167.31.camel@halloween.stsci.edu> References: <1105049407.10516.809.camel@halloween.stsci.edu> <128915B8-608F-11D9-AE74-000A958D1666@cwi.nl> <41DE5C2D.7050400@ucsd.edu> <026ED217-60B7-11D9-AE74-000A958D1666@cwi.nl> <41DED610.8080409@colorado.edu> <1105129384.15167.31.camel@halloween.stsci.edu> Message-ID: <41DEF32D.8040301@colorado.edu> Todd Miller wrote: [path ideas] > These sound like reasonable ideas but I want to mull it over some and > I'm pretty much out of time this week... I'm supposed to be working on > the scipy to numarray port. Both ideas look like they may be easy but > I'm out of oomph and... they may not. No worries. Right now I'm enjoying setting up a yum-based system for handling Atlas/Numeric/scipy on a group of machines with architecture-specific ATLASes (P3, P4, P4+hyperthreading,...). So I sort of have these build issues very much in front of me, but there's no hurry in incorporating them into numarray. The scipy work is definitely a priority. But thanks for considering the input. Cheers, f From rbastian at club-internet.fr Fri Jan 7 12:53:06 2005 From: rbastian at club-internet.fr (=?iso-8859-15?q?Ren=E9=20Bastian?=) Date: Fri Jan 7 12:53:06 2005 Subject: [Numpy-discussion] Re: ImportError In-Reply-To: <05010712542904.00761@rbastian> References: <05010712542904.00761@rbastian> Message-ID: <05010713271405.00761@rbastian> Le Vendredi 7 Janvier 2005 12:54, Ren? Bastian a ?crit : > Hi, > > I use python2.4+numarray1.1.1. > > import numarray.convolve as conv > > produces an ImportError : No module named convolve > > Can you help me ? This is the error message : Traceback (most recent call last): File "afolia01.py", line 11, in ? from filtres import * File "/home/rbastian/pythoneon/Filtres/filtres.py", line 5, in ? #-------------------------- ImportError: No module named convolve The line 5 is an commented line ! -- Ren? Bastian http://www.musiques-rb.org : Musique en Python From jmiller at stsci.edu Fri Jan 7 13:06:18 2005 From: jmiller at stsci.edu (Todd Miller) Date: Fri Jan 7 13:06:18 2005 Subject: [Numpy-discussion] Re: ImportError In-Reply-To: <05010713271405.00761@rbastian> References: <05010712542904.00761@rbastian> <05010713271405.00761@rbastian> Message-ID: <1105131949.15167.40.camel@halloween.stsci.edu> On Fri, 2005-01-07 at 07:27, Ren? Bastian wrote: > Le Vendredi 7 Janvier 2005 12:54, Ren? Bastian a ?crit : > > Hi, > > > > I use python2.4+numarray1.1.1. > > > > import numarray.convolve as conv > > > > produces an ImportError : No module named convolve > > > > Can you help me ? > > This is the error message : > > Traceback (most recent call last): > File "afolia01.py", line 11, in ? > from filtres import * > File "/home/rbastian/pythoneon/Filtres/filtres.py", line 5, in ? > #-------------------------- > ImportError: No module named convolve > > The line 5 is an commented line ! Show us maybe the first 10 lines of filtres.py. From rbastian at club-internet.fr Fri Jan 7 13:25:06 2005 From: rbastian at club-internet.fr (=?utf-8?q?Ren=C3=A9=20Bastian?=) Date: Fri Jan 7 13:25:06 2005 Subject: [Numpy-discussion] Re: ImportError In-Reply-To: <1105131949.15167.40.camel@halloween.stsci.edu> References: <05010712542904.00761@rbastian> <05010713271405.00761@rbastian> <1105131949.15167.40.camel@halloween.stsci.edu> Message-ID: <05010713591108.00761@rbastian> Le Vendredi 7 Janvier 2005 22:05, Todd Miller a ?crit : > On Fri, 2005-01-07 at 07:27, Ren? Bastian wrote: > > Le Vendredi 7 Janvier 2005 12:54, Ren? Bastian a ?crit : > > > Hi, > > > > > > I use python2.4+numarray1.1.1. > > > > > > import numarray.convolve as conv > > > > > > produces an ImportError : No module named convolve > > > > > > Can you help me ? > > > > This is the error message : > > > > Traceback (most recent call last): > > File "afolia01.py", line 11, in ? > > from filtres import * > > File "/home/rbastian/pythoneon/Filtres/filtres.py", line 5, in ? > > #-------------------------- > > ImportError: No module named convolve > > > > The line 5 is an commented line ! > > Show us maybe the first 10 lines of filtres.py. This are the first 14 lines of filtres.py (the tags are lHTML-like)
#-*-python-*-
#-*-coding:latin-1-*-
import numarray as NA
import numarray.convolve as conv
#--------------------------

from utiles import sr, midi2freq, aff
from math import sqrt
import Canal

"""
une collection de filtres
"""
#----------------------------------------------------------
filtres.py alone works ... bu if imported, it dont here the first 23 lines of an application (filtres is imported) where it dont work
#-*-python-*-
#-*-coding:Latin 1-*-
"""
suite d'evenements
 1 son seul ; tr\`es fort
 2 sons superpos\'e ; de moins en moins fort
 ....
 n sons superposes
s?par?s par des ?carts aleatoires

v0.colognestereo.raw est la version sans valeur 'amplitude' ni 'enveloppe'
"""
import random
import numarray.random_array as RA
#import Convolve
import numarray.convolve as Convolve
from numarray import *
from profils import *
from filtres import *
import utiles as U
from ondesNumarray import *
from oscillateurs import *
-- Ren? Bastian http://www.musiques-rb.org : Musique en Python From pearu at cens.ioc.ee Fri Jan 7 13:26:02 2005 From: pearu at cens.ioc.ee (Pearu Peterson) Date: Fri Jan 7 13:26:02 2005 Subject: [Numpy-discussion] A request for new distributions In-Reply-To: <026ED217-60B7-11D9-AE74-000A958D1666@cwi.nl> Message-ID: On Fri, 7 Jan 2005, Jack Jansen wrote: > > The alternative would be to add autoconf-like capabilities to the > > setup script such that it could determine if the libraries were in the > > default places (and valid!), then fall back to the lite versions if > > not. > > Ah, I see. So the problem is really that the library detection code > hasn't been written. FYI, scipy_distutils has rather general lapack/blas/atlas detection code facilities. It first looks for atlas/lapack libraries, then for blas/lapack libraries, and finally for blas/lapack Fortran sources that scipy_distutils would compile behind the scenes. See scipy_distutils/system_info.py and scipy/Lib/lib/lapack/setup_lapack.py for more information. Pearu From bsder at mail.allcaps.org Fri Jan 7 15:11:03 2005 From: bsder at mail.allcaps.org (Andrew P. Lentvorski, Jr.) Date: Fri Jan 7 15:11:03 2005 Subject: [Numpy-discussion] A request for new distributions In-Reply-To: <41DED610.8080409@colorado.edu> References: <1105049407.10516.809.camel@halloween.stsci.edu> <128915B8-608F-11D9-AE74-000A958D1666@cwi.nl> <41DE5C2D.7050400@ucsd.edu> <026ED217-60B7-11D9-AE74-000A958D1666@cwi.nl> <41DED610.8080409@colorado.edu> Message-ID: <435F784E-6101-11D9-A3AA-000A95C874EE@mail.allcaps.org> On Jan 7, 2005, at 10:33 AM, Fernando Perez wrote: > Jack Jansen wrote: > >> If the whole lapack section of addons was restructured like >> if os.environ.has_key('LINALG_LIB'): >> set things up for using that path >> elif os.path.exists('/usr/local/lib/atlas') >> use that >> elif os.path.exists('/System/Library/Frameworks/vecLib.framework') >> use that >> else >> use builtin blas_atlas > > If I may ask, it would be great if /usr/lib/(atlas/ATLAS) were added > to these default search paths, like the scipy setup.py file does. I would much rather that previous snippet of code look something like: if sys.scipypath: # Or some other flag/global/something use whatever is indicated else: if os.environ.has_key('LINALG_LIB'): set things up for using that path elif os.path.exists('/usr/local/lib/atlas') use that elif os.path.exists('/System/Library/Frameworks/vecLib.framework') use that else use builtin blas_atlas In addition, some of us do not trust anything in /usr for production work. This is to help make our system administrators lives easier. If I only use things from, say /tools, the sysadmins can completely erase and reload workstations for the purposes of bug fixes, security updates, etc. without disturbing my work. This prevents, "GAAAHHH! You upgraded the machine and now everything is using Foo_1.1.1 instead of Foo_1.1.0 and now everything is broken." Most Linux distributions are particularly bad about this. This affliction is also known as "Perl Hell". ;) -a From stephen.walton at csun.edu Fri Jan 7 16:15:01 2005 From: stephen.walton at csun.edu (Stephen Walton) Date: Fri Jan 7 16:15:01 2005 Subject: [Numpy-discussion] A request for new distributions In-Reply-To: <41DEF32D.8040301@colorado.edu> References: <1105049407.10516.809.camel@halloween.stsci.edu> <128915B8-608F-11D9-AE74-000A958D1666@cwi.nl> <41DE5C2D.7050400@ucsd.edu> <026ED217-60B7-11D9-AE74-000A958D1666@cwi.nl> <41DED610.8080409@colorado.edu> <1105129384.15167.31.camel@halloween.stsci.edu> <41DEF32D.8040301@colorado.edu> Message-ID: <41DF25B2.3080901@csun.edu> Fernando Perez wrote: > No worries. Right now I'm enjoying setting up a yum-based system for > handling Atlas/Numeric/scipy on a group of machines with > architecture-specific ATLASes (P3, P4, P4+hyperthreading,...). Ooh, ooh, I want a copy when you're done! Is 'enjoying' the right verb there? From Fernando.Perez at colorado.edu Fri Jan 7 16:17:04 2005 From: Fernando.Perez at colorado.edu (Fernando Perez) Date: Fri Jan 7 16:17:04 2005 Subject: [Numpy-discussion] A request for new distributions In-Reply-To: <41DF25B2.3080901@csun.edu> References: <1105049407.10516.809.camel@halloween.stsci.edu> <128915B8-608F-11D9-AE74-000A958D1666@cwi.nl> <41DE5C2D.7050400@ucsd.edu> <026ED217-60B7-11D9-AE74-000A958D1666@cwi.nl> <41DED610.8080409@colorado.edu> <1105129384.15167.31.camel@halloween.stsci.edu> <41DEF32D.8040301@colorado.edu> <41DF25B2.3080901@csun.edu> Message-ID: <41DF2647.3000509@colorado.edu> Stephen Walton wrote: > Fernando Perez wrote: > > >>No worries. Right now I'm enjoying setting up a yum-based system for >>handling Atlas/Numeric/scipy on a group of machines with >>architecture-specific ATLASes (P3, P4, P4+hyperthreading,...). > > > Ooh, ooh, I want a copy when you're done! I actually had you in mind this morning, b/c I remember you've asked about this before. My solution is messy, but I think it's going to work OK. I'll probably post a little writeup about it later. It may be useful to people. > Is 'enjoying' the right verb there? As they say in mountaineering, "it doesn't have to be fun to be fun" :) Cheers, f From rbastian at club-internet.fr Sat Jan 8 00:57:01 2005 From: rbastian at club-internet.fr (=?iso-8859-15?q?Ren=E9=20Bastian?=) Date: Sat Jan 8 00:57:01 2005 Subject: [Numpy-discussion] Re: ImportError In-Reply-To: <05010713271405.00761@rbastian> References: <05010712542904.00761@rbastian> <05010713271405.00761@rbastian> Message-ID: <05010801300900.00763@rbastian> There was an error in the pythoneon.pth file in the site-packages repertory. (circular references ?). I rewrote this file and now all waorks. Thanks and apologies ... rbastian Le Vendredi 7 Janvier 2005 13:27, Ren? Bastian a ?crit : > Le Vendredi 7 Janvier 2005 12:54, Ren? Bastian a ?crit : > > Hi, > > > > I use python2.4+numarray1.1.1. > > > > import numarray.convolve as conv > > > > produces an ImportError : No module named convolve > > > > Can you help me ? > > This is the error message : > > Traceback (most recent call last): > File "afolia01.py", line 11, in ? > from filtres import * > File "/home/rbastian/pythoneon/Filtres/filtres.py", line 5, in ? > #-------------------------- > ImportError: No module named convolve > > The line 5 is an commented line ! From Chris.Barker at noaa.gov Mon Jan 10 11:07:11 2005 From: Chris.Barker at noaa.gov (Chris Barker) Date: Mon Jan 10 11:07:11 2005 Subject: [Numpy-discussion] A request for new distributions In-Reply-To: References: Message-ID: <41E2D110.3080905@noaa.gov> Pearu Peterson wrote: > FYI, scipy_distutils has rather general lapack/blas/atlas detection code > facilities. It first looks for atlas/lapack libraries, then for > blas/lapack libraries This makes me happy, as Gentoo Linux puts atlas in: /usr/lib/blas/atlas /usr/lib/lapack/atlas However, I'm all for any system that "just works" on the most common systems, and allows me to specify my weirdo system if need be. Being an OS-X user, I'd be very happy if it "just works" there. I expect to tweak things when I'm running Gentoo. -Chris -- Christopher Barker, Ph.D. Oceanographer NOAA/OR&R/HAZMAT (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From jochen at fhi-berlin.mpg.de Wed Jan 12 09:19:12 2005 From: jochen at fhi-berlin.mpg.de (=?iso-8859-1?Q?Jochen_K=FCpper?=) Date: Wed Jan 12 09:19:12 2005 Subject: [Numpy-discussion] numarray dotblas Message-ID: <9ellay5zxo.fsf@gowron.rz-berlin.mpg.de> I needed the following patch to build _dotblas on a fresh system: ,---- | Index: Src/_dotblas.c | =================================================================== | RCS file: /cvsroot/numpy/numarray/Src/_dotblas.c,v | retrieving revision 1.2 | diff -u -u -r1.2 _dotblas.c | --- Src/_dotblas.c 5 Jan 2005 19:57:08 -0000 1.2 | +++ Src/_dotblas.c 12 Jan 2005 17:16:12 -0000 | @@ -10,8 +10,8 @@ | | | #include "Python.h" | -#include "numarray/libnumarray.h" | -#include "numarray/arrayobject.h" | +#include "libnumarray.h" | +#include "arrayobject.h" | #include | | #include `---- Alternatively -IInclude must be added to the compile flags (setup.py: headers()). Greetings, Jochen -- Einigkeit und Recht und Freiheit http://www.Jochen-Kuepper.de Libert?, ?galit?, Fraternit? GnuPG key: CC1B0B4D (Part 3 you find in my messages before fall 2003.) From jmiller at stsci.edu Wed Jan 12 10:43:19 2005 From: jmiller at stsci.edu (Todd Miller) Date: Wed Jan 12 10:43:19 2005 Subject: [Numpy-discussion] numarray dotblas In-Reply-To: <9ellay5zxo.fsf@gowron.rz-berlin.mpg.de> References: <9ellay5zxo.fsf@gowron.rz-berlin.mpg.de> Message-ID: <1105555358.24423.15.camel@halloween.stsci.edu> Thanks Jochen. It's committed now. Cheers, Todd On Wed, 2005-01-12 at 12:18, Jochen K?pper wrote: > I needed the following patch to build _dotblas on a fresh system: > ,---- > | Index: Src/_dotblas.c > | =================================================================== > | RCS file: /cvsroot/numpy/numarray/Src/_dotblas.c,v > | retrieving revision 1.2 > | diff -u -u -r1.2 _dotblas.c > | --- Src/_dotblas.c 5 Jan 2005 19:57:08 -0000 1.2 > | +++ Src/_dotblas.c 12 Jan 2005 17:16:12 -0000 > | @@ -10,8 +10,8 @@ > | > | > | #include "Python.h" > | -#include "numarray/libnumarray.h" > | -#include "numarray/arrayobject.h" > | +#include "libnumarray.h" > | +#include "arrayobject.h" > | #include > | > | #include > `---- > Alternatively -IInclude must be added to the compile flags (setup.py: headers()). > > Greetings, > Jochen -- From klimek at grc.nasa.gov Wed Jan 12 13:10:25 2005 From: klimek at grc.nasa.gov (Bob Klimek) Date: Wed Jan 12 13:10:25 2005 Subject: [Numpy-discussion] position of objects? Message-ID: <41E592EB.6090209@grc.nasa.gov> Hi, Is there a way to obtain the positions (coordinates) of objects that were found with find_objects() function (in nd_image)? Specifically what I'm looking for is the coordinates of the bounding box (for a 2d array it would be upper-left and lower-right). Regards, Bob From jmiller at stsci.edu Wed Jan 12 16:36:17 2005 From: jmiller at stsci.edu (Todd Miller) Date: Wed Jan 12 16:36:17 2005 Subject: [Numpy-discussion] Simplifying array() Message-ID: <1105576535.24423.213.camel@halloween.stsci.edu> Someone (way to go Rory!) recently posted a patch (woohoo!) for numarray which I think bears a little discussion since it involves the re-write of a fundamental numarray function: array(). The patch fixes a number of bugs and deconvolutes the logic of array(). The patch is here if you want to look at it yourself: http://sourceforge.net/tracker/?atid=450449&group_id=1369&func=browse One item I thought needed some discussion was the removal of two features: > * array() does too much. E.g., handling file/memory instances for > 'sequence'. There's fromfile for the former, and users needing > the latter functionality should be clued up enough to > instantiate NumArray directly. I agree with this myself. Does anyone care if they will no longer be able to construct an array from a file or buffer object using array() rather than fromfile() or NumArray(), respectively? Is a deprecation process necessary to remove them? I think strings.py and records.py also have "over-stuffed" array() functions... so consistency bids us to streamline those as well. Regards, Todd From rkern at ucsd.edu Wed Jan 12 17:03:31 2005 From: rkern at ucsd.edu (Robert Kern) Date: Wed Jan 12 17:03:31 2005 Subject: [Numpy-discussion] Matrix-SIG archives Message-ID: <41E5C8A8.5020303@ucsd.edu> It looks like the mailing list archives for the Matrix-SIG and other retired SIGs are down at the moment. I've alerted the python.org webmaster, but in the meantime, does anyone have the early archives sitting around somewhere? I'm trying to answer a question about the motivations of a particular design decision in Numeric (why dot(A,B) doesn't do conjugation on A when A is complex). Thanks in advance. -- Robert Kern rkern at ucsd.edu "In the fields of hell where the grass grows high Are the graves of dreams allowed to die." -- Richard Harter From florian.proff.schulze at gmx.net Thu Jan 13 01:32:00 2005 From: florian.proff.schulze at gmx.net (Florian Schulze) Date: Thu Jan 13 01:32:00 2005 Subject: [Numpy-discussion] Re: Simplifying array() References: <1105576535.24423.213.camel@halloween.stsci.edu> Message-ID: On 12 Jan 2005 19:35:36 -0500, Todd Miller wrote: > One item I thought needed some discussion was the removal of two > features: > >> * array() does too much. E.g., handling file/memory instances for >> 'sequence'. There's fromfile for the former, and users needing >> the latter functionality should be clued up enough to >> instantiate NumArray directly. > > I agree with this myself. Does anyone care if they will no longer be > able to construct an array from a file or buffer object using array() > rather than fromfile() or NumArray(), respectively? Is a deprecation > process necessary to remove them? IMHO, array should just delegate to other functions based on the arguments, then it can remain backward compatible. I use the from buffer functionality quite often and it would be nice if there would at least be a new function frombuffer or frommemory. Regards, Florian Schulze From faltet at carabos.com Thu Jan 13 01:35:14 2005 From: faltet at carabos.com (Francesc Altet) Date: Thu Jan 13 01:35:14 2005 Subject: [Numpy-discussion] Simplifying array() In-Reply-To: <1105576535.24423.213.camel@halloween.stsci.edu> References: <1105576535.24423.213.camel@halloween.stsci.edu> Message-ID: <200501131034.10856.faltet@carabos.com> A Dijous 13 Gener 2005 01:35, Todd Miller va escriure: > > * array() does too much. E.g., handling file/memory instances for > > 'sequence'. There's fromfile for the former, and users needing > > the latter functionality should be clued up enough to > > instantiate NumArray directly. > > I agree with this myself. Does anyone care if they will no longer be > able to construct an array from a file or buffer object using array() > rather than fromfile() or NumArray(), respectively? Is a deprecation > process necessary to remove them? For me is fine. I always call the array() factory function in order to get a buffer object, so no problem. > I think strings.py and records.py also have "over-stuffed" array() > functions... so consistency bids us to streamline those as well. I agree. Cheers, -- >OO< ? Francesc Altet || http://www.carabos.com/ V ?V ? Carabos Coop. V. || Who is your data daddy? PyTables "" From konrad.hinsen at laposte.net Thu Jan 13 05:33:16 2005 From: konrad.hinsen at laposte.net (konrad.hinsen at laposte.net) Date: Thu Jan 13 05:33:16 2005 Subject: [Numpy-discussion] Matrix-SIG archives In-Reply-To: <41E5C8A8.5020303@ucsd.edu> References: <41E5C8A8.5020303@ucsd.edu> Message-ID: On Jan 13, 2005, at 2:02, Robert Kern wrote: > does anyone have the early archives sitting around somewhere? I'm > trying to answer a question about the motivations of a particular > design decision in Numeric (why dot(A,B) doesn't do conjugation on A > when A is complex). I don't have the archives either, but I can answer that one from memory. The fundamental decision was to separate the concepts of "array" (structured collection of data items of identical type) and "vector", "matrix" or "tensor" (mathematical objects with specific properties that are numerically represented by arrays). Arrays are just that, their operations are defined in terms of operations on their element. Numeric.dot() does multiplication followed by summing on the last dimension of the first argument and the first dimension of the second, no matter what type the elements have. Konrad. -- --------------------------------------------------------------------- Konrad Hinsen Laboratoire L?on Brillouin, CEA Saclay, 91191 Gif-sur-Yvette Cedex, France Tel.: +33-1 69 08 79 25 Fax: +33-1 69 08 82 61 E-Mail: hinsen at llb.saclay.cea.fr --------------------------------------------------------------------- From konrad.hinsen at laposte.net Thu Jan 13 05:41:14 2005 From: konrad.hinsen at laposte.net (konrad.hinsen at laposte.net) Date: Thu Jan 13 05:41:14 2005 Subject: [Numpy-discussion] ScientificPython with numarray support Message-ID: A development release of Scientific Python that supports numarray as an alternative to NumPy (choice made at installation time) is now available at http://starship.python.net/~hinsen/ScientificPython/ or http://dirac.cnrs-orleans.fr/ScientificPython/ (Search for "2.51".) Note that some modules do not work under numarray because they rely on a NumPy feature that is currently implemented in numarray. They are listed in the README file. Konrad. -- --------------------------------------------------------------------- Konrad Hinsen Laboratoire L?on Brillouin, CEA Saclay, 91191 Gif-sur-Yvette Cedex, France Tel.: +33-1 69 08 79 25 Fax: +33-1 69 08 82 61 E-Mail: hinsen at llb.saclay.cea.fr --------------------------------------------------------------------- From cjw at sympatico.ca Thu Jan 13 07:27:16 2005 From: cjw at sympatico.ca (Colin J. Williams) Date: Thu Jan 13 07:27:16 2005 Subject: [Numpy-discussion] Simplifying array() In-Reply-To: <1105576535.24423.213.camel@halloween.stsci.edu> References: <1105576535.24423.213.camel@halloween.stsci.edu> Message-ID: <41E69311.3030308@sympatico.ca> Todd Miller wrote: >Someone (way to go Rory!) recently posted a patch (woohoo!) for >numarray which I think bears a little discussion since it involves >the re-write of a fundamental numarray function: array(). >The patch fixes a number of bugs and deconvolutes the logic of array(). > >The patch is here if you want to look at it yourself: > >http://sourceforge.net/tracker/?atid=450449&group_id=1369&func=browse > >One item I thought needed some discussion was the removal of two >features: > > > >> * array() does too much. E.g., handling file/memory instances for >> 'sequence'. There's fromfile for the former, and users needing >> the latter functionality should be clued up enough to >> instantiate NumArray directly. >> >> > >I agree with this myself. Does anyone care if they will no longer be >able to construct an array from a file or buffer object using array() >rather than fromfile() or NumArray(), respectively? Is a deprecation >process necessary to remove them? > > I would suggest deprecation on the way to removal. For the newcomer, who is not yet "clued up" some advice on the instantiation of NumArray would help. Currently, neither the word "class" or "NumArray" appear in the doc index. Rory leaves in type and typecode. It would be good to eliminate this apparent overlap. Why not deprecate and then drop type? As a compromise, either could be accepted as a NumArray.__init__ argument, since it is easy to distinguish between them. It would be good to clarify the acceptable content of a sequence. A list, perhaps with sublists, of numbers is clear enough but what about a sequence of NumArray instances or even a sequence of numbers, mixed with NumArray instances? Is the function asarray redundant? I suggest that the copy parameter be of the BoolType. This probably has no practical impact but it is consistent with current Python usage and makes it clear that this is a Yes/No parameter, rather than specifying a number of copies. >I think strings.py and records.py also have "over-stuffed" array() >functions... so consistency bids us to streamline those as well. > >Regards, >Todd > > > Thanks to Rory for initiating this. Colin W. From jmiller at stsci.edu Thu Jan 13 08:17:20 2005 From: jmiller at stsci.edu (Todd Miller) Date: Thu Jan 13 08:17:20 2005 Subject: [Numpy-discussion] Simplifying array() In-Reply-To: <41E69311.3030308@sympatico.ca> References: <1105576535.24423.213.camel@halloween.stsci.edu> <41E69311.3030308@sympatico.ca> Message-ID: <1105632994.3169.32.camel@jaytmiller.comcast.net> On Thu, 2005-01-13 at 10:26 -0500, Colin J. Williams wrote: > Todd Miller wrote: > > >Someone (way to go Rory!) recently posted a patch (woohoo!) for > >numarray which I think bears a little discussion since it involves > >the re-write of a fundamental numarray function: array(). > >The patch fixes a number of bugs and deconvolutes the logic of array(). > > > >The patch is here if you want to look at it yourself: > > > >http://sourceforge.net/tracker/?atid=450449&group_id=1369&func=browse > > > >One item I thought needed some discussion was the removal of two > >features: > > > > > > > >> * array() does too much. E.g., handling file/memory instances for > >> 'sequence'. There's fromfile for the former, and users needing > >> the latter functionality should be clued up enough to > >> instantiate NumArray directly. > >> > >> > > > >I agree with this myself. Does anyone care if they will no longer be > >able to construct an array from a file or buffer object using array() > >rather than fromfile() or NumArray(), respectively? Is a deprecation > >process necessary to remove them? > > > > > I would suggest deprecation on the way to removal. For the newcomer, > who is not yet "clued up" > some advice on the instantiation of NumArray would help. That's fair. The docstring for NumArray needs beefing up along the same lines as Rory's work on array(). I initially liked Florian's idea of frombuffer() but since I can't think of how it's not identical to NumArray(), I'm not sure there's any point. > Rory leaves in type and typecode. It would be good to eliminate this > apparent overlap. Why not > deprecate and then drop type? Some people like type. I don't want to touch this. > It would be good to clarify the acceptable content of a sequence. A > list, perhaps with sublists, of > numbers is clear enough but what about a sequence of NumArray instances > or even a sequence > of numbers, mixed with NumArray instances? The patch has a new docstring which spells out the array() construction algorithm. Lists of arrays would be seen as "numerical sequences". > Is the function asarray redundant? Yes, but it's clear and also needed for backward compatibility with Numeric. Besides, it's not just redundant, it's an idiom... > I suggest that the copy parameter be of the BoolType. This probably has > no practical impact but > it is consistent with current Python usage and makes it clear that this > is a Yes/No parameter, > rather than specifying a number of copies. Fair enough. Backward compatibility dictates not *requiring* a bool, but using it as a default is fine. From tim.hochberg at cox.net Thu Jan 13 09:01:15 2005 From: tim.hochberg at cox.net (Tim Hochberg) Date: Thu Jan 13 09:01:15 2005 Subject: [Numpy-discussion] Simplifying array() In-Reply-To: <41E69311.3030308@sympatico.ca> References: <1105576535.24423.213.camel@halloween.stsci.edu> <41E69311.3030308@sympatico.ca> Message-ID: <41E6A91A.7080209@cox.net> Colin J. Williams wrote: > Todd Miller wrote: > >> Someone (way to go Rory!) recently posted a patch (woohoo!) for >> numarray which I think bears a little discussion since it involves >> the re-write of a fundamental numarray function: array(). >> The patch fixes a number of bugs and deconvolutes the logic of array(). >> >> The patch is here if you want to look at it yourself: >> >> http://sourceforge.net/tracker/?atid=450449&group_id=1369&func=browse >> >> One item I thought needed some discussion was the removal of two >> features: >> >> >> >>> * array() does too much. E.g., handling file/memory instances for >>> 'sequence'. There's fromfile for the former, and users needing >>> the latter functionality should be clued up enough to >>> instantiate NumArray directly. >>> >> >> >> I agree with this myself. Does anyone care if they will no longer be >> able to construct an array from a file or buffer object using array() >> rather than fromfile() or NumArray(), respectively? Is a deprecation >> process necessary to remove them? >> This isn't going to cause me pain, FWIW. > I would suggest deprecation on the way to removal. For the newcomer, > who is not yet "clued up" > some advice on the instantiation of NumArray would help. Currently, > neither the word "class" or > "NumArray" appear in the doc index. > > Rory leaves in type and typecode. It would be good to eliminate this > apparent overlap. Why not > deprecate and then drop type? As a compromise, either could be > accepted as a NumArray.__init__ > argument, since it is easy to distinguish between them. I thought typecode was eventually going away, not type. Either way, it makes sense to drop one of them eventually. This should definately go through a period of deprecation thought: it will certainly require that I fix a bunch of my code. > It would be good to clarify the acceptable content of a sequence. A > list, perhaps with sublists, of > numbers is clear enough but what about a sequence of NumArray > instances or even a sequence > of numbers, mixed with NumArray instances? Isn't any sequence that is composed of numbers or subsequences acceptable, as long as it has a consistent shape (no ragged edges)? > > Is the function asarray redundant? No, the copy=False parameter is redundant ;) Well as a pair they are redundant, but if I was going to get rid of something, I'd get rid of copy, because it's lying: copy=False sometimes copies (when the sequence is not an array) and sometimes does not (when the sequence is an array). A better name would be alwaysCopy, but better still would be to just get rid of it altogether and rely on asarray. (asarray may be implemented using the copy parameter now, but that would be easy to fix.). While we're at it, savespace should get nuked too (all with appropriate deprecations I suppose), so the final signature of array would be: array(sequence=None, type=None, shape=None) Hmm. That's still too complicated. It really should be array(sequence, type=None) I believe that other uses can be more clearly accomplished using zeros and reshape. Of course that has drastic backward compatibility issues and even with generous usage of deprecations might not help the transition much. Still, that's probably what I'd shoot for if it were an option. > > I suggest that the copy parameter be of the BoolType. This probably > has no practical impact but > it is consistent with current Python usage and makes it clear that > this is a Yes/No parameter, > rather than specifying a number of copies. > >> I think strings.py and records.py also have "over-stuffed" array() >> functions... so consistency bids us to streamline those as well. >> Regards, >> Todd >> >> >> > Thanks to Rory for initiating this. Agreed. -tim From perry at stsci.edu Thu Jan 13 09:54:12 2005 From: perry at stsci.edu (Perry Greenfield) Date: Thu Jan 13 09:54:12 2005 Subject: [Numpy-discussion] Simplifying array() In-Reply-To: <41E6A91A.7080209@cox.net> References: <1105576535.24423.213.camel@halloween.stsci.edu> <41E69311.3030308@sympatico.ca> <41E6A91A.7080209@cox.net> Message-ID: <1DA08A62-658C-11D9-B8A8-000A95B68E50@stsci.edu> On Jan 13, 2005, at 12:00 PM, Tim Hochberg wrote: > Colin J. Williams wrote: > >> Rory leaves in type and typecode. It would be good to eliminate this >> apparent overlap. Why not >> deprecate and then drop type? As a compromise, either could be >> accepted as a NumArray.__init__ >> argument, since it is easy to distinguish between them. > > I thought typecode was eventually going away, not type. Either way, it > makes sense to drop one of them > eventually. This should definately go through a period of deprecation > thought: it will certainly require that I > fix a bunch of my code. Tim is right about this. The rationale was that typecode is inaccurate since types are no longer represented by letter codes (one can still use them for backward compatibility). From juenglin at cs.pdx.edu Thu Jan 13 10:26:15 2005 From: juenglin at cs.pdx.edu (Ralf Juengling) Date: Thu Jan 13 10:26:15 2005 Subject: [Numpy-discussion] iterating over an array Message-ID: <1105640695.14230.41.camel@alpspitze.cs.pdx.edu> Hi, I have an application where I cannot avoid (afaikt) looping over one array dimension. So I thought it might help speeding up the code by setting up the data in a way so that the dimension to loop over is the first dimension. This allows to write for data in a: do sth with data instead of for i in range(len(a)): data = a[i] do sth with data and would save the indexing operation. To my surprise it didn't make a difference in terms of speed. A little timing experiment suggests, that the first version is actually slightly slower than the second: >>> setup = 'import numarray as na; a = na.arange(2000,shape=(1000,2))' >>> Timer('for row in a: pass', setup).timeit(number=1000) 13.495718955993652 >>> Timer('for i in range(len(a)): row=a[i]', setup).timeit(number=1000) 12.162748098373413 I noticed that the array object does not have a special method __iter__, so apparently, no attempts have been made so far to make array iteration fast. Do you think it's possible to speed things up by implementing an __iter__ method? This is high on my wish list and I would help with implementing it, appreciating any advice. Thanks, Ralf From juenglin at cs.pdx.edu Thu Jan 13 10:31:14 2005 From: juenglin at cs.pdx.edu (Ralf Juengling) Date: Thu Jan 13 10:31:14 2005 Subject: [Numpy-discussion] Simplifying array() In-Reply-To: <41E6A91A.7080209@cox.net> References: <1105576535.24423.213.camel@halloween.stsci.edu> <41E69311.3030308@sympatico.ca> <41E6A91A.7080209@cox.net> Message-ID: <1105641044.14230.47.camel@alpspitze.cs.pdx.edu> On Thu, 2005-01-13 at 09:00, Tim Hochberg wrote: > > It would be good to clarify the acceptable content of a sequence. > > A list, perhaps with sublists, of numbers is clear enough but what > > about a sequence of NumArray instances or even a sequence of > > numbers, mixed with NumArray instances? > > Isn't any sequence that is composed of numbers or subsequences > acceptable, as long as it has a consistent shape (no ragged edges)? Why not make it a little more general and accept iterable objects? >From http://docs.python.org/lib/module-array.html : array( typecode[, initializer]) Return a new array whose items are restricted by typecode, and initialized from the optional initializer value, which must be a list, string, or iterable over elements of the appropriate type. Changed in version 2.4: Formerly, only lists or strings were accepted. If given a list or string, the initializer is passed to the new array's fromlist(), fromstring(), or fromunicode() method (see below) to add initial items to the array. Otherwise, the iterable initializer is passed to the extend() method. Ralf From verveer at embl.de Thu Jan 13 12:47:09 2005 From: verveer at embl.de (Peter Verveer) Date: Thu Jan 13 12:47:09 2005 Subject: [Numpy-discussion] position of objects? In-Reply-To: <41E592EB.6090209@grc.nasa.gov> References: <41E592EB.6090209@grc.nasa.gov> Message-ID: <39D8BD7A-65A4-11D9-B5CA-000D932805AC@embl.de> > Is there a way to obtain the positions (coordinates) of objects that > were found with find_objects() function (in nd_image)? Specifically > what I'm looking for is the coordinates of the bounding box (for a 2d > array it would be upper-left and lower-right). The find_objects() functions returns for each object the slices that define the bounding box of each object. The slices are a tuple of slice objects, one slice object for each axis. The start and stop attributes of slice objects can be used to find exact position and size along each axis. Cheers, Peter From jmiller at stsci.edu Thu Jan 13 13:07:12 2005 From: jmiller at stsci.edu (Todd Miller) Date: Thu Jan 13 13:07:12 2005 Subject: [Numpy-discussion] iterating over an array In-Reply-To: <1105640695.14230.41.camel@alpspitze.cs.pdx.edu> References: <1105640695.14230.41.camel@alpspitze.cs.pdx.edu> Message-ID: <1105650377.3325.26.camel@jaytmiller.comcast.net> On Thu, 2005-01-13 at 10:24 -0800, Ralf Juengling wrote: > Hi, > > I have an application where I cannot avoid (afaikt) > looping over one array dimension. So I thought it > might help speeding up the code by setting up the > data in a way so that the dimension to loop over is > the first dimension. This allows to write > > for data in a: > do sth with data > > instead of > > for i in range(len(a)): > data = a[i] > do sth with data > > and would save the indexing operation. To my surprise > it didn't make a difference in terms of speed. A > little timing experiment suggests, that the first > version is actually slightly slower than the second: > > >>> setup = 'import numarray as na; a = na.arange(2000,shape=(1000,2))' > > >>> Timer('for row in a: pass', setup).timeit(number=1000) > 13.495718955993652 > > >>> Timer('for i in range(len(a)): row=a[i]', setup).timeit(number=1000) > 12.162748098373413 > > > I noticed that the array object does not have a special > method __iter__, so apparently, no attempts have been > made so far to make array iteration fast. Do you think > it's possible to speed things up by implementing an > __iter__ method? I'm skeptical. My impression is that the fallback for the iteration system is to use the object's len() to determine the count and its getitem() to fetch the iteration elements, all in C without intermediate indexing objects. If numarray is to be sped up, I think the key is to speed up the indexing code and/or object creation code in numarray's _ndarraymodule.c and _numarraymodule.c. I'd be happy to be proved wrong but that's my 2 cents. Regards, Todd From Chris.Barker at noaa.gov Thu Jan 13 15:03:02 2005 From: Chris.Barker at noaa.gov (Chris Barker) Date: Thu Jan 13 15:03:02 2005 Subject: [Numpy-discussion] iterating over an array In-Reply-To: <1105640695.14230.41.camel@alpspitze.cs.pdx.edu> References: <1105640695.14230.41.camel@alpspitze.cs.pdx.edu> Message-ID: <41E6FCC7.4010309@noaa.gov> Ralf Juengling wrote: > for data in a: > do sth with data > > instead of > > for i in range(len(a)): > data = a[i] > do sth with data > Do you think > it's possible to speed things up by implementing an > __iter__ method? Frankly, I seriously doubt it would make much difference, the indexing operation would have to take a comparable period of time to your: do sth with data That is unlikely. By the way, here is a test with Python lists: setup = 'import numarray as na; a = [[i*2,i*2+1] for i in range(1000)]' >>> Timer('for i in range(len(a)): row=a[i]', setup).timeit(number=1000) 0.37136483192443848 Much faster than the numarray examples( ~ 30 on my machine). I suspect the real delay here is that each indexing operation has to create a new array (even if they do use the same data). Lists just return the item. Also, it's been discussed that numarray's generic indexing is much slower than Numeric's, for instance. This has made a huge difference when passing arrays into wxPython, for instance. Perhaps that's relevant? Here's a test with Numeric vs. numarray: >>> setup = 'import Numeric as na; a = na.arange(2000);a.shape=(1000,2)' >>> Timer('for i in range(len(a)): row=a[i]', setup).timeit(number=1000) 1.97064208984375 >>> setup = 'import numarray as na; a = na.arange(2000);a.shape=(1000,2)' >>> Timer('for i in range(len(a)): row=a[i]', setup).timeit(number=1000) 27.220904111862183 yup! that's it. numarray's indexing is SLOW. So it's not an iterator issue. Look in the archives of this list for discussion of why numarray's generic indexing is slow. A search for "wxPython indexing" will probably turn it up. -Chris -Chris -- Christopher Barker, Ph.D. Oceanographer NOAA/OR&R/HAZMAT (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From cjw at sympatico.ca Thu Jan 13 15:09:13 2005 From: cjw at sympatico.ca (Colin J. Williams) Date: Thu Jan 13 15:09:13 2005 Subject: [Numpy-discussion] Simplifying array() In-Reply-To: <1105641044.14230.47.camel@alpspitze.cs.pdx.edu> References: <1105576535.24423.213.camel@halloween.stsci.edu> <41E69311.3030308@sympatico.ca> <41E6A91A.7080209@cox.net> <1105641044.14230.47.camel@alpspitze.cs.pdx.edu> Message-ID: <41E6FF60.4090803@sympatico.ca> Ralf Juengling wrote: >On Thu, 2005-01-13 at 09:00, Tim Hochberg wrote: > > >>>It would be good to clarify the acceptable content of a sequence. >>>A list, perhaps with sublists, of numbers is clear enough but what >>>about a sequence of NumArray instances or even a sequence of >>>numbers, mixed with NumArray instances? >>> >>> >>Isn't any sequence that is composed of numbers or subsequences >>acceptable, as long as it has a consistent shape (no ragged edges)? >> >> > >Why not make it a little more general and accept iterable objects? > >>From http://docs.python.org/lib/module-array.html : > > >array( >typecode[, initializer]) > Return a new array whose items are restricted by typecode, and > initialized from the optional initializer value, which must be a > list, string, or iterable over elements of the appropriate type. > Changed in version 2.4: Formerly, only lists or strings were > accepted. If given a list or string, the initializer is passed > to the new array's fromlist(), fromstring(), or fromunicode() > method (see below) to add initial items to the array. Otherwise, > the iterable initializer is passed to the extend() method. > > > >Ralf > > > Yes, I'm not sure whether list comprehension produces an iter object but this should also be included. Similarly instances of subclasses of NumArray should be explicitly included. I like the term no "ragged edges". Colin W. From cjw at sympatico.ca Thu Jan 13 16:45:04 2005 From: cjw at sympatico.ca (Colin J. Williams) Date: Thu Jan 13 16:45:04 2005 Subject: [Numpy-discussion] Simplifying array() In-Reply-To: <1105632994.3169.32.camel@jaytmiller.comcast.net> References: <1105576535.24423.213.camel@halloween.stsci.edu> <41E69311.3030308@sympatico.ca> <1105632994.3169.32.camel@jaytmiller.comcast.net> Message-ID: <41E715F2.70205@sympatico.ca> Todd Miller wrote: >On Thu, 2005-01-13 at 10:26 -0500, Colin J. Williams wrote: > > >>Todd Miller wrote: >> >> >> >>>Someone (way to go Rory!) recently posted a patch (woohoo!) for >>>numarray which I think bears a little discussion since it involves >>>the re-write of a fundamental numarray function: array(). >>>The patch fixes a number of bugs and deconvolutes the logic of array(). >>> >>>The patch is here if you want to look at it yourself: >>> >>>http://sourceforge.net/tracker/?atid=450449&group_id=1369&func=browse >>> >>>One item I thought needed some discussion was the removal of two >>>features: >>> >>> >>> >>> >>> >>>> * array() does too much. E.g., handling file/memory instances for >>>> 'sequence'. There's fromfile for the former, and users needing >>>> the latter functionality should be clued up enough to >>>> instantiate NumArray directly. >>>> >>>> >>>> >>>> >>>I agree with this myself. Does anyone care if they will no longer be >>>able to construct an array from a file or buffer object using array() >>>rather than fromfile() or NumArray(), respectively? Is a deprecation >>>process necessary to remove them? >>> >>> >>> >>> >>I would suggest deprecation on the way to removal. For the newcomer, >>who is not yet "clued up" >>some advice on the instantiation of NumArray would help. >> >> > >That's fair. The docstring for NumArray needs beefing up along the same >lines as Rory's work on array(). > and, I would suggest, the documentation. >I initially liked Florian's idea of >frombuffer() but since I can't think of how it's not identical to >NumArray(), I'm not sure there's any point. > > > >>Rory leaves in type and typecode. It would be good to eliminate this >>apparent overlap. Why not >>deprecate and then drop type? >> >> > >Some people like type. I don't want to touch this. > > The basic suggestion was to drop one or the other, since one is an _nt entry and either an instance of a function while the other is a string. I recognize that "type" has become accepted in the numarray community but the same word is used by Python for a utility function. > > >>It would be good to clarify the acceptable content of a sequence. A >>list, perhaps with sublists, of >>numbers is clear enough but what about a sequence of NumArray instances >>or even a sequence >>of numbers, mixed with NumArray instances? >> >> > >The patch has a new docstring which spells out the array() construction >algorithm. Lists of arrays would be seen as "numerical sequences". > > > >>Is the function asarray redundant? >> >> > >Yes, but it's clear and also needed for backward compatibility with >Numeric. Besides, it's not just redundant, it's an idiom... > > *asarray*( seq, type=None, typecode=None) This function converts scalars, lists and tuples to a numarray, when possible. It passes numarrays through, making copies only to convert types. In any other case a TypeError is raised. *astype*( type) The astype method returns a copy of the array converted to the specified type. As with any copy, the new array is aligned, contiguous, and in native machine byte order. If the specified type is the same as current type, a copy is /still/ made. *array*( sequence=None, typecode=None, copy=1, savespace=0, type=None, shape=None) It seems that the function array could be used in place of either the function asarray or the method astype: >>> import numarray.numerictypes as _nt >>> import numarray.numarraycore as _n >>> a= _n.array([1, 2]) >>> a array([1, 2]) >>> a._type Int32 >>> b= a.astype(_nt.Float64) >>> b._type Float64 >>> a._type Int32 >>> c= _n.array(seq= a, type= _nt.Float64) Traceback (most recent call last): File "", line 1, in ? TypeError: array() got an unexpected keyword argument 'seq' >>> c= _n.array(a, type= _nt.Float64) >>> c._type Float64 >>> > > >>I suggest that the copy parameter be of the BoolType. This probably has >>no practical impact but >>it is consistent with current Python usage and makes it clear that this >>is a Yes/No parameter, >>rather than specifying a number of copies. >> >> > >Fair enough. Backward compatibility dictates not *requiring* a bool, >but using it as a default is fine. > > > > Colin W. From das_deniz at yahoo.com Thu Jan 13 20:41:14 2005 From: das_deniz at yahoo.com (D. Bahi) Date: Thu Jan 13 20:41:14 2005 Subject: [Numpy-discussion] Numeric 23.6 for Python 2.4 (23.7 anyone) Message-ID: <20050114044049.95790.qmail@web20422.mail.yahoo.com> Hey this works for me! Thanks very much Jonathan. Want to do it again for 23.7? das __________________________________ Do you Yahoo!? All your favorites on one personal page ? Try My Yahoo! http://my.yahoo.com From faltet at carabos.com Fri Jan 14 00:29:22 2005 From: faltet at carabos.com (Francesc Altet) Date: Fri Jan 14 00:29:22 2005 Subject: [Numpy-discussion] iterating over an array In-Reply-To: <41E6FCC7.4010309@noaa.gov> References: <1105640695.14230.41.camel@alpspitze.cs.pdx.edu> <41E6FCC7.4010309@noaa.gov> Message-ID: <200501140928.30275.faltet@carabos.com> A Dijous 13 Gener 2005 23:57, Chris Barker va escriure: > >>> setup = 'import Numeric as na; a = na.arange(2000);a.shape=(1000,2)' > >>> Timer('for i in range(len(a)): row=a[i]', setup).timeit(number=1000) > 1.97064208984375 > >>> setup = 'import numarray as na; a = na.arange(2000);a.shape=(1000,2)' > >>> Timer('for i in range(len(a)): row=a[i]', setup).timeit(number=1000) > 27.220904111862183 > > yup! that's it. numarray's indexing is SLOW. So it's not an iterator > issue. Look in the archives of this list for discussion of why > numarray's generic indexing is slow. A search for "wxPython indexing" > will probably turn it up. Well, if you want to really compare generic indexing speed, you can't mix array creation objects in the process, as your example seems to do. A pure indexing access test would look like: >>> setup = 'import numarray as na; a = [i*2 for i in range(2000)]' >>> Timer('for i in range(len(a)): row=a[i]', setup).timeit(number=1000) 0.48835396766662598 # With Python Lists >>> setup = 'import Numeric as na; a = na.arange(2000);a.shape=(1000*2,)' >>> Timer('for i in range(len(a)): row=a[i]', setup).timeit(number=1000) 0.65753912925720215 # With Numeric >>> setup = 'import numarray as na; a = na.arange(2000);a.shape=(1000*2,)' >>> Timer('for i in range(len(a)): row=a[i]', setup).timeit(number=1000) 0.89093804359436035 # With numarray That shows that numarray indexing is slower than Numeric, but not by a large extent (just a 40%). The real problem with numarray (for Ralf's example) is, as is already known, array creation time. Cheers, -- >OO< ? Francesc Altet || http://www.carabos.com/ V ?V ? Carabos Coop. V. || Who is your data daddy? PyTables "" From tkorvola at welho.com Fri Jan 14 02:08:20 2005 From: tkorvola at welho.com (Timo Korvola) Date: Fri Jan 14 02:08:20 2005 Subject: [Numpy-discussion] Simplifying array() In-Reply-To: <41E6FF60.4090803@sympatico.ca> (Colin J. Williams's message of "Thu, 13 Jan 2005 18:08:16 -0500") References: <1105576535.24423.213.camel@halloween.stsci.edu> <41E69311.3030308@sympatico.ca> <41E6A91A.7080209@cox.net> <1105641044.14230.47.camel@alpspitze.cs.pdx.edu> <41E6FF60.4090803@sympatico.ca> Message-ID: <87acrcwcg5.fsf@welho.com> "Colin J. Williams" writes: > Yes, I'm not sure whether list comprehension produces an iter object > but this should also be included. Lists are iterable but they also have a length, which is not accessible through the iterator: from a general iterator there is no way of knowing in advance how many items it will return. This may be a problem if you want to allocate memory for the values. -- Timo Korvola From jmiller at stsci.edu Fri Jan 14 06:19:22 2005 From: jmiller at stsci.edu (Todd Miller) Date: Fri Jan 14 06:19:22 2005 Subject: [Numpy-discussion] iterating over an array Message-ID: <1105712311.3481.2.camel@jaytmiller.comcast.net> On Fri, 2005-01-14 at 09:28 +0100, Francesc Altet wrote: > A Dijous 13 Gener 2005 23:57, Chris Barker va escriure: > > >>> setup = 'import Numeric as na; a = na.arange(2000);a.shape= (1000,2)' > > >>> Timer('for i in range(len(a)): row=a[i]', setup).timeit (number=1000) > > 1.97064208984375 > > >>> setup = 'import numarray as na; a = na.arange(2000);a.shape= (1000,2)' > > >>> Timer('for i in range(len(a)): row=a[i]', setup).timeit (number=1000) > > 27.220904111862183 > > > > yup! that's it. numarray's indexing is SLOW. So it's not an iterator > > issue. Look in the archives of this list for discussion of why > > numarray's generic indexing is slow. A search for "wxPython indexing" > > will probably turn it up. > > Well, if you want to really compare generic indexing speed, you can't mix > array creation objects in the process, as your example seems to do. > > A pure indexing access test would look like: > > >>> setup = 'import numarray as na; a = [i*2 for i in range(2000)]' > >>> Timer('for i in range(len(a)): row=a[i]', setup).timeit (number=1000) > 0.48835396766662598 # With Python Lists > >>> setup = 'import Numeric as na; a = na.arange(2000);a.shape= (1000*2,)' > >>> Timer('for i in range(len(a)): row=a[i]', setup).timeit (number=1000) > 0.65753912925720215 # With Numeric > >>> setup = 'import numarray as na; a = na.arange(2000);a.shape= (1000*2,)' > >>> Timer('for i in range(len(a)): row=a[i]', setup).timeit (number=1000) > 0.89093804359436035 # With numarray > > That shows that numarray indexing is slower than Numeric, but not by a large > extent (just a 40%). The real problem with numarray (for Ralf's example) is, > as is already known, array creation time. I thought we were done after what Francesc pointed out above, then I tried this: from timeit import Timer setup = 'import numarray as na; a = na.arange(2000,shape=(2000,))' print "numarray iteration: ", Timer('for i in a: pass', setup).timeit(number=1000) print "numarray simple indexing, int value:", Timer('for i in range(len(a)): row=a[i]', setup).timeit(number=1000) setup = 'import Numeric as na; a = na.arange(2000); a.shape=(2000,)' print "Numeric iteration: ", Timer('for i in a: pass', setup).timeit(number=1000) print "Numeric simple indexing, int value:", Timer('for i in range(len(a)): row=a[i]', setup).timeit(number=1000) And got: numarray iteration: 8.81474900246 numarray simple indexing, int value: 3.61732387543 Numeric iteration: 1.0384759903 Numeric simple indexing, int value: 2.18056321144 This is running on Python-2.3.4 compiled --with-debug using gcc-3.4.2 on a 1 GHZ Athlon XP and FC3 Linux. Simple indexing returning an int was 66% slower for me, but iteration was 880% slower. It looks to me like there is room for significant numarray iteration improvement; I'm not sure how it needs to be done or if Numeric has any special support. Regards, Todd From ryorke at telkomsa.net Fri Jan 14 07:33:21 2005 From: ryorke at telkomsa.net (Rory Yorke) Date: Fri Jan 14 07:33:21 2005 Subject: [Numpy-discussion] Re: Simplifying array() In-Reply-To: <1105576535.24423.213.camel@halloween.stsci.edu> References: <1105576535.24423.213.camel@halloween.stsci.edu> Message-ID: <20050113201813.GB6528@telkomsa.net> [Todd] > I agree with this myself. Does anyone care if they will no longer be > able to construct an array from a file or buffer object using array() > rather than fromfile() or NumArray(), respectively? Is a deprecation > process necessary to remove them? There seems to be a majority opinion in favour of deprecation, though at least Florian uses the sequence-as-a-buffer feature. [Colin] > I would suggest deprecation on the way to removal. For the > newcomer, who is not yet "clued up" some advice on the instantiation > of NumArray would help. Currently, The deprecation warning could include a pointer to NumArray or fromfile, as appropriate. I think some of the Python stdlib deprecations (doctest?) do exactly this. The NumArray docs do need to be fixed, though. [Colin] > Rory leaves in type and typecode. It would be good to eliminate > this apparent overlap. Why not deprecate and then drop type? As a > compromise, either could be accepted as a NumArray.__init__ > argument, since it is easy to distinguish between them. [Perry] > Tim is right about this. The rationale was that typecode is > inaccurate since types are no longer represented by letter codes > (one can still use them for backward compatibility). Also, the type keyword matches the NumArray type method. It does have the downside of clashing with the type builtin, of course. > It would be good to clarify the acceptable content of a sequence. A I think this is quite important, though perhaps not too difficult. I think any sequence, or nested sequences should be accepted, provided that they are "conformally sized" (for lack of a better phrase) and that the innermost sequences contain number types. I'll try to word this more precisely for the docs. Note that a NumArray is a sequence, in the sense that it has __getitem__ and __len__ methods, and is index from 0 upwards. Strings are also sequences, and Alexander made a comment to the patch that array() should handle sequences of strings. Consider Numeric's behaviour: >>> array(["abc",[1,2,3]]) array([[97, 98, 99], [ 1, 2, 3]]) I think this needs to be handled in fromlist, which, I think, handles fairly general sequences, but not strings. Note that this leads to a different interpretation of array(["abcd"]) and array("abcd") According to the above, array(["abcd"] should return array([[97,98,99,100]]) and, since plain strings go straight to fromstring, array("abcd") should return array([1684234849]) (probably dependent on endianess, what Long is, etc.). Is this acceptable? [Colin] >Is the function asarray redundant? [Tim] > No, the copy=False parameter is redundant ;) Well as a pair they are I'm not sure I follow Tim's argument, but asarray is not redundant for a different reason: it returns any NDArray arguments without calling array. generic.ravel calls numarraycore.asarray, and so ravel()ing RecArrays, or some other non-NumArray NDArray requires asarray to remain as it is. I'm not sure if this setup is desirable, but I decided not to change too many things at once. [Colin] >I suggest that the copy parameter be of the BoolType. This >probably has no practical impact but it is consistent with current >Python usage and makes it clear that this is a Yes/No parameter, >rather than specifying a number of copies. This makes sense; as Todd noted, we shouldn't rely on it being a bool, but having False as the default value is clearer. Cheers, Rory From Chris.Barker at noaa.gov Fri Jan 14 11:54:26 2005 From: Chris.Barker at noaa.gov (Chris Barker) Date: Fri Jan 14 11:54:26 2005 Subject: [Numpy-discussion] iterating over an array In-Reply-To: <200501140928.30275.faltet@carabos.com> References: <1105640695.14230.41.camel@alpspitze.cs.pdx.edu> <41E6FCC7.4010309@noaa.gov> <200501140928.30275.faltet@carabos.com> Message-ID: <41E8222B.3070900@noaa.gov> Francesc Altet wrote: > That shows that numarray indexing is slower than Numeric, but not by a large > extent (just a 40%). The real problem with numarray (for Ralf's example) is, > as is already known, array creation time. Thanks for clearing this up. The case I care about(at the moment) is in wxPython's "PointListHelper". It converts whatever Python sequence you give it into a wxList of wxPoints. The sequence you give it needs to look something like a list of (x,y) tuples. An NX2 Numeric or Numarray array works just fine, but both are slower than a list of tuples, and Numarray is MUCH slower. This appears to be exactly analogous the OP's example, of extracting a bunch of (2,) arrays from the (N,2) array. Then the two numbers must be extracted from the (2,) array, and then converted to a wxPoint. It seems the creation of all those (2,) numarrays is what's taking the time. A) Is there work going on on speeding this up? B) the real solution, at least for wxPython, is to make "PointListHelper" understand numarrays, so that it can go straight from the array->data pointer to the wxList of wxPoints. One of these days I'll get around to working on that! -Chris -- Christopher Barker, Ph.D. Oceanographer NOAA/OR&R/HAZMAT (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From jmiller at stsci.edu Fri Jan 14 13:46:13 2005 From: jmiller at stsci.edu (Todd Miller) Date: Fri Jan 14 13:46:13 2005 Subject: [Numpy-discussion] Re: Simplifying array() In-Reply-To: <20050113201813.GB6528@telkomsa.net> References: <1105576535.24423.213.camel@halloween.stsci.edu> <20050113201813.GB6528@telkomsa.net> Message-ID: <1105739144.5294.28.camel@jaytmiller.comcast.net> On Thu, 2005-01-13 at 22:18 +0200, Rory Yorke wrote: > [Todd] > > I agree with this myself. Does anyone care if they will no longer be > > able to construct an array from a file or buffer object using array() > > rather than fromfile() or NumArray(), respectively? Is a deprecation > > process necessary to remove them? > > There seems to be a majority opinion in favour of deprecation, though > at least Florian uses the sequence-as-a-buffer feature. By way of status, I applied and committed both Rory's patches this morning. Afterward, I added the deprecation warnings for the frombuffer() and fromfile() cases. frombuffer() is identical to NumArray(), so I did not add a new function. > [Colin] > > I would suggest deprecation on the way to removal. For the > > newcomer, who is not yet "clued up" some advice on the instantiation > > of NumArray would help. Currently, > > The deprecation warning could include a pointer to NumArray or > fromfile, as appropriate. I think some of the Python stdlib > deprecations (doctest?) do exactly this. The NumArray docs do need to > be fixed, though. I didn't touch the docs. > [Colin] > > Rory leaves in type and typecode. It would be good to eliminate > > this apparent overlap. Why not deprecate and then drop type? As a > > compromise, either could be accepted as a NumArray.__init__ > > argument, since it is easy to distinguish between them. > > [Perry] > > Tim is right about this. The rationale was that typecode is > > inaccurate since types are no longer represented by letter codes > > (one can still use them for backward compatibility). > > Also, the type keyword matches the NumArray type method. It does have > the downside of clashing with the type builtin, of course. IMHO, all this discussion about type/typecode is moot because typecode was added after the fact for Numeric compatibility. It's really makes no sense to take it out now that we're going for interoperability with scipy. I don't like it much either, but the alternative, being incompatible, is worse. "typecode" could be factored out in to the numerix layer, but that just makes life confusing; it's best that numarray works the same whether it's being used with scipy or not. > > It would be good to clarify the acceptable content of a sequence. A > > I think this is quite important, though perhaps not too difficult. I > think any sequence, or nested sequences should be accepted, provided > that they are "conformally sized" (for lack of a better phrase) and > that the innermost sequences contain number types. I'll try to word > this more precisely for the docs. > > Note that a NumArray is a sequence, in the sense that it has > __getitem__ and __len__ methods, and is index from 0 upwards. > > Strings are also sequences, and Alexander made a comment to the patch > that array() should handle sequences of strings. Consider Numeric's > behaviour: > > >>> array(["abc",[1,2,3]]) > array([[97, 98, 99], > [ 1, 2, 3]]) -1 from me. I think we're getting back into "array does too much" territory. > I think this needs to be handled in fromlist, which, I think, handles > fairly general sequences, but not strings. I think you're right, that's how it could be done. > Note that this leads to a different interpretation of array(["abcd"]) > and array("abcd") > > According to the above, array(["abcd"] should return > array([[97,98,99,100]]) and, since plain strings go straight to > fromstring, array("abcd") should return array([1684234849]) (probably > dependent on endianess, what Long is, etc.). Is this acceptable? I held off consolidating all the new default types to Long. Not having defaults hasn't been a problem up to now so I'm not sure Numeric compatibility is such a concern or that Long is really the best default... although it does make it easier to write doctests. Todd From haase at msg.ucsf.edu Fri Jan 14 15:24:08 2005 From: haase at msg.ucsf.edu (Sebastian Haase) Date: Fri Jan 14 15:24:08 2005 Subject: [Numpy-discussion] ScientificPython with numarray support In-Reply-To: References: Message-ID: <200501141523.07159.haase@msg.ucsf.edu> On Thursday 13 January 2005 05:42 am, konrad.hinsen at laposte.net wrote: > A development release of Scientific Python that supports numarray as an > alternative to NumPy Hi Konrad, This is great news ! In the readme it says: """ Note that this is a new feature and not very well tested. Feedback is welcome. Note also that the modules Scientific.Functions.Derivatives Scientific.Functions.FirstDerivatives Scientific.Functions.LeastSquares do not work correctly with numarray because they rely on a feature of Numeric that is missing in current numarray releases. """ I'm just curious what the missing feature is. The LeastSquare-fit is exactly what I'm interested in, since I couldn't find something similar anywhere else (like: It's not in SciPy, right?) Thanks, Sebastian Haase >(choice made at installation time) is now > available at > > http://starship.python.net/~hinsen/ScientificPython/ > or > http://dirac.cnrs-orleans.fr/ScientificPython/ > > (Search for "2.51".) > > Note that some modules do not work under numarray because they rely on > a NumPy feature that is currently implemented in numarray. They are > listed in the README file. > > Konrad. > -- > --------------------------------------------------------------------- > Konrad Hinsen > Laboratoire L?on Brillouin, CEA Saclay, > 91191 Gif-sur-Yvette Cedex, France > Tel.: +33-1 69 08 79 25 > Fax: +33-1 69 08 82 61 > E-Mail: hinsen at llb.saclay.cea.fr > --------------------------------------------------------------------- > > > > ------------------------------------------------------- > The SF.Net email is sponsored by: Beat the post-holiday blues > Get a FREE limited edition SourceForge.net t-shirt from ThinkGeek. > It's fun and FREE -- well, almost....http://www.thinkgeek.com/sfshirt > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/numpy-discussion From jdhunter at ace.bsd.uchicago.edu Fri Jan 14 16:21:22 2005 From: jdhunter at ace.bsd.uchicago.edu (John Hunter) Date: Fri Jan 14 16:21:22 2005 Subject: [Numpy-discussion] ScientificPython with numarray support In-Reply-To: <200501141523.07159.haase@msg.ucsf.edu> (Sebastian Haase's message of "Fri, 14 Jan 2005 15:23:07 -0800") References: <200501141523.07159.haase@msg.ucsf.edu> Message-ID: >>>>> "Sebastian" == Sebastian Haase writes: Sebastian> The LeastSquare-fit is exactly what I'm interested in, Sebastian> since I couldn't find something similar anywhere else Sebastian> (like: It's not in SciPy, right?) from scipy import exp, arange, zeros, Float, ones, transpose from RandomArray import normal from scipy.optimize import leastsq parsTrue = 2.0, -.76, 0.1 distance = arange(0, 4, 0.001) def func(pars): a, alpha, k = pars return a*exp(alpha*distance) + k def errfunc(pars): return data - func(pars) #return the error # some pseudo data; add some noise data = func(parsTrue) + normal(0.0, 0.1, distance.shape) guess = 1.0, -.4, 0.0 # the intial guess of the params best, info, ier, mesg = leastsq(errfunc, guess, full_output=1) print 'true', parsTrue print 'best', best From haase at msg.ucsf.edu Fri Jan 14 17:24:19 2005 From: haase at msg.ucsf.edu (Sebastian Haase) Date: Fri Jan 14 17:24:19 2005 Subject: [Numpy-discussion] ScientificPython with numarray support In-Reply-To: References: <200501141523.07159.haase@msg.ucsf.edu> Message-ID: <200501141723.17142.haase@msg.ucsf.edu> On Friday 14 January 2005 04:15 pm, John Hunter wrote: > >>>>> "Sebastian" == Sebastian Haase writes: > > Sebastian> The LeastSquare-fit is exactly what I'm interested in, > Sebastian> since I couldn't find something similar anywhere else > Sebastian> (like: It's not in SciPy, right?) > > from scipy import exp, arange, zeros, Float, ones, transpose > from RandomArray import normal > from scipy.optimize import leastsq > > parsTrue = 2.0, -.76, 0.1 > distance = arange(0, 4, 0.001) > > def func(pars): > a, alpha, k = pars > return a*exp(alpha*distance) + k > > def errfunc(pars): > return data - func(pars) #return the error > > # some pseudo data; add some noise > data = func(parsTrue) + normal(0.0, 0.1, distance.shape) > > > guess = 1.0, -.4, 0.0 # the intial guess of the params > > best, info, ier, mesg = leastsq(errfunc, guess, full_output=1) > > print 'true', parsTrue > print 'best', best > Thanks John, I thought it should be there. Is the code / algorithm about similar to what Konrad has in Scientific ? - Sebastian From gazzar at email.com Fri Jan 14 19:40:24 2005 From: gazzar at email.com (Gary Ruben) Date: Fri Jan 14 19:40:24 2005 Subject: [Numpy-discussion] ScientificPython with numarray support Message-ID: <20050115033916.3CAB3164002@ws1-4.us4.outblaze.com> Hi Sebastian, You could also use the linregress function in scipy.stats if you're doing least squares fitting of a straight line. Gary R. ----- Original Message ----- > On Friday 14 January 2005 04:15 pm, John Hunter wrote: > > >>>>> "Sebastian" == Sebastian Haase writes: > > > > Sebastian> The LeastSquare-fit is exactly what I'm interested in, > > Sebastian> since I couldn't find something similar anywhere else > > Sebastian> (like: It's not in SciPy, right?) > > > > best, info, ier, mesg = leastsq(errfunc, guess, full_output=1) > > Thanks John, > I thought it should be there. > Is the code / algorithm about similar to what Konrad has in Scientific ? > > - Sebastian -- ___________________________________________________________ Sign-up for Ads Free at Mail.com http://promo.mail.com/adsfreejump.htm From oliphant at ee.byu.edu Fri Jan 14 22:37:32 2005 From: oliphant at ee.byu.edu (Travis Oliphant) Date: Fri Jan 14 22:37:32 2005 Subject: [Numpy-discussion] Speeding up numarray -- questions on its design Message-ID: <41E8B9D9.5040301@ee.byu.edu> Hello all, I don't comment much on numarray because I haven't used it that much, as Numeric fits my needs quite well. It does bother me that there are two communities coexisting and that work seems to get repeated several times, so recently I have looked at numarray to see how far it is from being acceptable as a real replacement for Numeric. I have some comments based on perusing it's source. I don't want to seem overly critical, so please take my comments with the understanding that I appreciate the extensive work that has gone into Numarray. I do think that Numarray has made some great strides. I would really like to see a unification of Numeric and Numarray. 1) Are there plans to move the nd array entirely into C? -- I would like to see the nd array become purely a c-type. I would be willing to help here. I can see that part of the work has been done. 2) Why is the ND array C-structure so large? Why are the dimensions and strides array static? Why can't the extra stuff that the fancy arrays need be another structure and the numarray C structure just extended with a pointer to the extra stuff? 3) There seem to be too many files to define the array. The mixture of Python and C makes trying to understand the source very difficult. I thought one of the reasons for the re-write was to simplify the source code. 4) Object arrays must be supported. This was a bad oversight and an important feature of Numeric arrays. 5) The ufunc code interface needs to continue to be improved. I do see that some effort into understanding the old ufunc interface has taken place which is a good sign. Again, thanks to the work that has been done. I'm really interested to see if some of these modifications can be done as in my mind it will help the process of unifying the two camps. -Travis Oliphant From konrad.hinsen at laposte.net Sun Jan 16 03:29:20 2005 From: konrad.hinsen at laposte.net (konrad.hinsen at laposte.net) Date: Sun Jan 16 03:29:20 2005 Subject: [Numpy-discussion] ScientificPython with numarray support In-Reply-To: <200501141723.17142.haase@msg.ucsf.edu> References: <200501141523.07159.haase@msg.ucsf.edu> <200501141723.17142.haase@msg.ucsf.edu> Message-ID: On 15.01.2005, at 02:23, Sebastian Haase wrote: > I thought it should be there. > Is the code / algorithm about similar to what Konrad has in Scientific > ? > I don't know exactly what's in SciPy, but it's probably a variant of Levenberg-Marquart, just like in Scientific Python. However, there is one peculiarity of my implementation which is probably not shared by the SciPy one, and which is the cause of the incompatibility with numarray: the use of automatic derivatives in the linearization of the model. Most implementations use numerical differentiation. Automatic derivatives have the advantage of removing one numerical issue and one critical parameter (the differentiation step size), at the cost of somewhat limiting the applicability (the model must be expressed as an analytical function of the parameters) and of requiring a NumPy feature that numarray doesn't have (yet?). BTW, that feature was recenly discussed here: it is the possibility to apply the maths functions to objects of arbitrary type. This makes it possible to apply the same numerical code to numbers and arrays but also to the number-like objects that are used for automatic derivatives. Konrad. -- ------------------------------------------------------------------------ ------- Konrad Hinsen Laboratoire Leon Brillouin, CEA Saclay, 91191 Gif-sur-Yvette Cedex, France Tel.: +33-1 69 08 79 25 Fax: +33-1 69 08 82 61 E-Mail: hinsen at llb.saclay.cea.fr ------------------------------------------------------------------------ ------- From perry at stsci.edu Mon Jan 17 11:12:28 2005 From: perry at stsci.edu (Perry Greenfield) Date: Mon Jan 17 11:12:28 2005 Subject: [Numpy-discussion] Speeding up numarray -- questions on its design In-Reply-To: References: Message-ID: Travis Oliphant wrote: > I have some comments based on perusing it's source. I don't want to > seem overly critical, so please take my comments with the understanding > that I appreciate the extensive work that has gone into Numarray. I do > think that Numarray has made some great strides. I would really like > to > see a unification of Numeric and Numarray. > > 1) Are there plans to move the nd array entirely into C? > -- I would like to see the nd array become purely a c-type. I would > be willing to help here. I can see that part of the work has been > done. > I don't know that I would say they are definite, but I think that at some point we thought that would be necessary. We haven't yet since doing so makes it harder to change so it would be one of the last changes to the core that we would want to do. Our current priorities are towards making all the major libraries and packages available under it first and then finishing optimization issues (another issue that has to be tackled soon is handling 64-bit addressing; apparently the work to make Python sequences use 64-bit addresses is nearing completion so we want to be able to handle that. I expect we would want to make sure we find a way of handling that before we turn it all into C but maybe it is just as easy doing them in the opposite order. > 2) Why is the ND array C-structure so large? Why are the dimensions > and strides array static? Why can't the extra stuff that the fancy > arrays need be another structure and the numarray C structure just > extended with a pointer to the extra stuff? When Todd moved NDArray into C, he tried to keep it simple. As such, it has no "moving parts." We think making dimensions and strides malloc'ed rather than static would be fairly easy. Making the "extra stuff" variable is something we can look at. The bottom line is that adding the variability adds complexity and we're not sure we understand the storage economics of why we would doing it. Numarray was designed, first and foremost, for large arrays. For that case, the array struct size is irrelevant whereas additional complexity is not. I guess we would like to see some good practical examples where the array struct size matters. Do you have code with hundreds of thousands of small arrays existing simultaneously? > 3) There seem to be too many files to define the array. The mixture of > Python and C makes trying to understand the source very difficult. I > thought one of the reasons for the re-write was to simplify the source > code. > I think this reflects the transitional nature of going from mostly Python to a hybrid. We agree that the current state is more convoluted than it ought to be. If NDarray were all C, much of this would ended (though in some respects, being all in C will make it larger, harder to understand as well). The original hope was that most of the array setup computation could be kept in Python but that is what made it slow for small arrays (but it did allow us to implement it reasonably quickly with big array performance so that we could start using for our own projects without a long development effort). Unfortunately, the simplification in the rewrite is offset by handling the more complex cases (byte-swapping, etc.) and extra array indexing capabilities. > 4) Object arrays must be supported. This was a bad oversight and an > important feature of Numeric arrays. > The current implementation does support them (though in a different way, and generally not as efficiently, though Todd is more up on the details here). What aspect of object arrays are you finding lacking? C-api? > 5) The ufunc code interface needs to continue to be improved. I do see > that some effort into understanding the old ufunc interface has taken > place which is a good sign. > You are probably referring to work underway to integrate with scipy (I'm assuming you are looking at the version in CVS). > Again, thanks to the work that has been done. I'm really interested to > see if some of these modifications can be done as in my mind it will > help the process of unifying the two camps. > I'm glad to see that you are taking a look at it and welcome the comments and any offers of help in improving speed. Perry From Fernando.Perez at colorado.edu Mon Jan 17 13:25:34 2005 From: Fernando.Perez at colorado.edu (Fernando Perez) Date: Mon Jan 17 13:25:34 2005 Subject: [Numpy-discussion] Speeding up numarray -- questions on its design In-Reply-To: References: Message-ID: <41EC2D14.7000203@colorado.edu> Hi all, just some comments from the sidelines, while I applaud the fact that we are moving towards a successful numeric/numarray integration. Perry Greenfield wrote: > the array struct size matters. Do you have code with hundreds of > thousands > of small arrays existing simultaneously? I do have code with perhaps ~100k 'small' arrays (12x12x12 or so) in existence simultaneously, plus a few million created temporarily as part of the calculations. Needless to say, this uses Numeric :) What's been so nice about Numeric is that even with my innermost loops (carefully) coded in python, I get very acceptable performance for real-world problems. Perrry and I had this conversation over at scipy'04, so this is just a reminder. The Blitz++ project has faced similar problems of performance for their very flexible arrays classes, and their approach has been to have separate TinyVector/TinyMatrix classes. These do not offer almost any of the fancier features of the default Blitz Arrays, but they keep the same syntactic behavior and identical semantics where applicable. What they give up in flexibility, they gain in performance. I realize this requires a substantial amount of work, but perhaps it will be worthwhile in the long run. It would be great to have a numarray small_array() object which would not allow byteswapping, memory-mapping, or any of the extra features which make them memory and time consuming, but which would maintain compatibility with the regular arrays as far as arithmetic operators and ufunc application (including obviously lapack/blas/f2py usage). I know I am talking from 50.000 feet up, so I'm sure once you get down to the details this will probably not be easy (I can already see difficulties with the size of the underlying C structures for C API compatibility). But in the end, I think something like this might be the only way to satisfy all the disparate usage cases for numerical arrays in scientific computing. Besides the advanced features disparity, a simple set of guidelines for the crossover points in terms of performance would allow users to choose in their own codes what to use. At any rate, I'm extremely happy to see scipy/numarray integration moving forward. My thanks to all those who are actually doing the hard work. Regards, f From juenglin at cs.pdx.edu Mon Jan 17 14:33:26 2005 From: juenglin at cs.pdx.edu (Ralf Juengling) Date: Mon Jan 17 14:33:26 2005 Subject: [Numpy-discussion] Speeding up numarray -- questions on its design In-Reply-To: References: Message-ID: <1106001150.28436.300.camel@alpspitze.cs.pdx.edu> On Mon, 2005-01-17 at 11:12, Perry Greenfield wrote: > Travis Oliphant wrote: > > 3) There seem to be too many files to define the array. The mixture of > > Python and C makes trying to understand the source very difficult. I > > thought one of the reasons for the re-write was to simplify the source > > code. > > > I think this reflects the transitional nature of going from mostly > Python > to a hybrid. We agree that the current state is more convoluted than it > ought to be. If NDarray were all C, much of this would ended (though in > some respects, being all in C will make it larger, harder to understand > as well). The original hope was that most of the array setup computation > could be kept in Python but that is what made it slow for small arrays > (but it did allow us to implement it reasonably quickly with big array > performance so that we could start using for our own projects without > a long development effort). Unfortunately, the simplification in the > rewrite is offset by handling the more complex cases (byte-swapping, > etc.) and extra array indexing capabilities. I took a cursory look at the C API the other day and learned about this capability to process byte-swapped data. I am wondering why this is a good thing to have. Wouldn't it be enough and much easier to drop this feature and instead equip numarray IO routines with the capability to convert to and from a foreign endian to the host endian encoding? ralf From perry at stsci.edu Mon Jan 17 20:18:39 2005 From: perry at stsci.edu (Perry Greenfield) Date: Mon Jan 17 20:18:39 2005 Subject: [Numpy-discussion] Speeding up numarray -- questions on itsdesign In-Reply-To: <1106001150.28436.300.camel@alpspitze.cs.pdx.edu> Message-ID: Ralf Juengling wrote: > I took a cursory look at the C API the other day and learned about > this capability to process byte-swapped data. I am wondering why > this is a good thing to have. Wouldn't it be enough and much easier > to drop this feature and instead equip numarray IO routines with > the capability to convert to and from a foreign endian to the host > endian encoding? > Basically this feature was to allow use of memory mapped data that didn't use the native representation of the processor (also related to supporting record arrays). The details are given in a paper a couple years ago: http://www.stsci.edu/resources/software_hardware/numarray/papers/pycon2003.p df Perry From Chris.Barker at noaa.gov Tue Jan 18 09:49:38 2005 From: Chris.Barker at noaa.gov (Chris Barker) Date: Tue Jan 18 09:49:38 2005 Subject: [Numpy-discussion] Speeding up numarray -- questions on its design In-Reply-To: <41EC2D14.7000203@colorado.edu> References: <41EC2D14.7000203@colorado.edu> Message-ID: <41ED4AD0.6060204@noaa.gov> Hi all, This discussion has brought up a question I have had for a while: Can anyone provide a one-paragraph description of what numarray does that gives it better large-array performance than Numeric? By the way, For what it's worth, what's kept me from switching is the small array performance, and/or the array-creation performance. I don't use very large arrays, but I do use small ones all the time. thanks, -Chris -- Christopher Barker, Ph.D. Oceanographer NOAA/OR&R/HAZMAT (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From oliphant at ee.byu.edu Tue Jan 18 10:28:37 2005 From: oliphant at ee.byu.edu (Travis Oliphant) Date: Tue Jan 18 10:28:37 2005 Subject: [Numpy-discussion] Speeding up numarray -- questions on its design In-Reply-To: References: Message-ID: <41ED54E5.2050104@ee.byu.edu> Thanks for the comments that have been made. One of my reasons for commenting is to get an understanding of which design issues of Numarray are felt to be important and which can change. There seems to be this idea that small arrays are not worth supporting. I hope this is just due to time-constraints and not some fundamental idea that small arrays should never be considered with Numarray. Otherwise, there will always be two different array implementations developing at their own pace. I really want to gauge how willing developers of numarray are to changing things. Perry Greenfield wrote: >> 1) Are there plans to move the nd array entirely into C? >> -- I would like to see the nd array become purely a c-type. I would >> be willing to help here. I can see that part of the work has been done. >> > I don't know that I would say they are definite, but I think that at > some point we thought that would be necessary. We haven't yet since > doing so makes it harder to change so it would be one of the last > changes to the core that we would want to do. Our current priorities > are towards making all the major libraries and packages available > under it first and then finishing optimization issues (another issue > that has to be tackled soon is handling 64-bit addressing; apparently > the work to make Python sequences use 64-bit addresses is nearing > completion so we want to be able to handle that. I expect we would > want to make sure we find a way of handling that before we turn it > all into C but maybe it is just as easy doing them in the opposite > order. I do not think it would be difficult at this point to move it all to C and then make future changes there (you can always call pure Python code from C). With the structure in place and some experience behind you, now seems like as good a time as any. Especially, because now is a better time for me than any... I like what numarray is doing by not always defaulting to ints with the maybelong type. It is a good idea. > >> 2) Why is the ND array C-structure so large? Why are the dimensions >> and strides array static? Why can't the extra stuff that the fancy >> arrays need be another structure and the numarray C structure just >> extended with a pointer to the extra stuff? > > > When Todd moved NDArray into C, he tried to keep it simple. As > such, it > has no "moving parts." We think making dimensions and strides malloc'ed > rather than static would be fairly easy. Making the "extra stuff" > variable is something we can look at. But allocating dimensions and strides when needed is not difficult and it reduces the overhead of the ndarray object. Currently, that overhead seems extreme. I could be over-reacting here, but it just seems like it would have made more sense to expand the array object as little as possible to handle the complexity that you were searching for. It seems like more modifications were needed in the ufunc then in the arrayobject. > > The bottom line is that adding the variability adds complexity and we're > not sure we understand the storage economics of why we would doing it. > Numarray was designed, first and foremost, for large arrays. Small arrays are never going to disappear (Fernando Perez has an excellent example) and there are others. A design where a single pointer not being NULL is all that is needed to distinguish "simple" Numeric-like arrays from "fancy" numarray-like arrays seems like a great way to make sure that > For that case, > the array struct size is irrelevant whereas additional complexity is > not. I guess we would like to see some good practical examples where > the array struct size matters. Do you have code with hundreds of > thousands > of small arrays existing simultaneously? As mentioned before, such code exists especially when arrays become a basic datatype that you use all the time. How much complexity is really generated by offloading the extra struct material to a bigarray structure, thereby only increasing the Numeric array structure by 4 bytes instead of 200+? On another fundamental note, numarray is being sold as a replacement for Numeric. But, then, on closer inspection many things that Numeric does well, numarray is ignoring or not doing very well. I think this presents a certain amount of false advertising to new users, who don't understand the history. Most of them would probably never need the fanciness that numarray provides and would be quite satisfied with Numeric. They just want to know what others are using. I think it is a disservice to call numarray a replacement for Numeric until it actually is. It should currently be called an "alternative implementation" focused on large arrays. This (unintentional) slight of hand that has been occurring over the past year has been my biggest complaint with numarray. Making numarray a replacement for Numeric means that it has to support small arrays, object arrays, and ufuncs at least as well as but preferably better than Numeric. It should also be faster than Numeric whenever possible, because Numeric has lots of potential optimizations that have never been applied. If numarray does not do these things, then in my mind it cannot be a replacement for Numeric and should stop being called that on the numpy web site. >> 3) There seem to be too many files to define the array. The mixture of >> Python and C makes trying to understand the source very difficult. I >> thought one of the reasons for the re-write was to simplify the source >> code. >> > I think this reflects the transitional nature of going from mostly Python > to a hybrid. We agree that the current state is more convoluted than it > ought to be. If NDarray were all C, much of this would ended (though in > some respects, being all in C will make it larger, harder to understand > as well). The original hope was that most of the array setup computation > could be kept in Python but that is what made it slow for small arrays > (but it did allow us to implement it reasonably quickly with big array > performance so that we could start using for our own projects without > a long development effort). Unfortunately, the simplification in the > rewrite is offset by handling the more complex cases (byte-swapping, > etc.) and extra array indexing capabilities. I never really understood the "code is too complicated" argument anyway. I was just wondering if there is some support for reducing the number of source code files, or reorganizing them a bit. >> 4) Object arrays must be supported. This was a bad oversight and an >> important feature of Numeric arrays. >> > The current implementation does support them (though in a different > way, and generally not as efficiently, though Todd is more up on the > details here). What aspect of object arrays are you finding lacking? > C-api? I did not see such support when I looked at it, but given the previous comment, I could easily have missed where that support is provided. I'm mainly following up on Konrad's comment that his Automatic differentiation does not work with Numarray because of the missing support for object arrays. There are other applications for object arrays as well. Most of the support needs to come from the ufunc side. > >> 5) The ufunc code interface needs to continue to be improved. I do see >> that some effort into understanding the old ufunc interface has taken >> place which is a good sign. >> > You are probably referring to work underway to integrate with scipy (I'm > assuming you are looking at the version in CVS). Yes, I'm looking at the CVS version. > >> Again, thanks to the work that has been done. I'm really interested to >> see if some of these modifications can be done as in my mind it will >> help the process of unifying the two camps. >> > I'm glad to see that you are taking a look at it and welcome the > comments and > any offers of help in improving speed. > I would be interested in helping if there is support for really making numarray a real replacement for Numeric, by addressing the concerns that I've outlined. As stated at the beginning, I'm really just looking for how receptive numarray developers would be to the kinds of changes I'm talking about: (1) reducing the size of the array structure, (2) moving the ndarray entirely into C, (3) improving support for object arrays, (4) improving ufunc API support. I care less about array and ufunc C-API names being the same then the underlying capabilities being available. Best regards, -Travis Oliphant From paul at pfdubois.com Tue Jan 18 13:57:33 2005 From: paul at pfdubois.com (Paul Dubois) Date: Tue Jan 18 13:57:33 2005 Subject: [Numpy-discussion] Speeding up numarray -- questions on its design In-Reply-To: <41ED54E5.2050104@ee.byu.edu> References: <41ED54E5.2050104@ee.byu.edu> Message-ID: <41ED85F4.3010809@pfdubois.com> I haven't followed this discussion in detail but with respect to space for 'descriptors', it would simply be foolish to malloc space for these. The cost is ridiculous. You simply have to decide how big a number of dimensions to allow, make it a clearly findable definition in the sources, and dimension everything that big. Originally when we discussed this we considered 7, since that had been (and for all I know still is) the maximum array dimension in Fortran. But Jim Huginin needed 11 or something like it for his imaging. I've seen 40 in the numarray sources I think. It seems to me that an application that would care about this space (it being, after all, per array object) would be unusual indeed. If I've misunderstood what you're talking about, never mind. (:-> My advice is to make flexibility secondary to performance. It is always possible to layer on flexibility for those who want it. From rkern at ucsd.edu Tue Jan 18 14:17:30 2005 From: rkern at ucsd.edu (Robert Kern) Date: Tue Jan 18 14:17:30 2005 Subject: [Numpy-discussion] Speeding up numarray -- questions on its design In-Reply-To: <41ED54E5.2050104@ee.byu.edu> References: <41ED54E5.2050104@ee.byu.edu> Message-ID: <41ED8AC0.8090207@ucsd.edu> Travis Oliphant wrote: >>> 4) Object arrays must be supported. This was a bad oversight and an >>> important feature of Numeric arrays. >>> >> The current implementation does support them (though in a different >> way, and generally not as efficiently, though Todd is more up on the >> details here). What aspect of object arrays are you finding lacking? >> C-api? > > > I did not see such support when I looked at it, but given the previous > comment, I could easily have missed where that support is provided. I'm > mainly following up on Konrad's comment that his Automatic > differentiation does not work with Numarray because of the missing > support for object arrays. There are other applications for object > arrays as well. Most of the support needs to come from the ufunc side. It's tucked away in numarray.objects. Unfortunately for Konrad's application, numarray ufuncs don't recognize that it's being passed an object with the special methods defined, and they won't automatically create 0-D object "arrays". 0-D object arrays will work just fine when using operators (x+y works), but not when explicitly calling the ufuncs (add(x,y) does not work). Both methods work fine for 0-D numerical arrays. -- Robert Kern rkern at ucsd.edu "In the fields of hell where the grass grows high Are the graves of dreams allowed to die." -- Richard Harter From oliphant at ee.byu.edu Tue Jan 18 14:26:32 2005 From: oliphant at ee.byu.edu (Travis Oliphant) Date: Tue Jan 18 14:26:32 2005 Subject: [Numpy-discussion] Speeding up numarray -- questions on its design In-Reply-To: <41ED85F4.3010809@pfdubois.com> References: <41ED54E5.2050104@ee.byu.edu> <41ED85F4.3010809@pfdubois.com> Message-ID: <41ED8CA8.5090407@ee.byu.edu> Paul Dubois wrote: > I haven't followed this discussion in detail but with respect to space > for 'descriptors', it would simply be foolish to malloc space for > these. The cost is ridiculous. You simply have to decide how big a > number of dimensions to allow, make it a clearly findable definition > in the sources, and dimension everything that big. > Thanks for this comment. I can see now that it makes sense as it would presumably speed up small array creation. Why was this not done in the original sources? > Originally when we discussed this we considered 7, since that had been > (and for all I know still is) the maximum array dimension in Fortran. > But Jim Huginin needed 11 or something like it for his imaging. I've > seen 40 in the numarray sources I think. > > It seems to me that an application that would care about this space > (it being, after all, per array object) would be unusual indeed. > > If I've misunderstood what you're talking about, never mind. (:-> I think you've understood this part of it and have given good advice. > > My advice is to make flexibility secondary to performance. It is > always possible to layer on flexibility for those who want it. I like this attitude. -Travis From perry at stsci.edu Tue Jan 18 17:52:46 2005 From: perry at stsci.edu (Perry Greenfield) Date: Tue Jan 18 17:52:46 2005 Subject: [Numpy-discussion] Speeding up numarray -- questions on its design In-Reply-To: <41ED54E5.2050104@ee.byu.edu> Message-ID: Travis Oliphant > > Thanks for the comments that have been made. One of my reasons for > commenting is to get an understanding of which design issues of Numarray > are felt to be important and which can change. There seems to be this > idea that small arrays are not worth supporting. I hope this is just > due to time-constraints and not some fundamental idea that small arrays > should never be considered with Numarray. Otherwise, there will > always be two different array implementations developing at their > own pace. > I wouldn't say that we are "hostile" to small arrays. We do only have limited resources and can't do everything we would like. More on this below though. > I really want to gauge how willing developers of numarray are to > changing things. > Without going into all the details below, I think I can address this point. I suppose it all depends on what you mean by "how willing developers of numarray are to changing things." If you mean are we open to changes to numarray that speed up small arrays (and address other noted shortcomings). Yes, certainly (so long as they don't hurt the large array issues significantly). If it means we will drop everything and address all these issues immediately ourselves. No, we have other things to do regarding numarray that have higher priority before we can address these things. I would have a very hard time justifying the effort when there are other things needed by STScI more. We would love it if others could address them sooner though. More on related issues below. > >> 1) Are there plans to move the nd array entirely into C? [...] > I do not think it would be difficult at this point to move it all to C > and then make future changes there (you can always call pure Python code > from C). With the structure in place and some experience behind you, > now seems like as good a time as any. Especially, because now is a > better time for me than any... I like what numarray is doing by not > always defaulting to ints with the maybelong type. It is a good idea. > I hope that is true, but we've found doing moving thing to C a bigger effort than we would like. I'd like to be proved wrong by someone who can tackle sooner than we can. > > > >> 2) Why is the ND array C-structure so large? Why are the dimensions > >> and strides array static? Why can't the extra stuff that the fancy > >> arrays need be another structure and the numarray C structure just > >> extended with a pointer to the extra stuff? > > > > > > When Todd moved NDArray into C, he tried to keep it simple. As > > such, it > > has no "moving parts." We think making dimensions and strides malloc'ed > > rather than static would be fairly easy. Making the "extra stuff" > > > variable is something we can look at. > > But allocating dimensions and strides when needed is not difficult and > it reduces the overhead of the ndarray object. Currently, that overhead > seems extreme. I could be over-reacting here, but it just seems like it > would have made more sense to expand the array object as little as > possible to handle the complexity that you were searching for. It seems > like more modifications were needed in the ufunc then in the arrayobject. > I'm not convinced that this is a big issue, but we have no objection to someone making this change. But it falls well below small array performance in priority for us. > > > > The bottom line is that adding the variability adds complexity and we're > > not sure we understand the storage economics of why we would doing it. > > Numarray was designed, first and foremost, for large arrays. > > Small arrays are never going to disappear (Fernando Perez has an > excellent example) and there are others. A design where a single > pointer not being NULL is all that is needed to distinguish "simple" > Numeric-like arrays from "fancy" numarray-like arrays seems like a great > way to make sure that > I won't quarrel with that (but I'm not sure what you are suggesting in the bigger picture). > On another fundamental note, numarray is being sold as a replacement for > Numeric. But, then, on closer inspection many things that Numeric does > well, numarray is ignoring or not doing very well. I think this > presents a certain amount of false advertising to new users, who don't > understand the history. Most of them would probably never need the > fanciness that numarray provides and would be quite satisfied with > Numeric. They just want to know what others are using. I think it is > a disservice to call numarray a replacement for Numeric until it > actually is. It should currently be called an "alternative > implementation" focused on large arrays. This (unintentional) slight of > hand that has been occurring over the past year has been my biggest > complaint with numarray. Making numarray a replacement for Numeric > means that it has to support small arrays, object arrays, and ufuncs at > least as well as but preferably better than Numeric. It should also be > faster than Numeric whenever possible, because Numeric has lots of > potential optimizations that have never been applied. If numarray does > not do these things, then in my mind it cannot be a replacement for > Numeric and should stop being called that on the numpy web site. > It distresses me to be accused of false advertising. We were pretty up front at the beginning of the process of writing numarray that the approach we would be taking would likely mean slower small array performance. There were those (like you and Eric that expressed concern about that), but it wasn't at all clear what the consensus was regarding how much it could change and be acceptable. (I recall at one point when IDL was ported from Fortran to C which resulted in a factor of 2 overall slowdown in speed. People didn't accuse RSI of providing something that wasn't a replacement for IDL.) The fact was that at the time we started, several thought that backward compatibility wasn't that important. We didn't even try at the beginning to make the C-API the same. At the start, there was no claim that numarray would be an exact replacement for Numeric. (And I didn't hear huge objections at the time on the point and some that actually encouraged a break with how Numeric did things.) Much of the attempts to provide backward compatiblity have come well after the first implementations. We have strove to provide the full functionality of what Numeric had as we went to version 1.0. Sure, there are some holes for object arrays. So the issue of whether numarray is a replacement or not seems to be arguing over what the intent of the project was. Paul Dubois wrote the numpy page that make that reference, and sure, I didn't object to it (But why didn't you at the time? It's been there a long time, and the goals and direction of numarray have been quite visible for a long time. This wasn't some dark, secret project. Many of the things you are complaining about have been true for some time.) If people want to call numarray an alternative implementation, I'm fine with that. It was a replacement in our case. If we didn't develop it, we likely wouldn't be using Python in the full sense that we are now. Numeric wasn't really an option. At the time, many supported the idea of a reimplementation so it seemed like a good opportunity to add what we needed and do that. Obviously, we misread the importance of small array performance for a significant part of the community. (But I keep saying, if small array peformance is really that important, it would seem to me that much bigger wins are available as Fernando mentioned) It's been clear for a better part of a year that it would be a long time before there was any sort of unification between the two. That distressed me as I'm sure it did you. So some sort of useful sharing of libraries and packages seemed like the obvious way to go. In more specialized areas, there would be some divergence (e.g., we have dependencies on record arrays that we just can't provide in Numeric). I can no longer justify sinking many more months of work into numarray for issues of no value to STScI (other than the hope that it would convince others to switch, which isn't clear at all that it would). We need to move towards providing a lot of the tools that are available for Numeric. I can justify that work. The current situation is far from ideal (Paul called it "insane" at scipy if you prefer more colorful language). What we have are two camps that cannot afford to give up the capabilities that are unique to each version. But with most of the C-API compatable, and a way of coding most libraries (except for Ufuncs) to be compatible with both, we certainly can improve the situation. If you can help remove the biggest obstacle, small array performance, so that we could unify the two I would be thrilled, but most of the effort can't come from us, at least not in the near term (next year). We can help at some level. [...] > I never really understood the "code is too complicated" argument You lost me on this one. You mean the complaint that it was too complicated in Numeric way back? > anyway. I was just wondering if there is some support for reducing the > number of source code files, or reorganizing them a bit. > Yes, I'd say that this has relatively high priority. It would be nice to have feedback and advice on how to do this best. > >> 4) Object arrays must be supported. This was a bad oversight and an > >> important feature of Numeric arrays. > >> > > The current implementation does support them (though in a different > > way, and generally not as efficiently, though Todd is more up on the > > details here). What aspect of object arrays are you finding lacking? > > C-api? > > I did not see such support when I looked at it, but given the previous > comment, I could easily have missed where that support is provided. I'm > mainly following up on Konrad's comment that his Automatic > differentiation does not work with Numarray because of the missing > support for object arrays. There are other applications for object > arrays as well. Most of the support needs to come from the ufunc side. > I think Robert Kern pointed to the issue in a subsequent message. > > > > > >> Again, thanks to the work that has been done. I'm really interested to > >> see if some of these modifications can be done as in my mind it will > >> help the process of unifying the two camps. > >> > > I'm glad to see that you are taking a look at it and welcome the > > comments and > > any offers of help in improving speed. > > > I would be interested in helping if there is support for really making > numarray a real replacement for Numeric, by addressing the concerns that > I've outlined. As stated at the beginning, I'm really just looking > for how receptive numarray developers would be to the kinds of changes > I'm talking about: (1) reducing the size of the array structure, (2) > moving the ndarray entirely into C, (3) improving support for object > arrays, (4) improving ufunc API support. > I'm not exactly sure what you mean by 4). If you mean having a compatible api to numeric, that seem like a lot of work since the way ufuncs work in numarray is quite different. But you may mean something else. Perry From perry at stsci.edu Tue Jan 18 17:55:38 2005 From: perry at stsci.edu (Perry Greenfield) Date: Tue Jan 18 17:55:38 2005 Subject: [Numpy-discussion] Speeding up numarray -- questions on its design In-Reply-To: <41ED85F4.3010809@pfdubois.com> Message-ID: Paul Dubois wrote: > > I haven't followed this discussion in detail but with respect to space > for 'descriptors', it would simply be foolish to malloc space for these. > The cost is ridiculous. You simply have to decide how big a number of > dimensions to allow, make it a clearly findable definition in the > sources, and dimension everything that big. > > Originally when we discussed this we considered 7, since that had been > (and for all I know still is) the maximum array dimension in Fortran. > But Jim Huginin needed 11 or something like it for his imaging. I've > seen 40 in the numarray sources I think. > Actually, 40 came from Numeric. It may have been reduced to 11, but I'm sure it was 40 at one point. Jim even had a comment in the code to the effect that if someone needed more than 40, he wanted to see the problem that needed that. If people think it is too high, I'd be very happy to reduce it. Perry From cookedm at physics.mcmaster.ca Tue Jan 18 19:04:34 2005 From: cookedm at physics.mcmaster.ca (David M. Cooke) Date: Tue Jan 18 19:04:34 2005 Subject: [Numpy-discussion] Speeding up numarray -- questions on its design In-Reply-To: <41ED8AC0.8090207@ucsd.edu> (Robert Kern's message of "Tue, 18 Jan 2005 14:16:32 -0800") References: <41ED54E5.2050104@ee.byu.edu> <41ED8AC0.8090207@ucsd.edu> Message-ID: Robert Kern writes: > Travis Oliphant wrote: > >>>> 4) Object arrays must be supported. This was a bad oversight and an >>>> important feature of Numeric arrays. >>>> >>> The current implementation does support them (though in a different >>> way, and generally not as efficiently, though Todd is more up on the >>> details here). What aspect of object arrays are you finding lacking? >>> C-api? >> I did not see such support when I looked at it, but given the >> previous comment, I could easily have missed where that support is >> provided. I'm mainly following up on Konrad's comment that his >> Automatic differentiation does not work with Numarray because of the >> missing support for object arrays. There are other applications for >> object arrays as well. Most of the support needs to come from the >> ufunc side. > > It's tucked away in numarray.objects. Unfortunately for Konrad's > application, numarray ufuncs don't recognize that it's being passed an > object with the special methods defined, and they won't automatically > create 0-D object "arrays". 0-D object arrays will work just fine when > using operators (x+y works), but not when explicitly calling the > ufuncs (add(x,y) does not work). Both methods work fine for 0-D > numerical arrays. Are the 0-D object arrays necessary for this? The behaviour that Konrad needs is this (highly abstracted): class A: def __add__(self, other): return 0.1 def sin(self): return 0.5 Then: >>> a = A() >>> a + a 0.10000000000000001 >>> Numeric.add(a,a) 0.10000000000000001 >>> Numeric.sin(a) 0.5 The Numeric ufuncs, if the argument isn't an array, look for a method of the right name (here, sin) on the object, and call that. You could define a delegate class that does this with something like class MathFunctionDelegate: def __init__(self, fallback=Numeric): self._fallback = fallback def add(self, a, b): try: return a + b except TypeError: return self._fallback.add(a, b) def sin(self, x): sin = getattr(x, 'sin', None) if sin is None: return self._fallback.sin(x) else: return sin(x) ... etc. ... (This could be a module, too. This just allows parameterisation.) In ScientificPython, FirstDerivatives.py has a method of the DerivVar class that looks like this: def sin(self): v = Numeric.sin(self.value) d = Numeric.cos(self.value) return DerivVar(v, map(lambda x,f=d: f*x, self.deriv)) Add something like this to the __init__: self._mathfuncs = MathFunctionDelegate(Numeric) and that sin method becomes def sin(self): v = self._mathfuncs.sin(self.value) d = self._mathfuncs.cos(self.value) return DerivVar(v, map(lambda x,f=d: f*x, self.deriv)) That's not quite perfect, as the user has to use a mathfuncs object also; that's why having Numeric or numarray do the delegation automatically is nice. This would work equally well with numarray (or the math or cmath modules!) replacing Numeric. You could get fancy and be polymorphic: choose the right module to use depending on the type of the argument (Numeric arrays use Numeric, floats use math, etc.). If this was a module instead, you could have registration of types. I'll call this module numpy. Here's a possible (low-level) usage: import numpy import Numeric, numarray, math, cmath from Scientific.Functions import Derivatives numpy.register_type(Numeric.arraytype, Numeric) numpy.register_type(numarray.NumArray, numarray) numpy.register_type(float, math) numpy.register_type(complex, cmath) numpy.register_type(Derivatives.DerivVar, Derivates.derivate_math) numpy.default_constructor(numarray.array) a = numpy.array([1,2,3]) # makes a numarray b = Numeric.array([1,2,3]) # Numeric array print numpy.sin(a), numpy.sin(b) Things to consider with this would be: * how to handle a + b * where should the registering of types be done? (Probably by the packages themselves) * more complex predicates for registering handlers? (to handle subclasses, etc.) etc. Ok, I hope that's not too rambling. But the idea is that neither Numeric nor numarray need to provide the delegation ability. -- |>|\/|< /--------------------------------------------------------------------------\ |David M. Cooke http://arbutus.physics.mcmaster.ca/dmc/ |cookedm at physics.mcmaster.ca From konrad.hinsen at laposte.net Wed Jan 19 03:39:04 2005 From: konrad.hinsen at laposte.net (konrad.hinsen at laposte.net) Date: Wed Jan 19 03:39:04 2005 Subject: [Numpy-discussion] Speeding up numarray -- questions on its design In-Reply-To: <41ED54E5.2050104@ee.byu.edu> References: <41ED54E5.2050104@ee.byu.edu> Message-ID: On 18.01.2005, at 19:26, Travis Oliphant wrote: > On another fundamental note, numarray is being sold as a replacement > for Numeric. But, then, on closer inspection many things that Numeric > does well, numarray is ignoring or not doing very well. I think this > presents a certain amount of false advertising to new users, who don't > understand the history. Most of them would probably never need the > fanciness that I agree with that. I regularly get questions from people who download my code and then wonder why it "still" uses NumPy instead of the "newer" numarray. The reason is that my code has nothing to gain from numarray, as it uses many small and few if any very large arrays. I have no problem explaining that, but the fact that the question arises shows that there is a wrong perception by many newcomers of the relation between NumPy and numarray. > comment, I could easily have missed where that support is provided. > I'm mainly following up on Konrad's comment that his Automatic > differentiation does not work with Numarray because of the missing > support for object arrays. There are other applications for object > arrays as well. Most of the While I agree that object arrays are useful, they have nothing to do with the missing feature that I mentioned recently. That one concerns only ufuncs. In NumPy, they use a method call when presented with an object type they cannot handle directly. In numarray, they just produce an error message in that case. Returning to object arrays, I have used them occasionally but never in any of my public code, because there have been lots of minor bugs concerning them in all versions of NumPy. It would be nice if numarray could do a better job there. Konrad. -- ------------------------------------------------------------------------ ------- Konrad Hinsen Laboratoire Leon Brillouin, CEA Saclay, 91191 Gif-sur-Yvette Cedex, France Tel.: +33-1 69 08 79 25 Fax: +33-1 69 08 82 61 E-Mail: hinsen at llb.saclay.cea.fr ------------------------------------------------------------------------ ------- From konrad.hinsen at laposte.net Wed Jan 19 04:32:01 2005 From: konrad.hinsen at laposte.net (konrad.hinsen at laposte.net) Date: Wed Jan 19 04:32:01 2005 Subject: [Numpy-discussion] Speeding up numarray -- questions on its design In-Reply-To: References: Message-ID: <01532740-6A16-11D9-B1A9-000A95AB5F10@laposte.net> On 19.01.2005, at 02:56, Perry Greenfield wrote: > It distresses me to be accused of false advertising. We were pretty > up front at the beginning of the process of writing numarray that the It's not you, or the numarray team in general, that is being accused. Actually I doubt that any single person is responsible for the current state of misinformation. Those are the wonders of the OpenSource world. I saw Travis' post more as a request for clarification than an accusation against anyone in particular. As you describe very well, there is a gap between past intents and what has actually happened. > concern about that), but it wasn't at all clear what the consensus > was regarding how much it could change and be acceptable. (I recall It's probably still not clear. Perhaps there is no consensus at all. > The current situation is far from ideal (Paul called it "insane" > at scipy if you prefer more colorful language). What we have are > two camps that cannot afford to give up the capabilities that are > unique to each version. But with most of the C-API compatable, and > a way of coding most libraries (except for Ufuncs) to be compatible > with both, we certainly can improve the situation. I am not sure that compatibility is really the main issue. In the typical scientific computing installation, NumPy and numarray are building blocks. Some people use them without even being aware of them, indirectly through other libraries. In a building-block world, two bricks should be either equivalent or be able to coexist. The original intention was to make NumPy and numarray equivalent, but this is not what they are at the moment. But they do not coexist very well either. While it is easy to install both of them, every library that builds on them uses one or the other (and to make it worse, it is not always easy to figure out which one is used if both are available). Sooner or later, anyone who uses multiple libraries that are array clients is going to have a compatibility issue, which will probably be hard to understand because both sides' arrays look so very similar. Konrad. -- ------------------------------------------------------------------------ ------- Konrad Hinsen Laboratoire Leon Brillouin, CEA Saclay, 91191 Gif-sur-Yvette Cedex, France Tel.: +33-1 69 08 79 25 Fax: +33-1 69 08 82 61 E-Mail: hinsen at llb.saclay.cea.fr ------------------------------------------------------------------------ ------- From konrad.hinsen at laposte.net Wed Jan 19 04:48:13 2005 From: konrad.hinsen at laposte.net (konrad.hinsen at laposte.net) Date: Wed Jan 19 04:48:13 2005 Subject: [Numpy-discussion] Speeding up numarray -- questions on its design In-Reply-To: References: <41ED54E5.2050104@ee.byu.edu> <41ED8AC0.8090207@ucsd.edu> Message-ID: <3EF499F4-6A18-11D9-B1A9-000A95AB5F10@laposte.net> On 19.01.2005, at 04:03, David M. Cooke wrote: > That's not quite perfect, as the user has to use a mathfuncs object > also; that's why having Numeric or numarray do the delegation > automatically is nice. Exactly. It is an important practical feature of automatic derivatices that you can use with with nearly any existing mathematical code. If you have to import the math functions from somewhere else, then you have to adapt all that code, which in the case of code imported from some other module means rewriting it. More importantly, that approach doesn't scale to larger installations. If two different modules use it to provide generalized math functions, then the math functions of the two will not be interchangeable. In fact, it was exactly that kind of missing unversality that was the motivation for the ufunc code in NumPy ("u" for "universal"). Before NumPy, we had math (for float) and cmath (for complex), but there was no simple way to write code that would accept either float or complex even though that is often useful. Ufuncs would work on float, complex, arrays of either type, and "anything else" through the method call mechanism. > If this was a module instead, you could have registration of types. > I'll call this module numpy. Here's a possible (low-level) usage: Yes, a universal module with a registry would be another viable solution. But the whole community would have to agree on one such module to make it useful. > Things to consider with this would be: > * how to handle a + b a + b is just operator.add(a, b). The same mechanism would work. > * where should the registering of types be done? (Probably by the > packages themselves) Probably. The method call approach has an advantage here: no registry is required. In fact, if we could start all over again, I would argue for a math function module to be part of core Python that does nothing else but converting function calls into method calls. After all, math functions are just syntactic sugar for what functionally *is* a method call. Konrad. -- ------------------------------------------------------------------------ ------- Konrad Hinsen Laboratoire Leon Brillouin, CEA Saclay, 91191 Gif-sur-Yvette Cedex, France Tel.: +33-1 69 08 79 25 Fax: +33-1 69 08 82 61 E-Mail: hinsen at llb.saclay.cea.fr ------------------------------------------------------------------------ ------- From perry at stsci.edu Wed Jan 19 08:44:18 2005 From: perry at stsci.edu (Perry Greenfield) Date: Wed Jan 19 08:44:18 2005 Subject: [Numpy-discussion] Speeding up numarray -- questions on its design In-Reply-To: <01532740-6A16-11D9-B1A9-000A95AB5F10@laposte.net> References: <01532740-6A16-11D9-B1A9-000A95AB5F10@laposte.net> Message-ID: <4D98AB88-6A39-11D9-B8A8-000A95B68E50@stsci.edu> On Jan 19, 2005, at 7:31 AM, konrad.hinsen at laposte.net wrote: > >> The current situation is far from ideal (Paul called it "insane" >> at scipy if you prefer more colorful language). What we have are >> two camps that cannot afford to give up the capabilities that are >> unique to each version. But with most of the C-API compatable, and >> a way of coding most libraries (except for Ufuncs) to be compatible >> with both, we certainly can improve the situation. > > I am not sure that compatibility is really the main issue. In the > typical scientific computing installation, NumPy and numarray are > building blocks. Some people use them without even being aware of > them, indirectly through other libraries. > > In a building-block world, two bricks should be either equivalent or > be able to coexist. The original intention was to make NumPy and > numarray equivalent, but this is not what they are at the Just to clarify, the intention to make them equivalent was not originally true (and some encouraged the idea that there be a break with Numpy compatibility). But that has grown to be a much bigger goal over time. > moment. But they do not coexist very well either. While it is easy to > install both of them, every library that builds on them uses one or > the other (and to make it worse, it is not always easy to figure out > which one is used if both are available). Sooner or later, anyone who > uses multiple libraries that are array clients is going to have a > compatibility issue, which will probably be hard to understand because > both sides' arrays look so very similar. > No doubt that supporting both introduces more work, but for the most part, I think that with the exception of some parts(namely ufunc C-api), it should be possible write a library that supports both with little conditional code. That does mean not using some features of numarray, or depending some of the different behaviors of Numeric (e.g., scalar coercion rules), so that requires understanding the subsets to use. And that does cost. But one doesn't need to have two separate libraries. In such cases I'm hoping there is no need to mix different flavors of arrays. You either use Numeric arrays consistently or numarrays consistently. And if the two can be unified, then this will just be a intermediate solution. Perry From perry at stsci.edu Wed Jan 19 08:45:32 2005 From: perry at stsci.edu (Perry Greenfield) Date: Wed Jan 19 08:45:32 2005 Subject: [Numpy-discussion] Speeding up numarray -- questions on its design In-Reply-To: References: <41ED54E5.2050104@ee.byu.edu> Message-ID: <807F24EF-6A39-11D9-B8A8-000A95B68E50@stsci.edu> Konrad Hinsen wrote: >> comment, I could easily have missed where that support is provided. >> I'm mainly following up on Konrad's comment that his Automatic >> differentiation does not work with Numarray because of the missing >> support for object arrays. There are other applications for object >> arrays as well. Most of the > > While I agree that object arrays are useful, they have nothing to do > with the missing feature that I mentioned recently. That one concerns > only ufuncs. In NumPy, they use a method call when presented with an > object type they cannot handle directly. In numarray, they just > produce an error message in that case. > > Returning to object arrays, I have used them occasionally but never in > any of my public code, because there have been lots of minor bugs > concerning them in all versions of NumPy. It would be nice if numarray > could do a better job there. > This is a good point. In fact, when we started thinking about implementing object arrays, it looked tricker than it first appeared. One needs to ensure that all the objects referenced in the arrays have their reference counts appropriately adjusted with all operations. At that time it was quite easy to segfault Numeric using object arrays I'm guessing for this reason. Perhaps those problems have since been fixed. I don't recall the exact manipulations that caused the segfaults, but they were simple operations; and I don't know if the same problems remain. Perry From Chris.Barker at noaa.gov Wed Jan 19 09:37:42 2005 From: Chris.Barker at noaa.gov (Chris Barker) Date: Wed Jan 19 09:37:42 2005 Subject: [Numpy-discussion] Speeding up numarray -- questions on its design In-Reply-To: <6332FB22-69A4-11D9-B8A8-000A95B68E50@stsci.edu> References: <41EC2D14.7000203@colorado.edu> <41ED4AD0.6060204@noaa.gov> <6332FB22-69A4-11D9-B8A8-000A95B68E50@stsci.edu> Message-ID: <41EE990F.8050709@noaa.gov> Perry Greenfield wrote: > On Jan 18, 2005, at 12:43 PM, Chris Barker wrote: >> Can anyone provide a one-paragraph description of what numarray does >> that gives it better large-array performance than Numeric? > > It has two aspects: one is speed, but for us it was more about memory. Thanks for the summary, I have a better idea of the issues now. It doesn't look, to my untrained eyes, like any of these are contrary to small array performance, so I'm hopeful that the grand convergence can occur. -Chris -- Christopher Barker, Ph.D. Oceanographer NOAA/OR&R/HAZMAT (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From klimek at grc.nasa.gov Wed Jan 19 11:26:37 2005 From: klimek at grc.nasa.gov (Bob Klimek) Date: Wed Jan 19 11:26:37 2005 Subject: [Numpy-discussion] position of objects? In-Reply-To: <82AC3A32-67C6-11D9-932A-000D932805AC@embl.de> References: <41E592EB.6090209@grc.nasa.gov> <39D8BD7A-65A4-11D9-B5CA-000D932 805AC@embl.de> <41E82498.3070101@grc.nasa.gov> <82AC3A32-67C6-11D9-932A-00 0D932805AC@embl.de> Message-ID: <41EEB4D1.6060100@grc.nasa.gov> Peter Verveer wrote: > The watershed_ift() is a somewhat uusual implementation of watershed. > In principle it does the same as a normal watershed, except that it > does not produce watershed lines. I implemented this one, because with > the current implementation of binary morphology it is a bit cumbersome > to implement more common approaches. That will hopefully change in the > future. Well, it might turn out to still be useful. From what I'm reading, watershed from markers can do some interesting things. See the library below. > The procedure you show below seems to be based on a normal watershed. > I am not completely sure how the Image-J implementation works, but one > way to do that would be to do a watershed on the distance transform of > that image (actually you would use the negative of the distance > transform, with the local minima of that as the seeds). You could do > that with watershed_ift, in this case it would give you two labeled > objects, that in contrast to your example would however touch each > other. To do the exact same as below a watershed is needed that also > gives watershed lines. I'll give this procedure a try. Even if the labeled objects touch, some code could perhaps separate the objects by changing the touching pixels to 0. > Prompted by your earlier questions about skeletons I had a look at > what it would take to properly implement skeletons and other > morphology based algorithms, such as watersheds, and I found that I > need to rethink and reimplement the basic morphology operations first. ... Well, improving things is always good but from what I can see its not bad right now. if you are going to be changing things, one minor suggestion from me would be to make indices of label() (and sum(), mean(), ...) and find_objects() the same. For example, in an image containg two objects, label() returns a list of three: 0, 1, and 2 where 0 is the background and the two objects are labeled 1 and 2. But find_objects() returns a list of two (indices in the list being 0 and 1). Its not a big deal but in a for-loop it gets a little messy. Also forces me to do things like the following example which requires the loop to start at 1 (to skip the background) and run the range to n+1 to capture the second object. labeled, n = ND.label(binImage) objList = ND.sum(binImage, labeled, range(n+1)) for i in range(1, len(objList)): print 'object %d pixels: %d ' % (i, objList[i]) On a different note, I came across a morphology library which looks very promising. http://www.mmorph.com/pymorph/ I've contacted one of the authors of the package (R. Lotufo) and he indicated that they are thinking about updating it to run under numarray probably in about 6 months. Perhaps you and them could join forces. The only potential problem I see is that their code is designed strictly for 2D grayscale and binary images whereas you are trying to keep it general for any number of dimensions. Regards, Bob From konrad.hinsen at laposte.net Wed Jan 19 13:45:03 2005 From: konrad.hinsen at laposte.net (konrad.hinsen at laposte.net) Date: Wed Jan 19 13:45:03 2005 Subject: [Numpy-discussion] Speeding up numarray -- questions on its design In-Reply-To: <4D98AB88-6A39-11D9-B8A8-000A95B68E50@stsci.edu> References: <01532740-6A16-11D9-B1A9-000A95AB5F10@laposte.net> <4D98AB88-6A39-11D9-B8A8-000A95B68E50@stsci.edu> Message-ID: <1DD86004-6A63-11D9-B1A9-000A95AB5F10@laposte.net> On 19.01.2005, at 17:43, Perry Greenfield wrote: > Just to clarify, the intention to make them equivalent was not > originally true (and some encouraged the idea that there be a break > with Numpy compatibility). But that has grown to be a much bigger goal > over time. If my memory serves me well, the original intention was to have a new implementation that could replace the old one feature- and performancewise but without promising API compatibility. What we have now is the opposite. > No doubt that supporting both introduces more work, but for the most > part, I think that with the exception of some parts(namely ufunc > C-api), it should be possible write a library that supports both with > little conditional code. Yes, certainly. But not everybody is going to do it, for whatever reasons, if only lack of time or dependencies on exclusive features. So one day, there will be library A that requires NumPy and library B that requires numarray (that day may already have arrived). If I want to use both A and B in my code, I can expect to run into problems and unpleasant debugging sessions. Konrad. From perry at stsci.edu Wed Jan 19 14:10:02 2005 From: perry at stsci.edu (Perry Greenfield) Date: Wed Jan 19 14:10:02 2005 Subject: [Numpy-discussion] Speeding up numarray -- questions on its design In-Reply-To: <1DD86004-6A63-11D9-B1A9-000A95AB5F10@laposte.net> References: <01532740-6A16-11D9-B1A9-000A95AB5F10@laposte.net> <4D98AB88-6A39-11D9-B8A8-000A95B68E50@stsci.edu> <1DD86004-6A63-11D9-B1A9-000A95AB5F10@laposte.net> Message-ID: I'd like to clarify our position on this a bit in case previous messages have given a wrong or incomplete impression. 1) We don't deny that small array performance is important to many users. We understand that. But it generally isn't important for our projects, and in the list of things to do for numarray, we can't give it high priority this year. We have devoted resources to this issue in the past couple of years (but without sufficient success to persuade many to switch for that reason alone), and it is hard to continue to put much more resources into this not knowing whether it will be enough of an improvement to satisfy those that really need it. 2) This doesn't mean that we don't think it isn't important to add as soon as it can be done. That is, we aren't trying to prevent such improvements from being made. 3) We hope that there are people out there for which this is important who would like to see a numarray/Numeric unification, have some experience with the internals of one or the other (or are willing to learn), and are willing to devote the time to help make numarray faster (if you can rewrite everything from scratch and satisfy both worlds, that would make just as happy :-). 4) We are willing to help in the near term as far as helping explain how things currently work, where possible improvements can be made, helping in design discussions, reviewing proposed or actual changes, and doing the testing and integration of such changes. 5) But the onus of doing the actual implementation can't be on us for reasons I've already given. But besides those I think it is important that whoever does this should have a strong stake in the success of this (i.e., the performance improvements are important for their projects). Perry From perry at stsci.edu Thu Jan 20 06:37:36 2005 From: perry at stsci.edu (Perry Greenfield) Date: Thu Jan 20 06:37:36 2005 Subject: [Numpy-discussion] Speeding up numarray -- questions on its design In-Reply-To: References: <01532740-6A16-11D9-B1A9-000A95AB5F10@laposte.net> <4D98AB88-6A39-11D9-B8A8-000A95B68E50@stsci.edu> <1DD86004-6A63-11D9-B1A9-000A95AB5F10@laposte.net> Message-ID: On a different note, we will update the numarray home page to better reflect the current situation regard to Numeric, particularly to clarify that there is no official consensus regarding it as a replacement for Numeric (but also to spell out what the differences are so that people wondering about which to use will have a better idea to base their choice on, and to give an idea of what our development plans and priorities are). We're fairly busy at the moment so it may take a few days for such updates to the web page to happen. I'll post a message when that happens so that those interested can look at them and provide comments if they feel they are not accurate. I'll also contact Paul Dubois about updating the numpy page. Perry From jrennie at csail.mit.edu Thu Jan 20 07:25:49 2005 From: jrennie at csail.mit.edu (Jason Rennie) Date: Thu Jan 20 07:25:49 2005 Subject: [Numpy-discussion] Matlab/Numeric/numarray benchmarks In-Reply-To: <41D9CA01.9040108@arrowtheory.com> References: <41D9CA01.9040108@arrowtheory.com> Message-ID: <20050120152427.GC30466@csail.mit.edu> I have access to a variety of intel machines running Debian Sarge, and I'm trying to decide between numarray and Numeric for some experiments I'm about to run, so I thought I'd try out this benchmark. I need fast matrix multipliation and element-wise operations. Here are the results I see: Celeron/2.8GHz -------------- Matlab: 0.0475 1.44 5.78 Numeric: 0.0842 1.19 6.28 numarray: 7.62 9.78 Floating point exception Pentium4/2.8GHz --------------- Matlab: 0.0143 1.00 3.08 Numeric: 0.0653 1.19 6.26 numarray: 3.46 8.30 Floating point exception DualXeon/3.06GHz ---------------- Matlab: 0.0102 0.886 2.71 Numeric: 0.0272 10.2 2.46 numarray: 2.23 3.43 Floating point exception Numarray performance is pitiful. Numeric ain't bad, except for that matrixmultiply on the Xeon. As luck would have it, our cpu-cycle-servers are all Xeons, and the main big computations I have to do are matrix multiplies... Grrr... All three machines are Debian Sarge with atlas3-sse2 plus all the python2.3 packages installed. I had to include /usr/lib/atlas/sse2 in my LD_LIBRARY_PATH. Anyone have any clue why the Xeon would balk at the Numeric matrixmultiply? Thinking it might be an atlas3-sse2 issue, I tried atlas-sse: Xeon/atlas3-sse/Numeric: 0.0269 10.2 2.44 Xeon/atlas3-sse/numarray: 2.24 3.41 2.48 Apparently, there's a bug in the sse2 libraries that numarry is tripping... Still horrible Numeric/matrixmultiply performance... Interesting that sse2 doesn't provide a performance boost over sse. I tried it on another Xeon machine... same bad Numeric/matrixmultiply performance. I tried atlas3-base (386 instructions only): Xeon/atlas3-base/Numeric: 0.0269 10.2 2.60 Xeon/atlas3-base/numarray: 2.23 3.41 2.54 Sheesh! No worse than the libraries w/ sse instructions... But still, no improvement in the Numeric/matrixmultiply test. Next, refblas3/lapack3: Xeon/Numeric: 0.0271 3.45 2.72 Xeon/numarray: 2.24 3.42 2.62 Progress! Though, the Numeric/matrixmultiply is still four times slower than Matlab... As far as I can tell, I'm out of (Debian Sarge) libraries to try... Any ideas as to why the Numeric matrixmultiply would be so slow on the Xeon? Thanks, Jason P.S. I had to move the import statements to the top of the file to get benchmark.py to work. As a sanity check, I tried only importing sys, time, Numeric, and RandomArray, defining test10. I then called test10(). Same results as above. From perry at stsci.edu Thu Jan 20 07:34:00 2005 From: perry at stsci.edu (Perry Greenfield) Date: Thu Jan 20 07:34:00 2005 Subject: [Numpy-discussion] Matlab/Numeric/numarray benchmarks In-Reply-To: <20050120152427.GC30466@csail.mit.edu> References: <41D9CA01.9040108@arrowtheory.com> <20050120152427.GC30466@csail.mit.edu> Message-ID: Could you at least give enough information to understand the what the benchmark is? (Size of arrays, what the 3 columns are, etc). What is the benchmark code? I see references to benchmark.py but there doesn't appear to be any attachment. Thanks, Perry On Jan 20, 2005, at 10:24 AM, Jason Rennie wrote: > I have access to a variety of intel machines running Debian Sarge, and > I'm trying to decide between numarray and Numeric for some experiments > I'm about to run, so I thought I'd try out this benchmark. I need > fast matrix multipliation and element-wise operations. Here are the > results I see: > > Celeron/2.8GHz > -------------- > Matlab: 0.0475 1.44 5.78 > Numeric: 0.0842 1.19 6.28 > numarray: 7.62 9.78 Floating point exception > > Pentium4/2.8GHz > --------------- > Matlab: 0.0143 1.00 3.08 > Numeric: 0.0653 1.19 6.26 > numarray: 3.46 8.30 Floating point exception > > DualXeon/3.06GHz > ---------------- > Matlab: 0.0102 0.886 2.71 > Numeric: 0.0272 10.2 2.46 > numarray: 2.23 3.43 Floating point exception > > Numarray performance is pitiful. Numeric ain't bad, except for that > matrixmultiply on the Xeon. As luck would have it, our > cpu-cycle-servers are all Xeons, and the main big computations I have > to do are matrix multiplies... Grrr... > > All three machines are Debian Sarge with atlas3-sse2 plus all the > python2.3 packages installed. I had to include /usr/lib/atlas/sse2 in > my LD_LIBRARY_PATH. Anyone have any clue why the Xeon would balk at > the Numeric matrixmultiply? Thinking it might be an atlas3-sse2 > issue, I tried atlas-sse: > > Xeon/atlas3-sse/Numeric: 0.0269 10.2 2.44 > Xeon/atlas3-sse/numarray: 2.24 3.41 2.48 > > Apparently, there's a bug in the sse2 libraries that numarry is > tripping... Still horrible Numeric/matrixmultiply > performance... Interesting that sse2 doesn't provide a performance > boost over sse. I tried it on another Xeon machine... same bad > Numeric/matrixmultiply performance. I tried atlas3-base (386 > instructions only): > > Xeon/atlas3-base/Numeric: 0.0269 10.2 2.60 > Xeon/atlas3-base/numarray: 2.23 3.41 2.54 > > Sheesh! No worse than the libraries w/ sse instructions... But > still, no improvement in the Numeric/matrixmultiply test. Next, > refblas3/lapack3: > > Xeon/Numeric: 0.0271 3.45 2.72 > Xeon/numarray: 2.24 3.42 2.62 > > Progress! Though, the Numeric/matrixmultiply is still four times > slower than Matlab... > > As far as I can tell, I'm out of (Debian Sarge) libraries to > try... Any ideas as to why the Numeric matrixmultiply would be so slow > on the Xeon? > > Thanks, > > Jason > > P.S. I had to move the import statements to the top of the file to get > benchmark.py to work. As a sanity check, I tried only importing sys, > time, Numeric, and RandomArray, defining test10. I then called > test10(). Same results as above. > > > ------------------------------------------------------- > This SF.Net email is sponsored by: IntelliVIEW -- Interactive Reporting > Tool for open source databases. Create drag-&-drop reports. Save time > by over 75%! Publish reports on the web. Export to DOC, XLS, RTF, etc. > Download a FREE copy at http://www.intelliview.com/go/osdn_nl > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/numpy-discussion From jrennie at csail.mit.edu Thu Jan 20 07:45:45 2005 From: jrennie at csail.mit.edu (Jason Rennie) Date: Thu Jan 20 07:45:45 2005 Subject: [Numpy-discussion] Matlab/Numeric/numarray benchmarks In-Reply-To: References: <41D9CA01.9040108@arrowtheory.com> <20050120152427.GC30466@csail.mit.edu> Message-ID: <20050120154419.GF30466@csail.mit.edu> On Thu, Jan 20, 2005 at 10:33:48AM -0500, Perry Greenfield wrote: > Could you at least give enough information to understand the what the > benchmark is? (Size of arrays, what the 3 columns are, etc). What is > the benchmark code? I see references to benchmark.py but there doesn't > appear to be any attachment. It's Simon Burton's benchmark.py and bench.m code. Only modification I made was to move the imports to the top. Matlab code is identical. See attached for the exact code. Jason -------------- next part -------------- A non-text attachment was scrubbed... Name: benchmark.py Type: text/x-python Size: 1552 bytes Desc: not available URL: -------------- next part -------------- a=randn(1000,1000); b=randn(1000,1000); N=100; t0 = cputime; for i=1:N c = a+b; end t = cputime-t0; t = t/N N=10; t0 = cputime; for i=1:N c = a*b; end t = cputime-t0; t = t/N a=randn(500,500); N=10; t0 = cputime; for i=1:N c = eig(a); end t = cputime-t0; t = t/N From Peter.Chang at nottingham.ac.uk Thu Jan 20 08:14:09 2005 From: Peter.Chang at nottingham.ac.uk (Peter Chang) Date: Thu Jan 20 08:14:09 2005 Subject: [Numpy-discussion] Matlab/Numeric/numarray benchmarks In-Reply-To: <20050120154419.GF30466@csail.mit.edu> Message-ID: On Thu, 20 Jan 2005, Jason Rennie wrote: > It's Simon Burton's benchmark.py and bench.m code. Only modification > I made was to move the imports to the top. Matlab code is identical. > See attached for the exact code. There are errors in benchmark.py and the Matlab code isn't identical: 1) a missing division by count in test01() 2) a different default value for count in test11() 3) the Matlab code uses normally distributed random numbers whereas the Numeric/numarray code uses uniformlt distributed random numbers. Peter This message has been checked for viruses but the contents of an attachment may still contain software viruses, which could damage your computer system: you are advised to perform your own checks. Email communications with the University of Nottingham may be monitored as permitted by UK legislation. From ivilata at carabos.com Thu Jan 20 08:14:56 2005 From: ivilata at carabos.com (Ivan Vilata i Balaguer) Date: Thu Jan 20 08:14:56 2005 Subject: [Numpy-discussion] 'copy' argument in records.array Message-ID: <20050120161346.GC4102@tardis.terramar.selidor.net> Hi all! I have seen that records.array() has a boolean 'copy' argument which indicates whether copying the 'sequence' object or not when it already is an array. However, the written documentation does not mention it anywhere. Is this an officially supported argument? Thank you, Ivan PS: Would you mind cc:'ing me? I am not subscribed to the list. import disclaimer -- Ivan Vilata i Balaguer >qo< http://www.carabos.com/ C?rabos Coop. V. V V Enjoy Data "" -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 189 bytes Desc: Digital signature URL: From jmiller at stsci.edu Thu Jan 20 08:27:00 2005 From: jmiller at stsci.edu (Todd Miller) Date: Thu Jan 20 08:27:00 2005 Subject: [Numpy-discussion] Matlab/Numeric/numarray benchmarks In-Reply-To: <20050120154419.GF30466@csail.mit.edu> References: <41D9CA01.9040108@arrowtheory.com> <20050120152427.GC30466@csail.mit.edu> <20050120154419.GF30466@csail.mit.edu> Message-ID: <1106238362.16482.87.camel@halloween.stsci.edu> On Thu, 2005-01-20 at 10:44, Jason Rennie wrote: > On Thu, Jan 20, 2005 at 10:33:48AM -0500, Perry Greenfield wrote: > > Could you at least give enough information to understand the what the > > benchmark is? (Size of arrays, what the 3 columns are, etc). What is > > the benchmark code? I see references to benchmark.py but there doesn't > > appear to be any attachment. > > It's Simon Burton's benchmark.py and bench.m code. Only modification > I made was to move the imports to the top. Matlab code is identical. > See attached for the exact code. > > Jason Sigh. We discussed this some last week and as a result I ported Numeric's dotblas to numarray. Here's what I get running from numarray CVS and Numeric-23.7 both built with the latest blas, LAPACK, and ATLAS I could find and run on a 1.7 GHz P-IV: t= 0.0697661995888 t= 0.910463786125 t= 9.6143862009 t= 6.44409584999 t= 0.939763069153 t= 9.36037609577 Note that there's a bug in the benchmark (which has already been reported on this list) which explains the 100x difference in the first test case. Here's the results I get with a corrected version of the benchmark: numarray + : 0.0632889986038 numarray matrixmultiply : 0.91903450489 numarray eigenvalues : 8.78720998764 Numeric + : 0.0704428911209 Numeric matrixmultiply : 0.912343025208 Numeric eigenvalues : 8.919506073 I think this a closed issue, at least as far as the numarray/Numeric comparison goes. -------------- next part -------------- A non-text attachment was scrubbed... Name: benchmark.py Type: text/x-python Size: 2051 bytes Desc: not available URL: From jrennie at csail.mit.edu Thu Jan 20 10:18:40 2005 From: jrennie at csail.mit.edu (Jason Rennie) Date: Thu Jan 20 10:18:40 2005 Subject: [Numpy-discussion] Matlab/Numeric/numarray benchmarks In-Reply-To: References: <20050120154419.GF30466@csail.mit.edu> Message-ID: <20050120181752.GG30466@csail.mit.edu> On Thu, Jan 20, 2005 at 04:12:56PM +0000, Peter Chang wrote: > 1) a missing division by count in test01() > 2) a different default value for count in test11() My bad. Should have used Todd Miller's revised version. > 3) the Matlab code uses normally distributed random numbers whereas the > Numeric/numarray code uses uniformlt distributed random numbers. Good point. Revised numbers: + mmult eigen Xeon/3.06GHz/refblas3/lapack3/numarray: .0224 3.14 2.63 Xeon/3.06GHz/refblas3/lapack3/Numeric: .0268 3.45 2.73 Xeon/3.06GHz/atlas3-base/numarray: .0225 3.40 2.52 Xeon/3.06GHz/atlas3-base/Numeric: .0268 1.04 2.57 Xeon/3.06GHz/atlas3-sse/numarray: .0224 3.42 2.54 Xeon/3.06GHz/atlas3-sse/Numeric: .0269 1.05 2.58 Xeon/3.06GHz/atlas3-sse2/numarray: .0225 3.41 FP Exception Xeon/3.06GHz/atlas3-sse2/Numeric: .0269 FP Exc FP Exception Celeron/2.8GHz/atlas-base/numarray: .0814 11.3 6.53 Celeron/2.8GHz/atlas-base/Numeric: .0918 1.70 6.50 P4/2.8GHz/atlas-base/numarray: .0262 4.58 2.96 P4/2.8GHz/atlas-base/Numeric: .0318 1.15 3.00 Xeon/3.06GHz/Matlab: .0102 .886 2.70 P4/2.8GHz/Matlab: .0143 1.00 3.07 Very comparable (Numeric vs. numarray) except matrixmultiply, which I guess is explained by the Debian sarge python2.3-numarray (v1.1.1) not using the dotblas package/routine, as Todd Miller explained in an earlier post. I'll be looking forward to the Debian numarray release that includes dotblas. Looks like it will edge-out Numeric across the board (on the Xeon) once that's in place. For now, I'll be happy with Numeric/atlas3-base. The Matlab numbers use uniform random matrices. All code attached. Todd, Peter: sorry for the confusion I propagated. Jason -------------- next part -------------- A non-text attachment was scrubbed... Name: miller-benchmark.py Type: text/x-python Size: 2055 bytes Desc: not available URL: -------------- next part -------------- a=rand(1000,1000); b=rand(1000,1000); N=100; t0 = cputime; for i=1:N c = a+b; end t = cputime-t0; t = t/N N=10; t0 = cputime; for i=1:N c = a*b; end t = cputime-t0; t = t/N a=rand(500,500); N=10; t0 = cputime; for i=1:N c = eig(a); end t = cputime-t0; t = t/N From simon at arrowtheory.com Thu Jan 20 16:17:49 2005 From: simon at arrowtheory.com (Simon Burton) Date: Thu Jan 20 16:17:49 2005 Subject: [Numpy-discussion] Matlab/Numeric/numarray benchmarks In-Reply-To: <20050120152427.GC30466@csail.mit.edu> References: <41D9CA01.9040108@arrowtheory.com> <20050120152427.GC30466@csail.mit.edu> Message-ID: <20050121111610.678e5f6b.simon@arrowtheory.com> On Thu, 20 Jan 2005 10:24:27 -0500 Jason Rennie wrote: > All three machines are Debian Sarge with atlas3-sse2 plus all the > python2.3 packages installed. I had to include /usr/lib/atlas/sse2 in > my LD_LIBRARY_PATH. Anyone have any clue why the Xeon would balk at > the Numeric matrixmultiply? Thinking it might be an atlas3-sse2 > issue, I tried atlas-sse: > > Xeon/atlas3-sse/Numeric: 0.0269 10.2 2.44 > Xeon/atlas3-sse/numarray: 2.24 3.41 2.48 > > Apparently, there's a bug in the sse2 libraries that numarry is > tripping... Yes, we have the same problem here (Xeons with debian-sarge). It all works fine on the base atlas but blows on atlas3-sse2. I also compiled ATLAS 3.6 for sse2 and the same floating point exception happens. The next thing would be to write a simple c program that trips this exception, because I'm not convinced it is ATLAS's fault. Doesn't Numeric use the same library calls, and if so, why doesn't it also trip this exception ? One other thing I noticed was that atlas3-sse was not noticably faster than atlas3-base. (And i'm sorry about that bad benchmark code..) bye for now, Simon. -- Simon Burton, B.Sc. Licensed PO Box 8066 ANU Canberra 2601 Australia Ph. 61 02 6249 6940 http://arrowtheory.com From perry at stsci.edu Thu Jan 20 16:57:40 2005 From: perry at stsci.edu (Perry Greenfield) Date: Thu Jan 20 16:57:40 2005 Subject: [Numpy-discussion] Numeric web page In-Reply-To: <20050121111610.678e5f6b.simon@arrowtheory.com> Message-ID: Enthought has agreed to host the Numeric home page and will make the necessary changes to the content. On the other front, when I reviewed the current numarray home page, it seemed pretty much fine as is. It could stand some updating, and more detail on our development plans, but it didn't appear misleading to me. If anyone disagrees let me know what wording you consider improper or incorrect. Perry From yunmao at gmail.com Thu Jan 20 17:32:26 2005 From: yunmao at gmail.com (Yun Mao) Date: Thu Jan 20 17:32:26 2005 Subject: [Numpy-discussion] problems with duplicating and slicing an array Message-ID: <7cffadfa05012017293f833a87@mail.gmail.com> Hi everyone, I have two questions: 1. When I do v = u[:, :], it seems u and v still point to the same memory. e.g. When I do v[1,1]=0, u[1,1] will be zero out as well. What's the right way to duplicate an array? Now I have to do v = dot(u, identity(N)), which is kind of silly. 2. Is there a way to do Matlab style slicing? e.g. if I have i = array([0, 2]) x = array([1.1, 2.2, 3.3, 4.4]) I wish y = x(i) would give me [1.1, 3.3] Now I'm using map, but it gets a little annoying when there are two dimensions. Any ideas? Thanks!!! -Y From simon at arrowtheory.com Thu Jan 20 17:45:42 2005 From: simon at arrowtheory.com (Simon Burton) Date: Thu Jan 20 17:45:42 2005 Subject: [Numpy-discussion] problems with duplicating and slicing an array In-Reply-To: <7cffadfa05012017293f833a87@mail.gmail.com> References: <7cffadfa05012017293f833a87@mail.gmail.com> Message-ID: <20050121124417.15da0438.simon@arrowtheory.com> On Thu, 20 Jan 2005 20:29:26 -0500 Yun Mao wrote: > Hi everyone, > I have two questions: > 1. When I do v = u[:, :], it seems u and v still point to the same > memory. e.g. When I do v[1,1]=0, u[1,1] will be zero out as well. > What's the right way to duplicate an array? Now I have to do v = > dot(u, identity(N)), which is kind of silly. v = na.array(u) > > 2. Is there a way to do Matlab style slicing? e.g. if I have > i = array([0, 2]) > x = array([1.1, 2.2, 3.3, 4.4]) > I wish y = x(i) would give me [1.1, 3.3] > Now I'm using map, but it gets a little annoying when there are two > dimensions. Any ideas? have a look at the "take" method. Simon. From jrennie at csail.mit.edu Thu Jan 20 19:10:39 2005 From: jrennie at csail.mit.edu (Jason Rennie) Date: Thu Jan 20 19:10:39 2005 Subject: [Numpy-discussion] Matlab/Numeric/numarray benchmarks In-Reply-To: <20050121111610.678e5f6b.simon@arrowtheory.com> References: <41D9CA01.9040108@arrowtheory.com> <20050120152427.GC30466@csail.mit.edu> <20050121111610.678e5f6b.simon@arrowtheory.com> Message-ID: <20050121030940.GA8687@csail.mit.edu> On Fri, Jan 21, 2005 at 11:16:10AM +1100, Simon Burton wrote: > The next thing would be to write a simple c program that trips this exception, > because I'm not convinced it is ATLAS's fault. Doesn't Numeric use the same library calls, > and if so, why doesn't it also trip this exception ? Turns out the FP exception is a known bug in libc. Numeric does trip it (see my "revised numbers" post. For info on the bug, see, e.g. http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=279294 I also found that it was discussed earlier on this list. Search for "floating point exception weirdness". Jason From stephen.walton at csun.edu Thu Jan 20 19:11:16 2005 From: stephen.walton at csun.edu (Stephen Walton) Date: Thu Jan 20 19:11:16 2005 Subject: [Numpy-discussion] problems with duplicating and slicing an array In-Reply-To: <20050121124417.15da0438.simon@arrowtheory.com> References: <7cffadfa05012017293f833a87@mail.gmail.com> <20050121124417.15da0438.simon@arrowtheory.com> Message-ID: <41F07291.7070507@csun.edu> Simon Burton wrote: >On Thu, 20 Jan 2005 20:29:26 -0500 >Yun Mao wrote: > > > >>What's the right way to duplicate an array? >> >> > >v = na.array(u) > > v=u.copy() From konrad.hinsen at laposte.net Fri Jan 21 00:50:14 2005 From: konrad.hinsen at laposte.net (konrad.hinsen at laposte.net) Date: Fri Jan 21 00:50:14 2005 Subject: [Numpy-discussion] problems with duplicating and slicing an array In-Reply-To: <7cffadfa05012017293f833a87@mail.gmail.com> References: <7cffadfa05012017293f833a87@mail.gmail.com> Message-ID: <41561AA2-6B89-11D9-A2D4-000A95AB5F10@laposte.net> On 21.01.2005, at 02:29, Yun Mao wrote: > 1. When I do v = u[:, :], it seems u and v still point to the same > memory. e.g. When I do v[1,1]=0, u[1,1] will be zero out as well. > What's the right way to duplicate an array? Now I have to do v = > dot(u, identity(N)), which is kind of silly. There are several ways to make a copy of an array. My personal preference is import copy v = copy(u) because this is a general mechanism that works for all Python objects. > 2. Is there a way to do Matlab style slicing? e.g. if I have > i = array([0, 2]) > x = array([1.1, 2.2, 3.3, 4.4]) > I wish y = x(i) would give me [1.1, 3.3] > Now I'm using map, but it gets a little annoying when there are two > dimensions. Any ideas? y = Numeric.take(x, i) Konrad. -- ------------------------------------------------------------------------ ------- Konrad Hinsen Laboratoire Leon Brillouin, CEA Saclay, 91191 Gif-sur-Yvette Cedex, France Tel.: +33-1 69 08 79 25 Fax: +33-1 69 08 82 61 E-Mail: hinsen at llb.saclay.cea.fr ------------------------------------------------------------------------ ------- From konrad.hinsen at laposte.net Fri Jan 21 02:20:55 2005 From: konrad.hinsen at laposte.net (konrad.hinsen at laposte.net) Date: Fri Jan 21 02:20:55 2005 Subject: [Numpy-discussion] problems with duplicating and slicing an array In-Reply-To: <41561AA2-6B89-11D9-A2D4-000A95AB5F10@laposte.net> References: <7cffadfa05012017293f833a87@mail.gmail.com> <41561AA2-6B89-11D9-A2D4-000A95AB5F10@laposte.net> Message-ID: <33B69E6E-6B96-11D9-AB0C-000A95999556@laposte.net> On Jan 21, 2005, at 9:48, konrad.hinsen at laposte.net wrote: > There are several ways to make a copy of an array. My personal > preference is > > import copy > v = copy(u) That's of course import copy v = copy.copy(u) or from copy import copy v = copy(u) Konrad. -- --------------------------------------------------------------------- Konrad Hinsen Laboratoire L?on Brillouin, CEA Saclay, 91191 Gif-sur-Yvette Cedex, France Tel.: +33-1 69 08 79 25 Fax: +33-1 69 08 82 61 E-Mail: hinsen at llb.saclay.cea.fr --------------------------------------------------------------------- From faltet at carabos.com Fri Jan 21 05:16:12 2005 From: faltet at carabos.com (Francesc Altet) Date: Fri Jan 21 05:16:12 2005 Subject: [Numpy-discussion] Proposal for making of Numarray a real Numeric 'NG' Message-ID: <200501211413.51663.faltet@carabos.com> Hi List, I would like to make a formal proposal regarding with the subject of previous discussions in that list. This message is a bit long, but I've tried my best to expose my thoughts as clearly as possible. Is Numarray a good replacement of Numeric? ========================================== It has been a debate lately with regard to the convinience of claiming numarray to be a replacement of Numeric. Perhaps the main source for this claim has been the home page of the Numeric project [1]: """ If you are new to Numerical Python, please use Numarray. The older module, Numeric, is unsupported. At this writing Numarray is slower for very small arrays but faster for large ones. Numarray contains facilities to help you convert older code to use it. Some parts of the community have not made the switch yet but the Numarray libraries have been carefully named differently so that Numeric and Numarray can coexist in one application. """ So the paragraph is giving the impression that Numeric was going to be deprecated. While I recognize that I was between those that this statement lent us to think about numarray as a kind of 'Next Generation of Numeric', it seems now (from the previous discussions) that this was sort of unfortunate/misleading observation. In fact, Perry Greenfield, one of the main authors of numarray will be taking some steps in order to correct that observation in the near future [2]. However, I'd like to believe (and with me, quite a few more people for sure) that the mentioned statement, apart of creating some confusion, would eventually easy the long term convergence of both packages. This would be great not only to unify efforts, but also to allow the inclusion of Numeric/Numarray in the Python Standard Library, which would be a Good Thing. Numarray vs Numeric: Pros and Cons ================================== It's worth remembering that Numeric has been a major breakthrough in introducing the capability to deal with large (homogeneous) datasets in Python in a very efficient mannner. In my opinion Numarray is, generally speaking, a very good package as well with many interesting new features that lack Numeric. Between the main advantages of Numarray vs Numeric I can list the next (although I can be a bit misleaded here because of my own user cases of both libraries): - Memory-mapped objects: Allow working with on-disk numarray objects like if they were in-memory. - RecArrays: Objects that allow to deal with heterogeneous datasets (tables) in an efficient manner. This ought to be very beneficial in many fields. - CharArrays: Allow to work with large amounts of fixed and variable length strings. I see this implementation much more powerful that Numeric. - Index arrays within subscripts: e.g. if ind = array([4, 4, 0, 2]) and x = 2*arange(6), x[inx] results in array([8, 8, 0, 4]) - New design interface: We should not forget that numarray has been designed from the ground with Python Library integration in mind (or at least, this is my impression). So, it should have more chances (if there is some hope) to enter in the Standard Library than Numeric. [See [3] for a more acurate description of differences] In this point, it would be also fair to recognize the important effort that has been done by the Numarray crew (and others) to create a fairly good replacement for Numeric: the API is getting closer bit a bit, the numerix module makes easier to support both Numeric and numarray by an application (see [5] for a concrete case of switching between Numeric and Numarray in SciPy or [6] for matplotlib), the current effort to support Numarray in SciPy, and last but not least, their good responsiveness to enhancements in that respect. The real problem for Numarray: Object Creation Time =================================================== On the other hand, the main drawback of Numarray vs Numeric is, in my opinion, its poor performance regarding object creation. This might look like a banal thing at first glance, but it is not in many cases. One example recently reported in this list is: >>> from timeit import Timer >>> setup = 'import Numeric; a = Numeric.arange(2000);a.shape=(1000,2)' >>> Timer('for i in range(len(a)): row=a[i]', setup).timeit(number=100) 0.12782907485961914 >>> setup = 'import numarray; a = numarray.arange(2000);a.shape=(1000,2)' >>> Timer('for i in range(len(a)): row=a[i]', setup).timeit(number=100) 1.2013700008392334 So, numarray performs 10 times slower than Numeric not because its indexing access code would be 10 times slower, but mainly due to the fact that object creation is between 5 and 10 times slower, and the loop above implies an object creation on each iteration. Other case of use where object creation time is important can be seen in [4]. Proposal for making of Numarray a real Numeric 'NG' (Next Generation) ===================================================================== Provided that the most important reason (IMO) to not consider Numarray to be a good replacement of Numeric is object creation time, I would like to propose a coordinated effort to solve this precise issue. First of all, it would be nice if the most experienced people with Numarray (i.e. the Numarray crew) would give a deep analysis to that, and end with a series of small, autocontained benchmarks files that clearly exposes the possible bottlenecks. This maybe hard to do, but this is crucial. Once the problem has been reduced to optimize these small, auto-contained benchmarks, they can be made publicly accessible together with an explanation of what the problem is and what the benchmarks are intended for. After this, I suggest a call for contributions (in this list and scipy list, for example) on optimizing this code and spark discussions on that (a Wiki can work great here). I'm pretty sure that there is enough brain and challenge-hungry people in these lists to contribute solving the problem. If after these efforts, there are issues that can't be solved yet, at least the problem would be much more centered, and much more people can think on that (hopefully, the solution may not depend on the intricacies of Numeric/Numarray), so it maybe possible to sent it to the general Python list and hope that some guru would be willing to help us on that. Well, this is my proposal. Uh, sorry for the length of the message. Perhaps you may think that I've smoked too much and maybe you are right. However, I'm so convinced that such a Numeric/Numarray unification is going to be a Very Good Thing that I unrecklessly spend some time making this proposal (and look forward contributing in some way or another if this is going to be done). Cheers, [1] http://www.pfdubois.com/numpy/ [2] http://sourceforge.net/mailarchive/message.php?msg_id=10608642 [3] http://stsdas.stsci.edu/numarray/numarray-1.1.html/node18.html [4] http://sourceforge.net/mailarchive/message.php?msg_id=10582525 [5] http://aspn.activestate.com/ASPN/Mail/Message/scipy-dev/2299767 [6] http://matplotlib.sourceforge.net/matplotlib.numerix.html -- >qo< Francesc Altet ? ? http://www.carabos.com/ V ?V C?rabos Coop. V. ??Enjoy Data "" -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: not available URL: From aisaac at american.edu Fri Jan 21 06:29:00 2005 From: aisaac at american.edu (Alan G Isaac) Date: Fri Jan 21 06:29:00 2005 Subject: [Numpy-discussion] problems with duplicating and slicing an array In-Reply-To: <33B69E6E-6B96-11D9-AB0C-000A95999556@laposte.net> References: <7cffadfa05012017293f833a87@mail.gmail.com> <41561AA2-6B89-11D9-A2D4-000A95AB5F10@laposte.net><33B69E6E-6B96-11D9-AB0C-000A95999556@laposte.net> Message-ID: On Fri, 21 Jan 2005, konrad.hinsen at laposte.net apparently wrote: > There are several ways to make a copy of an array. Are there any other considerations in making this choice? Thank you, Alan Isaac From jrennie at csail.mit.edu Fri Jan 21 07:58:55 2005 From: jrennie at csail.mit.edu (Jason Rennie) Date: Fri Jan 21 07:58:55 2005 Subject: [Numpy-discussion] Proposal for making of Numarray a real Numeric 'NG' In-Reply-To: <200501211413.51663.faltet@carabos.com> References: <200501211413.51663.faltet@carabos.com> Message-ID: <20050121155712.GD16747@csail.mit.edu> On Fri, Jan 21, 2005 at 02:13:45PM +0100, Francesc Altet wrote: > """ > If you are new to Numerical Python, please use Numarray. The older module, > Numeric, is unsupported. At this writing Numarray is slower for very small > arrays but faster for large ones. Numarray contains facilities to help you > convert older code to use it. Some parts of the community have not made the > switch yet but the Numarray libraries have been carefully named differently > so that Numeric and Numarray can coexist in one application. > """ Another problem is that Numeric is extremely poorly advertised/marketed. - There is no single keyword for Numeric: it is referred to as "Numerical", "Numeric" and "numpy". Both "Numerical" and "numpy" are also used to refer to numarray. - Numeric does not have a home page of its own. The Sourceforge "Numerical" page lists both numarray and Numeric (which, coincidentally, is referred to as "numpy"). - The #1 & #2 Google results for "numeric python" are the numpy.org page, which is out-of-date, and advertises numarray as being a replacement for Numeric. Plus, what appears to be the main link for Numeric, "Release 22.0" points to a page with both numarray and Numeric releases, numarray first, and Numeric releases named "numpy". Could you try to be more confusing? - None of the top 10 Google links for "numeric python" point to the Sourceforge page. - A "numeric python" search on sourceforge lists 24 projects before the Numerical Python page. Jason From klimek at grc.nasa.gov Fri Jan 21 08:22:01 2005 From: klimek at grc.nasa.gov (Bob Klimek) Date: Fri Jan 21 08:22:01 2005 Subject: [Numpy-discussion] position of objects? In-Reply-To: <82AC3A32-67C6-11D9-932A-000D932805AC@embl.de> References: <41E592EB.6090209@grc.nasa.gov> <39D8BD7A-65A4-11D9-B5CA-000D932 805AC@embl.de> <41E82498.3070101@grc.nasa.gov> <82AC3A32-67C6-11D9-932A-00 0D932805AC@embl.de> Message-ID: <41F12C47.3080404@grc.nasa.gov> Peter Verveer wrote: > The procedure you show below seems to be based on a normal watershed. > I am not completely sure how the Image-J implementation works, but one > way to do that would be to do a watershed on the distance transform of > that image (actually you would use the negative of the distance > transform, with the local minima of that as the seeds). You could do > that with watershed_ift, in this case it would give you two labeled > objects, that in contrast to your example would however touch each > other. To do the exact same as below a watershed is needed that also > gives watershed lines. Hi Peter, I thought I'd try your suggestion above but I'm falling short. Where I stall is at local minima (or local maxima if you don't invert the image). Currently there is no local minima (or maxima) function in nd_image is there (or am I missing it)? Bob From perry at stsci.edu Fri Jan 21 08:49:10 2005 From: perry at stsci.edu (Perry Greenfield) Date: Fri Jan 21 08:49:10 2005 Subject: [Numpy-discussion] Proposal for making of Numarray a real Numeric 'NG' In-Reply-To: <200501211413.51663.faltet@carabos.com> References: <200501211413.51663.faltet@carabos.com> Message-ID: <15F3A864-6BCC-11D9-B8A8-000A95B68E50@stsci.edu> On Jan 21, 2005, at 8:13 AM, Francesc Altet wrote: > Hi List, > > I would like to make a formal proposal regarding with the subject of > previous discussions in that list. This message is a bit long, but I've > tried my best to expose my thoughts as clearly as possible. > [...] I think Francesc has summarized things very well and offered up some good ideas for how to proceed in speeding up small array performance. Particularly key is understanding exactly where the the time is going in the processing. We (read Todd, really) has some suspicions about what the bottlenecks are and I'll include his conclusions about these below. I just caution that getting good benchmark information to determine this correctly can be more difficult that it would first seem for the reasons he mentions. But anyway I'm certainly supportive of getting an effort going to address this issue (again, we can give support as we described before, but if it is to be done in the near term, others will have to actually do most of the work). A wiki page sounds like a good idea, and it probably should be hosted on scipy.org. If we see any response to this I'll ask to have one set up. > The real problem for Numarray: Object Creation Time > =================================================== > > On the other hand, the main drawback of Numarray vs Numeric is, in my > opinion, its poor performance regarding object creation. This might > look > like a banal thing at first glance, but it is not in many cases. One > example > recently reported in this list is: > >>>> from timeit import Timer >>>> setup = 'import Numeric; a = Numeric.arange(2000);a.shape=(1000,2)' >>>> Timer('for i in range(len(a)): row=a[i]', setup).timeit(number=100) > 0.12782907485961914 >>>> setup = 'import numarray; a = >>>> numarray.arange(2000);a.shape=(1000,2)' >>>> Timer('for i in range(len(a)): row=a[i]', setup).timeit(number=100) > 1.2013700008392334 > > So, numarray performs 10 times slower than Numeric not because its > indexing > access code would be 10 times slower, but mainly due to the fact that > object > creation is between 5 and 10 times slower, and the loop above implies > an > object creation on each iteration. > > Other case of use where object creation time is important can be seen > in > [4]. > It probably is perhaps too narrow to focus on just array creation. It likely is the biggest factor but there may be other issues as well. For the above case it's possible that the indexing mechanism itself can be speeded up, and that is likely part of the ratio of speeds being 5 to 10 times slower. Todd's comments: Here's a little input for how someone can continue looking at this. Here's the semi-solid info I have at the moment on ufunc execution time; included within it is a breakdown of some of the costs in the C-API function NA_NewAllFromBuffer() located in newarray.ch. I haven't been working on this; this is where I left off. My timing module, numarray.teacup, may be useful to someone else trying to measure timing; the accuracy of the measurements is questionable either due to bugs or the intrusiveness of the inline code disturbing the processor cache (it does dictionary operations for each timing measurement). I tried to design it so that timing measurements can be nested, with limited success. Nevertheless, as a rough guide that provides microsecond level measurements, I have found it useful. It only works on linux. Build numarray like this: % rm -rf build % python setup.py install --timing --force Then do this to see the cost of the generated output array in an add(): >>> import numarray as na >>> a = na.arange(10) >>> b = a.copy() >>> for i in range(101): ... jnk = na.add(a,b) ... >>> import numarray.teacup as tc >>> tc.report() Src/_ufuncmodule.c _cache_exec2 fast count: 101 avg_usec: 4.73 cycles: 0 Src/_ufuncmodule.c _cache_lookup2 broadcast count: 101 avg_usec: 4.46 cycles: 0 Src/_ufuncmodule.c _cache_lookup2 hit or miss count: 101 avg_usec: 27.50 cycles: 6 Src/_ufuncmodule.c _cache_lookup2 hit output count: 100 avg_usec: 25.22 cycles: 5 Src/_ufuncmodule.c _cache_lookup2 internal count: 101 avg_usec: 5.20 cycles: 0 Src/_ufuncmodule.c _cache_lookup2 miss count: 0 avg_usec: nan cycles: 0 Src/_ufuncmodule.c cached_dispatch2 exec count: 101 avg_usec: 13.65 cycles: 1 Src/_ufuncmodule.c cached_dispatch2 lookup count: 101 avg_usec: 37.35 cycles: 9 Src/_ufuncmodule.c cached_dispatch2 overall count: 101 avg_usec: 53.36 cycles: 12 Src/libnumarraymodule.c NewArray __new__ count: 304 avg_usec: 8.12 cycles: 0 Src/libnumarraymodule.c NewArray buffer count: 304 avg_usec: 5.37 cycles: 0 Src/libnumarraymodule.c NewArray misc count: 304 avg_usec: 0.25 cycles: 0 Src/libnumarraymodule.c NewArray type count: 304 avg_usec: 0.27 cycles: 0 Src/libnumarraymodule.c NewArray update count: 304 avg_usec: 1.16 cycles: 0 Src/libteacupmodule.c calibration nested count:999999 avg_usec: -0.00 cycles: 1 Src/libteacupmodule.c calibration top count:999999 avg_usec: -0.00 cycles: 0 I would caution anyone working on this that there are at least three locations in the code (some of it redundancy inserted for the purpose of performance optimization, some of it the consequences of having a class hierarchy) that need to be considered: _ndarraymodule.c, _numarraymodule.c, and newarray.ch. My suspicions: 1. Having an independent buffer/memory object rather than just mallocing the array storage. This is expensive in a lot of ways: it's an extra hidden object and also a Python function call. The ways I've thought of for Eliminating this add complexity and make numarray even more modal than it already is. 2. Being implemented as a new style class; this is an unknown cost and involves the creation of still other extra objects, like the object() dictionary, but presumably that has been fairly well optimized already. Calling up the object hierarchy to build the object (__new__) probably has additional overheads. Things to try: 1. Retain a free-list/cache of small objects and re-use them rather than creating/destroying all the time. Use a constant storage size and fit any small array into that space. I think this is the killer technique that would solve the small array problem without kludging up everything else. Do this first, and only then see if (2) or (3) need to be done. 2. Flatten the class hiearchy more (at least for construction) and remove any redundancy by refactoring. 3. Build in a malloc/free mode for array storage which bypasses the memorymodule completely and creates buffer objects when _data is accessed. Use the OWN_DATA bit in arrayobject.flags. > The real problem for Numarray: Object Creation Time >> =================================================== >> >> On the other hand, the main drawback of Numarray vs Numeric is, in my >> opinion, its poor performance regarding object creation. This might >> look >> like a banal thing at first glance, but it is not in many cases. One >> example >> recently reported in this list is: >> >>>>> from timeit import Timer >>>>> setup = 'import Numeric; a = Numeric.arange(2000);a.shape=(1000,2)' >>>>> Timer('for i in range(len(a)): row=a[i]', setup).timeit(number=100) >> 0.12782907485961914 >>>>> setup = 'import numarray; a = >>>>> numarray.arange(2000);a.shape=(1000,2)' >>>>> Timer('for i in range(len(a)): row=a[i]', setup).timeit(number=100) >> 1.2013700008392334 >> >> So, numarray performs 10 times slower than Numeric not because its >> indexing >> access code would be 10 times slower, but mainly due to the fact that >> object >> creation is between 5 and 10 times slower, and the loop above implies >> an >> object creation on each iteration. >> >> Other case of use where object creation time is important can be seen >> in >> [4]. One thing to note here is that NumArray() is really used to create numarray arrays, while array() is used to create Numeric arrays. In numarray, array() is a Python function which can be optimized to C in its own right. That alone will not fix the problem though. NumArray itself must be optimized. >> >> Proposal for making of Numarray a real Numeric 'NG' (Next Generation) >> ===================================================================== >> >> Provided that the most important reason (IMO) to not consider Numarray >> to be >> a good replacement of Numeric is object creation time, I would like to >> propose a coordinated effort to solve this precise issue. I think that is one place to optimize, and the best I'm aware of, but there's a lot of Python in numarray, and a single "." is enough to blow performance out of the water. I think this problem is easily solveable for small arrays with a modest effort. There are a lot of others though (moving the NumArray number protocol to C is one that comes to mind.) From paul at pfdubois.com Fri Jan 21 10:21:05 2005 From: paul at pfdubois.com (Paul F. Dubois) Date: Fri Jan 21 10:21:05 2005 Subject: [Numpy-discussion] Proposal for making of Numarray a real Numeric 'NG' In-Reply-To: <20050121155712.GD16747@csail.mit.edu> References: <200501211413.51663.faltet@carabos.com> <20050121155712.GD16747@csail.mit.edu> Message-ID: <41F1474D.1060801@pfdubois.com> Somehow 349,000+ accesses to the Numeric home page have occurred despite the fact that those searching for it did not get a good education at MIT. I had it on my own site so as to be able to use better tools than SF lets you do. We're in the middle of changing the ownership of the page due to my impending retirement, so perhaps that caused some confusion. The Numeric/numpy/Numerical thing has a long funny history. You had to be there. It isn't right but it is what it is. When I was leading the project there was a general feeling that a lot of the things we wanted to do with Numeric were going to be very hard to do with the existing implementation, some of which was generated by a code generator that had gotten lost, and some of which was impeneratable because it was written by a genius who went to (you guessed it) MIT. My intention was to replace Numeric with a quickly-written better implementation. That is why the Numeric page says what it says. I've left it that way as a reminder of the goal, which I continue to believe is important. Besides cleaning it up, the other motivation was to back off the 'performance at all cost' design enough that we would be 'safe' enough to qualify for the Python distribution and become a standard module. Numeric was written without many safety checks *on purpose*. Over time opinions about that philosphy changed. In fact, the team that wrote numarray did not do what I asked for, leading to the present confusion but also to, as noted by Altet, some nice features. I think it was unfortunate that this happened but as with most open source projects the person doing the work does the work the way they want and partly to satisfy their own needs. But they do the work, all credit to them. I'm not complaining. There are really only a couple of problems (object arrays and array creation time) that can be fixed. What is wrong with the array creation time is obvious. It is written in Python and has too much flexibility, which costs time to decode. Make a raw C-level creator with less choice and I bet it will be ok. Somebody help these guys; this isn't a product it is an open source project. Let's get to the promised land and retire Numeric/Numerical/numpy. Jason Rennie wrote: > On Fri, Jan 21, 2005 at 02:13:45PM +0100, Francesc Altet wrote: > >>""" >>If you are new to Numerical Python, please use Numarray. The older module, >>Numeric, is unsupported. At this writing Numarray is slower for very small >>arrays but faster for large ones. Numarray contains facilities to help you >>convert older code to use it. Some parts of the community have not made the >>switch yet but the Numarray libraries have been carefully named differently >>so that Numeric and Numarray can coexist in one application. >>""" > > > Another problem is that Numeric is extremely poorly advertised/marketed. > > - There is no single keyword for Numeric: it is referred to as "Numerical", > "Numeric" and "numpy". Both "Numerical" and "numpy" are also used to > refer to numarray. > > - Numeric does not have a home page of its own. The Sourceforge > "Numerical" page lists both numarray and Numeric (which, coincidentally, > is referred to as "numpy"). > > - The #1 & #2 Google results for "numeric python" are the numpy.org > page, which is out-of-date, and advertises numarray as being a replacement > for Numeric. Plus, what appears to be the main link for Numeric, > "Release 22.0" points to a page with both numarray and Numeric > releases, numarray first, and Numeric releases named "numpy". Could > you try to be more confusing? > > - None of the top 10 Google links for "numeric python" point to the > Sourceforge page. > > - A "numeric python" search on sourceforge lists 24 projects before the > Numerical Python page. > > Jason > > > ------------------------------------------------------- > This SF.Net email is sponsored by: IntelliVIEW -- Interactive Reporting > Tool for open source databases. Create drag-&-drop reports. Save time > by over 75%! Publish reports on the web. Export to DOC, XLS, RTF, etc. > Download a FREE copy at http://www.intelliview.com/go/osdn_nl > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/numpy-discussion > From jrennie at csail.mit.edu Fri Jan 21 11:32:58 2005 From: jrennie at csail.mit.edu (Jason Rennie) Date: Fri Jan 21 11:32:58 2005 Subject: [Numpy-discussion] Proposal for making of Numarray a real Numeric 'NG' In-Reply-To: <41F1474D.1060801@pfdubois.com> References: <200501211413.51663.faltet@carabos.com> <20050121155712.GD16747@csail.mit.edu> <41F1474D.1060801@pfdubois.com> Message-ID: <20050121193009.GA19684@csail.mit.edu> On Fri, Jan 21, 2005 at 10:17:49AM -0800, Paul F. Dubois wrote: > Somehow 349,000+ accesses to the Numeric home page have occurred despite > the fact that those searching for it did not get a good education at > MIT. I had it on my own site so as to be able to use better tools than > SF lets you do. We're in the middle of changing the ownership of the > page due to my impending retirement, so perhaps that caused some confusion. Sorry if I came off "big headed." Was just trying to point out that, to an outsider, it's, well, confusing. And, there are some very simple things that could be done to alleviate the confusion: a Numeric (not Numerical, not numarray) home page, consistent nomenclature. I'm not asking you to take your page down. I agree, it's a cool snapshot of history. And, I agree with you: it's often easier to host a home page on your own server. I've gone through hell trying to host the ifile home page on Savannah. I just think there needs to be a "Numeric" page somewhere with updated release information, pointers to current documentation, short explanation of how Numeric is different from numarray and maybe a short synopsis of the history behind the project(s). :) I'm also not trying to belittle the great achievements that are Numeric and numarray. I think these are both awesome packages. I sure can't claim to have written anything as useful. Jason From cookedm at physics.mcmaster.ca Fri Jan 21 13:25:55 2005 From: cookedm at physics.mcmaster.ca (David M. Cooke) Date: Fri Jan 21 13:25:55 2005 Subject: [Numpy-discussion] Speeding up Numeric Message-ID: Following up on the discussion of small-array performance, I decided to profile Numeric and numarray. I've been playing with Pyrex, implementing a vector class (for doubles) that runs as fast as I can make it, and it's beating Numeric and numarray by a good bit, so I figured those two were doing something. I can't say much about numarray right now (most of the runtime is in complicated Python code, with weird callbacks to and from C code), but for Numeric it was easy to find a problem. First, profiling Python extensions is not easy. I've played with using the -pg flag to the GCC C compiler, but I couldn't find a way to profile the extension itself (even with Python built with -pg). So I see about three ways to profile an extension: 1) Wrap each function in the extension in a pair of calls to something that keeps track of time in the function. This is what the numarray.teacup module does. This is unsatisfying, intrusive, labour-intensive, and it does manually what -pg does automatically. 2) Compile the extension into the Python executable (where both the extension and Python have been compiled with the -pg flag). Unfortunately, as far as I can tell, this is not possible with distutils. If you have another build framework, however, it's not that hard to do. I've had some success with this approach with other extensions. 3) Use oprofile (http://oprofile.sourceforge.net/), which runs on Linux on a x86 processor. This is the approach that I've used here. oprofile is a combination of a kernel module for Linux, a daemon for collecting sample data, and several tools to analyse the samples. It periodically polls the processor performance counters, and records which code is running. It's a system-level profiler: it profiles _everything_ that's running on the system. One obstacle is that does require root access. Short tutorial on using oprofile -------------------------------- Using oprofile on Debian with a 2.6.x kernel is easy (replace sudo with your favourite "be-root" method): $ sudo apt-get install oprofile $ sudo modprobe oprofile # make sure the kernel module is installed Now, start the oprofile daemon. On my machine, --event=default just looks at retired CPU instructions. Read the oprofile documentation for more info. $ sudo opcontrol --event=default --start $ (run code) $ opcontrol --dump # dump the statistics to disk # this is the only thing a non-root user can do $ sudo opcontrol --stop # we don't need the daemon anymore To do another profile run, you need to reset the data $ sudo opcontrol --reset You should be able to to the above when the daemon is running, but I the daemon crashes on me when I do that; I find I end up having to also clear the old statistics manually: $ sudo rm -rf /var/lib/oprofile/samples Once you've collected samples, you can analyse the results. Here, I'll be looking at adding two 1000-element arrays with the following code: import Numeric as NA a = NA.arange(1000.0) b = NA.arange(1000.0) for i in xrange(10000000): a + b This takes 1m14s on my machine (an AMD64 3200+ running in 64-bit mode). So, where I have (run code) above, I'd do $ python x.py Once I've collected the samples, I can analyse them. Note that samples are collected on a per-application basis; if you've got other processes using python, they'll be included. You could copy the python binary to another location, and use that for the analysis, then your program would be the only picked by the following analysis. $ opstack -t 1 /usr/bin/python self % child % image name symbol name 132281 10.5031 0 0 python (no symbols) ------------------------------------------------------------------------------- 704810 55.9618 0 0 _numpy.so check_array ------------------------------------------------------------------------------- 309384 24.5650 0 0 umath.so DOUBLE_add ------------------------------------------------------------------------------- 112974 8.9701 0 0 libc-2.3.2.so (no symbols) ------------------------------------------------------------------------------- The -t 1 limits the display to those routines taking more than 1% of the runtime. 10% for python, and 10% for the C-library probably aren't so bad (I'm thinking that's calls to malloc() and friends). However, the big problem is that only 25% of the time is actually doing useful work. What's check_array doing? We can delve deeper: $ mkdir profile-Numeric $ opannotate --source -o profile-Numeric \ --search-dir= /usr/bin/python Now profile-Numeric//Src has annotated copies of the source for the Numeric extension modules. The definition of check_array is in ufuncobject.c, which gives us 386 0.0286 :void check_array(PyArrayObject *ap) { /* check_array total: 7046 : double *data; : int i, n; : 371 0.0275 : if (ap->descr->type_num == PyArray_DOUBLE || ap->descr->type_num == PyArray_CDOUBLE) { 89 0.0066 : data = (double *)ap->data; 758 0.0563 : n = PyArray_Size((PyObject *)ap); 46 0.0034 : if (ap->descr->type_num == PyArray_CDOUBLE) n *= 2; : 700662 51.9988 : for(i=0; i>> import Numeric >>> a = Numeric.array([1e308]) >>> a + a array([ inf]) It will catch NaN's though. It's obvious when you realize that HUGE_VAL is inf; inf <= inf is true. With HAVE_FINITE defined, I get, for the same a array, >>> a + a OverflowError: math range error The Numeric documentation has this to say about the check_return parameter to PyUFunc_FromFuncAndData (which determines whether check_array is called): Usually best to set to 1. If this is non-zero then returned matrices will be cleaned up so that rank-0 arrays will be returned as python scalars. Also, if non-zero, then any math error that sets the errno global variable will cause an appropriate Python exception to be raised. Note that the rank-0 array -> scalar conversion happens regardless. check_return doesn't affect this at all. Removing check_array -------------------- Commenting out the body of check_array in ufuncobject.c speeds up the script above by *two* times. On my iBook (a G4 800), it speeds it up by *four* times. Using timeit.py: $ python /usr/lib/python2.3/timeit.py \ -s 'import Numeric as NA; N=1e4; a=NA.arange(N)/N; b=NA.arange(N)/N' \ 'a+b' I get for various values of N: N Stock Numeric numarray numarray Numeric without recent 1.1.1 23.7 check_array CVS 1e1 1.13 1.08 10.5 9.9 1e2 1.73 1.35 10.8 10.6 1e3 6.91 3.2 13.3 12.9 1e4 83.3 42.5 52.8 52.3 1e5 4890 4420 4520 4510 1e6 52700 47400 47100 47000 1e7 532000 473000 476000 474000 Numeric is as fast as numarray now! :-) The 10x change in per-element speed between 1e4 and 1e5 is due to cache effects. N Stock Numeric numarray numarray Numeric without recent 1.1.1 23.7 check_array CVS 1e1 1.31 1.28 8.49 7.64 1e2 5.86 5.44 14.4 12.1 1e3 51.8 48 70.4 54.5 1e4 542 502 643 508 1e5 7480 6880 7430 6850 1e6 77500 70700 82700 69100 1e7 775000 710000 860000 694000 Numeric is faster than numarray from CVS, but there seems to be regression. Without check_array, Numeric is almost as fast as as numarray 1.1.1. Remarks ------- - I'd rather have my speed than checks for NaN's. Have that in a separate function (I'm willing to write one), or do numarray-style processor flag checks (tougher). - General plea: *please*, *please*, when releasing a library for which speed is a selling point, profile it first! - doing the same profiling on numarray finds 15% of the time actually adding, 65% somewhere in python, and 15% in libc. - I'm still fiddling. Using the three-argument form of Numeric.add (so no memory allocation needs to be done), 64% of the time is now spend adding; I think that could be better. The Pyrex vector class I'm working on does 80% adding (with memory allocation for the result). Hope this helps :-) -- |>|\/|< /--------------------------------------------------------------------------\ |David M. Cooke http://arbutus.physics.mcmaster.ca/dmc/ |cookedm at physics.mcmaster.ca From paul at pfdubois.com Fri Jan 21 13:43:53 2005 From: paul at pfdubois.com (Paul F. Dubois) Date: Fri Jan 21 13:43:53 2005 Subject: [Numpy-discussion] Speeding up Numeric In-Reply-To: References: Message-ID: <41F176E5.4080806@pfdubois.com> As I mentioned in a recent post, the original Numeric philosphy was damn the torpedos full steam ahead; performance first, safety second. There was a deliberate decision not to handle NaN, inf, or anything like it, and if you overflowed you should die. Unfortunately the original high-performance community did not remain the only community, and there were lots of complaints about dying, it being considered unPythonic to die. High-performance people don't mind dying so much; to me it just means my algorithm is wrong and I need to hear about it. But for a calculator for a biologist or student of linear algebra that's not the right answer. While I haven't researched the source history, I doubt that checking was in there before. And putting it in should have been the result of a long public discussion first. Perhaps there was one and I missed it, since I haven't paid too much attention the last few years (my present project involves Numeric only a tiny amount). When I retire maybe I will write a high-performance one. But it will be in C++ and half the people will hate me. (:-> Speed kills. But some of us have to drive fast. From Chris.Barker at noaa.gov Fri Jan 21 14:25:04 2005 From: Chris.Barker at noaa.gov (Chris Barker) Date: Fri Jan 21 14:25:04 2005 Subject: [Numpy-discussion] Speeding up Numeric In-Reply-To: <41F176E5.4080806@pfdubois.com> References: <41F176E5.4080806@pfdubois.com> Message-ID: <41F1718D.4010601@noaa.gov> Paul F. Dubois wrote: > But for a calculator for a biologist or student of linear > algebra that's not the right answer. I'm neither, but my needs are similar, and I really want Numeric to NOT stop, and just keep going with a NaN, inf or -inf. IMHO, that is the only intelligent way for an array package to behave, and you get a performance boost as well! However, my understanding is that the IEEE special values are not universally supported by compilers and/or math libraries, so this poses a problem. I wonder if it's still an issue or if all the compilers of interest have the requisite support? > When I retire maybe I will write a high-performance one. But it will be > in C++ and half the people will hate me. (:-> Maybe a python wrapper around Blitz++ ? I would love that! David, If you could apply your skills to profiling array creation in numarray, you'd be doing a great service! -Chris -- Christopher Barker, Ph.D. Oceanographer NOAA/OR&R/HAZMAT (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From Chris.Barker at noaa.gov Fri Jan 21 14:57:48 2005 From: Chris.Barker at noaa.gov (Chris Barker) Date: Fri Jan 21 14:57:48 2005 Subject: [Numpy-discussion] problems with duplicating and slicing an array In-Reply-To: <20050121124417.15da0438.simon@arrowtheory.com> References: <7cffadfa05012017293f833a87@mail.gmail.com> <20050121124417.15da0438.simon@arrowtheory.com> Message-ID: <41F1796E.7040303@noaa.gov> Simon Burton wrote: > On Thu, 20 Jan 2005 20:29:26 -0500 > Yun Mao wrote: >>2. Is there a way to do Matlab style slicing? e.g. if I have >> i = array([0, 2]) >> x = array([1.1, 2.2, 3.3, 4.4]) >> I wish y = x(i) would give me [1.1, 3.3] > > have a look at the "take" method. or use numarray: >>> import numarray as N >>> i = N.array([0, 2]) >>> x = N.array([1.1, 2.2, 3.3, 4.4]) >>> y = x[i] >>> y array([ 1.1, 3.3]) >>> -- Christopher Barker, Ph.D. Oceanographer NOAA/OR&R/HAZMAT (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From oliphant at ee.byu.edu Fri Jan 21 15:37:49 2005 From: oliphant at ee.byu.edu (Travis Oliphant) Date: Fri Jan 21 15:37:49 2005 Subject: [Numpy-discussion] problems with duplicating and slicing an array In-Reply-To: <41F1796E.7040303@noaa.gov> References: <7cffadfa05012017293f833a87@mail.gmail.com> <20050121124417.15da0438.simon@arrowtheory.com> <41F1796E.7040303@noaa.gov> Message-ID: <41F191D7.9040906@ee.byu.edu> Chris Barker wrote: > > > Simon Burton wrote: > >> On Thu, 20 Jan 2005 20:29:26 -0500 >> Yun Mao wrote: > > >>> 2. Is there a way to do Matlab style slicing? e.g. if I have i = >>> array([0, 2]) >>> x = array([1.1, 2.2, 3.3, 4.4]) >>> I wish y = x(i) would give me [1.1, 3.3] >> >> >> have a look at the "take" method. > > > or use numarray: > > >>> import numarray as N > >>> i = N.array([0, 2]) > >>> x = N.array([1.1, 2.2, 3.3, 4.4]) > >>> y = x[i] > >>> y > array([ 1.1, 3.3]) > >>> > > Or use scipy: from scipy import * alter_numeric() i = array([0,2]) x = array([1.1,2.2,3.3,4.4]) y = x[i] print y [1.1 3.3] From oliphant at ee.byu.edu Fri Jan 21 15:38:49 2005 From: oliphant at ee.byu.edu (Travis Oliphant) Date: Fri Jan 21 15:38:49 2005 Subject: [Fwd: Re: [Numpy-discussion] Speeding up Numeric] Message-ID: <41F19247.4010100@ee.byu.edu> Original message sent to wrong address. Forwarding to correct address. -------------- next part -------------- An embedded message was scrubbed... From: unknown sender Subject: no subject Date: no date Size: 38 URL: From oliphant at ee.byu.edu Fri Jan 21 18:32:59 2005 From: oliphant at ee.byu.edu (Travis Oliphant) Date: Fri, 21 Jan 2005 16:32:59 -0700 Subject: [Numpy-discussion] Speeding up Numeric In-Reply-To: References: Message-ID: <41F1912B.1050605@ee.byu.edu> David M. Cooke wrote: >$ opstack -t 1 /usr/bin/python > self % child % image name symbol name >132281 10.5031 0 0 python (no symbols) >------------------------------------------------------------------------------- >704810 55.9618 0 0 _numpy.so check_array >------------------------------------------------------------------------------- >309384 24.5650 0 0 umath.so DOUBLE_add >------------------------------------------------------------------------------- >112974 8.9701 0 0 libc-2.3.2.so (no symbols) >------------------------------------------------------------------------------- > >The -t 1 limits the display to those routines taking more than 1% of >the runtime. 10% for python, and 10% for the C-library probably aren't >so bad (I'm thinking that's calls to malloc() and friends). However, >the big problem is that only 25% of the time is actually doing useful >work. What's check_array doing? We can delve deeper: > > > Thanks for this *excellent* tutorial and analysis. I would love to see more of this. I've never liked the check_array concept. In fact, if you use SciPy which makes several changes to things that Numeric does, check_array never runs, because self->check_return is 0 for all SciPy Ufuncs (e.g. those in fastumath). So, perhaps some of these basic benchmarks should be run by SciPy users. I forgot about this little speed-up that SciPy users enjoy all the time. SciPy has also added inf and Nan. I would be very willing to remove check_array from all Numeric ufuncs and create a separate interface for checking results, after the fact. What is the attitude of the community. -Travis --------------040503020304000703090102-- From oliphant at ee.byu.edu Fri Jan 21 15:53:56 2005 From: oliphant at ee.byu.edu (Travis Oliphant) Date: Fri Jan 21 15:53:56 2005 Subject: [Numpy-discussion] updating Numeric Message-ID: <41F195A0.6080207@ee.byu.edu> I would like to try to extensively update Numeric. Because of the changes, I would like to call it Numeric3.0 The goal is to create something that is an easier link between current Numeric and future numarray. It will be largely based on the current Numeric code base (same overall structure). I'm looking for feedback and criticism so don't hesitate to tell me what you think (positive or negative). I'll put on my thickest skin :-) Changes: (in priority order) ================ 1) Add support for multidimensional indexing 2) Change coercion rule to "scalars don't count rule" 3) Add support for bool, long long (__int64), unsigned long long, long double, complex long double, and unicode character arrays 4) Move to a new-style c-type (i.e. support for array objects being sub-classable which was added in Python 2.2) 5) Add support for relevant parts of numarray's C-API (to allow code written for numarray that just uses basic homogeneous data to work with Numeric) 6) Add full IEEE support 7) Add warning system much like numarray for reporting errors (eliminate check_array and friends). 8) optimize the ufuncs where-ever possible: I can see a couple of possibilities but would be interested in any help here. 9) other things I'm forgetting.... Why it is not numarray? I think there is a need for the tight code-base of Numeric to continue with incremental improvements that keeps the same concept of an array of homogeneous data types. If sub-classing in c works well, then perhaps someday, numarray could subclass Numeric for an even improved link between the two. I have not given up on the numarray and Numeric merging someday, I just think we need an update to Numeric that moves Numeric forward in directions that numarray has paved without sacrificing the things that Numeric already does well. -Travis O. From juenglin at cs.pdx.edu Fri Jan 21 17:11:06 2005 From: juenglin at cs.pdx.edu (Ralf Juengling) Date: Fri Jan 21 17:11:06 2005 Subject: [Numpy-discussion] updating Numeric In-Reply-To: <41F195A0.6080207@ee.byu.edu> References: <41F195A0.6080207@ee.byu.edu> Message-ID: <1106356067.18238.31.camel@alpspitze.cs.pdx.edu> On Fri, 2005-01-21 at 15:52, Travis Oliphant wrote: > I would like to try to extensively update Numeric. Because of the > changes, I would like to call it Numeric3.0 > > The goal is to create something that is an easier link between current > Numeric and future numarray. It will be largely based on the current > Numeric code base (same overall structure). > > I'm looking for feedback and criticism so don't hesitate to tell me what > you think (positive or negative). I'll put on my thickest skin :-) So yeah, I think you should drive to Baltimore, visit these guys at STSCI and ... get really drunk together! ralf > > Changes: (in priority order) > ================ > > 1) Add support for multidimensional indexing > 2) Change coercion rule to "scalars don't count rule" > 3) Add support for bool, long long (__int64), unsigned long long, long > double, complex long double, and unicode character arrays > 4) Move to a new-style c-type (i.e. support for array objects being > sub-classable which was added in Python 2.2) > 5) Add support for relevant parts of numarray's C-API (to allow code > written for numarray that just uses basic homogeneous data to work with > Numeric) > 6) Add full IEEE support > 7) Add warning system much like numarray for reporting errors (eliminate > check_array and friends). > 8) optimize the ufuncs where-ever possible: I can see a couple of > possibilities but would be interested in any help here. > 9) other things I'm forgetting.... > > Why it is not numarray? I think there is a need for the tight > code-base of Numeric to continue with incremental improvements that > keeps the same concept of an array of homogeneous data types. > > If sub-classing in c works well, then perhaps someday, numarray could > subclass Numeric for an even improved link between the two. > > I have not given up on the numarray and Numeric merging someday, I just > think we need an update to Numeric that moves Numeric forward in > directions that numarray has paved without sacrificing the things that > Numeric already does well. > > -Travis O. > > > > ------------------------------------------------------- > This SF.Net email is sponsored by: IntelliVIEW -- Interactive Reporting > Tool for open source databases. Create drag-&-drop reports. Save time > by over 75%! Publish reports on the web. Export to DOC, XLS, RTF, etc. > Download a FREE copy at http://www.intelliview.com/go/osdn_nl > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/numpy-discussion From stevech1097 at yahoo.com.au Fri Jan 21 19:15:14 2005 From: stevech1097 at yahoo.com.au (Steve Chaplin) Date: Fri Jan 21 19:15:14 2005 Subject: [Numpy-discussion] Re: Speeding up Numeric In-Reply-To: <20050121234055.8D6BC16054@sc8-sf-spam2.sourceforge.net> References: <20050121234055.8D6BC16054@sc8-sf-spam2.sourceforge.net> Message-ID: <1106363633.2903.7.camel@f1> On Fri, 2005-01-21 at 15:38 -0800, David M. Cooke wrote: > > First, profiling Python extensions is not easy. I've played with using Which version of Python are you using? In 2.4 the profile module has been updated so it can now profile C extension functions. Steve From konrad.hinsen at laposte.net Sat Jan 22 00:11:00 2005 From: konrad.hinsen at laposte.net (konrad.hinsen at laposte.net) Date: Sat Jan 22 00:11:00 2005 Subject: [Numpy-discussion] Speeding up Numeric In-Reply-To: <41F176E5.4080806@pfdubois.com> References: <41F176E5.4080806@pfdubois.com> Message-ID: On 21.01.2005, at 22:40, Paul F. Dubois wrote: > As I mentioned in a recent post, the original Numeric philosphy was > damn the torpedos full steam ahead; performance first, safety second. > There was a deliberate decision not to handle NaN, inf, or anything > like it, and if you overflowed you should die. There was also at some time the idea of having a "safe" version of the code (added checks as a compile-time option) and an installer that compiled both with different module names such that one could ultimately choose at run time which one to use. I really liked that idea, but it never got implemented (there was a "safe" version of ufunc in some versions but it was no different from the standard one). Konrad. From Fernando.Perez at colorado.edu Sat Jan 22 00:36:45 2005 From: Fernando.Perez at colorado.edu (Fernando Perez) Date: Sat Jan 22 00:36:45 2005 Subject: [Numpy-discussion] Speeding up Numeric In-Reply-To: References: <41F176E5.4080806@pfdubois.com> Message-ID: <41F21022.8090904@colorado.edu> konrad.hinsen at laposte.net wrote: > On 21.01.2005, at 22:40, Paul F. Dubois wrote: > > >>As I mentioned in a recent post, the original Numeric philosphy was >>damn the torpedos full steam ahead; performance first, safety second. >>There was a deliberate decision not to handle NaN, inf, or anything >>like it, and if you overflowed you should die. > > > There was also at some time the idea of having a "safe" version of the > code (added checks as a compile-time option) and an installer that > compiled both with different module names such that one could > ultimately choose at run time which one to use. I really liked that > idea, but it never got implemented (there was a "safe" version of ufunc > in some versions but it was no different from the standard one). I really like this approach. The Blitz++ library offers something similar: if you build your code with -DBZ_DEBUG, it activates a ton of safety checks which are normally off. The performance plummets, but it can save you days of debugging, since most pointer/memory errors are flagged instantly where they occur, instead of causing the usual inscrutable segfaults. F2PY also has the debug_capi flag which provides similar services, and I've found it to be tremendously useful on a few occasions. It would be great to be able to simply use: #import Numeric import Numeric_safe as Numeric to have a safe, debug-enabled version active. The availability of such a version would also free the developers from having to cater too much to safety considerations in the default version. The default could be advertised as 'fast car, no brakes, learn to jump out before going off a cliff', with the _debug 'family minivan' being there if safety were needed. Cheers, f From cookedm at physics.mcmaster.ca Sat Jan 22 08:18:15 2005 From: cookedm at physics.mcmaster.ca (David M. Cooke) Date: Sat Jan 22 08:18:15 2005 Subject: [Numpy-discussion] Re: Speeding up Numeric In-Reply-To: <1106363633.2903.7.camel@f1> References: <20050121234055.8D6BC16054@sc8-sf-spam2.sourceforge.net> <1106363633.2903.7.camel@f1> Message-ID: <20050122161725.GA6968@arbutus.physics.mcmaster.ca> On Sat, Jan 22, 2005 at 11:13:52AM +0800, Steve Chaplin wrote: > On Fri, 2005-01-21 at 15:38 -0800, David M. Cooke wrote: > > > First, profiling Python extensions is not easy. I've played with using > Which version of Python are you using? In 2.4 the profile module has > been updated so it can now profile C extension functions. > > Steve Sorry, I should have mentioned: I'm using 2.3 (it's still the default under Debian). But the Python profiler can only keep track of how much time is spent in the C extension; it can't determine the hotspots in the extension itself. -- |>|\/|< /--------------------------------------------------------------------------\ |David M. Cooke http://arbutus.physics.mcmaster.ca/dmc/ |cookedm at physics.mcmaster.ca From perry at stsci.edu Sat Jan 22 08:26:03 2005 From: perry at stsci.edu (Perry Greenfield) Date: Sat Jan 22 08:26:03 2005 Subject: [Numpy-discussion] Proposal for making of Numarray a real Numeric 'NG' In-Reply-To: <41F1474D.1060801@pfdubois.com> Message-ID: Paul Dubois wrote: > My intention was to replace Numeric with a quickly-written better > implementation. That is why the Numeric page says what it says. I've > left it that way as a reminder of the goal, which I continue to believe > is important. Besides cleaning it up, the other motivation was to back > off the 'performance at all cost' design enough that we would be 'safe' > enough to qualify for the Python distribution and become a standard > module. Numeric was written without many safety checks *on purpose*. > Over time opinions about that philosphy changed. > > In fact, the team that wrote numarray did not do what I asked for, > leading to the present confusion but also to, as noted by Altet, some > nice features. I think it was unfortunate that this happened but as with > most open source projects the person doing the work does the work the > way they want and partly to satisfy their own needs. But they do the > work, all credit to them. I'm not complaining. > Just to clarify, if we could have found a way of doing a basic version and layering on the extra features we would have. To take a specific example, if you want to be able to access data in a buffer that is spaced by intervals not a multiple of the data element size (which is what recarray needs to do) then one needs to handle non-aligned data in the basic version (otherwise segfaults will happen). I couldn't see a way of handling such arrays without the mechanism for handling non-aligned data being built into the basic mechanism (if someone else can, I'd like to see it). So it's a good design approach, but sometimes things can't work that way. Perry From perry at stsci.edu Sat Jan 22 08:41:16 2005 From: perry at stsci.edu (Perry Greenfield) Date: Sat Jan 22 08:41:16 2005 Subject: [Numpy-discussion] updating Numeric In-Reply-To: <41F195A0.6080207@ee.byu.edu> Message-ID: Travis Oliphant wrote: I'm not as negative on this as one might guess. While not as nice as having one package that satifies both extremes (small vs large), making the two have identical behavior for the capabilities they share would make life much easier for users and those that want to write libraries that support both. (Though note that I'm not sure it would be possible to subclass one to get the other as I noted in my reply to Paul, though I would like to be proved wrong). So I consider this a positive step myself. Perry > I would like to try to extensively update Numeric. Because of the > changes, I would like to call it Numeric3.0 > > The goal is to create something that is an easier link between current > Numeric and future numarray. It will be largely based on the current > Numeric code base (same overall structure). > > I'm looking for feedback and criticism so don't hesitate to tell me what > you think (positive or negative). I'll put on my thickest skin :-) > > Changes: (in priority order) > ================ > > 1) Add support for multidimensional indexing > 2) Change coercion rule to "scalars don't count rule" > 3) Add support for bool, long long (__int64), unsigned long long, long > double, complex long double, and unicode character arrays > 4) Move to a new-style c-type (i.e. support for array objects being > sub-classable which was added in Python 2.2) > 5) Add support for relevant parts of numarray's C-API (to allow code > written for numarray that just uses basic homogeneous data to work with > Numeric) > 6) Add full IEEE support > 7) Add warning system much like numarray for reporting errors (eliminate > check_array and friends). > 8) optimize the ufuncs where-ever possible: I can see a couple of > possibilities but would be interested in any help here. > 9) other things I'm forgetting.... > > Why it is not numarray? I think there is a need for the tight > code-base of Numeric to continue with incremental improvements that > keeps the same concept of an array of homogeneous data types. > > If sub-classing in c works well, then perhaps someday, numarray could > subclass Numeric for an even improved link between the two. > > I have not given up on the numarray and Numeric merging someday, I just > think we need an update to Numeric that moves Numeric forward in > directions that numarray has paved without sacrificing the things that > Numeric already does well. > > -Travis O. > > > > ------------------------------------------------------- > This SF.Net email is sponsored by: IntelliVIEW -- Interactive Reporting > Tool for open source databases. Create drag-&-drop reports. Save time > by over 75%! Publish reports on the web. Export to DOC, XLS, RTF, etc. > Download a FREE copy at http://www.intelliview.com/go/osdn_nl > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/numpy-discussion > From tim.hochberg at cox.net Sat Jan 22 11:17:10 2005 From: tim.hochberg at cox.net (Tim Hochberg) Date: Sat Jan 22 11:17:10 2005 Subject: [Numpy-discussion] Speeding up Numeric In-Reply-To: <41F21022.8090904@colorado.edu> References: <41F176E5.4080806@pfdubois.com> <41F21022.8090904@colorado.edu> Message-ID: <41F2A681.1080106@cox.net> Fernando Perez wrote: > konrad.hinsen at laposte.net wrote: > >> On 21.01.2005, at 22:40, Paul F. Dubois wrote: >> >> >>> As I mentioned in a recent post, the original Numeric philosphy was >>> damn the torpedos full steam ahead; performance first, safety >>> second. There was a deliberate decision not to handle NaN, inf, or >>> anything like it, and if you overflowed you should die. >> >> >> >> There was also at some time the idea of having a "safe" version of >> the code (added checks as a compile-time option) and an installer >> that compiled both with different module names such that one could >> ultimately choose at run time which one to use. I really liked that >> idea, but it never got implemented (there was a "safe" version of >> ufunc in some versions but it was no different from the standard one). > > > I really like this approach. The Blitz++ library offers something > similar: if you build your code with -DBZ_DEBUG, it activates a ton of > safety checks which are normally off. The performance plummets, but > it can save you days of debugging, since most pointer/memory errors > are flagged instantly where they occur, instead of causing the usual > inscrutable segfaults. > > F2PY also has the debug_capi flag which provides similar services, and > I've found it to be tremendously useful on a few occasions. > > It would be great to be able to simply use: > > #import Numeric > import Numeric_safe as Numeric > > to have a safe, debug-enabled version active. The availability of > such a version would also free the developers from having to cater too > much to safety considerations in the default version. The default > could be advertised as 'fast car, no brakes, learn to jump out before > going off a cliff', with the _debug 'family minivan' being there if > safety were needed. Before embarking on such a project, I'd urge that some careful profiling be done. My gut feeling is that, for most functions, no signifigant speedup would result from omitting the range checks that prevent segfaults. In the cases where removal of such checks would help in C (item access, very small arrays, etc) their execution time will be dwarfed by Python's overhead. Without care, one runs the risk of ending up with a minivan with no brakes; something no one needs. 'take' is a likely exception since it involves range checking at every element. But if only a few functions get changed, two versions of the library is a bad idea; two versions of the functions in question would be better. Particularly since, in my experience, speed is simply not critical for most of my numeric code, for the 5% or so where speed is critical I could use the unsafe functions and be more careful. This would be easier if the few differing functions were part of the main library. I don't have a good feel for the NaN/Inf checking. If it's possible to hoist the checking to outside all of the loops, then the above arguments probably apply here as well. If not, this might be a candidate for an 'unsafe' library. That seems more reasonable to me as I'm much more tolerant of NaNs than segfaults. That's my two cents anyway. -tim From jrennie at csail.mit.edu Sat Jan 22 15:09:10 2005 From: jrennie at csail.mit.edu (Jason Rennie) Date: Sat Jan 22 15:09:10 2005 Subject: [Numpy-discussion] Speeding up Numeric In-Reply-To: <41F21022.8090904@colorado.edu> References: <41F176E5.4080806@pfdubois.com> <41F21022.8090904@colorado.edu> Message-ID: <20050122230836.GA7023@csail.mit.edu> On Sat, Jan 22, 2005 at 01:34:42AM -0700, Fernando Perez wrote: > the _debug 'family minivan' being there if safety were needed. Don't know about that analogy :) Minivans are more likely to roll-over than your typical car. Maybe the Volvo S80 would be better: http://www.safercar.gov/NCAP/Cars/3285.html But, I have to say that I love your "import Numeric_safe as Numeric" idea :) Jason From jrennie at csail.mit.edu Sat Jan 22 15:12:06 2005 From: jrennie at csail.mit.edu (Jason Rennie) Date: Sat Jan 22 15:12:06 2005 Subject: [Fwd: Re: [Numpy-discussion] Speeding up Numeric] In-Reply-To: <41F19247.4010100@ee.byu.edu> References: <41F19247.4010100@ee.byu.edu> Message-ID: <20050122231116.GA7058@csail.mit.edu> On Fri, Jan 21, 2005 at 04:37:43PM -0700, Travis Oliphant wrote: > I would be very willing to remove check_array from all Numeric ufuncs > and create a separate interface for checking results, after the fact. > What is the attitude of the community. Sounds great to me. Jason From Norbert.Nemec.list at gmx.de Sun Jan 23 03:21:14 2005 From: Norbert.Nemec.list at gmx.de (Norbert Nemec) Date: Sun Jan 23 03:21:14 2005 Subject: [Numpy-discussion] Speeding up Numeric In-Reply-To: <41F2A681.1080106@cox.net> References: <41F21022.8090904@colorado.edu> <41F2A681.1080106@cox.net> Message-ID: <200501231220.33894.Norbert.Nemec.list@gmx.de> Am Samstag, 22. Januar 2005 20:16 schrieb Tim Hochberg: > I don't have a good feel for the NaN/Inf checking. If it's possible to > hoist the checking to outside all of the loops, then the above arguments > probably apply here as well. If not, this might be a candidate for an > 'unsafe' library. That seems more reasonable to me as I'm much more > tolerant of NaNs than segfaults. Why do we need NaN/Inf checking anyway? The whole point of IEEE754 is to give operations on NaNs and Infs clearly defined results, eliminating many unnecessary checks. I think, it was one of the fundamental flaws in the design of Python not to include IEEE754 from the very beginning. Leaving the details of floating point handling completely to the implementation calls for incompatibilities and results in a situation where you can only work by trial and error instead of relying on some defined standard. -- _________________________________________Norbert Nemec Bernhardstr. 2 ... D-93053 Regensburg Tel: 0941 - 2009638 ... Mobil: 0179 - 7475199 eMail: From konrad.hinsen at laposte.net Mon Jan 24 01:50:28 2005 From: konrad.hinsen at laposte.net (konrad.hinsen at laposte.net) Date: Mon Jan 24 01:50:28 2005 Subject: [Numpy-discussion] Speeding up Numeric In-Reply-To: <200501231220.33894.Norbert.Nemec.list@gmx.de> References: <41F21022.8090904@colorado.edu> <41F2A681.1080106@cox.net> <200501231220.33894.Norbert.Nemec.list@gmx.de> Message-ID: <779B39A0-6DED-11D9-B904-000A95999556@laposte.net> On Jan 23, 2005, at 12:20, Norbert Nemec wrote: > I think, it was one of the fundamental flaws in the design of Python > not to > include IEEE754 from the very beginning. Leaving the details of > floating Python is written in C, so it couldn't make more promises about floats than the C standard does, at least not without an enormous effort. Not even the floating-point units of modern CPUs respect IEEE in all respects. Konrad. -- --------------------------------------------------------------------- Konrad Hinsen Laboratoire L?on Brillouin, CEA Saclay, 91191 Gif-sur-Yvette Cedex, France Tel.: +33-1 69 08 79 25 Fax: +33-1 69 08 82 61 E-Mail: hinsen at llb.saclay.cea.fr --------------------------------------------------------------------- From jeanluc.menut at free.fr Mon Jan 24 04:49:07 2005 From: jeanluc.menut at free.fr (Jean-Luc Menut) Date: Mon Jan 24 04:49:07 2005 Subject: [Numpy-discussion] Bug or feature ? Message-ID: <41F4EE4C.7000204@free.fr> Hello, I'm using numarray, I'm wondering about the behaviour of some functions which seem odd to me: 1) When I want to sum all the elements of an array, I can do sum(array) or array.sum() : With the first method >>> a array([[1, 2], [3, 4]]) >>> numarray.sum(a) array([4, 6]) It seems to be impossible to sum all the elements with sum(array). With the second, >>> a.sum() 10L In this case, it's ok but : >>> b array(1) >>> b.sum() 0L I know that it's stupid to sum only one element but it's force me to plan that case in my program (and I don't want to check for each case) if I don't want an error. 2) When I want to replace one element in an array : >>> a[0,0]=1.1 >>> a array([[1, 2], [3, 4]]) I know that I created the array as an array of integer, but at least I can expect an error message. Anyboby knows if these behaviour are bugs or not ? And if not, I will be glad the have somme explication about the choice of this behaviour. Thanks for your help, Jean-Luc From jmiller at stsci.edu Mon Jan 24 05:06:06 2005 From: jmiller at stsci.edu (Todd Miller) Date: Mon Jan 24 05:06:06 2005 Subject: [Numpy-discussion] Bug or feature ? In-Reply-To: <41F4EE4C.7000204@free.fr> References: <41F4EE4C.7000204@free.fr> Message-ID: <1106571937.5350.14.camel@jaytmiller.comcast.net> On Mon, 2005-01-24 at 13:47 +0100, Jean-Luc Menut wrote: > array([[1, 2], > [3, 4]]) > > >>> numarray.sum(a) > array([4, 6]) > > It seems to be impossible to sum all the elements with sum(array). > > With the second, > > >>> a.sum() > 10L > > In this case, it's ok but : > > >>> b > array(1) > > >>> b.sum() > 0L This is clearly a bug. In general, rank-0 and zero-length array handling is buggy in numarray because my awareness of these issues came after the fact and these issues were not priorities in Perry's initial design, which was after all to process huge astronomical images memory mapped and across platforms. We've considered ripping out rank-0 altogether several times. > I know that it's stupid to sum only one element but it's force me to > plan that case in my program (and I don't want to check for each case) > if I don't want an error. > > > 2) When I want to replace one element in an array : > > >>> a[0,0]=1.1 > >>> a > array([[1, 2], > [3, 4]]) > > I know that I created the array as an array of integer, but at least I > can expect an error message. numarray and Numeric, consciously, don't work that way. So, no, you can't expect that. > Anyboby knows if these behaviour are bugs or not ? rank-0 yes, silent truncation no. Regards, Todd From Norbert.Nemec.list at gmx.de Mon Jan 24 06:51:12 2005 From: Norbert.Nemec.list at gmx.de (Norbert Nemec) Date: Mon Jan 24 06:51:12 2005 Subject: [Numpy-discussion] Bug or feature ? In-Reply-To: <1106571937.5350.14.camel@jaytmiller.comcast.net> References: <41F4EE4C.7000204@free.fr> <1106571937.5350.14.camel@jaytmiller.comcast.net> Message-ID: <200501241550.27643.Norbert.Nemec.list@gmx.de> Am Montag, 24. Januar 2005 14:05 schrieb Todd Miller: > We've considered ripping out rank-0 altogether several times. Glad you didn't do it! I use them all the time to simplify my code and avoid special case checking. So far, I haven't hit any serious bug, so I assume the code is mostly working for rank-0 arrays by not. Let's hunt down the remaining bugs... :-) -- _________________________________________Norbert Nemec Bernhardstr. 2 ... D-93053 Regensburg Tel: 0941 - 2009638 ... Mobil: 0179 - 7475199 eMail: From jeanluc.menut at free.fr Mon Jan 24 07:20:18 2005 From: jeanluc.menut at free.fr (Jean-Luc Menut) Date: Mon Jan 24 07:20:18 2005 Subject: [Numpy-discussion] Bug or feature ? In-Reply-To: <1106571937.5350.14.camel@jaytmiller.comcast.net> References: <41F4EE4C.7000204@free.fr> <1106571937.5350.14.camel@jaytmiller.comcast.net> Message-ID: <41F51209.6010202@free.fr> Hello, > numarray and Numeric, consciously, don't work that way. So, no, you > can't expect that. > rank-0 yes, silent truncation no. I'm sorry, I don't understand very well what is a silent truncation. When I write : >> a = array([[1, 2],[3, 4]]) >> >>> a[0,0]=1.1 I cannot expect to have a = array([[1.1, 2],[3, 4]]) ? How can I solve this problem ? is it possible to force an array to be an array of float ? Thanks, Jean-Luc From faltet at carabos.com Mon Jan 24 07:23:17 2005 From: faltet at carabos.com (Francesc Altet) Date: Mon Jan 24 07:23:17 2005 Subject: [Numpy-discussion] Bug or feature ? In-Reply-To: <200501241550.27643.Norbert.Nemec.list@gmx.de> References: <41F4EE4C.7000204@free.fr> <1106571937.5350.14.camel@jaytmiller.comcast.net> <200501241550.27643.Norbert.Nemec.list@gmx.de> Message-ID: <200501241622.45659.faltet@carabos.com> A Dilluns 24 Gener 2005 15:50, Norbert Nemec va escriure: > Am Montag, 24. Januar 2005 14:05 schrieb Todd Miller: > > We've considered ripping out rank-0 altogether several times. > > Glad you didn't do it! I use them all the time to simplify my code and > avoid special case checking. So far, I haven't hit any serious bug, so I > assume the code is mostly working for rank-0 arrays by not. Let's hunt down > the remaining bugs... :-) This is also my case. rank-0 appears naturally on my code, and, as far as I can tell, they work quite well. -- >qo< Francesc Altet ? ? http://www.carabos.com/ V ?V C?rabos Coop. V. ??Enjoy Data "" From konrad.hinsen at laposte.net Mon Jan 24 07:30:20 2005 From: konrad.hinsen at laposte.net (konrad.hinsen at laposte.net) Date: Mon Jan 24 07:30:20 2005 Subject: [Numpy-discussion] Bug or feature ? In-Reply-To: <41F51209.6010202@free.fr> References: <41F4EE4C.7000204@free.fr> <1106571937.5350.14.camel@jaytmiller.comcast.net> <41F51209.6010202@free.fr> Message-ID: On Jan 24, 2005, at 16:19, Jean-Luc Menut wrote: > When I write : >>> a = array([[1, 2],[3, 4]]) >>> >>> a[0,0]=1.1 > > I cannot expect to have a = array([[1.1, 2],[3, 4]]) ? No. All elements in an array are of the same type. > How can I solve this problem ? is it possible to force an array to be > an > array of float ? Yes, at creation time: from Numeric import array, Float a = array([[1, 2], [3, 4]], Float) wil create a float array. All the integers are then converted to floats. Konrad. -- --------------------------------------------------------------------- Konrad Hinsen Laboratoire L?on Brillouin, CEA Saclay, 91191 Gif-sur-Yvette Cedex, France Tel.: +33-1 69 08 79 25 Fax: +33-1 69 08 82 61 E-Mail: hinsen at llb.saclay.cea.fr --------------------------------------------------------------------- From jmiller at stsci.edu Mon Jan 24 07:31:24 2005 From: jmiller at stsci.edu (Todd Miller) Date: Mon Jan 24 07:31:24 2005 Subject: [Numpy-discussion] Bug or feature ? In-Reply-To: <41F5112D.2060305@free.fr> References: <41F4EE4C.7000204@free.fr> <1106571937.5350.14.camel@jaytmiller.comcast.net> <41F5112D.2060305@free.fr> Message-ID: <1106580666.5350.95.camel@jaytmiller.comcast.net> On Mon, 2005-01-24 at 16:15 +0100, Jean-Luc Menut wrote: > Hello, > > > numarray and Numeric, consciously, don't work that way. So, no, you > > can't expect that. > > > rank-0 yes, silent truncation no. > > > I'm sorry, I don't understand very well what is a silent truncation. By silent truncation, I mean the fact that 1.1 is floored to 1 without an exception or warning. > When I write : > >> a = array([[1, 2],[3, 4]]) > >> >>> a[0,0]=1.1 > > I cannot expect to have a = array([[1.1, 2],[3, 4]]) ? This works and the result will be a Float64 array containing 1.1, 2.0, ... > How can I solve this problem ? is it possible to force an array to be an > array of float ? Sure. For numarray or Numeric: a = array([[1, 2],[3, 4]], typecode=Float64) From jmiller at stsci.edu Mon Jan 24 08:04:15 2005 From: jmiller at stsci.edu (Todd Miller) Date: Mon Jan 24 08:04:15 2005 Subject: [Numpy-discussion] Speeding up Numeric In-Reply-To: References: Message-ID: <1106582644.5350.102.camel@jaytmiller.comcast.net> > 3) Use oprofile (http://oprofile.sourceforge.net/), which runs on > Linux on a x86 processor. This is the approach that I've used here. > oprofile is a combination of a kernel module for Linux, a daemon > for collecting sample data, and several tools to analyse the > samples. It periodically polls the processor performance counters, > and records which code is running. It's a system-level profiler: it > profiles _everything_ that's running on the system. One obstacle is > that does require root access. This looked fantastic so I tried it over the weekend. On Fedora Core 3, I couldn't get any information about numarray runtime (in the shared libraries), only Python. Ditto with Numeric, although from your post you apparently got great results including information on Numeric .so's. I'm curious: has anyone else tried this for numarray (or Numeric) on Fedora Core 3? Does anyone have a working profile script? > Numeric is faster (with the check_array() feature deletion) > than numarray from CVS, but there seems to be regression. (in numarray performance) Don't take this the wrong way, but how confident are you that the speed differences are real? (With my own benchmarking numbers, there is always too much fuzz to split hairs like this.) > Without check_array, Numeric is almost as fast as as > numarray 1.1.1. > > Remarks > ------- > > - I'd rather have my speed than checks for NaN's. Have that in a > separate function (I'm willing to write one), or do numarray-style > processor flag checks (tougher). > > - General plea: *please*, *please*, when releasing a library for which > speed is a selling point, profile it first! > > - doing the same profiling on numarray finds 15% of the time actually > adding, 65% somewhere in python, and 15% in libc. Part of this is because the numarray number protocol is still in Python. > - I'm still fiddling. Using the three-argument form of Numeric.add (so add(a,b) and add(a,b,c) are what I've focused on for profiling numarray until the number protocol is moved to C. I've held off doing that because the numarray number protocol is complicated by subclassing issues I'm not sure are fully resolved. Regards, Todd From jh at oobleck.astro.cornell.edu Mon Jan 24 08:55:41 2005 From: jh at oobleck.astro.cornell.edu (Joe Harrington) Date: Mon Jan 24 08:55:41 2005 Subject: [Numpy-discussion] updating Numeric In-Reply-To: <20050122041343.8BAAF89CDB@sc8-sf-spam1.sourceforge.net> (numpy-discussion-request@lists.sourceforge.net) References: <20050122041343.8BAAF89CDB@sc8-sf-spam1.sourceforge.net> Message-ID: <200501241654.j0OGsTXU024858@oobleck.astro.cornell.edu> Hi Travis, Perry may not see a problem with updating numeric, but I do, at least in the short term. It takes resources and time away from the issue at hand, which I believe you yourself raised. Certainly they are your resources to allocate as you wish (this being open source), but please consider the following. This whole dispute arises from a single question to which we don't yet know the answer: WHY is numarray slower than numeric for small arrays? Why not just do the work to answer the question? Then we can have a discussion on the direction we want to go in that is based on actual information. At this point we know each others' opinions and why we hold them, but we don't have the key information to make any decisions. Let's say, hypothetically, that there is a way to fix numarray to be fast in small arrays without breaking it in other important ways. Would it really be worth perpetuating numeric rather than working on unifying the packages and the community? If the problems are not fundamental to our respective values, they can be fixed, and we can move forward with the great volume of work that's needed to make this a viable data analysis environment for the masses. If the problems *are* fundamental to our values, we can work on compromise solutions knowing *what* we are actually working around, and unifying elsewhere when possible. We wouldn't waste any more years wringing our hands about unification. Perry (and others) have summarized a few ideas on why numarray might be slower. One of those ideas, namely the use of new-style classes in numarray, might mean that all the code-bumming in the world won't fix the problem. Numeric fans would likely say that speed is worth the inconvenience of old-style coding. Numarray fans wouldn't, and that would be that: we'd be in the realm of co-existence solutions. We'd move forward implementing them, documenting them, etc. I would think it worthwhile to check at least that possibility before proceeding with other work. To check it, someone who is familiar with numeric needs to convert it (or an appropriate subset of it) to new-style classes, and profile both versions. If the array creation time jumps by the factor we've seen, we need look no further. Rather, we'd need to focus the discussion on whether to continue using new-style classes in numarray. Assuming the two packages *do* have irreconcilable differences, then a coexistence approach makes a lot of sense, and a numeric update would be an important first step. We've talked about two approaches: user chooses a package at runtime, or things start "light" and the software detects cases where "heavy" features get used and augments those arrays on the fly. We know what needs to be done to figure out where the problems lie. Why not work on that next, and put this argument to bed once and for all? --jh-- From ryorke at telkomsa.net Mon Jan 24 10:54:16 2005 From: ryorke at telkomsa.net (Rory Yorke) Date: Mon Jan 24 10:54:16 2005 Subject: [Numpy-discussion] Speeding up Numeric In-Reply-To: <1106582644.5350.102.camel@jaytmiller.comcast.net> References: <1106582644.5350.102.camel@jaytmiller.comcast.net> Message-ID: <87k6q264w1.fsf@localhost.localdomain> Todd Miller writes: > This looked fantastic so I tried it over the weekend. On Fedora Core 3, > I couldn't get any information about numarray runtime (in the shared > libraries), only Python. Ditto with Numeric, although from your post > you apparently got great results including information on Numeric .so's. > I'm curious: has anyone else tried this for numarray (or Numeric) on > Fedora Core 3? Does anyone have a working profile script? I think you need to have --separate=lib when invoking opcontrol. (See later for an example.) Some comments on oprofile: - I think the oprofile tools (opcontrol, opreport etc.) are separate from the oprofile module, which is part of the kernel. I installed oprofile-0.8.1 from source, and it works with my standard Ubuntu kernel. It is easy to install it in a non-standard location ($HOME/usr on my system). - I think opstack is part of oprofile 0.8 (or maybe 0.8.1) -- it wasn't in the 0.7.1 package available for Ubuntu. Also, to actually get callgraphs (from opstack), you need a patched kernel; see here: http://oprofile.sf.net/patches/ - I think you probably *shouldn't* compile with -pg if you use oprofile, but you should use -g. To profile shared libraries, I also tried the following: - sprof. Some sort of dark art glibc tool. I couldn't get this to work with dlopen()'ed libraries (in which class I believe Python C extensions fall). - qprof (http://www.hpl.hp.com/research/linux/qprof/). Almost worked, but I couldn't get it to identify symbols in shared libraries. Their page has a list of other profilers. I also tried the Python 2.4 profile module; it does support C-extension functions as advertised, but it seemed to miss object instantiation calls (_numarray._numarray's instantiation, in this case). Sample oprofile usage on my Ubuntu box: rory at foo:~/hack/numarray/profile $ cat longadd.py import numarray as na a = na.arange(1000.0) b = na.arange(1000.0) for i in xrange(1000000): a + b rory at foo:~/hack/numarray/profile $ sudo modprobe oprofile Password: rory at foo:~/hack/numarray/profile $ sudo ~/usr/bin/opcontrol --start --separate=lib Using 2.6+ OProfile kernel interface. Using log file /var/lib/oprofile/oprofiled.log Daemon started. Profiler running. rory at foo:~/hack/numarray/profile $ sudo ~/usr/bin/opcontrol --reset Signalling daemon... done rory at foo:~/hack/numarray/profile $ python2.4 longadd.py rory at foo:~/hack/numarray/profile $ sudo ~/usr/bin/opcontrol --shutdown Stopping profiling. Killing daemon. rory at foo:~/hack/numarray/profile $ opreport -t 2 -l $(which python2.4) CPU: Athlon, speed 1836.45 MHz (estimated) Counted CPU_CLK_UNHALTED events (Cycles outside of halt state) with a unit mask of 0x00 (No unit mask) count 100000 samples % image name symbol name 47122 11.2430 _ufuncFloat64.so add_ddxd_vvxv 26731 6.3778 python2.4 PyEval_EvalFrame 24122 5.7553 libc-2.3.2.so memset 21228 5.0648 python2.4 lookdict_string 10583 2.5250 python2.4 PyObject_GenericGetAttr 9131 2.1786 libc-2.3.2.so mcount 9026 2.1535 python2.4 PyDict_GetItem 8968 2.1397 python2.4 PyType_IsSubtype (The idea wasn't really to discuss the results, but anyway: The prominence of memset is a little odd -- are destination arrays zeroed before being assigned the sum result?) To get the libc symbols you need a libc with debug symbols -- on Ubuntu this is the libc-dbg package; I don't know what it'll be on Fedora or other systems. Set the LD_LIBRARY_PATH variable to force these debug libraries to be loaded: export LD_LIBRARY_PATH=/usr/lib/debug This is probably not all that useful -- I suppose it might be interesting if one generates callgraphs. I don't (yet) have a modified kernel, so I haven't tried this. Have fun, Rory From jmiller at stsci.edu Mon Jan 24 13:13:13 2005 From: jmiller at stsci.edu (Todd Miller) Date: Mon Jan 24 13:13:13 2005 Subject: [Numpy-discussion] Speeding up Numeric In-Reply-To: <87k6q264w1.fsf@localhost.localdomain> References: <1106582644.5350.102.camel@jaytmiller.comcast.net> <87k6q264w1.fsf@localhost.localdomain> Message-ID: <1106601157.5361.42.camel@jaytmiller.comcast.net> On Mon, 2005-01-24 at 20:47 +0200, Rory Yorke wrote: > Todd Miller writes: > > > This looked fantastic so I tried it over the weekend. On Fedora Core 3, > > I couldn't get any information about numarray runtime (in the shared > > libraries), only Python. Ditto with Numeric, although from your post > > you apparently got great results including information on Numeric .so's. > > I'm curious: has anyone else tried this for numarray (or Numeric) on > > Fedora Core 3? Does anyone have a working profile script? > > I think you need to have --separate=lib when invoking opcontrol. (See > later for an example.) Thanks! That and using a more liberal "opreport -t" setting got it. > - I think opstack is part of oprofile 0.8 (or maybe 0.8.1) -- it > wasn't in the 0.7.1 package available for Ubuntu. Also, to actually > get callgraphs (from opstack), you need a patched kernel; see here: > > http://oprofile.sf.net/patches/ Ugh. Well, that won't happen today for me either. > - I think you probably *shouldn't* compile with -pg if you use > oprofile, but you should use -g. > > To profile shared libraries, I also tried the following: > > - sprof. Some sort of dark art glibc tool. I couldn't get this to work > with dlopen()'ed libraries (in which class I believe Python C > extensions fall). > > - qprof (http://www.hpl.hp.com/research/linux/qprof/). Almost worked, > but I couldn't get it to identify symbols in shared libraries. Their > page has a list of other profilers. I tried gprof too but couldn't get much out of it. As David noted, gprof is a pain to use with disutils too. > I also tried the Python 2.4 profile module; it does support > C-extension functions as advertised, but it seemed to miss object > instantiation calls (_numarray._numarray's instantiation, in this > case). I think the thing to focus on is building an object cache for "almost- new" small NumArrays; that could potentially short circuit memory object allocation/deallocation costs, NumArray object hierarchical allocation/deallocation costs, etc. > rory at foo:~/hack/numarray/profile $ opreport -t 2 -l $(which python2.4) > CPU: Athlon, speed 1836.45 MHz (estimated) > Counted CPU_CLK_UNHALTED events (Cycles outside of halt state) with a unit mask of 0x00 (No unit mask) count 100000 > samples % image name symbol name > 47122 11.2430 _ufuncFloat64.so add_ddxd_vvxv > 26731 6.3778 python2.4 PyEval_EvalFrame > 24122 5.7553 libc-2.3.2.so memset > 21228 5.0648 python2.4 lookdict_string > 10583 2.5250 python2.4 PyObject_GenericGetAttr > 9131 2.1786 libc-2.3.2.so mcount > 9026 2.1535 python2.4 PyDict_GetItem > 8968 2.1397 python2.4 PyType_IsSubtype > > (The idea wasn't really to discuss the results, but anyway: The > prominence of memset is a little odd -- are destination arrays zeroed > before being assigned the sum result?) Yes, the API routines which allocate the output array zero it. I've tried to remove this in the past but at least one of the add-on packages (linear_algebra or fft I think) wasn't stable w/o the zeroing. > Have fun, Better already. Thanks again! Todd From Mailer-Daemon at kottonmouth.trouble-free.net Tue Jan 25 01:15:01 2005 From: Mailer-Daemon at kottonmouth.trouble-free.net (Mail Delivery System) Date: Tue Jan 25 01:15:01 2005 Subject: [Numpy-discussion] Mail delivery failed: returning message to sender Message-ID: This message was created automatically by mail delivery software. A message that you sent could not be delivered to one or more of its recipients. This is a permanent error. The following address(es) failed: pete at shinners.org This message has been rejected because it has a potentially executable attachment "your_picture.pif" This form of attachment has been used by recent viruses or other malware. If you meant to send this file then please package it up as a zip file and resend it. ------ This is a copy of the message, including all the headers. ------ From numpy-discussion at lists.sourceforge.net Tue Jan 25 04:14:22 2005 From: numpy-discussion at lists.sourceforge.net (numpy-discussion at lists.sourceforge.net) Date: Tue, 25 Jan 2005 10:14:22 +0100 Subject: Your picture Message-ID: Your file is attached. -------------- next part -------------- A non-text attachment was scrubbed... Name: your_picture.pif Type: application/octet-stream Size: 17424 bytes Desc: not available URL: From curzio.basso at unibas.ch Tue Jan 25 05:12:03 2005 From: curzio.basso at unibas.ch (Curzio Basso) Date: Tue Jan 25 05:12:03 2005 Subject: [Numpy-discussion] Matrix products Message-ID: <41F6455C.9000003@unibas.ch> Hi all, assume that I have a matrix A with shape = (m,n), what I would like to compute is a matrix B with shape = (m, n, n) such as B[i] = NA.matrixmultiply(A[i, :, NA.NewAxis], A[i, NA.NewAxis]) e.g. if A is array([[0, 1], [2, 3]]) then B would be array([[[0, 0], [0, 1]], [[4, 6], [6, 9]]]) Does anyone know how to do this without using loops? thanks From konrad.hinsen at laposte.net Tue Jan 25 05:33:01 2005 From: konrad.hinsen at laposte.net (konrad.hinsen at laposte.net) Date: Tue Jan 25 05:33:01 2005 Subject: [Numpy-discussion] Matrix products In-Reply-To: <41F6455C.9000003@unibas.ch> References: <41F6455C.9000003@unibas.ch> Message-ID: On Jan 25, 2005, at 14:10, Curzio Basso wrote: > assume that I have a matrix A with shape = (m,n), what I would like to > compute is a matrix B with shape = (m, n, n) such as > B[i] = NA.matrixmultiply(A[i, :, NA.NewAxis], A[i, NA.NewAxis]) ... > Does anyone know how to do this without using loops? How about A[:, :, NewAxis]*A[:, NewAxis, :] That works for your example at least. I am not quite sure why you use matrixmultiply in your definition as there doesn't seem to be any summation. Konrad. -- --------------------------------------------------------------------- Konrad Hinsen Laboratoire L?on Brillouin, CEA Saclay, 91191 Gif-sur-Yvette Cedex, France Tel.: +33-1 69 08 79 25 Fax: +33-1 69 08 82 61 E-Mail: hinsen at llb.saclay.cea.fr --------------------------------------------------------------------- From curzio.basso at unibas.ch Tue Jan 25 05:49:05 2005 From: curzio.basso at unibas.ch (Curzio Basso) Date: Tue Jan 25 05:49:05 2005 Subject: [Numpy-discussion] Matrix products In-Reply-To: References: <41F6455C.9000003@unibas.ch> Message-ID: <41F64E44.8090500@unibas.ch> konrad.hinsen at laposte.net wrote: > How about > > A[:, :, NewAxis]*A[:, NewAxis, :] > > That works for your example at least. I am not quite sure why you use > matrixmultiply in your definition as there doesn't seem to be any > summation. You're right, it was my mistake to use matrixmultiply. Thanks a lot. From focke at slac.stanford.edu Tue Jan 25 08:22:02 2005 From: focke at slac.stanford.edu (Warren Focke) Date: Tue Jan 25 08:22:02 2005 Subject: [Numpy-discussion] Speeding up Numeric In-Reply-To: <779B39A0-6DED-11D9-B904-000A95999556@laposte.net> References: <41F21022.8090904@colorado.edu> <41F2A681.1080106@cox.net> <200501231220.33894.Norbert.Nemec.list@gmx.de> <779B39A0-6DED-11D9-B904-000A95999556@laposte.net> Message-ID: On Mon, 24 Jan 2005 konrad.hinsen at laposte.net wrote: > On Jan 23, 2005, at 12:20, Norbert Nemec wrote: > > > I think, it was one of the fundamental flaws in the design of Python > > not to include IEEE754 from the very beginning. Leaving the details of > > floating > > Python is written in C, so it couldn't make more promises about floats > than the C standard does, at least not without an enormous effort. Not > even the floating-point units of modern CPUs respect IEEE in all > respects. And even if that effort had been put into Pythno, Numeric probably would've sidestepped it for performance. Note that Python does give platform-independent behavior for integer division and mod, while Numeric just gives whatever your C platform does. Warren Focke From chrisperkins99 at gmail.com Tue Jan 25 12:31:02 2005 From: chrisperkins99 at gmail.com (Chris Perkins) Date: Tue Jan 25 12:31:02 2005 Subject: [Numpy-discussion] Missing mail In-Reply-To: <41EE990F.8050709@noaa.gov> References: <41EC2D14.7000203@colorado.edu> <41ED4AD0.6060204@noaa.gov> <6332FB22-69A4-11D9-B8A8-000A95B68E50@stsci.edu> <41EE990F.8050709@noaa.gov> Message-ID: <184a9f5a05012512301f054818@mail.gmail.com> Can anyone find the missing mail from the thread below? I have Chris's original question and his thanks for the reply, but not Perry's reply. I can't seem to conjure up the right incantations to get Google to find it, and I am also quite interested in the answer. Could someone forward me Perry's email or point me to somewhere that it's archived, please and thank you? Chris Perkins On Wed, 19 Jan 2005 09:29:51 -0800, Chris Barker wrote: > > > Perry Greenfield wrote: > > On Jan 18, 2005, at 12:43 PM, Chris Barker wrote: > >> Can anyone provide a one-paragraph description of what numarray does > >> that gives it better large-array performance than Numeric? > > > > It has two aspects: one is speed, but for us it was more about memory. > > Thanks for the summary, I have a better idea of the issues now. > > It doesn't look, to my untrained eyes, like any of these are contrary to > small array performance, so I'm hopeful that the grand convergence can > occur. > > -Chris > > -- > Christopher Barker, Ph.D. > Oceanographer > From perry at stsci.edu Tue Jan 25 19:52:09 2005 From: perry at stsci.edu (Perry Greenfield) Date: Tue Jan 25 19:52:09 2005 Subject: FW: [Numpy-discussion] Speeding up numarray -- questions on its design Message-ID: Hmmm, it looks like it was sent only to Chris. My mistake. -- Perry -----Original Message----- From: Perry Greenfield [mailto:perry at stsci.edu] Sent: Tuesday, January 18, 2005 5:58 PM To: Chris Barker Cc: Perry Greenfield Subject: Re: [Numpy-discussion] Speeding up numarray -- questions on its design On Jan 18, 2005, at 12:43 PM, Chris Barker wrote: > Hi all, > > This discussion has brought up a question I have had for a while: > > Can anyone provide a one-paragraph description of what numarray does > that gives it better large-array performance than Numeric? It has two aspects: one is speed, but for us it was more about memory. It is likely faster (for simpler cases, i.e., ones that don't involve strides, byteswaps or type conversions) because the C code for the loop is as simple as can be resulting in better optimizations. But we haven't done careful research on that. It has a number of aspects that lessen memory demands: 1) fewer temporaries created, particularly for type conversions. 2) avoids the memory wasting scalar type coercions that Numeric has. 3) allows use of memory mapping. This one is at the moment not a strong advantage due to the fact that the current limit is due to Python. Interesting large arrays sizes are bumping into the Python limit making this less useful. But when this goes away (this year I hope) it is again a useful tool for minimizing memory demands. There are other advantages, but these are the primary ones that relate to large array performance that I recall offhand (Todd may recall others). Perry From lkemohawk at yahoo.com Tue Jan 25 21:42:04 2005 From: lkemohawk at yahoo.com (kevin lester) Date: Tue Jan 25 21:42:04 2005 Subject: [Numpy-discussion] array_str(arr,precision=4,suppress_small=1) Message-ID: <20050126054129.27510.qmail@web53905.mail.yahoo.com> Can someone please tell me why I can't control the output of my arrays. Neither the numarray.array_str(...), or the sys.float_output_suppress_small, works to confine the numerals from it's exponential form.. when printed to my stdout. Thank you much, Kevin __________________________________ Do you Yahoo!? Meet the all-new My Yahoo! - Try it today! http://my.yahoo.com From Chris.Barker at noaa.gov Tue Jan 25 22:20:03 2005 From: Chris.Barker at noaa.gov (Chris Barker) Date: Tue Jan 25 22:20:03 2005 Subject: [Numpy-discussion] Missing mail Message-ID: ----- Original Message ----- From: Chris Perkins > Can anyone find the missing mail from the thread below? Here it is, Perry might have sent in only to me, which I imagine was an oversight. -Chris On Jan 18, 2005, at 12:43 PM, Chris Barker wrote: > Hi all, > > This discussion has brought up a question I have had for a while: > > Can anyone provide a one-paragraph description of what numarray does > that gives it better large-array performance than Numeric? It has two aspects: one is speed, but for us it was more about memory. It is likely faster (for simpler cases, i.e., ones that don't involve strides, byteswaps or type conversions) because the C code for the loop is as simple as can be resulting in better optimizations. But we haven't done careful research on that. It has a number of aspects that lessen memory demands: 1) fewer temporaries created, particularly for type conversions. 2) avoids the memory wasting scalar type coercions that Numeric has. 3) allows use of memory mapping. This one is at the moment not a strong advantage due to the fact that the current limit is due to Python. Interesting large arrays sizes are bumping into the Python limit making this less useful. But when this goes away (this year I hope) it is again a useful tool for minimizing memory demands. There are other advantages, but these are the primary ones that relate to large array performance that I recall offhand (Todd may recall others). Perry From jh at oobleck.astro.cornell.edu Wed Jan 26 10:31:03 2005 From: jh at oobleck.astro.cornell.edu (Joe Harrington) Date: Wed Jan 26 10:31:03 2005 Subject: [Numpy-discussion] wiki to resolve numeric vs. numarray Message-ID: <200501261830.j0QIU5Ug010130@oobleck.astro.cornell.edu> Since the arguments in the numeric vs. numarray debate have been spread over many months, they have been hard for me and others to follow. There is now a wiki on scipy.org for summaries of the main points and requests for tasks to be done in resolving the issue: http://www.scipy.org/wikis/featurerequests/ArrayMathCore The wiki states facts about each package that are relevant to the debate, and the desires people have for an eventual array package. Under "Desires", please feel free to pose questions that need to be resolved and tasks that need to be done. If you are interested in helping to resolve this split, consider taking on one of those tasks. Do post to the mailing list to see if anyone else is doing it and to get feedback on how to do it well. --jh-- From aisaac at american.edu Thu Jan 27 07:41:01 2005 From: aisaac at american.edu (Alan G Isaac) Date: Thu Jan 27 07:41:01 2005 Subject: [Numpy-discussion] problems with duplicating and slicing an array In-Reply-To: <41F191D7.9040906@ee.byu.edu> References: <7cffadfa05012017293f833a87@mail.gmail.com> <20050121124417.15da0438.simon@arrowtheory.com> <41F1796E.7040303@noaa.gov><41F191D7.9040906@ee.byu.edu> Message-ID: On Fri, 21 Jan 2005, Travis Oliphant apparently wrote: > from scipy import * > alter_numeric() > i = array([0,2]) > x = array([1.1,2.2,3.3,4.4]) > y = x[i] This ^ gives me an invalid index error. scipy version 0.3.0_266.4242 Alan Isaac From aw-confirm at ebay.com Thu Jan 27 07:50:22 2005 From: aw-confirm at ebay.com (aw-confirm at ebay.com) Date: Thu Jan 27 07:50:22 2005 Subject: [Numpy-discussion] Ebay account verification Message-ID: <200501271641.j0RGfKtR028887@team-chat-forum.net> An HTML attachment was scrubbed... URL: From faltet at carabos.com Thu Jan 27 12:48:40 2005 From: faltet at carabos.com (Francesc Altet) Date: Thu Jan 27 12:48:40 2005 Subject: [Numpy-discussion] Speeding up Numeric In-Reply-To: <1106601157.5361.42.camel@jaytmiller.comcast.net> References: <87k6q264w1.fsf@localhost.localdomain> <1106601157.5361.42.camel@jaytmiller.comcast.net> Message-ID: <200501272136.07558.faltet@carabos.com> Hi, After a while of waiting for some free time, I'm playing myself with the excellent oprofile, and try to help in reducing numarray creation. For that goal, I selected the next small benchmark: import numarray a = numarray.arange(2000) a.shape=(1000,2) for j in xrange(1000): for i in range(len(a)): row=a[i] I know that it mixes creation with indexing cost, but as the indexing cost of numarray is only a bit slower (perhaps a 40%) than Numeric, while array creation time is 5 to 10 times slower, I think this benchmark may provide a good starting point to see what's going on. For numarray, I've got the next results: samples % image name symbol name 902 7.3238 python PyEval_EvalFrame 835 6.7798 python lookdict_string 408 3.3128 python PyObject_GenericGetAttr 384 3.1179 python PyDict_GetItem 383 3.1098 libc-2.3.2.so memcpy 358 2.9068 libpthread-0.10.so __pthread_alt_unlock 293 2.3790 python _PyString_Eq 273 2.2166 libnumarray.so NA_updateStatus 273 2.2166 python PyType_IsSubtype 271 2.2004 python countformat 252 2.0461 libc-2.3.2.so memset 249 2.0218 python string_hash 248 2.0136 _ndarray.so _universalIndexing while for Numeric I've got this: samples % image name symbol name 279 15.6478 libpthread-0.10.so __pthread_alt_unlock 216 12.1144 libc-2.3.2.so memmove 187 10.4879 python lookdict_string 162 9.0858 python PyEval_EvalFrame 144 8.0763 libpthread-0.10.so __pthread_alt_lock 126 7.0667 libpthread-0.10.so __pthread_alt_trylock 56 3.1408 python PyDict_SetItem 53 2.9725 libpthread-0.10.so __GI___pthread_mutex_unlock 45 2.5238 _numpy.so PyArray_FromDimsAndDataAndDescr 39 2.1873 libc-2.3.2.so __malloc 36 2.0191 libc-2.3.2.so __cfree one preliminary result is that numarray spends a lot more time in Python space than do Numeric, as Todd already said here. The problem is that, as I have not yet patched my kernel, I can't get the call tree, and I can't look for the ultimate responsible for that. So, I've tried to run the profile module included in the standard library in order to see which are the hot spots in python: $ time ~/python.nobackup/Python-2.4/python -m profile -s time create-numarray.py 1016105 function calls (1016064 primitive calls) in 25.290 CPU seconds Ordered by: internal time ncalls tottime percall cumtime percall filename:lineno(function) 1 19.220 19.220 25.290 25.290 create-numarray.py:1(?) 999999 5.530 0.000 5.530 0.000 numarraycore.py:514(__del__) 1753 0.160 0.000 0.160 0.000 :0(eval) 1 0.060 0.060 0.340 0.340 numarraycore.py:3(?) 1 0.050 0.050 0.390 0.390 generic.py:8(?) 1 0.040 0.040 0.490 0.490 numarrayall.py:1(?) 3455 0.040 0.000 0.040 0.000 :0(len) 1 0.030 0.030 0.190 0.190 ufunc.py:1504(_makeCUFuncDict) 51 0.030 0.001 0.070 0.001 ufunc.py:184(_nIOArgs) 3572 0.030 0.000 0.030 0.000 :0(has_key) 2582 0.020 0.000 0.020 0.000 :0(append) 1000 0.020 0.000 0.020 0.000 :0(range) 1 0.010 0.010 0.010 0.010 generic.py:510 (_stridesFromShape) 42/1 0.010 0.000 25.290 25.290 :1(?) but, to say the truth, I can't really see where the time is exactly consumed. Perhaps somebody with more experience can put more light on this? Another thing that I find intriguing has to do with Numeric and oprofile output. Let me remember: samples % image name symbol name 279 15.6478 libpthread-0.10.so __pthread_alt_unlock 216 12.1144 libc-2.3.2.so memmove 187 10.4879 python lookdict_string 162 9.0858 python PyEval_EvalFrame 144 8.0763 libpthread-0.10.so __pthread_alt_lock 126 7.0667 libpthread-0.10.so __pthread_alt_trylock 56 3.1408 python PyDict_SetItem 53 2.9725 libpthread-0.10.so __GI___pthread_mutex_unlock 45 2.5238 _numpy.so PyArray_FromDimsAndDataAndDescr 39 2.1873 libc-2.3.2.so __malloc 36 2.0191 libc-2.3.2.so __cfree we can see that a lot of the time in the benchmark using Numeric is consumed in libc space (a 37% or so). However, only a 16% is used in memory-related tasks (memmove, malloc and free) while the rest seems to be used in thread issues (??). Again, anyone can explain why the pthread* routines take so many time, or why they appear here at all?. Perhaps getting rid of these calls might improve the Numeric performance even further. Cheers, -- >qo< Francesc Altet ? ? http://www.carabos.com/ V ?V C?rabos Coop. V. ??Enjoy Data "" From stephen.walton at csun.edu Thu Jan 27 13:51:07 2005 From: stephen.walton at csun.edu (Stephen Walton) Date: Thu Jan 27 13:51:07 2005 Subject: [Numpy-discussion] problems with duplicating and slicing an array In-Reply-To: References: <7cffadfa05012017293f833a87@mail.gmail.com> <20050121124417.15da0438.simon@arrowtheory.com> <41F1796E.7040303@noaa.gov><41F191D7.9040906@ee.byu.edu> Message-ID: <41F9621F.5040300@csun.edu> Alan G Isaac wrote: >On Fri, 21 Jan 2005, Travis Oliphant apparently wrote: > > >>from scipy import * >>alter_numeric() >>i = array([0,2]) >>x = array([1.1,2.2,3.3,4.4]) >>y = x[i] >> >> > >This ^ gives me an invalid index error. >scipy version 0.3.0_266.4242 > > Travis's example works for me at scipy 0.3.2_302.4549 (from CVS), Numeric 23.6, numarray 1.1.1, all on FC3. From oliphant at ee.byu.edu Thu Jan 27 14:31:06 2005 From: oliphant at ee.byu.edu (Travis Oliphant) Date: Thu Jan 27 14:31:06 2005 Subject: [Numpy-discussion] problems with duplicating and slicing an array In-Reply-To: References: <7cffadfa05012017293f833a87@mail.gmail.com> <20050121124417.15da0438.simon@arrowtheory.com> <41F1796E.7040303@noaa.gov><41F191D7.9040906@ee.byu.edu> Message-ID: <41F96A4A.4080506@ee.byu.edu> Alan G Isaac wrote: >On Fri, 21 Jan 2005, Travis Oliphant apparently wrote: > > >>from scipy import * >>alter_numeric() >>i = array([0,2]) >>x = array([1.1,2.2,3.3,4.4]) >>y = x[i] >> >> > >This ^ gives me an invalid index error. >scipy version 0.3.0_266.4242 > > Your version of scipy is apparently too low. Mine is 0.3.2_299.4506 -Travis From jmiller at stsci.edu Fri Jan 28 03:48:05 2005 From: jmiller at stsci.edu (Todd Miller) Date: Fri Jan 28 03:48:05 2005 Subject: [Numpy-discussion] Speeding up Numeric In-Reply-To: <200501272136.07558.faltet@carabos.com> References: <87k6q264w1.fsf@localhost.localdomain> <1106601157.5361.42.camel@jaytmiller.comcast.net> <200501272136.07558.faltet@carabos.com> Message-ID: <1106912892.6118.25.camel@jaytmiller.comcast.net> I got some insight into what I think is the tall pole in the profile: sub-array creation is implemented using views. The generic indexing code does a view() Python callback because object arrays override view (). Faster view() creation for numerical arrays can be achieved like this by avoiding the callback: Index: Src/_ndarraymodule.c =================================================================== RCS file: /cvsroot/numpy/numarray/Src/_ndarraymodule.c,v retrieving revision 1.75 diff -c -r1.75 _ndarraymodule.c *** Src/_ndarraymodule.c 14 Jan 2005 14:13:22 -0000 1.75 --- Src/_ndarraymodule.c 28 Jan 2005 11:15:50 -0000 *************** *** 453,460 **** } } else { /* partially subscripted --> subarray */ long i; ! result = (PyArrayObject *) ! PyObject_CallMethod((PyObject *) self,"view",NULL); if (!result) goto _exit; result->nd = result->nstrides = self->nd - nindices; --- 453,463 ---- } } else { /* partially subscripted --> subarray */ long i; ! if (NA_NumArrayCheck((PyObject *)self)) ! result = _view(self); ! else ! result = (PyArrayObject *) PyObject_CallMethod( ! (PyObject *) self,"view",NULL); if (!result) goto _exit; result->nd = result->nstrides = self->nd - nindices; I committed the patch above to CVS for now. This optimization makes view() "non-overridable" for NumArray subclasses so there is probably a better way of doing this. One other thing that struck me looking at your profile, and it has been discussed before, is that NumArray.__del__() needs to be pushed (back) down into C. Getting rid of __del__ would also synergyze well with making an object freelist, one aspect of which is capturing unneeded objects rather than destroying them. Thanks for the profile. Regards, Todd On Thu, 2005-01-27 at 21:36 +0100, Francesc Altet wrote: > Hi, > > After a while of waiting for some free time, I'm playing myself with > the excellent oprofile, and try to help in reducing numarray creation. > > For that goal, I selected the next small benchmark: > > import numarray > a = numarray.arange(2000) > a.shape=(1000,2) > for j in xrange(1000): > for i in range(len(a)): > row=a[i] > > I know that it mixes creation with indexing cost, but as the indexing > cost of numarray is only a bit slower (perhaps a 40%) than Numeric, > while array creation time is 5 to 10 times slower, I think this > benchmark may provide a good starting point to see what's going on. > > For numarray, I've got the next results: > > samples % image name symbol name > 902 7.3238 python PyEval_EvalFrame > 835 6.7798 python lookdict_string > 408 3.3128 python PyObject_GenericGetAttr > 384 3.1179 python PyDict_GetItem > 383 3.1098 libc-2.3.2.so memcpy > 358 2.9068 libpthread-0.10.so __pthread_alt_unlock > 293 2.3790 python _PyString_Eq > 273 2.2166 libnumarray.so NA_updateStatus > 273 2.2166 python PyType_IsSubtype > 271 2.2004 python countformat > 252 2.0461 libc-2.3.2.so memset > 249 2.0218 python string_hash > 248 2.0136 _ndarray.so _universalIndexing > > while for Numeric I've got this: > > samples % image name symbol name > 279 15.6478 libpthread-0.10.so __pthread_alt_unlock > 216 12.1144 libc-2.3.2.so memmove > 187 10.4879 python lookdict_string > 162 9.0858 python PyEval_EvalFrame > 144 8.0763 libpthread-0.10.so __pthread_alt_lock > 126 7.0667 libpthread-0.10.so __pthread_alt_trylock > 56 3.1408 python PyDict_SetItem > 53 2.9725 libpthread-0.10.so __GI___pthread_mutex_unlock > 45 2.5238 _numpy.so PyArray_FromDimsAndDataAndDescr > 39 2.1873 libc-2.3.2.so __malloc > 36 2.0191 libc-2.3.2.so __cfree > > one preliminary result is that numarray spends a lot more time in > Python space than do Numeric, as Todd already said here. The problem > is that, as I have not yet patched my kernel, I can't get the call > tree, and I can't look for the ultimate responsible for that. > > So, I've tried to run the profile module included in the standard > library in order to see which are the hot spots in python: > > $ time ~/python.nobackup/Python-2.4/python -m profile -s time > create-numarray.py > 1016105 function calls (1016064 primitive calls) in 25.290 CPU > seconds > > Ordered by: internal time > > ncalls tottime percall cumtime percall filename:lineno(function) > 1 19.220 19.220 25.290 25.290 create-numarray.py:1(?) > 999999 5.530 0.000 5.530 0.000 numarraycore.py:514(__del__) > 1753 0.160 0.000 0.160 0.000 :0(eval) > 1 0.060 0.060 0.340 0.340 numarraycore.py:3(?) > 1 0.050 0.050 0.390 0.390 generic.py:8(?) > 1 0.040 0.040 0.490 0.490 numarrayall.py:1(?) > 3455 0.040 0.000 0.040 0.000 :0(len) > 1 0.030 0.030 0.190 0.190 ufunc.py:1504(_makeCUFuncDict) > 51 0.030 0.001 0.070 0.001 ufunc.py:184(_nIOArgs) > 3572 0.030 0.000 0.030 0.000 :0(has_key) > 2582 0.020 0.000 0.020 0.000 :0(append) > 1000 0.020 0.000 0.020 0.000 :0(range) > 1 0.010 0.010 0.010 0.010 generic.py:510 > (_stridesFromShape) > 42/1 0.010 0.000 25.290 25.290 :1(?) > > but, to say the truth, I can't really see where the time is exactly > consumed. Perhaps somebody with more experience can put more light on > this? > > Another thing that I find intriguing has to do with Numeric and > oprofile output. Let me remember: > > samples % image name symbol name > 279 15.6478 libpthread-0.10.so __pthread_alt_unlock > 216 12.1144 libc-2.3.2.so memmove > 187 10.4879 python lookdict_string > 162 9.0858 python PyEval_EvalFrame > 144 8.0763 libpthread-0.10.so __pthread_alt_lock > 126 7.0667 libpthread-0.10.so __pthread_alt_trylock > 56 3.1408 python PyDict_SetItem > 53 2.9725 libpthread-0.10.so __GI___pthread_mutex_unlock > 45 2.5238 _numpy.so PyArray_FromDimsAndDataAndDescr > 39 2.1873 libc-2.3.2.so __malloc > 36 2.0191 libc-2.3.2.so __cfree > > we can see that a lot of the time in the benchmark using Numeric is > consumed in libc space (a 37% or so). However, only a 16% is used in > memory-related tasks (memmove, malloc and free) while the rest seems > to be used in thread issues (??). Again, anyone can explain why the > pthread* routines take so many time, or why they appear here at all?. > Perhaps getting rid of these calls might improve the Numeric > performance even further. > > Cheers, > From Norbert.Nemec.list at gmx.de Fri Jan 28 12:34:37 2005 From: Norbert.Nemec.list at gmx.de (Norbert Nemec) Date: Fri Jan 28 12:34:37 2005 Subject: [Numpy-discussion] Speeding up Numeric In-Reply-To: <200501272136.07558.faltet@carabos.com> References: <1106601157.5361.42.camel@jaytmiller.comcast.net> <200501272136.07558.faltet@carabos.com> Message-ID: <200501282116.55184.Norbert.Nemec.list@gmx.de> Am Donnerstag 27 Januar 2005 21:36 schrieb Francesc Altet: > So, I've tried to run the profile module included in the standard > library in order to see which are the hot spots in python: > > $ time ~/python.nobackup/Python-2.4/python -m profile -s time > create-numarray.py > 1016105 function calls (1016064 primitive calls) in 25.290 CPU > seconds > > Ordered by: internal time > > ncalls tottime percall cumtime percall filename:lineno(function) > 1 19.220 19.220 25.290 25.290 create-numarray.py:1(?) > 999999 5.530 0.000 5.530 0.000 numarraycore.py:514(__del__) Might it actually be that (at least part of) the speed problem lies in __del__? I don't have any tools for benchmarking at hand, so I can only ask other to experiment, but I recall that it already struck me odd a little while ago, that hitting Ctrl-C in the middle of numarray-calculations nearly always gave me a backtrace ending inside a __del__ function. Should be trivial to test: deactivate __del__ completely for a test run. Ciao, Norbert -- _________________________________________Norbert Nemec Bernhardstr. 2 ... D-93053 Regensburg Tel: 0941 - 2009638 ... Mobil: 0179 - 7475199 eMail: From Norbert.Nemec.list at gmx.de Fri Jan 28 13:39:12 2005 From: Norbert.Nemec.list at gmx.de (Norbert Nemec) Date: Fri Jan 28 13:39:12 2005 Subject: [Numpy-discussion] 30% speedup when deactivating NumArray.__del__ !!! In-Reply-To: <200501272136.07558.faltet@carabos.com> References: <1106601157.5361.42.camel@jaytmiller.comcast.net> <200501272136.07558.faltet@carabos.com> Message-ID: <200501282223.56850.Norbert.Nemec.list@gmx.de> Hi there, indeed my suspicion has proven correct: Am Donnerstag 27 Januar 2005 21:36 schrieb Francesc Altet: [...] > For that goal, I selected the next small benchmark: > > import numarray > a = numarray.arange(2000) > a.shape=(1000,2) > for j in xrange(1000): > for i in range(len(a)): > row=a[i] [...] > So, I've tried to run the profile module included in the standard > library in order to see which are the hot spots in python: > > $ time ~/python.nobackup/Python-2.4/python -m profile -s time > create-numarray.py > 1016105 function calls (1016064 primitive calls) in 25.290 CPU > seconds > > Ordered by: internal time > > ncalls tottime percall cumtime percall filename:lineno(function) > 1 19.220 19.220 25.290 25.290 create-numarray.py:1(?) > 999999 5.530 0.000 5.530 0.000 numarraycore.py:514(__del__) > 1753 0.160 0.000 0.160 0.000 :0(eval) [...] This benchmark made me suspicious since I had already found it odd before that killing a numarray calculation with Ctrl-C nearly always gives a backtrace starting in __del__ I did the simple thing: simply comment out the NumArray.__del__ routine (numarraycore.py, line 514, current CVS version) The result is astonishing: Vanilla numarray: nobbi at Marvin:~/tmp $ time python create-array.py real 0m9.457s user 0m8.851s sys 0m0.038s NumArray.__del__ commented out: nobbi at Marvin:~/tmp $ time python create-array.py real 0m6.512s user 0m6.065s sys 0m0.021s 30% speedup !!!!!! Doing a detailed benchmarking shows similar results. I don't think I have to go on about this at this point. It seems clear that __del__ has to be avoided in such a central position. Ciao, Norbert -- _________________________________________Norbert Nemec Bernhardstr. 2 ... D-93053 Regensburg Tel: 0941 - 2009638 ... Mobil: 0179 - 7475199 eMail: From faltet at carabos.com Fri Jan 28 14:30:54 2005 From: faltet at carabos.com (Francesc Altet) Date: Fri Jan 28 14:30:54 2005 Subject: [Numpy-discussion] Speeding up Numeric In-Reply-To: <1106912892.6118.25.camel@jaytmiller.comcast.net> References: <200501272136.07558.faltet@carabos.com> <1106912892.6118.25.camel@jaytmiller.comcast.net> Message-ID: <200501282327.21093.faltet@carabos.com> Hi Todd, Nice to see that you can achieved a good speed-up with your optimization path. With the next code: import numarray a = numarray.arange(2000) a.shape=(1000,2) for j in xrange(1000): for i in range(len(a)): row=a[i] and original numarray-1.1.1 it took 11.254s (pentium4 at 2GHz). With your patch, this time has been reduced to 7.816s. Now, following your suggestion to push NumArray.__del__ down into C, I've got a good speed-up as well: 5.332s. This is more that twice as fast as the unpatched numarray 1.1.1. There is still a long way until we can catch Numeric (1.123s), but it is a first step :) The patch. Please, revise it as I'm not very used with dealing with pure C extensions (just a Pyrex user): Index: Lib/numarraycore.py =================================================================== RCS file: /cvsroot/numpy/numarray/Lib/numarraycore.py,v retrieving revision 1.101 diff -r1.101 numarraycore.py 696,699c696,699 < def __del__(self): < if self._shadows != None: < self._shadows._copyFrom(self) < self._shadows = None --- > def __del__(self): > if self._shadows != None: > self._shadows._copyFrom(self) > self._shadows = None Index: Src/_numarraymodule.c =================================================================== RCS file: /cvsroot/numpy/numarray/Src/_numarraymodule.c,v retrieving revision 1.65 diff -r1.65 _numarraymodule.c 399a400,411 > static void > _numarray_dealloc(PyObject *self) > { > PyArrayObject *selfa = (PyArrayObject *) self; > > if (selfa->_shadows != NULL) { > _copyFrom(selfa->_shadows, self); > selfa->_shadows = NULL; > } > self->ob_type->tp_free(self); > } > 421c433 < 0, /* tp_dealloc */ --- > _numarray_dealloc, /* tp_dealloc */ The profile with the new optimizations looks now like: samples % image name symbol name 453 8.6319 python PyEval_EvalFrame 372 7.0884 python lookdict_string 349 6.6502 python string_hash 271 5.1639 libc-2.3.2.so _wordcopy_bwd_aligned 210 4.0015 libnumarray.so NA_updateStatus 194 3.6966 python _PyString_Eq 185 3.5252 libc-2.3.2.so __GI___strcasecmp 162 3.0869 python subtype_dealloc 158 3.0107 libc-2.3.2.so _int_malloc 147 2.8011 libnumarray.so isBufferWriteable 145 2.7630 python PyDict_SetItem 135 2.5724 _ndarray.so _view 131 2.4962 python PyObject_GenericGetAttr 122 2.3247 python PyDict_GetItem 100 1.9055 python PyString_InternInPlace 94 1.7912 libnumarray.so getReadBufferDataPtr 77 1.4672 _ndarray.so _simpleIndexingCore i.e. time spent in libc and libnumarray is going up in the list, as it should. Now, we have to concentrate in other points of optimization. Perhaps is a good time to have a try on recompiling the kernel and getting the call tree... Cheers, A Divendres 28 Gener 2005 12:48, Todd Miller va escriure: > I got some insight into what I think is the tall pole in the profile: > sub-array creation is implemented using views. The generic indexing > code does a view() Python callback because object arrays override view > (). Faster view() creation for numerical arrays can be achieved like > this by avoiding the callback: > > Index: Src/_ndarraymodule.c > =================================================================== > RCS file: /cvsroot/numpy/numarray/Src/_ndarraymodule.c,v > retrieving revision 1.75 > diff -c -r1.75 _ndarraymodule.c > *** Src/_ndarraymodule.c 14 Jan 2005 14:13:22 -0000 1.75 > --- Src/_ndarraymodule.c 28 Jan 2005 11:15:50 -0000 > *************** > *** 453,460 **** > } > } else { /* partially subscripted --> subarray */ > long i; > ! result = (PyArrayObject *) > ! PyObject_CallMethod((PyObject *) > self,"view",NULL); > if (!result) goto _exit; > > result->nd = result->nstrides = self->nd - nindices; > --- 453,463 ---- > } > } else { /* partially subscripted --> subarray */ > long i; > ! if (NA_NumArrayCheck((PyObject *)self)) > ! result = _view(self); > ! else > ! result = (PyArrayObject *) PyObject_CallMethod( > ! (PyObject *) self,"view",NULL); > if (!result) goto _exit; > > result->nd = result->nstrides = self->nd - nindices; > > I committed the patch above to CVS for now. This optimization makes > view() "non-overridable" for NumArray subclasses so there is probably a > better way of doing this. > > One other thing that struck me looking at your profile, and it has been > discussed before, is that NumArray.__del__() needs to be pushed (back) > down into C. Getting rid of __del__ would also synergyze well with > making an object freelist, one aspect of which is capturing unneeded > objects rather than destroying them. > > Thanks for the profile. > > Regards, > Todd > > On Thu, 2005-01-27 at 21:36 +0100, Francesc Altet wrote: > > Hi, > > > > After a while of waiting for some free time, I'm playing myself with > > the excellent oprofile, and try to help in reducing numarray creation. > > > > For that goal, I selected the next small benchmark: > > > > import numarray > > a = numarray.arange(2000) > > a.shape=(1000,2) > > for j in xrange(1000): > > for i in range(len(a)): > > row=a[i] > > > > I know that it mixes creation with indexing cost, but as the indexing > > cost of numarray is only a bit slower (perhaps a 40%) than Numeric, > > while array creation time is 5 to 10 times slower, I think this > > benchmark may provide a good starting point to see what's going on. > > > > For numarray, I've got the next results: > > > > samples % image name symbol name > > 902 7.3238 python PyEval_EvalFrame > > 835 6.7798 python lookdict_string > > 408 3.3128 python PyObject_GenericGetAttr > > 384 3.1179 python PyDict_GetItem > > 383 3.1098 libc-2.3.2.so memcpy > > 358 2.9068 libpthread-0.10.so __pthread_alt_unlock > > 293 2.3790 python _PyString_Eq > > 273 2.2166 libnumarray.so NA_updateStatus > > 273 2.2166 python PyType_IsSubtype > > 271 2.2004 python countformat > > 252 2.0461 libc-2.3.2.so memset > > 249 2.0218 python string_hash > > 248 2.0136 _ndarray.so _universalIndexing > > > > while for Numeric I've got this: > > > > samples % image name symbol name > > 279 15.6478 libpthread-0.10.so __pthread_alt_unlock > > 216 12.1144 libc-2.3.2.so memmove > > 187 10.4879 python lookdict_string > > 162 9.0858 python PyEval_EvalFrame > > 144 8.0763 libpthread-0.10.so __pthread_alt_lock > > 126 7.0667 libpthread-0.10.so __pthread_alt_trylock > > 56 3.1408 python PyDict_SetItem > > 53 2.9725 libpthread-0.10.so __GI___pthread_mutex_unlock > > 45 2.5238 _numpy.so > > PyArray_FromDimsAndDataAndDescr 39 2.1873 libc-2.3.2.so > > __malloc > > 36 2.0191 libc-2.3.2.so __cfree > > > > one preliminary result is that numarray spends a lot more time in > > Python space than do Numeric, as Todd already said here. The problem > > is that, as I have not yet patched my kernel, I can't get the call > > tree, and I can't look for the ultimate responsible for that. > > > > So, I've tried to run the profile module included in the standard > > library in order to see which are the hot spots in python: > > > > $ time ~/python.nobackup/Python-2.4/python -m profile -s time > > create-numarray.py > > 1016105 function calls (1016064 primitive calls) in 25.290 CPU > > seconds > > > > Ordered by: internal time > > > > ncalls tottime percall cumtime percall filename:lineno(function) > > 1 19.220 19.220 25.290 25.290 create-numarray.py:1(?) > > 999999 5.530 0.000 5.530 0.000 > > numarraycore.py:514(__del__) 1753 0.160 0.000 0.160 0.000 > > :0(eval) > > 1 0.060 0.060 0.340 0.340 numarraycore.py:3(?) > > 1 0.050 0.050 0.390 0.390 generic.py:8(?) > > 1 0.040 0.040 0.490 0.490 numarrayall.py:1(?) > > 3455 0.040 0.000 0.040 0.000 :0(len) > > 1 0.030 0.030 0.190 0.190 > > ufunc.py:1504(_makeCUFuncDict) 51 0.030 0.001 0.070 0.001 > > ufunc.py:184(_nIOArgs) 3572 0.030 0.000 0.030 0.000 > > :0(has_key) > > 2582 0.020 0.000 0.020 0.000 :0(append) > > 1000 0.020 0.000 0.020 0.000 :0(range) > > 1 0.010 0.010 0.010 0.010 generic.py:510 > > (_stridesFromShape) > > 42/1 0.010 0.000 25.290 25.290 :1(?) > > > > but, to say the truth, I can't really see where the time is exactly > > consumed. Perhaps somebody with more experience can put more light on > > this? > > > > Another thing that I find intriguing has to do with Numeric and > > oprofile output. Let me remember: > > > > samples % image name symbol name > > 279 15.6478 libpthread-0.10.so __pthread_alt_unlock > > 216 12.1144 libc-2.3.2.so memmove > > 187 10.4879 python lookdict_string > > 162 9.0858 python PyEval_EvalFrame > > 144 8.0763 libpthread-0.10.so __pthread_alt_lock > > 126 7.0667 libpthread-0.10.so __pthread_alt_trylock > > 56 3.1408 python PyDict_SetItem > > 53 2.9725 libpthread-0.10.so __GI___pthread_mutex_unlock > > 45 2.5238 _numpy.so > > PyArray_FromDimsAndDataAndDescr 39 2.1873 libc-2.3.2.so > > __malloc > > 36 2.0191 libc-2.3.2.so __cfree > > > > we can see that a lot of the time in the benchmark using Numeric is > > consumed in libc space (a 37% or so). However, only a 16% is used in > > memory-related tasks (memmove, malloc and free) while the rest seems > > to be used in thread issues (??). Again, anyone can explain why the > > pthread* routines take so many time, or why they appear here at all?. > > Perhaps getting rid of these calls might improve the Numeric > > performance even further. > > > > Cheers, > > ------------------------------------------------------- > This SF.Net email is sponsored by: IntelliVIEW -- Interactive Reporting > Tool for open source databases. Create drag-&-drop reports. Save time > by over 75%! Publish reports on the web. Export to DOC, XLS, RTF, etc. > Download a FREE copy at http://www.intelliview.com/go/osdn_nl > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/numpy-discussion -- >qo< Francesc Altet ? ? http://www.carabos.com/ V ?V C?rabos Coop. V. ??Enjoy Data "" From jmiller at stsci.edu Fri Jan 28 15:02:18 2005 From: jmiller at stsci.edu (Todd Miller) Date: Fri Jan 28 15:02:18 2005 Subject: [Numpy-discussion] Speeding up Numeric In-Reply-To: <200501282327.21093.faltet@carabos.com> References: <200501272136.07558.faltet@carabos.com> <1106912892.6118.25.camel@jaytmiller.comcast.net> <200501282327.21093.faltet@carabos.com> Message-ID: <1106953001.7990.35.camel@halloween.stsci.edu> Nice work! But... IRC, there's a problem with moving __del__ down to C, possibly only for a --with-pydebug Python, I can't remember. It's a serious problem though... it dumps core. I'll try to see if I can come up with something conditionally compiled. Related note to Make Todd's Life Easy: use "cvs diff -c" to make context diffs which "patch" applies effortlessly. Thanks for getting the ball rolling. 2x is nothing to sneeze at. Todd On Fri, 2005-01-28 at 17:27, Francesc Altet wrote: > Hi Todd, > > Nice to see that you can achieved a good speed-up with your > optimization path. With the next code: > > import numarray > a = numarray.arange(2000) > a.shape=(1000,2) > for j in xrange(1000): > for i in range(len(a)): > row=a[i] > > and original numarray-1.1.1 it took 11.254s (pentium4 at 2GHz). With your > patch, this time has been reduced to 7.816s. Now, following your > suggestion to push NumArray.__del__ down into C, I've got a good > speed-up as well: 5.332s. This is more that twice as fast as the > unpatched numarray 1.1.1. There is still a long way until we can catch > Numeric (1.123s), but it is a first step :) > > The patch. Please, revise it as I'm not very used with dealing with > pure C extensions (just a Pyrex user): > > Index: Lib/numarraycore.py > =================================================================== > RCS file: /cvsroot/numpy/numarray/Lib/numarraycore.py,v > retrieving revision 1.101 > diff -r1.101 numarraycore.py > 696,699c696,699 > < def __del__(self): > < if self._shadows != None: > < self._shadows._copyFrom(self) > < self._shadows = None > --- > > def __del__(self): > > if self._shadows != None: > > self._shadows._copyFrom(self) > > self._shadows = None > Index: Src/_numarraymodule.c > =================================================================== > RCS file: /cvsroot/numpy/numarray/Src/_numarraymodule.c,v > retrieving revision 1.65 > diff -r1.65 _numarraymodule.c > 399a400,411 > > static void > > _numarray_dealloc(PyObject *self) > > { > > PyArrayObject *selfa = (PyArrayObject *) self; > > > > if (selfa->_shadows != NULL) { > > _copyFrom(selfa->_shadows, self); > > selfa->_shadows = NULL; > > } > > self->ob_type->tp_free(self); > > } > > > 421c433 > < 0, /* tp_dealloc */ > --- > > _numarray_dealloc, /* tp_dealloc */ > > > The profile with the new optimizations looks now like: > > samples % image name symbol name > 453 8.6319 python PyEval_EvalFrame > 372 7.0884 python lookdict_string > 349 6.6502 python string_hash > 271 5.1639 libc-2.3.2.so _wordcopy_bwd_aligned > 210 4.0015 libnumarray.so NA_updateStatus > 194 3.6966 python _PyString_Eq > 185 3.5252 libc-2.3.2.so __GI___strcasecmp > 162 3.0869 python subtype_dealloc > 158 3.0107 libc-2.3.2.so _int_malloc > 147 2.8011 libnumarray.so isBufferWriteable > 145 2.7630 python PyDict_SetItem > 135 2.5724 _ndarray.so _view > 131 2.4962 python PyObject_GenericGetAttr > 122 2.3247 python PyDict_GetItem > 100 1.9055 python PyString_InternInPlace > 94 1.7912 libnumarray.so getReadBufferDataPtr > 77 1.4672 _ndarray.so _simpleIndexingCore > > i.e. time spent in libc and libnumarray is going up in the list, as it > should. Now, we have to concentrate in other points of optimization. > Perhaps is a good time to have a try on recompiling the kernel and > getting the call tree... > > Cheers, > > A Divendres 28 Gener 2005 12:48, Todd Miller va escriure: > > I got some insight into what I think is the tall pole in the profile: > > sub-array creation is implemented using views. The generic indexing > > code does a view() Python callback because object arrays override view > > (). Faster view() creation for numerical arrays can be achieved like > > this by avoiding the callback: > > > > Index: Src/_ndarraymodule.c > > =================================================================== > > RCS file: /cvsroot/numpy/numarray/Src/_ndarraymodule.c,v > > retrieving revision 1.75 > > diff -c -r1.75 _ndarraymodule.c > > *** Src/_ndarraymodule.c 14 Jan 2005 14:13:22 -0000 1.75 > > --- Src/_ndarraymodule.c 28 Jan 2005 11:15:50 -0000 > > *************** > > *** 453,460 **** > > } > > } else { /* partially subscripted --> subarray */ > > long i; > > ! result = (PyArrayObject *) > > ! PyObject_CallMethod((PyObject *) > > self,"view",NULL); > > if (!result) goto _exit; > > > > result->nd = result->nstrides = self->nd - nindices; > > --- 453,463 ---- > > } > > } else { /* partially subscripted --> subarray */ > > long i; > > ! if (NA_NumArrayCheck((PyObject *)self)) > > ! result = _view(self); > > ! else > > ! result = (PyArrayObject *) PyObject_CallMethod( > > ! (PyObject *) self,"view",NULL); > > if (!result) goto _exit; > > > > result->nd = result->nstrides = self->nd - nindices; > > > > I committed the patch above to CVS for now. This optimization makes > > view() "non-overridable" for NumArray subclasses so there is probably a > > better way of doing this. > > > > One other thing that struck me looking at your profile, and it has been > > discussed before, is that NumArray.__del__() needs to be pushed (back) > > down into C. Getting rid of __del__ would also synergyze well with > > making an object freelist, one aspect of which is capturing unneeded > > objects rather than destroying them. > > > > Thanks for the profile. > > > > Regards, > > Todd > > > > On Thu, 2005-01-27 at 21:36 +0100, Francesc Altet wrote: > > > Hi, > > > > > > After a while of waiting for some free time, I'm playing myself with > > > the excellent oprofile, and try to help in reducing numarray creation. > > > > > > For that goal, I selected the next small benchmark: > > > > > > import numarray > > > a = numarray.arange(2000) > > > a.shape=(1000,2) > > > for j in xrange(1000): > > > for i in range(len(a)): > > > row=a[i] > > > > > > I know that it mixes creation with indexing cost, but as the indexing > > > cost of numarray is only a bit slower (perhaps a 40%) than Numeric, > > > while array creation time is 5 to 10 times slower, I think this > > > benchmark may provide a good starting point to see what's going on. > > > > > > For numarray, I've got the next results: > > > > > > samples % image name symbol name > > > 902 7.3238 python PyEval_EvalFrame > > > 835 6.7798 python lookdict_string > > > 408 3.3128 python PyObject_GenericGetAttr > > > 384 3.1179 python PyDict_GetItem > > > 383 3.1098 libc-2.3.2.so memcpy > > > 358 2.9068 libpthread-0.10.so __pthread_alt_unlock > > > 293 2.3790 python _PyString_Eq > > > 273 2.2166 libnumarray.so NA_updateStatus > > > 273 2.2166 python PyType_IsSubtype > > > 271 2.2004 python countformat > > > 252 2.0461 libc-2.3.2.so memset > > > 249 2.0218 python string_hash > > > 248 2.0136 _ndarray.so _universalIndexing > > > > > > while for Numeric I've got this: > > > > > > samples % image name symbol name > > > 279 15.6478 libpthread-0.10.so __pthread_alt_unlock > > > 216 12.1144 libc-2.3.2.so memmove > > > 187 10.4879 python lookdict_string > > > 162 9.0858 python PyEval_EvalFrame > > > 144 8.0763 libpthread-0.10.so __pthread_alt_lock > > > 126 7.0667 libpthread-0.10.so __pthread_alt_trylock > > > 56 3.1408 python PyDict_SetItem > > > 53 2.9725 libpthread-0.10.so __GI___pthread_mutex_unlock > > > 45 2.5238 _numpy.so > > > PyArray_FromDimsAndDataAndDescr 39 2.1873 libc-2.3.2.so > > > __malloc > > > 36 2.0191 libc-2.3.2.so __cfree > > > > > > one preliminary result is that numarray spends a lot more time in > > > Python space than do Numeric, as Todd already said here. The problem > > > is that, as I have not yet patched my kernel, I can't get the call > > > tree, and I can't look for the ultimate responsible for that. > > > > > > So, I've tried to run the profile module included in the standard > > > library in order to see which are the hot spots in python: > > > > > > $ time ~/python.nobackup/Python-2.4/python -m profile -s time > > > create-numarray.py > > > 1016105 function calls (1016064 primitive calls) in 25.290 CPU > > > seconds > > > > > > Ordered by: internal time > > > > > > ncalls tottime percall cumtime percall filename:lineno(function) > > > 1 19.220 19.220 25.290 25.290 create-numarray.py:1(?) > > > 999999 5.530 0.000 5.530 0.000 > > > numarraycore.py:514(__del__) 1753 0.160 0.000 0.160 0.000 > > > :0(eval) > > > 1 0.060 0.060 0.340 0.340 numarraycore.py:3(?) > > > 1 0.050 0.050 0.390 0.390 generic.py:8(?) > > > 1 0.040 0.040 0.490 0.490 numarrayall.py:1(?) > > > 3455 0.040 0.000 0.040 0.000 :0(len) > > > 1 0.030 0.030 0.190 0.190 > > > ufunc.py:1504(_makeCUFuncDict) 51 0.030 0.001 0.070 0.001 > > > ufunc.py:184(_nIOArgs) 3572 0.030 0.000 0.030 0.000 > > > :0(has_key) > > > 2582 0.020 0.000 0.020 0.000 :0(append) > > > 1000 0.020 0.000 0.020 0.000 :0(range) > > > 1 0.010 0.010 0.010 0.010 generic.py:510 > > > (_stridesFromShape) > > > 42/1 0.010 0.000 25.290 25.290 :1(?) > > > > > > but, to say the truth, I can't really see where the time is exactly > > > consumed. Perhaps somebody with more experience can put more light on > > > this? > > > > > > Another thing that I find intriguing has to do with Numeric and > > > oprofile output. Let me remember: > > > > > > samples % image name symbol name > > > 279 15.6478 libpthread-0.10.so __pthread_alt_unlock > > > 216 12.1144 libc-2.3.2.so memmove > > > 187 10.4879 python lookdict_string > > > 162 9.0858 python PyEval_EvalFrame > > > 144 8.0763 libpthread-0.10.so __pthread_alt_lock > > > 126 7.0667 libpthread-0.10.so __pthread_alt_trylock > > > 56 3.1408 python PyDict_SetItem > > > 53 2.9725 libpthread-0.10.so __GI___pthread_mutex_unlock > > > 45 2.5238 _numpy.so > > > PyArray_FromDimsAndDataAndDescr 39 2.1873 libc-2.3.2.so > > > __malloc > > > 36 2.0191 libc-2.3.2.so __cfree > > > > > > we can see that a lot of the time in the benchmark using Numeric is > > > consumed in libc space (a 37% or so). However, only a 16% is used in > > > memory-related tasks (memmove, malloc and free) while the rest seems > > > to be used in thread issues (??). Again, anyone can explain why the > > > pthread* routines take so many time, or why they appear here at all?. > > > Perhaps getting rid of these calls might improve the Numeric > > > performance even further. > > > > > > Cheers, > > > > ------------------------------------------------------- > > This SF.Net email is sponsored by: IntelliVIEW -- Interactive Reporting > > Tool for open source databases. Create drag-&-drop reports. Save time > > by over 75%! Publish reports on the web. Export to DOC, XLS, RTF, etc. > > Download a FREE copy at http://www.intelliview.com/go/osdn_nl > > _______________________________________________ > > Numpy-discussion mailing list > > Numpy-discussion at lists.sourceforge.net > > https://lists.sourceforge.net/lists/listinfo/numpy-discussion -- From pearu at cens.ioc.ee Sun Jan 30 11:28:57 2005 From: pearu at cens.ioc.ee (Pearu Peterson) Date: Sun Jan 30 11:28:57 2005 Subject: [Numpy-discussion] ANN: F2PY - Fortran to Python Interface Generator Message-ID: F2PY - Fortran to Python Interface Generator -------------------------------------------- I am pleased to announce the ninth public release of F2PY, version 2.45.241_1926. The purpose of the F2PY project is to provide the connection between Python and Fortran programming languages. For more information, see http://cens.ioc.ee/projects/f2py2e/ Download: http://cens.ioc.ee/projects/f2py2e/2.x/F2PY-2-latest.tar.gz http://cens.ioc.ee/projects/f2py2e/2.x/F2PY-2-latest.win32.exe http://cens.ioc.ee/projects/f2py2e/2.x/scipy_distutils-latest.tar.gz http://cens.ioc.ee/projects/f2py2e/2.x/scipy_distutils-latest.win32.exe What's new? ------------ * Added support for wrapping signed integers and processing .pyf.src template files. * F2PY fortran objects have _cpointer attribute holding a C pointer to a wrapped function or a variable. When using _cpointer as a callback argument, the overhead of Python C/API is avoided giving for using callback arguments the same performance as calling Fortran or C function from Fortran or C, at the same time retaining the flexibility of Python. * Callback arguments can be built-in functions, fortran objects, and CObjects (hold by _cpointer attribute, for instance). * New attribute: ``intent(aux)`` to save parameter values. * New command line switches: --help-link and --link- * Numerous bugs are fixed. Support for ``usercode`` statement has been improved. * Documentation updates. Enjoy, Pearu Peterson ---------------

F2PY 2.45.241_1926 - The Fortran to Python Interface Generator (30-Jan-05) From andrewm at object-craft.com.au Sun Jan 30 16:57:34 2005 From: andrewm at object-craft.com.au (Andrew McNamara) Date: Sun Jan 30 16:57:34 2005 Subject: [Numpy-discussion] 30% speedup when deactivating NumArray.__del__ !!! In-Reply-To: <200501282223.56850.Norbert.Nemec.list@gmx.de> References: <1106601157.5361.42.camel@jaytmiller.comcast.net> <200501272136.07558.faltet@carabos.com> <200501282223.56850.Norbert.Nemec.list@gmx.de> Message-ID: <20050131005420.D5C563C889@coffee.object-craft.com.au> >This benchmark made me suspicious since I had already found it odd before that >killing a numarray calculation with Ctrl-C nearly always gives a backtrace >starting in __del__ Much of the python machinery may have been torn down when your __del__ method is called while the interpreter is exiting (I'm asuming you're talking about a script, rather than interactive mode). Code should be prepared for anything to fail - it's quite common for parts of __builtins__ to have been disassembled, etc. The language reference has this to say: http://python.org/doc/2.3.4/ref/customization.html#l2h-174 Warning: Due to the precarious circumstances under which __del__() methods are invoked, exceptions that occur during their execution are ignored, and a warning is printed to sys.stderr instead. Also, when __del__() is invoked in response to a module being deleted (e.g., when execution of the program is done), other globals referenced by the __del__() method may already have been deleted. For this reason, __del__() methods should do the absolute minimum needed to maintain external invariants. Starting with version 1.5, Python guarantees that globals whose name begins with a single underscore are deleted from their module before other globals are deleted; if no other references to such globals exist, this may help in assuring that imported modules are still available at the time when the __del__() method is called. Another important caveat of classes with __del__ methods is mentioned in the library reference for the "gc" module: http://python.org/doc/2.3.4/lib/module-gc.html Objects that have __del__() methods and are part of a reference cycle cause the entire reference cycle to be uncollectable, including objects not necessarily in the cycle but reachable only from it. Python doesn't collect such cycles automatically because, in general, it isn't possible for Python to guess a safe order in which to run the __del__() methods. -- Andrew McNamara, Senior Developer, Object Craft http://www.object-craft.com.au/ From jmiller at stsci.edu Mon Jan 31 10:30:21 2005 From: jmiller at stsci.edu (Todd Miller) Date: Mon Jan 31 10:30:21 2005 Subject: [Numpy-discussion] 30% speedup when deactivating NumArray.__del__ !!! In-Reply-To: <20050131005420.D5C563C889@coffee.object-craft.com.au> References: <1106601157.5361.42.camel@jaytmiller.comcast.net> <200501272136.07558.faltet@carabos.com> <200501282223.56850.Norbert.Nemec.list@gmx.de> <20050131005420.D5C563C889@coffee.object-craft.com.au> Message-ID: <1107196116.8508.154.camel@halloween.stsci.edu> Thanks Andrew, that was a useful summary. I wish I had more time to work on optimizing numarray personally, but I don't. Instead I'll try to share what I know of the state of __del__/tp_dealloc so that people who want to work on it can come up with something better: 1. We need __del__/tp_dealloc. (This may be controversial but I hope not). Using the destructor makes the high level C-API cleaner. Getting rid of it means changing the C-API. __del__/tp_dealloc is used to transparently copy the contents of a working array back onto an ill-behaved (byteswapped, etc...) source array at extension function exit time. 2. There's a problem with the tp_dealloc I originally implemented which causes it to segfault for a ./configure'ed --with-pydebug Python. Looking at it today, it looks like it may be an exit-time garbage collection problem. There is no explicit garbage collection support in _numarray or _ndarray, so that may be the problem. 3. We're definitely not exploiting the "single underscore rule" yet. We use single underscores mostly to hide globals from module export. I don't think this is really critical, but that's the state of things. 4. Circular references should only be a problem for numerical arrays with "user introduced" cycles. numarray ObjectArrays have no __del__. I attached a patch against CVS that reinstates the old tp_dealloc; this shows where I left off in case someone has insight on how to fix it. I haven't tested it recently for a non-debug Python; I think it works. The patch segfaults after the C-API examples/selftest for debug Pythons: % python setup.py install --selftest Using EXTRA_COMPILE_ARGS = [] running install running build running build_py copying Lib/numinclude.py -> build/lib.linux-i686-2.4/numarray running build_ext running install_lib copying build/lib.linux-i686-2.4/numarray/numinclude.py -> /home/jmiller/work/lib/python2.4/site-packages/numarray byte-compiling /home/jmiller/work/lib/python2.4/site-packages/numarray/numinclude.py to numinclude.pyc running install_headers copying Include/numarray/numconfig.h -> /home/jmiller/work/include/python2.4/numarray running install_data Testing numarray 1.2a on Python (2, 4, 0, 'final', 0) numarray.numtest: ((0, 1231), (0, 1231)) numarray.ieeespecial: (0, 20) numarray.records: (0, 48) numarray.strings: (0, 186) numarray.memmap: (0, 82) numarray.objects: (0, 105) numarray.memorytest: (0, 16) numarray.examples.convolve: ((0, 20), (0, 20), (0, 20), (0, 20)) Segmentation fault (core dumped) That's all I can add for now. Regards, Todd On Mon, 2005-01-31 at 11:54 +1100, Andrew McNamara wrote: > >This benchmark made me suspicious since I had already found it odd before that > >killing a numarray calculation with Ctrl-C nearly always gives a backtrace > >starting in __del__ > > Much of the python machinery may have been torn down when your __del__ > method is called while the interpreter is exiting (I'm asuming you're > talking about a script, rather than interactive mode). Code should > be prepared for anything to fail - it's quite common for parts of > __builtins__ to have been disassembled, etc. > > The language reference has this to say: > > http://python.org/doc/2.3.4/ref/customization.html#l2h-174 > > Warning: Due to the precarious circumstances under which __del__() > methods are invoked, exceptions that occur during their execution > are ignored, and a warning is printed to sys.stderr instead. Also, > when __del__() is invoked in response to a module being deleted (e.g., > when execution of the program is done), other globals referenced by > the __del__() method may already have been deleted. For this reason, > __del__() methods should do the absolute minimum needed to maintain > external invariants. Starting with version 1.5, Python guarantees that > globals whose name begins with a single underscore are deleted from > their module before other globals are deleted; if no other references > to such globals exist, this may help in assuring that imported modules > are still available at the time when the __del__() method is called. > > Another important caveat of classes with __del__ methods is mentioned in > the library reference for the "gc" module: > > http://python.org/doc/2.3.4/lib/module-gc.html > > Objects that have __del__() methods and are part of a reference > cycle cause the entire reference cycle to be uncollectable, > including objects not necessarily in the cycle but reachable only > from it. Python doesn't collect such cycles automatically because, > in general, it isn't possible for Python to guess a safe order in > which to run the __del__() methods. -------------- next part -------------- ? Lib/numinclude.py ? Lib/ufunc.warnings ? Lib/codegenerator/basecode.pyc ? Lib/codegenerator/bytescode.pyc ? Lib/codegenerator/convcode.pyc ? Lib/codegenerator/sortcode.pyc ? Lib/codegenerator/template.pyc ? Lib/codegenerator/ufunccode.pyc ? Src/_ufuncmodule.new Index: Lib/numarraycore.py =================================================================== RCS file: /cvsroot/numpy/numarray/Lib/numarraycore.py,v retrieving revision 1.101 diff -c -r1.101 numarraycore.py *** Lib/numarraycore.py 25 Jan 2005 11:25:09 -0000 1.101 --- Lib/numarraycore.py 31 Jan 2005 16:36:47 -0000 *************** *** 693,703 **** v._byteorder = self._byteorder return v - def __del__(self): - if self._shadows != None: - self._shadows._copyFrom(self) - self._shadows = None - def __getstate__(self): """returns state of NumArray for pickling.""" # assert not hasattr(self, "_shadows") # Not a good idea for pickling. --- 693,698 ---- Index: Src/_numarraymodule.c =================================================================== RCS file: /cvsroot/numpy/numarray/Src/_numarraymodule.c,v retrieving revision 1.65 diff -c -r1.65 _numarraymodule.c *** Src/_numarraymodule.c 5 Jan 2005 19:49:02 -0000 1.65 --- Src/_numarraymodule.c 31 Jan 2005 16:36:47 -0000 *************** *** 105,128 **** } static PyObject * ! _numarray_shadows_get(PyArrayObject *self) { ! if (self->_shadows) { ! Py_INCREF(self->_shadows); ! return self->_shadows; ! } else { ! Py_INCREF(Py_None); ! return Py_None; } } ! static int ! _numarray_shadows_set(PyArrayObject *self, PyObject *s) { ! Py_XDECREF(self->_shadows); ! if (s) Py_INCREF(s); ! self->_shadows = s; ! return 0; } static PyObject * --- 105,138 ---- } static PyObject * ! _numarray_new(PyTypeObject *type, PyObject *args, PyObject *kwds) { ! PyArrayObject *self; ! self = (PyArrayObject *) ! _numarray_type.tp_base->tp_new(type, args, kwds); ! if (!self) return NULL; ! if (!(self->descr = PyArray_DescrFromType( tAny))) { ! PyErr_Format(PyExc_RuntimeError, ! "_numarray_new: bad type number"); ! return NULL; } + return (PyObject *) self; } ! static void ! _numarray_dealloc(PyObject *self) { ! PyArrayObject *me = (PyArrayObject *) self; ! Py_INCREF(self); ! if (me->_shadows) { ! PyObject *result = PyObject_CallMethod(me->_shadows, ! "_copyFrom", "(O)", self); ! Py_XDECREF(result); /* Should be None. */ ! Py_DECREF(me->_shadows); ! me->_shadows = NULL; ! } ! self->ob_refcnt = 0; ! _numarray_type.tp_base->tp_dealloc(self); } static PyObject * *************** *** 218,226 **** } static PyGetSetDef _numarray_getsets[] = { - {"_shadows", - (getter)_numarray_shadows_get, - (setter) _numarray_shadows_set, "numeric shadows object"}, {"_type", (getter)_numarray_type_get, (setter) _numarray_type_set, "numeric type object"}, --- 228,233 ---- *************** *** 418,424 **** "numarray._numarray._numarray", sizeof(PyArrayObject), 0, ! 0, /* tp_dealloc */ 0, /* tp_print */ 0, /* tp_getattr */ 0, /* tp_setattr */ --- 425,431 ---- "numarray._numarray._numarray", sizeof(PyArrayObject), 0, ! _numarray_dealloc, /* tp_dealloc */ 0, /* tp_print */ 0, /* tp_getattr */ 0, /* tp_setattr */ *************** *** 452,458 **** 0, /* tp_dictoffset */ (initproc)_numarray_init, /* tp_init */ 0, /* tp_alloc */ ! 0, /* tp_new */ }; typedef void Sigfunc(int); --- 459,465 ---- 0, /* tp_dictoffset */ (initproc)_numarray_init, /* tp_init */ 0, /* tp_alloc */ ! _numarray_new, /* tp_new */ }; typedef void Sigfunc(int); From rays at blue-cove.com Mon Jan 31 16:58:29 2005 From: rays at blue-cove.com (Ray S) Date: Mon Jan 31 16:58:29 2005 Subject: [Numpy-discussion] Numeric array's actual data address? Message-ID: <5.2.0.4.2.20050131165659.12d54640@blue-cove.com> If I have an array N: >>> N = Numeric.zeros((1000,), Numeric.Float) >>> repr(N.__copy__) '' What is the actual address of the first element? Or, as an offset from the object? numarray gives us that: >>> N = numarray.zeros((1000,), numarray.Float) >>> N.info() class: shape: (1000,) strides: (8,) byteoffset: 0 bytestride: 8 itemsize: 8 aligned: 1 contiguous: 1 data: byteorder: little byteswap: 0 type: Float64 In numarray, the offset appears to be 20. If I try to use memmove() to fill a Numeric array it faults when using an offset of 20... Ray From simon at arrowtheory.com Mon Jan 31 17:21:23 2005 From: simon at arrowtheory.com (Simon Burton) Date: Mon Jan 31 17:21:23 2005 Subject: [Numpy-discussion] pyrex numarray Message-ID: <20050201121626.6150f9ff.simon@arrowtheory.com> Has anyone considered using pyrex to implement numarray ? I don't know a lot of the details but it seems to me that pyrex could unify numarray's python/c source code mixture and smooth the transition from python (==untyped pyrex) code to c (==typed pyrex) code. It would also help clueless users like me understand, and perhaps contribute to, the codebase. Also, the pyrex people have talked about applying this technique to the standard python libraries, both for readability and speed. ciao, Simon. From faltet at carabos.com Sat Jan 1 14:01:13 2005 From: faltet at carabos.com (Francesc Altet) Date: Sat Jan 1 14:01:13 2005 Subject: [Numpy-discussion] Padding policy in CharArrays Message-ID: <200501012244.23474.faltet@carabos.com> Hi, I'm experiencing some problems derived from the fact that CharArrays in numarray are padded with spaces. That leads to somewhat curious consequences like this: In [180]: a=strings.array(None, itemsize = 4, shape=1) In [181]: a[0] = '0' In [182]: a >= '0\x00\x00\x00\x01' Out[182]: array([1], type=Bool) # Incorrect but... In [183]: a[0] >= '0\x00\x00\x00\x01' Out[183]: False # correct While this is not a bug (see the padding policy for chararrays) I think it would be much better to use '\0x00' as default padding. Would be any problem with that?. If yes, well, I've found a workaround on this, but quite inelegant I'm afraid :-/ Have a Happy New Year! -- Francesc Altet ? >qo< ? http://www.carabos.com/ C?rabos Coop. V. ? V ?V ? Enjoy Data ? ? ? ? ? ? ? ? ? ? "" From jmiller at stsci.edu Mon Jan 3 08:41:27 2005 From: jmiller at stsci.edu (Todd Miller) Date: Mon Jan 3 08:41:27 2005 Subject: [Numpy-discussion] Padding policy in CharArrays In-Reply-To: <200501012244.23474.faltet@carabos.com> References: <200501012244.23474.faltet@carabos.com> Message-ID: <1104770406.26038.118.camel@halloween.stsci.edu> On Sat, 2005-01-01 at 16:44, Francesc Altet wrote: > Hi, > > I'm experiencing some problems derived from the fact that CharArrays in > numarray are padded with spaces. That leads to somewhat curious consequences > like this: > > In [180]: a=strings.array(None, itemsize = 4, shape=1) > In [181]: a[0] = '0' > In [182]: a >= '0\x00\x00\x00\x01' > Out[182]: array([1], type=Bool) # Incorrect > > but... > > In [183]: a[0] >= '0\x00\x00\x00\x01' > Out[183]: False # correct > > While this is not a bug (see the padding policy for chararrays) I think it > would be much better to use '\0x00' as default padding. Would be any problem > with that?. The design intent of numarray.strings was that you could use RawCharArray, the baseclass of CharArray, for NULL padded arrays. I tried it out like this: >>> a=strings.array(None, itemsize = 4, shape=1, kind=strings.RawCharArray) >>> a[0] = '0\0\0\0' >>> print repr(a >= '0\x00\x00\x00\x01') array([0], type=Bool) You'll note that I "hand padded" the assigned value; because RawCharArray is a little used feature, it needs more work. I think RawCharArray either makes partial/inconsistent use of NULL padding. > If yes, well, I've found a workaround on this, but quite > inelegant I'm afraid :-/ Give RawCharArray a try; it *is* the basis of CharArray, so it basically works but there will likely be a few issues to sort out. My guess is that anything that really needs fixing can be added for numarray-1.2. Regards, Todd From faltet at carabos.com Mon Jan 3 13:26:27 2005 From: faltet at carabos.com (Francesc Altet) Date: Mon Jan 3 13:26:27 2005 Subject: [Numpy-discussion] Padding policy in CharArrays In-Reply-To: <1104770406.26038.118.camel@halloween.stsci.edu> References: <200501012244.23474.faltet@carabos.com> <1104770406.26038.118.camel@halloween.stsci.edu> Message-ID: <200501032225.15238.faltet@carabos.com> A Dilluns 03 Gener 2005 17:40, Todd Miller va escriure: > >>> a=strings.array(None, itemsize = 4, shape=1, kind=strings.RawCharArray) > >>> a[0] = '0\0\0\0' > >>> print repr(a >= '0\x00\x00\x00\x01') > array([0], type=Bool) > > You'll note that I "hand padded" the assigned value; because > RawCharArray is a little used feature, it needs more work. I think > RawCharArray either makes partial/inconsistent use of NULL padding. Well, I've already tried that, but what I would like is to have the possibility to assign values *and* padding with NULL values. However, using a RawCharArray does not allow this: >>> a=strings.array(None, itemsize = 4, shape=1, kind=strings.RawCharArray) >>> a RawCharArray([' ']) >>> a[0] = str(0) Traceback (most recent call last): File "", line 1, in ? File "/usr/local/lib/python2.4/site-packages/numarray/strings.py", line 185, in _setitem where[bo:bo+self._itemsize] = self.pad(value)[0:self._itemsize] TypeError: right operand length must match slice length > Give RawCharArray a try; it *is* the basis of CharArray, so it > basically works but there will likely be a few issues to sort out. My > guess is that anything that really needs fixing can be added for > numarray-1.2. Mmm, perhaps having the possibility to select the pad value in CharArray creation time would be nice. Cheers, -- Francesc Altet ? >qo< ? http://www.carabos.com/ C?rabos Coop. V. ? V ?V ? Enjoy Data ? ? ? ? ? ? ? ? ? ? "" From simon at arrowtheory.com Mon Jan 3 14:42:30 2005 From: simon at arrowtheory.com (Simon Burton) Date: Mon Jan 3 14:42:30 2005 Subject: [Numpy-discussion] Matlab/Numeric/numarray benchmarks Message-ID: <41D9CA01.9040108@arrowtheory.com> Hi all, Here are some benchmarks, measured in seconds on a 1.5GHz celeron. Each test does a matrix add (1000x1000), mul (1000x1000) and eigenvalue find (500x500). Matlab: 0.0562 1.5180 3.7630 Numeric: 0.0962309813499 1.73247330189 3.72153270245 numarray: 7.17220497131 19.3960719109 5.72376401424 I have attached the code. Looks like numarray is way behind on the basic linear algebra stuff. We have (so far) chosen to go with numarray for our scientific computations, but will be needing fast add/multiply. I am surmising that these methods just have not been pluged into the native BLAS/ATLAS lib. We will also be needing other solvers from LAPACK such as dpotrf/dposv, and some of the special functions (bessel) already implemented in scipy. I understand that Todd is working on scipy/numarray compatability. How is that progressing, and what should we be doing to get the above functionality into numarray ? I have been pokeing around the code already, and am able to help out with that. bye for now, Simon. -- Simon Burton, B.Sc. Licensed PO Box 8066 ANU Canberra 2601 Australia Ph. 61 02 6249 6940 http://arrowtheory.com -------------- next part -------------- A non-text attachment was scrubbed... Name: benchmark.py Type: text/x-python Size: 1786 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: bench.m Type: text/x-objcsrc Size: 272 bytes Desc: not available URL: From rbastian at club-internet.fr Tue Jan 4 05:21:16 2005 From: rbastian at club-internet.fr (=?iso-8859-15?q?Ren=E9=20Bastian?=) Date: Tue Jan 4 05:21:16 2005 Subject: [Numpy-discussion] install, numpy Message-ID: <05010405560602.00761@rbastian> Hi, I tried "python setup.py install" (Python2.4) in order to get Numeric-23.6 messages : running install running build running build_py running build_ext building 'lapack_lite' extension gcc -pthread -shared build/temp.linux-i686-2.4/Src/lapack_litemodule.o -L/usr/lib/atlas -llapack -lcblas -lf77blas -latlas -lg2c -o build/lib.linux-i686-2.4/lapack_lite.so error : /usr/i486-suse-linux/bin/ld: cannot find -llapack collect2: ld returned 1 exit status error: command 'gcc' failed with exit status 1 Please, what is missing ? -- Ren? Bastian http://www.musiques-rb.org : Musique en Python From Chris.Barker at noaa.gov Tue Jan 4 09:30:30 2005 From: Chris.Barker at noaa.gov (Chris Barker) Date: Tue Jan 4 09:30:30 2005 Subject: [Numpy-discussion] install, numpy In-Reply-To: <05010405560602.00761@rbastian> References: <05010405560602.00761@rbastian> Message-ID: <41DAD148.9090703@noaa.gov> Ren? Bastian wrote: > I tried "python setup.py install" (Python2.4) in order to get Numeric-23.6 > /usr/i486-suse-linux/bin/ld: cannot find -llapack AARRGG!! I can't believe this bug is still there! Who is responsible for maintaining the setup.py for Numeric? This has been discussed numerous times on this list, does there need to be a bug report officially filed somewhere? Anyway, the problem is that it's looking for lapack libs that you don't have. By default, setup.py is supposed to be configured to use only the built-in lapack-lite, so it should build anywhere. I've looked in the setup.py, and found that it's closer, but apparently not fixed. I've got lapack on my system, so it's hard for me to test, but try making these changes in setup.py: # delete all but the first one in this list if using your own LAPACK/BLAS #This looks to be right: sourcelist = [os.path.join('Src', 'lapack_litemodule.c'), # os.path.join('Src', 'blas_lite.c'), # os.path.join('Src', 'f2c_lite.c'), # os.path.join('Src', 'zlapack_lite.c'), # os.path.join('Src', 'dlapack_lite.c') ] # set these to use your own BLAS; #library_dirs_list = ['/usr/lib/atlas'] library_dirs_list = [] #libraries_list = ['lapack', 'cblas', 'f77blas', 'atlas', 'g2c'] # if you also set `use_dotblas` (see below), you'll need: # ['lapack', 'cblas', 'f77blas', 'atlas', 'g2c'] libraries_list = [] # set to true (1), if you also want BLAS optimized #matrixmultiply/dot/innerproduct #use_dotblas = 1 use_dotblas = 0 #include_dirs = ['/usr/include/atlas'] # You may need to set this to include_dirs = [] find cblas.h # e.g. on UNIX using ATLAS this should be ['/usr/include/atlas'] Note that some of those may be harmless, even if they don't exist, but it won't hurt to get rid of paths you don't have anyway. Also, if you are doing any linear algebra, you'll get much better performance with a native lapack, such as the atlas one, so you may want to get that installed, rather than making this fix. search this list for lapack and/or atlas, to learn how. Suse is likely to provide an atlas rpm. -Chris -- Christopher Barker, Ph.D. Oceanographer NOAA/OR&R/HAZMAT (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From rbastian at club-internet.fr Tue Jan 4 09:51:27 2005 From: rbastian at club-internet.fr (=?iso-8859-15?q?Ren=E9=20Bastian?=) Date: Tue Jan 4 09:51:27 2005 Subject: [Numpy-discussion] install, numpy In-Reply-To: <41DAD148.9090703@noaa.gov> References: <05010405560602.00761@rbastian> <41DAD148.9090703@noaa.gov> Message-ID: <05010410263100.00761@rbastian> Thanks! Now it work's. I will see if numpy is faster than numarray for my music/audio business. Le Mardi 4 Janvier 2005 18:24, Chris Barker a ?crit : > Ren? Bastian wrote: > > I tried "python setup.py install" (Python2.4) in order to get > > Numeric-23.6 > > > > /usr/i486-suse-linux/bin/ld: cannot find -llapack > > AARRGG!! I can't believe this bug is still there! Who is responsible for > maintaining the setup.py for Numeric? This has been discussed numerous > times on this list, does there need to be a bug report officially filed > somewhere? > > Anyway, the problem is that it's looking for lapack libs that you don't > have. By default, setup.py is supposed to be configured to use only the > built-in lapack-lite, so it should build anywhere. > > I've looked in the setup.py, and found that it's closer, but apparently > not fixed. I've got lapack on my system, so it's hard for me to test, > but try making these changes in setup.py: > > # delete all but the first one in this list if using your own LAPACK/BLAS > > #This looks to be right: > > sourcelist = [os.path.join('Src', 'lapack_litemodule.c'), > # os.path.join('Src', 'blas_lite.c'), > # os.path.join('Src', 'f2c_lite.c'), > # os.path.join('Src', 'zlapack_lite.c'), > # os.path.join('Src', 'dlapack_lite.c') > ] > # set these to use your own BLAS; > > #library_dirs_list = ['/usr/lib/atlas'] > library_dirs_list = [] > #libraries_list = ['lapack', 'cblas', 'f77blas', 'atlas', 'g2c'] > # if you also set `use_dotblas` (see below), you'll > need: > # ['lapack', 'cblas', 'f77blas', 'atlas', 'g2c'] > > libraries_list = [] > > # set to true (1), if you also want BLAS optimized > #matrixmultiply/dot/innerproduct > #use_dotblas = 1 > use_dotblas = 0 > #include_dirs = ['/usr/include/atlas'] # You may need to set this to > include_dirs = [] > > > > Note that some of those may be harmless, even if they don't exist, but > it won't hurt to get rid of paths you don't have anyway. > > Also, if you are doing any linear algebra, you'll get much better > performance with a native lapack, such as the atlas one, so you may want > to get that installed, rather than making this fix. search this list for > lapack and/or atlas, to learn how. > > Suse is likely to provide an atlas rpm. > > -Chris -- Ren? Bastian http://www.musiques-rb.org : Musique en Python From stephen.walton at csun.edu Tue Jan 4 10:11:27 2005 From: stephen.walton at csun.edu (Stephen Walton) Date: Tue Jan 4 10:11:27 2005 Subject: [Numpy-discussion] install, numpy In-Reply-To: <41DAD148.9090703@noaa.gov> References: <05010405560602.00761@rbastian> <41DAD148.9090703@noaa.gov> Message-ID: <41DADC0C.3000501@csun.edu> Chris Barker wrote: > Suse is likely to provide an atlas rpm. How? I've struggled with trying to create ATLAS RPMs in order to maintain it locally, but the rpm program doesn't have the hooks to distinguish between the various architectures ATLAS supports; it distinguishes between Athlon and Pentium, for example, and an ATLAS library built on the former core dumps on the latter. It would also be nice if ATLAS could be made a shared library, but that's also not supported at this time. Without it, Numeric and numarray built against ATLAS inherit its architecture dependence. It makes maintaining all of this at our site a real pain. From jbrandmeyer at earthlink.net Tue Jan 4 10:50:48 2005 From: jbrandmeyer at earthlink.net (Jonathan Brandmeyer) Date: Tue Jan 4 10:50:48 2005 Subject: [Numpy-discussion] install, numpy In-Reply-To: <41DADC0C.3000501@csun.edu> References: <05010405560602.00761@rbastian> <41DAD148.9090703@noaa.gov> <41DADC0C.3000501@csun.edu> Message-ID: <1104864489.26580.15.camel@illuvatar> On Tue, 2005-01-04 at 10:10 -0800, Stephen Walton wrote: > Chris Barker wrote: > > > Suse is likely to provide an atlas rpm. > > How? I've struggled with trying to create ATLAS RPMs in order to > maintain it locally, but the rpm program doesn't have the hooks to > distinguish between the various architectures ATLAS supports; it > distinguishes between Athlon and Pentium, for example, and an ATLAS > library built on the former core dumps on the latter. > > It would also be nice if ATLAS could be made a shared library, but > that's also not supported at this time. ATLAS is built as a shared library in Debian, named libatlas.so.3 and libblas.so.3. > Without it, Numeric and > numarray built against ATLAS inherit its architecture dependence. It > makes maintaining all of this at our site a real pain. In Debian there are several packages that "Provide" the atlas shared libraries (atlas3-base, atlas3-sse atlas3-sse2 atlas3-3dnow). A dependent package would "Depend" on the generic name. I don't know anything about how RPM's support for "Provides" works, but I would assume that you can manage something similar. The preinstall scripts for each one verify that the CPU supports the instructions used in the package. HTH, -Jonathan From rbastian at club-internet.fr Tue Jan 4 13:35:07 2005 From: rbastian at club-internet.fr (=?iso-8859-1?q?Ren=E9=20Bastian?=) Date: Tue Jan 4 13:35:07 2005 Subject: [Numpy-discussion] Matlab/Numeric/numarray benchmarks In-Reply-To: <41D9CA01.9040108@arrowtheory.com> References: <41D9CA01.9040108@arrowtheory.com> Message-ID: <05010414075901.00761@rbastian> Hm, Numeric is installed but : Traceback (most recent call last): File "benchmark.py", line 93, in ? test00() File "benchmark.py", line 8, in test00 import RandomArray as RA File "/usr/local/lib/python2.4/site-packages/Numeric/RandomArray.py", line 3, in ? import LinearAlgebra File "/usr/local/lib/python2.4/site-packages/Numeric/LinearAlgebra.py", line 8, in ? import lapack_lite ImportError: /usr/local/lib/python2.4/site-packages/Numeric/lapack_lite.so: undefined symbol: dgesdd_ Something wrong ? -- Ren? Bastian http://www.musiques-rb.org : Musique en Python From perry at stsci.edu Tue Jan 4 13:54:28 2005 From: perry at stsci.edu (Perry Greenfield) Date: Tue Jan 4 13:54:28 2005 Subject: [Numpy-discussion] Matlab/Numeric/numarray benchmarks In-Reply-To: <41D9CA01.9040108@arrowtheory.com> References: <41D9CA01.9040108@arrowtheory.com> Message-ID: <573FE986-5E9B-11D9-BF21-000A95B68E50@stsci.edu> On Jan 3, 2005, at 5:41 PM, Simon Burton wrote: > I understand that Todd is working on scipy/numarray compatability. How > is that progressing, and what should we be doing to get the above > functionality into numarray ? I have been pokeing around the code > already, and am able to help out with that. > Todd has done the first phase of making changes to numarray to handle generalized ufuncs (the area of greatest incompatibility) and also make the necessary changes to scipy_base to support both numarray and Numeric. We are currently waiting for some feedback on the acceptability of these changes so we can continue modifying the rest of scipy to support numarray. So it is going ahead. I figure that once these changes are accepted and an agreement is reached on how setup.py should handle dual builds that anyone should be able to contribute to the porting effort. I hope that can happen in a few weeks. I don't know if that is quick enough for your needs. But you are welcome to take a look at what Todd has already checked into CVS in scipy (as a branch). Perry Greenfield From Chris.Barker at noaa.gov Tue Jan 4 15:33:54 2005 From: Chris.Barker at noaa.gov (Chris Barker) Date: Tue Jan 4 15:33:54 2005 Subject: [Numpy-discussion] install, numpy In-Reply-To: <41DADC0C.3000501@csun.edu> References: <05010405560602.00761@rbastian> <41DAD148.9090703@noaa.gov> <41DADC0C.3000501@csun.edu> Message-ID: <41DB2667.8050908@noaa.gov> Stephen Walton wrote: > How? I've struggled with trying to create ATLAS RPMs in order to > maintain it locally, but the rpm program doesn't have the hooks to > distinguish between the various architectures ATLAS supports; it > distinguishes between Athlon and Pentium, for example, and an ATLAS > library built on the former core dumps on the latter. sorry, I can't help here. I'm running Gentoo, which has a "compile everything yourself" philosophy! Ren? Bastian wrote: > Numeric is installed but : > > Traceback (most recent call last): > File "benchmark.py", line 93, in ? > test00() > File "benchmark.py", line 8, in test00 > import RandomArray as RA > File "/usr/local/lib/python2.4/site-packages/Numeric/RandomArray.py", line > 3, in ? > import LinearAlgebra > File "/usr/local/lib/python2.4/site-packages/Numeric/LinearAlgebra.py", > line 8, in ? > import lapack_lite > ImportError: /usr/local/lib/python2.4/site-packages/Numeric/lapack_lite.so: > undefined symbol: dgesdd_ > > Something wrong ? Sorry, I'm kind of out of my depth here, but one thing I would try is to trash the build directory of Numeric, and build again, just to make sure you're re-building everything. You might want to delete the /usr/lib/python2.3/site-packages/Numeric Directory too, before installing. - Chris -- Christopher Barker, Ph.D. Oceanographer NOAA/OR&R/HAZMAT (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From NadavH at VisionSense.com Wed Jan 5 02:31:00 2005 From: NadavH at VisionSense.com (Nadav Horesh) Date: Wed Jan 5 02:31:00 2005 Subject: [Numpy-discussion] A bug in numarray sign function. Message-ID: <41DBC107.7070702@VisionSense.com> in numarraycore.py line 1504 should be changed from return zeros(shape(m))-less(m,0)+greater(m,0) to return zeros(shape(m))-ufunc.less(m,0)+ufunc.greater(m,0) otherwise sign function raises an error: /usr/local/lib/python2.4/site-packages/numarray/numarraycore.py in sign(m) 1502 """ 1503 m = asarray(m) -> 1504 return zeros(shape(m))-less(m,0)+greater(m,0) 1505 1506 def alltrue(array, axis=0): NameError: global name 'less' is not defined Nadav. From jmiller at stsci.edu Wed Jan 5 06:11:31 2005 From: jmiller at stsci.edu (Todd Miller) Date: Wed Jan 5 06:11:31 2005 Subject: [Numpy-discussion] Matlab/Numeric/numarray benchmarks In-Reply-To: <41D9CA01.9040108@arrowtheory.com> References: <41D9CA01.9040108@arrowtheory.com> Message-ID: <1104934210.31516.1563.camel@halloween.stsci.edu> Hi Simon, I found a benchmark bug which explains the performance difference in +. Here are my times with the modified benchmark (Python-2.4 gcc version 3.2.2 20030222 (Red Hat Linux 3.2.2-5) on 1.7 GHz P-IV w/ 2G): numarray + : 0.0540893316269 numarray matrixmultiply : 16.9448821545 numarray eigenvalues : 9.67254910469 Numeric + : 0.0653991508484 Numeric matrixmultiply : 33.0565470934 Numeric eigenvalues : 9.44225819111 So, for large arrays with a simple to install / built-in linear algebra system, numarray is doing just fine. Looking at your results, I think you may have been comparing numarray built using a built-in blas_lite versus Numeric using ATLAS. There, I think numarray *is* behind but that is fixable with some effort. The key is porting and integrating the Numeric dotblas package with numarray. I've been looking at that some today... err, yesterday, apparently I forgot to hit "send". On Mon, 2005-01-03 at 17:41, Simon Burton wrote: > Hi all, > > Here are some benchmarks, measured in seconds on a 1.5GHz celeron. > Each test does a matrix add (1000x1000), mul (1000x1000) and eigenvalue > find (500x500). > > Matlab: > 0.0562 > 1.5180 > 3.7630 > > Numeric: > 0.0962309813499 > 1.73247330189 > 3.72153270245 > > numarray: > 7.17220497131 > 19.3960719109 > 5.72376401424 > > I understand that Todd is working on scipy/numarray compatability. How > is that progressing, and what should we be doing to get the above > functionality into numarray ? Perry already addressed this. > I have been pokeing around the code > already, and am able to help out with that. Regards, Todd -------------- next part -------------- A non-text attachment was scrubbed... Name: benchmark.py Type: text/x-python Size: 2055 bytes Desc: not available URL: From stephen.walton at csun.edu Wed Jan 5 09:16:05 2005 From: stephen.walton at csun.edu (Stephen Walton) Date: Wed Jan 5 09:16:05 2005 Subject: [Numpy-discussion] A bug in numarray sign function. In-Reply-To: <41DBC107.7070702@VisionSense.com> References: <41DBC107.7070702@VisionSense.com> Message-ID: <41DC208F.5060605@csun.edu> Nadav Horesh wrote: > in numarraycore.py line 1504 should be changed from > > return zeros(shape(m))-less(m,0)+greater(m,0) What version of numarray are you running? I don't see this code in version 1.1.1. (I did a 'grep zeros(shape(m)) *.py' in /usr/lib/python2.3/site-packages/numarray and got no matches; line 1504 of my copy of numarraycore.py doesn't look anything like the above. From stephen.walton at csun.edu Wed Jan 5 09:34:28 2005 From: stephen.walton at csun.edu (Stephen Walton) Date: Wed Jan 5 09:34:28 2005 Subject: [Numpy-discussion] Matlab/Numeric/numarray benchmarks In-Reply-To: <1104934210.31516.1563.camel@halloween.stsci.edu> References: <41D9CA01.9040108@arrowtheory.com> <1104934210.31516.1563.camel@halloween.stsci.edu> Message-ID: <41DC24DD.10002@csun.edu> Todd, I ran your updated benchmark on a Dell Precision 350 (P4 at 2.26GHz) with numarray 1.1.1 and Numeric 23.6 both built against ATLAS. My results were: numarray + : 0.026392891407 numarray matrixmultiply : 4.37110900879 numarray eigenvalues : 2.95166471004 Numeric + : 0.0369043111801 Numeric matrixmultiply : 0.69968931675 Numeric eigenvalues : 2.81557621956 Might there still be a matrixmultiply problem somewhere? From jmiller at stsci.edu Wed Jan 5 09:54:03 2005 From: jmiller at stsci.edu (Todd Miller) Date: Wed Jan 5 09:54:03 2005 Subject: [Numpy-discussion] Matlab/Numeric/numarray benchmarks In-Reply-To: <41DC24DD.10002@csun.edu> References: <41D9CA01.9040108@arrowtheory.com> <1104934210.31516.1563.camel@halloween.stsci.edu> <41DC24DD.10002@csun.edu> Message-ID: <1104947578.31516.1752.camel@halloween.stsci.edu> On Wed, 2005-01-05 at 12:33, Stephen Walton wrote: > Todd, I ran your updated benchmark on a Dell Precision 350 (P4 at > 2.26GHz) with numarray 1.1.1 and Numeric 23.6 both built against ATLAS. > My results were: > > numarray + : 0.026392891407 > numarray matrixmultiply : 4.37110900879 > numarray eigenvalues : 2.95166471004 > > Numeric + : 0.0369043111801 > Numeric matrixmultiply : 0.69968931675 > Numeric eigenvalues : 2.81557621956 > > Might there still be a matrixmultiply problem somewhere? That's what "dotblas" does; it replaces matrixmultiply() and innerproduct() with versions which are dependent on a laundry list of numerical libraries. Numeric has dotblas, numarray doesn't. I'm looking into it now. The port itself is trivial, but integrating it into the numarray package structure has my head spinning a little. Regards, Todd From simon at arrowtheory.com Wed Jan 5 16:25:00 2005 From: simon at arrowtheory.com (Simon Burton) Date: Wed Jan 5 16:25:00 2005 Subject: [Numpy-discussion] Matlab/Numeric/numarray benchmarks In-Reply-To: <1104934210.31516.1563.camel@halloween.stsci.edu> References: <41D9CA01.9040108@arrowtheory.com> <1104934210.31516.1563.camel@halloween.stsci.edu> Message-ID: <41DC84FA.1090703@arrowtheory.com> Todd Miller wrote: >Hi Simon, > >I found a benchmark bug which explains the performance difference in +. >Here are my times with the modified benchmark (Python-2.4 gcc version >3.2.2 20030222 (Red Hat Linux 3.2.2-5) on 1.7 GHz P-IV w/ 2G): > >numarray + : 0.0540893316269 >numarray matrixmultiply : 16.9448821545 >numarray eigenvalues : 9.67254910469 > >Numeric + : 0.0653991508484 >Numeric matrixmultiply : 33.0565470934 >Numeric eigenvalues : 9.44225819111 > >So, for large arrays with a simple to install / built-in linear algebra >system, numarray is doing just fine. > >Looking at your results, I think you may have been comparing numarray >built using a built-in blas_lite versus Numeric using ATLAS. There, I >think numarray *is* behind but that is fixable with some effort. The >key is porting and integrating the Numeric dotblas package with >numarray. I've been looking at that some today... err, yesterday, >apparently I forgot to hit "send". > > Wow, those results look great, Todd. I have double checked my install. However, the numarray multiply is still x10 slower. This is set in addons: lapack_libs = ['lapack', 'f77blas', 'cblas', 'atlas', 'blas'] and, at runtime, python has loaded: 40771000-40cb6000 r-xp 00000000 00:0c 783245 /usr/lib/atlas/liblapack.so.3.0 40cb6000-40cb9000 rw-p 00545000 00:0c 783245 /usr/lib/atlas/liblapack.so.3.0 40cb9000-40dbd000 rw-p 00000000 00:00 0 40dbd000-40dd7000 r-xp 00000000 00:0c 783242 /usr/lib/libf77blas.so.3.0 40dd7000-40dd8000 rw-p 00019000 00:0c 783242 /usr/lib/libf77blas.so.3.0 40dd8000-40df7000 r-xp 00000000 00:0c 783241 /usr/lib/libcblas.so.3.0 40df7000-40df8000 rw-p 0001e000 00:0c 783241 /usr/lib/libcblas.so.3.0 40df8000-4110b000 r-xp 00000000 00:0c 783240 /usr/lib/libatlas.so.3.0 4110b000-4110f000 rw-p 00312000 00:0c 783240 /usr/lib/libatlas.so.3.0 4110f000-41454000 r-xp 00000000 00:0c 783244 /usr/lib/atlas/libblas.so.3.0 41454000-41458000 rw-p 00345000 00:0c 783244 /usr/lib/atlas/libblas.so.3.0 41458000-41472000 r-xp 00000000 00:0c 17227 /usr/lib/libg2c.so.0.0.0 41472000-41473000 rw-p 0001a000 00:0c 17227 /usr/lib/libg2c.so.0.0.0 ( as well as lapack_lite2.so ) So I assumed that since the eigenvalues came in fast ATLAS was alive and well. Also, the above libs are exactly what Numeric uses: 4033b000-40880000 r-xp 00000000 00:0c 783245 /usr/lib/atlas/liblapack.so.3.0 40880000-40883000 rw-p 00545000 00:0c 783245 /usr/lib/atlas/liblapack.so.3.0 40883000-40987000 rw-p 00000000 00:00 0 40987000-409a6000 r-xp 00000000 00:0c 783241 /usr/lib/libcblas.so.3.0 409a6000-409a7000 rw-p 0001e000 00:0c 783241 /usr/lib/libcblas.so.3.0 409a7000-409c1000 r-xp 00000000 00:0c 783242 /usr/lib/libf77blas.so.3.0 409c1000-409c2000 rw-p 00019000 00:0c 783242 /usr/lib/libf77blas.so.3.0 409c2000-40cd5000 r-xp 00000000 00:0c 783240 /usr/lib/libatlas.so.3.0 40cd5000-40cd9000 rw-p 00312000 00:0c 783240 /usr/lib/libatlas.so.3.0 40cd9000-40cf3000 r-xp 00000000 00:0c 17227 /usr/lib/libg2c.so.0.0.0 40cf3000-40cf4000 rw-p 0001a000 00:0c 17227 /usr/lib/libg2c.so.0.0.0 40cf4000-40cf7000 rw-p 00000000 00:00 0 40cf7000-4103c000 r-xp 00000000 00:0c 783244 /usr/lib/atlas/libblas.so.3.0 4103c000-41040000 rw-p 00345000 00:0c 783244 /usr/lib/atlas/libblas.so.3.0 Any ideas ? It's not even clear to me where the matrixmultiply is taking place. I couldn't find it in my lapack_lite2.so even though there seems to be a lite version of dgemm in the source. But then dgemm is not referenced anywhere else in the numarray source. Comeing from the other end, I traced matrixmultiply to an _ipFloat64. But then the trail went cold again :) Flumoxed. Simon. -- Simon Burton, B.Sc. Licensed PO Box 8066 ANU Canberra 2601 Australia Ph. 61 02 6249 6940 http://arrowtheory.com From stephen.walton at csun.edu Wed Jan 5 17:59:12 2005 From: stephen.walton at csun.edu (Stephen Walton) Date: Wed Jan 5 17:59:12 2005 Subject: [Numpy-discussion] Matlab/Numeric/numarray benchmarks In-Reply-To: <41DC84FA.1090703@arrowtheory.com> References: <41D9CA01.9040108@arrowtheory.com> <1104934210.31516.1563.camel@halloween.stsci.edu> <41DC84FA.1090703@arrowtheory.com> Message-ID: <41DC9B35.2070801@csun.edu> Simon Burton wrote: > I have double checked my install. However, the numarray multiply is > still x10 slower. > > This is set in addons: > lapack_libs = ['lapack', 'f77blas', 'cblas', 'atlas', 'blas'] One more thing: did you set the environment variable USE_LAPACK before building; i.e., env USE_LAPACK=1 python setup.py build in your numarray directory? Just checking. From simon at arrowtheory.com Wed Jan 5 21:15:02 2005 From: simon at arrowtheory.com (Simon Burton) Date: Wed Jan 5 21:15:02 2005 Subject: [Numpy-discussion] Matlab/Numeric/numarray benchmarks In-Reply-To: <41DC9B35.2070801@csun.edu> References: <41D9CA01.9040108@arrowtheory.com> <1104934210.31516.1563.camel@halloween.stsci.edu> <41DC84FA.1090703@arrowtheory.com> <41DC9B35.2070801@csun.edu> Message-ID: <20050106160431.0e910010.simon@arrowtheory.com> On Wed, 05 Jan 2005 17:58:13 -0800 Stephen Walton wrote: > One more thing: did you set the environment variable USE_LAPACK before > building; i.e., > > env USE_LAPACK=1 python setup.py build > > in your numarray directory? Just checking. > yes, i also checked it using a print statement. Simon. -- Simon Burton, B.Sc. Licensed PO Box 8066 ANU Canberra 2601 Australia Ph. 61 02 6249 6940 http://arrowtheory.com From NadavH at VisionSense.com Thu Jan 6 04:26:29 2005 From: NadavH at VisionSense.com (Nadav Horesh) Date: Thu Jan 6 04:26:29 2005 Subject: [Numpy-discussion] A bug in numarray sign function. In-Reply-To: <41DC208F.5060605@csun.edu> References: <41DBC107.7070702@VisionSense.com> <41DC208F.5060605@csun.edu> Message-ID: <41DD2D8F.8000803@VisionSense.com> >>> print numarray.__version__ 1.2a It is from the CVS repositoty Nadav Stephen Walton wrote: > Nadav Horesh wrote: > >> in numarraycore.py line 1504 should be changed from >> >> return zeros(shape(m))-less(m,0)+greater(m,0) > > > What version of numarray are you running? I don't see this code in > version 1.1.1. (I did a 'grep zeros(shape(m)) *.py' in > /usr/lib/python2.3/site-packages/numarray and got no matches; line > 1504 of my copy of numarraycore.py doesn't look anything like the above. > > From southey at uiuc.edu Thu Jan 6 06:31:34 2005 From: southey at uiuc.edu (Bruce Southey) Date: Thu Jan 6 06:31:34 2005 Subject: [Numpy-discussion] Matlab/Numeric/numarray benchmarks Message-ID: Hi, While on the subject on benchmarks, I thought I would point out an really excellent book by Hans Petter Langtangen's book: 'Python Scripting for computational science' (Springer, 2004: http://www.springeronline.com/sgw/cda/frontpage/0,0,4-0-22-17627636-0,0.html ). The book web site is http://folk.uio.no/hpl/scripting/ that also has the scripts. There is considerable detailed material on using Numeric and numarray as well as using Python callbacks from C/C++ and Fortran. Also addresses GUI programming and other topics in Python including regular expressions. One of the really great things about this book is the discussion on how to improve code with reference to a single example called gridloop. Gridloop just evaluates a function (the actual function used was 'sin(x,y) + 8*x') over a rectangular grid and stores the results in an array. There are well over 25 versions from using straight C, Fortran and C++ to using Python and Numerical Python. These benchmarks are on different ways to implement this gridloop function in Fortran, C/C++, numarray, Numeric and Python callbacks from C/C++ and Fortran. In the vectorized form relative to the F77 version, numarray (v0.9) was 2.7 times slower and Numeric (v23) was 3.0 times slower. Another items that appeared was that since the sin function is scalar, there was a huge difference in the Python implementation between using math.sin (140 times slower than F77), Numeric.sin (230 times slower than F77)and numarray.sin (350 times slower than F77). Perhaps, this suggests that namespace should be checked for scalar arguments before using vectorized versions. Regards Bruce Southey From Jack.Jansen at cwi.nl Thu Jan 6 06:55:24 2005 From: Jack.Jansen at cwi.nl (Jack Jansen) Date: Thu Jan 6 06:55:24 2005 Subject: [Numpy-discussion] A request for new distributions Message-ID: I'm catching up with half a year of back postings on the numpy list, and the messages about automatically using vecLib on the macintosh sound very interesting. I maintain the official Package Manager databases for MacPython (which allow users to install a handful of common packages with one mouseclick), and Numeric and numarray have been in there since day one. I'm revising all the packages again at the moment (I do that about once a year, or on request), and I had to manually fiddle Numeric 23.6 to actually build on the Mac. This is a bit of a bother, as I have to keep a separate source distribution in stead of just referring to the official one, but more important is the fact that neither Numeric nor numarray support vecLib out of the box. So, a plea for help: I don't know what the usual frequency for Numeric and numarray distributions is, but it would be very helpful for me (and for anyone using Numeric/numarray on the Mac) if new distributions were available that contained the fixes mentioned in the november discussions... -- Jack Jansen, , http://www.cwi.nl/~jack If I can't dance I don't want to be part of your revolution -- Emma Goldman From jmiller at stsci.edu Thu Jan 6 08:12:00 2005 From: jmiller at stsci.edu (Todd Miller) Date: Thu Jan 6 08:12:00 2005 Subject: [Numpy-discussion] A bug in numarray sign function. In-Reply-To: <41DD2D8F.8000803@VisionSense.com> References: <41DBC107.7070702@VisionSense.com> <41DC208F.5060605@csun.edu> <41DD2D8F.8000803@VisionSense.com> Message-ID: <1105027838.10516.190.camel@halloween.stsci.edu> On Thu, 2005-01-06 at 07:22, Nadav Horesh wrote: > >>> print numarray.__version__ > 1.2a > > It is from the CVS repositoty > > Nadav Thanks Nadav. Fixed in CVS. Regards, Todd > > Stephen Walton wrote: > > > Nadav Horesh wrote: > > > >> in numarraycore.py line 1504 should be changed from > >> > >> return zeros(shape(m))-less(m,0)+greater(m,0) > > > > > > What version of numarray are you running? I don't see this code in > > version 1.1.1. (I did a 'grep zeros(shape(m)) *.py' in > > /usr/lib/python2.3/site-packages/numarray and got no matches; line > > 1504 of my copy of numarraycore.py doesn't look anything like the above. > > > > > > > ------------------------------------------------------- > The SF.Net email is sponsored by: Beat the post-holiday blues > Get a FREE limited edition SourceForge.net t-shirt from ThinkGeek. > It's fun and FREE -- well, almost....http://www.thinkgeek.com/sfshirt > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/numpy-discussion -- From jmiller at stsci.edu Thu Jan 6 08:18:59 2005 From: jmiller at stsci.edu (Todd Miller) Date: Thu Jan 6 08:18:59 2005 Subject: [Numpy-discussion] Matlab/Numeric/numarray benchmarks In-Reply-To: <41DC84FA.1090703@arrowtheory.com> References: <41D9CA01.9040108@arrowtheory.com> <1104934210.31516.1563.camel@halloween.stsci.edu> <41DC84FA.1090703@arrowtheory.com> Message-ID: <1105028223.10516.210.camel@halloween.stsci.edu> On Wed, 2005-01-05 at 19:23, Simon Burton wrote: > Todd Miller wrote: > > >Hi Simon, > > > >I found a benchmark bug which explains the performance difference in +. > >Here are my times with the modified benchmark (Python-2.4 gcc version > >3.2.2 20030222 (Red Hat Linux 3.2.2-5) on 1.7 GHz P-IV w/ 2G): > > > >numarray + : 0.0540893316269 > >numarray matrixmultiply : 16.9448821545 > >numarray eigenvalues : 9.67254910469 > > > >Numeric + : 0.0653991508484 > >Numeric matrixmultiply : 33.0565470934 > >Numeric eigenvalues : 9.44225819111 > > > >So, for large arrays with a simple to install / built-in linear algebra > >system, numarray is doing just fine. > > > >Looking at your results, I think you may have been comparing numarray > >built using a built-in blas_lite versus Numeric using ATLAS. There, I > >think numarray *is* behind but that is fixable with some effort. The > >key is porting and integrating the Numeric dotblas package with > >numarray. I've been looking at that some today... err, yesterday, > >apparently I forgot to hit "send". > > > > > > Wow, those results look great, Todd. > > I have double checked my install. However, the numarray multiply is > still x10 slower. > > This is set in addons: > lapack_libs = ['lapack', 'f77blas', 'cblas', 'atlas', 'blas'] > > and, at runtime, python has loaded: > 40771000-40cb6000 r-xp 00000000 00:0c 783245 > /usr/lib/atlas/liblapack.so.3.0 > 40cb6000-40cb9000 rw-p 00545000 00:0c 783245 > /usr/lib/atlas/liblapack.so.3.0 > 40cb9000-40dbd000 rw-p 00000000 00:00 0 > 40dbd000-40dd7000 r-xp 00000000 00:0c 783242 /usr/lib/libf77blas.so.3.0 > 40dd7000-40dd8000 rw-p 00019000 00:0c 783242 /usr/lib/libf77blas.so.3.0 > 40dd8000-40df7000 r-xp 00000000 00:0c 783241 /usr/lib/libcblas.so.3.0 > 40df7000-40df8000 rw-p 0001e000 00:0c 783241 /usr/lib/libcblas.so.3.0 > 40df8000-4110b000 r-xp 00000000 00:0c 783240 /usr/lib/libatlas.so.3.0 > 4110b000-4110f000 rw-p 00312000 00:0c 783240 /usr/lib/libatlas.so.3.0 > 4110f000-41454000 r-xp 00000000 00:0c 783244 > /usr/lib/atlas/libblas.so.3.0 > 41454000-41458000 rw-p 00345000 00:0c 783244 > /usr/lib/atlas/libblas.so.3.0 > 41458000-41472000 r-xp 00000000 00:0c 17227 /usr/lib/libg2c.so.0.0.0 > 41472000-41473000 rw-p 0001a000 00:0c 17227 /usr/lib/libg2c.so.0.0.0 > ( as well as lapack_lite2.so ) > > So I assumed that since the eigenvalues came in fast ATLAS was alive and > well. > Also, the above libs are exactly what Numeric uses: > 4033b000-40880000 r-xp 00000000 00:0c 783245 > /usr/lib/atlas/liblapack.so.3.0 > 40880000-40883000 rw-p 00545000 00:0c 783245 > /usr/lib/atlas/liblapack.so.3.0 > 40883000-40987000 rw-p 00000000 00:00 0 > 40987000-409a6000 r-xp 00000000 00:0c 783241 /usr/lib/libcblas.so.3.0 > 409a6000-409a7000 rw-p 0001e000 00:0c 783241 /usr/lib/libcblas.so.3.0 > 409a7000-409c1000 r-xp 00000000 00:0c 783242 /usr/lib/libf77blas.so.3.0 > 409c1000-409c2000 rw-p 00019000 00:0c 783242 /usr/lib/libf77blas.so.3.0 > 409c2000-40cd5000 r-xp 00000000 00:0c 783240 /usr/lib/libatlas.so.3.0 > 40cd5000-40cd9000 rw-p 00312000 00:0c 783240 /usr/lib/libatlas.so.3.0 > 40cd9000-40cf3000 r-xp 00000000 00:0c 17227 /usr/lib/libg2c.so.0.0.0 > 40cf3000-40cf4000 rw-p 0001a000 00:0c 17227 /usr/lib/libg2c.so.0.0.0 > 40cf4000-40cf7000 rw-p 00000000 00:00 0 > 40cf7000-4103c000 r-xp 00000000 00:0c 783244 > /usr/lib/atlas/libblas.so.3.0 > 4103c000-41040000 rw-p 00345000 00:0c 783244 > /usr/lib/atlas/libblas.so.3.0 > > > Any ideas ? What we've got here is... a'falya to communicate. numarray's dot/matrixmultiply() and innerproduct() have never been implemented in terms of a BLAS. *That's* the problem. Numeric has the "dotblas" extension which augments the the built-in versions of these functions with ones that are souped up using external libraries. I ported dotblas yesterday and checked it into numarray CVS yesterday afternoon. I fixed the last doctest artifact and re-arranged a little this morning. If you're working from CVS you should be able to see the new performance optimization now by doing an update and building/linking against the right libraries. To do that I: setenv USE_LAPACK 1 setenv LINALG_LIB setenv LINALG_INCLUDE Have a look. I think numarray is slightly ahead now. Todd From stephen.walton at csun.edu Thu Jan 6 09:40:26 2005 From: stephen.walton at csun.edu (Stephen Walton) Date: Thu Jan 6 09:40:26 2005 Subject: [Numpy-discussion] Matlab/Numeric/numarray benchmarks In-Reply-To: <1105028223.10516.210.camel@halloween.stsci.edu> References: <41D9CA01.9040108@arrowtheory.com> <1104934210.31516.1563.camel@halloween.stsci.edu> <41DC84FA.1090703@arrowtheory.com> <1105028223.10516.210.camel@halloween.stsci.edu> Message-ID: <41DD77C0.10402@csun.edu> Todd Miller wrote: >What we've got here is... a'falya to communicate. > > > Yeah, include me in that category as well. >I ported dotblas yesterday and checked it into numarray CVS yesterday >afternoon. I fixed the last doctest artifact and re-arranged a little >this morning. > > > I just checked out the 1.2a CVS and am getting the same result I did before, with matrix multiplies about a factor of 7 slower in numarray than numeric. Now, I'm building with the Absoft compiler, and am wondering if some glitch in the build process is causing ATLAS not to be used. How did Simon Burton get the list of libraries loaded by python after importing numarray? I should check that next. By the way, libg2c still needs to be linked against with the Absoft compiler; add 'g2c' to 'lapack_libs' between 'atlas' and 'f90math'. Steve From jmiller at stsci.edu Thu Jan 6 12:25:43 2005 From: jmiller at stsci.edu (Todd Miller) Date: Thu Jan 6 12:25:43 2005 Subject: [Numpy-discussion] Matlab/Numeric/numarray benchmarks In-Reply-To: <41DD77C0.10402@csun.edu> References: <41D9CA01.9040108@arrowtheory.com> <1104934210.31516.1563.camel@halloween.stsci.edu> <41DC84FA.1090703@arrowtheory.com> <1105028223.10516.210.camel@halloween.stsci.edu> <41DD77C0.10402@csun.edu> Message-ID: <1105043050.10516.557.camel@halloween.stsci.edu> > >I ported dotblas yesterday and checked it into numarray CVS yesterday > >afternoon. I fixed the last doctest artifact and re-arranged a little > >this morning. > > > > > > > I just checked out the 1.2a CVS and am getting the same result I did > before, with matrix multiplies about a factor of 7 slower in numarray > than numeric. Try: >>> import numarray.dotblas as db >>> db.USING_BLAS 1 # 0 means the import fails Then try: >>> import numarray._dotblas and the traceback should identify the problem. > Now, I'm building with the Absoft compiler, and am > wondering if some glitch in the build process is causing ATLAS not to be > used. How did Simon Burton get the list of libraries loaded by python > after importing numarray? I should check that next. > > By the way, libg2c still needs to be linked against with the Absoft > compiler; add 'g2c' to 'lapack_libs' between 'atlas' and 'f90math'. > > Steve -- From faltet at carabos.com Thu Jan 6 13:17:05 2005 From: faltet at carabos.com (Francesc Altet) Date: Thu Jan 6 13:17:05 2005 Subject: [Numpy-discussion] Padding policy in CharArrays In-Reply-To: <200501032225.15238.faltet@carabos.com> References: <200501012244.23474.faltet@carabos.com> <1104770406.26038.118.camel@halloween.stsci.edu> <200501032225.15238.faltet@carabos.com> Message-ID: <200501061853.20673.faltet@carabos.com> A Dilluns 03 Gener 2005 22:25, Francesc Altet va escriure: > Mmm, perhaps having the possibility to select the pad value in CharArray > creation time would be nice. I've ended making an implementation of this in numarray. With the patches (against numarray 1.1.1) I'm attaching, the next works: >>> b=strings.array(['0'], itemsize = 4, padc="\x00") >>> b.raw() RawCharArray(['0\x00\x00\x00']) >>> b.raw() >= '0\x00\x00\x00\x01' array([0], type=Bool) While the actual behaviour in numarray 1.1.1 is: >>> b=strings.array(['0'], itemsize = 4) >>> b.raw() RawCharArray(['0 ']) >>> b.raw() >= '0\x00\x00\x00\x01' array([1], type=Bool) As you may have already noted, I've added a new parameter named padc to the CharArray/RawCharArray constructor being the default pad character value the space (" "), for backward compatibility. All the current tests for CharArray passes with patch applied. The new functionality is restricted to what I needed, but I guess it should be easily extended to be completely consistent in other cases. Feel free to add the patch to numarray if you feel it to be appropriate. Cheers, -- Francesc Altet ? >qo< ? http://www.carabos.com/ C?rabos Coop. V. ? V ?V ? Enjoy Data ? ? ? ? ? ? ? ? ? ? "" -------------- next part -------------- A non-text attachment was scrubbed... Name: _chararraymodule.c.patch Type: text/x-diff Size: 1085 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: strings.py.patch Type: text/x-diff Size: 7664 bytes Desc: not available URL: From jmiller at stsci.edu Thu Jan 6 14:11:48 2005 From: jmiller at stsci.edu (Todd Miller) Date: Thu Jan 6 14:11:48 2005 Subject: [Numpy-discussion] A request for new distributions In-Reply-To: References: Message-ID: <1105049407.10516.809.camel@halloween.stsci.edu> On Thu, 2005-01-06 at 09:54, Jack Jansen wrote: > I'm catching up with half a year of back postings on the numpy list, > and the messages about automatically using vecLib on the macintosh > sound very interesting. > > I maintain the official Package Manager databases for MacPython (which > allow users to install a handful of common packages with one > mouseclick), and Numeric and numarray have been in there since day one. > I'm revising all the packages again at the moment (I do that about once > a year, or on request), and I had to manually fiddle Numeric 23.6 to > actually build on the Mac. This is a bit of a bother, as I have to keep > a separate source distribution in stead of just referring to the > official one, but more important is the fact that neither Numeric nor > numarray support vecLib out of the box. > > So, a plea for help: I don't know what the usual frequency for Numeric > and numarray distributions is, but it would be very helpful for me (and > for anyone using Numeric/numarray on the Mac) if new distributions were > available that contained the fixes mentioned in the november > discussions... numarray-1.2 is relatively near at hand, sometime in the next 2-3 weeks I hope. For numarray-1.2 on the Mac, I think all you will need to do to get a vecLib build is: python setup.py install --use_lapack Regards, Todd From jmiller at stsci.edu Thu Jan 6 14:40:01 2005 From: jmiller at stsci.edu (Todd Miller) Date: Thu Jan 6 14:40:01 2005 Subject: [Numpy-discussion] Padding policy in CharArrays In-Reply-To: <200501061853.20673.faltet@carabos.com> References: <200501012244.23474.faltet@carabos.com> <1104770406.26038.118.camel@halloween.stsci.edu> <200501032225.15238.faltet@carabos.com> <200501061853.20673.faltet@carabos.com> Message-ID: <1105051133.10516.819.camel@halloween.stsci.edu> In some kind of cosmic irony, your bona-fide-patch was filed as Junk Mail by my filter. Anyway, thanks, it's committed in CVS. I added the extra code to handle the PadAll case you flagged as "to be corrected." Regards, Todd On Thu, 2005-01-06 at 12:53, Francesc Altet wrote: > A Dilluns 03 Gener 2005 22:25, Francesc Altet va escriure: > > Mmm, perhaps having the possibility to select the pad value in CharArray > > creation time would be nice. > > I've ended making an implementation of this in numarray. With the patches > (against numarray 1.1.1) I'm attaching, the next works: > > >>> b=strings.array(['0'], itemsize = 4, padc="\x00") > >>> b.raw() > RawCharArray(['0\x00\x00\x00']) > >>> b.raw() >= '0\x00\x00\x00\x01' > array([0], type=Bool) > > While the actual behaviour in numarray 1.1.1 is: > > >>> b=strings.array(['0'], itemsize = 4) > >>> b.raw() > RawCharArray(['0 ']) > >>> b.raw() >= '0\x00\x00\x00\x01' > array([1], type=Bool) > > As you may have already noted, I've added a new parameter named padc to the > CharArray/RawCharArray constructor being the default pad character value the > space (" "), for backward compatibility. All the current tests for CharArray > passes with patch applied. > > The new functionality is restricted to what I needed, but I guess it should > be easily extended to be completely consistent in other cases. Feel free to > add the patch to numarray if you feel it to be appropriate. > > Cheers, -- From stephen.walton at csun.edu Thu Jan 6 14:44:21 2005 From: stephen.walton at csun.edu (Stephen Walton) Date: Thu Jan 6 14:44:21 2005 Subject: [Numpy-discussion] Matlab/Numeric/numarray benchmarks In-Reply-To: <1105043050.10516.557.camel@halloween.stsci.edu> References: <41D9CA01.9040108@arrowtheory.com> <1104934210.31516.1563.camel@halloween.stsci.edu> <41DC84FA.1090703@arrowtheory.com> <1105028223.10516.210.camel@halloween.stsci.edu> <41DD77C0.10402@csun.edu> <1105043050.10516.557.camel@halloween.stsci.edu> Message-ID: <41DDBEF5.6060306@csun.edu> Todd Miller wrote: > Try: > > > >>>>import numarray.dotblas as db >>>>db.USING_BLAS >>>> You must have changed something in CVS today, or maybe things just propagated slowly. I didn't get the dotblas.py file until a couple of hours ago. Everything's hunky-dory now. Steve From simon at arrowtheory.com Thu Jan 6 15:57:19 2005 From: simon at arrowtheory.com (Simon Burton) Date: Thu Jan 6 15:57:19 2005 Subject: [Numpy-discussion] Matlab/Numeric/numarray benchmarks In-Reply-To: <1105028223.10516.210.camel@halloween.stsci.edu> References: <41D9CA01.9040108@arrowtheory.com> <1104934210.31516.1563.camel@halloween.stsci.edu> <41DC84FA.1090703@arrowtheory.com> <1105028223.10516.210.camel@halloween.stsci.edu> Message-ID: <41DDD040.9020505@arrowtheory.com> Todd Miller wrote: > >I ported dotblas yesterday and checked it into numarray CVS yesterday >afternoon. I fixed the last doctest artifact and re-arranged a little >this morning. > > > OK, it works great! One thing: to get it to compile i needed to change the includes in _dotblas.c to: #include "libnumarray.h" #include "arrayobject.h" from #include "numarray/libnumarray.h" #include "numarray/arrayobject.h" because we are useing "-IInclude/numarray" Simon. -- Simon Burton, B.Sc. Licensed PO Box 8066 ANU Canberra 2601 Australia Ph. 61 02 6249 6940 http://arrowtheory.com From prabhu_r at users.sf.net Thu Jan 6 20:32:03 2005 From: prabhu_r at users.sf.net (Prabhu Ramachandran) Date: Thu Jan 6 20:32:03 2005 Subject: [Numpy-discussion] Numarray feature request: supporting the buffer interface Message-ID: <16862.4265.62327.648134@monster.linux.in> Hi Numarray developers, Numeric arrays support the buffer interface by providing an array_as_buffer structure in the type object definition by doing this: (PyBufferProcs *)&array_as_buffer, /*tp_as_buffer*/ and adding this to the tp_flags: Py_TPFLAGS_HAVE_GETCHARBUFFER), /*tp_flags*/ This is very handy when one needs to pass void arrays into C/C++ code and is used to pass data from Numeric to C or C++ libraries very efficiently. In particular, this is very useful when passing Numeric array data to VTK. I noticed that numarray does not support this interface. My feature request is that numarray arrays also support this buffer interface (if possible). Thanks! cheers, prabhu p.s. I'm not on this list so please cc me in on any messages. Thanks! From Jack.Jansen at cwi.nl Fri Jan 7 01:34:32 2005 From: Jack.Jansen at cwi.nl (Jack Jansen) Date: Fri Jan 7 01:34:32 2005 Subject: [Numpy-discussion] A request for new distributions In-Reply-To: <1105049407.10516.809.camel@halloween.stsci.edu> References: <1105049407.10516.809.camel@halloween.stsci.edu> Message-ID: <128915B8-608F-11D9-AE74-000A958D1666@cwi.nl> On 6 Jan 2005, at 23:10, Todd Miller wrote: > numarray-1.2 is relatively near at hand, sometime in the next 2-3 weeks > I hope. For numarray-1.2 on the Mac, I think all you will need to do > to get a vecLib build is: > > python setup.py install --use_lapack Is there a reason to require the "--use_lapack"? I.e. are there any adverse consequences to using it? -- Jack Jansen, , http://www.cwi.nl/~jack If I can't dance I don't want to be part of your revolution -- Emma Goldman From rkern at ucsd.edu Fri Jan 7 01:54:11 2005 From: rkern at ucsd.edu (Robert Kern) Date: Fri Jan 7 01:54:11 2005 Subject: [Numpy-discussion] A request for new distributions In-Reply-To: <128915B8-608F-11D9-AE74-000A958D1666@cwi.nl> References: <1105049407.10516.809.camel@halloween.stsci.edu> <128915B8-608F-11D9-AE74-000A958D1666@cwi.nl> Message-ID: <41DE5C2D.7050400@ucsd.edu> Jack Jansen wrote: > > On 6 Jan 2005, at 23:10, Todd Miller wrote: > >> numarray-1.2 is relatively near at hand, sometime in the next 2-3 weeks >> I hope. For numarray-1.2 on the Mac, I think all you will need to do >> to get a vecLib build is: >> >> python setup.py install --use_lapack > > > Is there a reason to require the "--use_lapack"? I.e. are there any > adverse consequences to using it? On other platforms, one has to edit the setup scripts to add the information about where the libraries are. The default fallback is to use the unoptimized version packaged with numarray. The alternative would be to add autoconf-like capabilities to the setup script such that it could determine if the libraries were in the default places (and valid!), then fall back to the lite versions if not. On the Mac, --use_lapack should have no adverse consequences, if I'm reading you right. On other platforms, numarray might fail to build correctly if one hadn't supplied the necessary information. -- Robert Kern rkern at ucsd.edu "In the fields of hell where the grass grows high Are the graves of dreams allowed to die." -- Richard Harter From Jack.Jansen at cwi.nl Fri Jan 7 06:19:20 2005 From: Jack.Jansen at cwi.nl (Jack Jansen) Date: Fri Jan 7 06:19:20 2005 Subject: [Numpy-discussion] A request for new distributions In-Reply-To: <41DE5C2D.7050400@ucsd.edu> References: <1105049407.10516.809.camel@halloween.stsci.edu> <128915B8-608F-11D9-AE74-000A958D1666@cwi.nl> <41DE5C2D.7050400@ucsd.edu> Message-ID: <026ED217-60B7-11D9-AE74-000A958D1666@cwi.nl> On 7 Jan 2005, at 10:53, Robert Kern wrote: > Jack Jansen wrote: >> On 6 Jan 2005, at 23:10, Todd Miller wrote: >>> numarray-1.2 is relatively near at hand, sometime in the next 2-3 >>> weeks >>> I hope. For numarray-1.2 on the Mac, I think all you will need to >>> do >>> to get a vecLib build is: >>> >>> python setup.py install --use_lapack >> Is there a reason to require the "--use_lapack"? I.e. are there any >> adverse consequences to using it? > > On other platforms, one has to edit the setup scripts to add the > information about where the libraries are. The default fallback is to > use the unoptimized version packaged with numarray. > > The alternative would be to add autoconf-like capabilities to the > setup script such that it could determine if the libraries were in the > default places (and valid!), then fall back to the lite versions if > not. Ah, I see. So the problem is really that the library detection code hasn't been written. Hmm, having a look at the code, it seems that it should be fairly simple to fix (but I'm not completely sure I understand the interdependencies between setup.py, generate.py and addons.py, so I don't dare creating a patch). If the whole lapack section of addons was restructured like if os.environ.has_key('LINALG_LIB'): set things up for using that path elif os.path.exists('/usr/local/lib/atlas') use that elif os.path.exists('/System/Library/Frameworks/vecLib.framework') use that else use builtin blas_atlas I think it would have the same functionality as now but without need for the -use_lapack option. OTOH I may be oversimplifying things, I have no idea how these numerical libraries would normally be installed on Linux or other unixen, let alone on Windows. -- Jack Jansen, , http://www.cwi.nl/~jack If I can't dance I don't want to be part of your revolution -- Emma Goldman From chrisperkins99 at gmail.com Fri Jan 7 06:20:09 2005 From: chrisperkins99 at gmail.com (Chris Perkins) Date: Fri Jan 7 06:20:09 2005 Subject: [Numpy-discussion] Numarray feature request: supporting the buffer interface In-Reply-To: <16862.4265.62327.648134@monster.linux.in> References: <16862.4265.62327.648134@monster.linux.in> Message-ID: <184a9f5a0501070619c29386@mail.gmail.com> On Fri, 7 Jan 2005 10:01:37 +0530, Prabhu Ramachandran wrote: > > I noticed that numarray does not support this interface. My feature > request is that numarray arrays also support this buffer interface (if > possible). > I second this request. Note that numarray arrays have a member called "_data" that does support the buffer interface. I have been using code like this for a while (pseudocode): def asBuffer(a): if a is a numarray array: return a._data elif a is a Numeric array: return a else: ... do something else But it would be nice if the numarray array supported the buffer interface directly. I have no idea how hard or easy this would be to do. Chris Perkins From jmiller at stsci.edu Fri Jan 7 06:25:02 2005 From: jmiller at stsci.edu (Todd Miller) Date: Fri Jan 7 06:25:02 2005 Subject: [Numpy-discussion] A request for new distributions In-Reply-To: <41DE5C2D.7050400@ucsd.edu> References: <1105049407.10516.809.camel@halloween.stsci.edu> <128915B8-608F-11D9-AE74-000A958D1666@cwi.nl> <41DE5C2D.7050400@ucsd.edu> Message-ID: <1105107831.14757.72.camel@halloween.stsci.edu> On Fri, 2005-01-07 at 04:53, Robert Kern wrote: > Jack Jansen wrote: > > > > On 6 Jan 2005, at 23:10, Todd Miller wrote: > > > >> numarray-1.2 is relatively near at hand, sometime in the next 2-3 weeks > >> I hope. For numarray-1.2 on the Mac, I think all you will need to do > >> to get a vecLib build is: > >> > >> python setup.py install --use_lapack > > > > > > Is there a reason to require the "--use_lapack"? I.e. are there any > > adverse consequences to using it? > > On other platforms, one has to edit the setup scripts to add the > information about where the libraries are. The default fallback is to > use the unoptimized version packaged with numarray. > > The alternative would be to add autoconf-like capabilities to the setup > script such that it could determine if the libraries were in the default > places (and valid!), then fall back to the lite versions if not. > > On the Mac, --use_lapack should have no adverse consequences, if I'm > reading you right. On other platforms, numarray might fail to build > correctly if one hadn't supplied the necessary information. Since I'm not a Mac user, I'll beat a dead horse. Are we all agreed that: 1. vecLib is universally available on OS-X. 2. Using vecLib rather than blaslite is preferred. If so, I'll make it the default. Regards, Todd From rkern at ucsd.edu Fri Jan 7 06:34:13 2005 From: rkern at ucsd.edu (Robert Kern) Date: Fri Jan 7 06:34:13 2005 Subject: [Numpy-discussion] A request for new distributions In-Reply-To: <1105107831.14757.72.camel@halloween.stsci.edu> References: <1105049407.10516.809.camel@halloween.stsci.edu> <128915B8-608F-11D9-AE74-000A958D1666@cwi.nl> <41DE5C2D.7050400@ucsd.edu> <1105107831.14757.72.camel@halloween.stsci.edu> Message-ID: <41DE9D97.2090707@ucsd.edu> Todd Miller wrote: > Since I'm not a Mac user, I'll beat a dead horse. Are we all agreed > that: > > 1. vecLib is universally available on OS-X. Yes. > 2. Using vecLib rather than blaslite is preferred. Yes. > If so, I'll make it the default. Woohoo! Thank you! -- Robert Kern rkern at ucsd.edu "In the fields of hell where the grass grows high Are the graves of dreams allowed to die." -- Richard Harter From jmiller at stsci.edu Fri Jan 7 06:45:04 2005 From: jmiller at stsci.edu (Todd Miller) Date: Fri Jan 7 06:45:04 2005 Subject: [Numpy-discussion] Numarray feature request: supporting the buffer interface In-Reply-To: <184a9f5a0501070619c29386@mail.gmail.com> References: <16862.4265.62327.648134@monster.linux.in> <184a9f5a0501070619c29386@mail.gmail.com> Message-ID: <1105109054.14757.129.camel@halloween.stsci.edu> On Fri, 2005-01-07 at 09:19, Chris Perkins wrote: > On Fri, 7 Jan 2005 10:01:37 +0530, Prabhu Ramachandran > wrote: > > > > I noticed that numarray does not support this interface. My feature > > request is that numarray arrays also support this buffer interface (if > > possible). > > > > I second this request. > > Note that numarray arrays have a member called "_data" that does > support the buffer interface. I have been using code like this for a > while (pseudocode): > > def asBuffer(a): > if a is a numarray array: > return a._data > elif a is a Numeric array: > return a > else: ... do something else > > But it would be nice if the numarray array supported the buffer > interface directly. I have no idea how hard or easy this would be to > do. Without looking at code, my guess is that the C source level compatibility of numarray with Numeric will enable a "direct graft" of the buffer protocol code from Numeric to numarray. I think it will be easy... but... numarray has a concept of "misbehaved arrays", i.e. arrays in the binary format of another platform and therefore byte-swapped, or arrays spread across records and therefore possibly noncontiguous or misaligned. I think these buffers are likely unusable so providing access to them is a mistake. Misbehaved arrays probably don't arise in the work of most users, but they are a possibility that has to be accounted for. For cases of misbehaved arrays, I think raising a ValueError exception is necessary. How does that sound? Regards, Todd From jmiller at stsci.edu Fri Jan 7 07:10:22 2005 From: jmiller at stsci.edu (Todd Miller) Date: Fri Jan 7 07:10:22 2005 Subject: [Numpy-discussion] Numarray feature request: supporting the buffer interface In-Reply-To: <41DEA2A5.2020100@ucsd.edu> References: <16862.4265.62327.648134@monster.linux.in> <184a9f5a0501070619c29386@mail.gmail.com> <1105109054.14757.129.camel@halloween.stsci.edu> <41DEA2A5.2020100@ucsd.edu> Message-ID: <1105110589.15167.5.camel@halloween.stsci.edu> On Fri, 2005-01-07 at 09:54, Robert Kern wrote: > Todd Miller wrote: > > > numarray has a concept of "misbehaved arrays", i.e. arrays in the > > binary format of another platform and therefore byte-swapped, or arrays > > spread across records and therefore possibly noncontiguous or > > misaligned. I think these buffers are likely unusable so providing > > access to them is a mistake. Misbehaved arrays probably don't arise in > > the work of most users, but they are a possibility that has to be > > accounted for. > > > > For cases of misbehaved arrays, I think raising a ValueError exception > > is necessary. How does that sound? > > For the byteswapped case, could I still get a buffer object around the > raw data by using _data? If so, I vote +1. Sure. Alternately, you could make a copy of the array which will automatically be well behaved and therefore usable in C. Regards, Todd From prabhu_r at users.sf.net Fri Jan 7 09:47:05 2005 From: prabhu_r at users.sf.net (Prabhu Ramachandran) Date: Fri Jan 7 09:47:05 2005 Subject: [Numpy-discussion] Numarray feature request: supporting the buffer interface In-Reply-To: <184a9f5a0501070619c29386@mail.gmail.com> References: <16862.4265.62327.648134@monster.linux.in> <184a9f5a0501070619c29386@mail.gmail.com> Message-ID: <16862.51916.126290.46177@monster.linux.in> >>>>> "CP" == Chris Perkins writes: CP> On Fri, 7 Jan 2005 10:01:37 +0530, Prabhu Ramachandran CP> wrote: >> >> I noticed that numarray does not support this interface. My >> feature request is that numarray arrays also support this >> buffer interface (if possible). CP> I second this request. CP> Note that numarray arrays have a member called "_data" that CP> does support the buffer interface. I have been using code Aha! Thanks for that hint. :) cheers, prabhu From prabhu_r at users.sf.net Fri Jan 7 09:55:10 2005 From: prabhu_r at users.sf.net (Prabhu Ramachandran) Date: Fri Jan 7 09:55:10 2005 Subject: [Numpy-discussion] Numarray feature request: supporting the buffer interface In-Reply-To: <1105109054.14757.129.camel@halloween.stsci.edu> References: <16862.4265.62327.648134@monster.linux.in> <184a9f5a0501070619c29386@mail.gmail.com> <1105109054.14757.129.camel@halloween.stsci.edu> Message-ID: <16862.52414.137855.751814@monster.linux.in> >>>>> "TM" == Todd Miller writes: TM> On Fri, 2005-01-07 at 09:19, Chris Perkins wrote: >> On Fri, 7 Jan 2005 10:01:37 +0530, Prabhu Ramachandran >> wrote: >> > >> > I noticed that numarray does not support this interface. My >> > feature request is that numarray arrays also support this >> > buffer interface (if possible). >> >> I second this request. [...] TM> Without looking at code, my guess is that the C source level TM> compatibility of numarray with Numeric will enable a "direct TM> graft" of the buffer protocol code from Numeric to numarray. TM> I think it will be easy... but... TM> numarray has a concept of "misbehaved arrays", i.e. arrays in TM> the binary format of another platform and therefore TM> byte-swapped, or arrays spread across records and therefore TM> possibly noncontiguous or misaligned. I think these buffers TM> are likely unusable so providing access to them is a mistake. TM> Misbehaved arrays probably don't arise in the work of most TM> users, but they are a possibility that has to be accounted TM> for. TM> For cases of misbehaved arrays, I think raising a ValueError TM> exception is necessary. How does that sound? I think that sounds reasonable. In my particular use case I always flatten (ravel) the array before using it as a buffer. I guess that in cases where the array is non-contiguous or misaligned a copy of the data is made on ravel so these would not be a problem for me. For misbehaved arrays, a ValueError with a decent error message would be perfect! Anyway, thanks for considering the feature request! cheers, prabhu From Fernando.Perez at colorado.edu Fri Jan 7 10:34:09 2005 From: Fernando.Perez at colorado.edu (Fernando Perez) Date: Fri Jan 7 10:34:09 2005 Subject: [Numpy-discussion] A request for new distributions In-Reply-To: <026ED217-60B7-11D9-AE74-000A958D1666@cwi.nl> References: <1105049407.10516.809.camel@halloween.stsci.edu> <128915B8-608F-11D9-AE74-000A958D1666@cwi.nl> <41DE5C2D.7050400@ucsd.edu> <026ED217-60B7-11D9-AE74-000A958D1666@cwi.nl> Message-ID: <41DED610.8080409@colorado.edu> Jack Jansen wrote: > If the whole lapack section of addons was restructured like > if os.environ.has_key('LINALG_LIB'): > set things up for using that path > elif os.path.exists('/usr/local/lib/atlas') > use that > elif os.path.exists('/System/Library/Frameworks/vecLib.framework') > use that > else > use builtin blas_atlas If I may ask, it would be great if /usr/lib/(atlas/ATLAS) were added to these default search paths, like the scipy setup.py file does. In a mixed-architecture environment, where /usr/local is often NFS shared, one must put things like ATLAS in machine-specific locations. One simple solution is to put it directly in /usr/lib/atlas, instead of /usr/local/lib/atlas, since /usr/lib is rarely NFS-shared. This gives a way to share over NFS the bulk of things which are built from source, while leaving architecture-specific things in a location where they don't cause conflicts. Numpy and scipy already have this search path, so hopefully numarray can adopt the same convention as well. It's nice to be able to just unpack those and, without needing to set absolutely anything, simply say './setup.py bdist_rpm' and be done :) Cheers, f From rbastian at club-internet.fr Fri Jan 7 12:21:02 2005 From: rbastian at club-internet.fr (=?iso-8859-15?q?Ren=E9=20Bastian?=) Date: Fri Jan 7 12:21:02 2005 Subject: [Numpy-discussion] ImportError Message-ID: <05010712542904.00761@rbastian> Hi, I use python2.4+numarray1.1.1. import numarray.convolve as conv produces an ImportError : No module named convolve Can you help me ? -- Ren? Bastian http://www.musiques-rb.org : Musique en Python From jmiller at stsci.edu Fri Jan 7 12:24:02 2005 From: jmiller at stsci.edu (Todd Miller) Date: Fri Jan 7 12:24:02 2005 Subject: [Numpy-discussion] A request for new distributions In-Reply-To: <41DED610.8080409@colorado.edu> References: <1105049407.10516.809.camel@halloween.stsci.edu> <128915B8-608F-11D9-AE74-000A958D1666@cwi.nl> <41DE5C2D.7050400@ucsd.edu> <026ED217-60B7-11D9-AE74-000A958D1666@cwi.nl> <41DED610.8080409@colorado.edu> Message-ID: <1105129384.15167.31.camel@halloween.stsci.edu> On Fri, 2005-01-07 at 13:33, Fernando Perez wrote: > Jack Jansen wrote: > > > If the whole lapack section of addons was restructured like > > if os.environ.has_key('LINALG_LIB'): > > set things up for using that path > > elif os.path.exists('/usr/local/lib/atlas') > > use that > > elif os.path.exists('/System/Library/Frameworks/vecLib.framework') > > use that > > else > > use builtin blas_atlas > > If I may ask, it would be great if /usr/lib/(atlas/ATLAS) were added to these > default search paths, like the scipy setup.py file does. In a > mixed-architecture environment, where /usr/local is often NFS shared, one must > put things like ATLAS in machine-specific locations. One simple solution is > to put it directly in /usr/lib/atlas, instead of /usr/local/lib/atlas, since > /usr/lib is rarely NFS-shared. This gives a way to share over NFS the bulk of > things which are built from source, while leaving architecture-specific things > in a location where they don't cause conflicts. > > Numpy and scipy already have this search path, so hopefully numarray can adopt > the same convention as well. It's nice to be able to just unpack those and, > without needing to set absolutely anything, simply say './setup.py bdist_rpm' > and be done :) > > Cheers, > > f These sound like reasonable ideas but I want to mull it over some and I'm pretty much out of time this week... I'm supposed to be working on the scipy to numarray port. Both ideas look like they may be easy but I'm out of oomph and... they may not. Regards, Todd From jmiller at stsci.edu Fri Jan 7 12:27:04 2005 From: jmiller at stsci.edu (Todd Miller) Date: Fri Jan 7 12:27:04 2005 Subject: [Numpy-discussion] ImportError In-Reply-To: <05010712542904.00761@rbastian> References: <05010712542904.00761@rbastian> Message-ID: <1105129589.15167.35.camel@halloween.stsci.edu> On Fri, 2005-01-07 at 06:54, Ren? Bastian wrote: > Hi, > > I use python2.4+numarray1.1.1. > > import numarray.convolve as conv > > produces an ImportError : No module named convolve > > Can you help me ? The above works fine for me. I'd suggest deleting your current numarray install and "build" tree and re-installing. What's your OS and processor? From Fernando.Perez at colorado.edu Fri Jan 7 12:39:01 2005 From: Fernando.Perez at colorado.edu (Fernando Perez) Date: Fri Jan 7 12:39:01 2005 Subject: [Numpy-discussion] A request for new distributions In-Reply-To: <1105129384.15167.31.camel@halloween.stsci.edu> References: <1105049407.10516.809.camel@halloween.stsci.edu> <128915B8-608F-11D9-AE74-000A958D1666@cwi.nl> <41DE5C2D.7050400@ucsd.edu> <026ED217-60B7-11D9-AE74-000A958D1666@cwi.nl> <41DED610.8080409@colorado.edu> <1105129384.15167.31.camel@halloween.stsci.edu> Message-ID: <41DEF32D.8040301@colorado.edu> Todd Miller wrote: [path ideas] > These sound like reasonable ideas but I want to mull it over some and > I'm pretty much out of time this week... I'm supposed to be working on > the scipy to numarray port. Both ideas look like they may be easy but > I'm out of oomph and... they may not. No worries. Right now I'm enjoying setting up a yum-based system for handling Atlas/Numeric/scipy on a group of machines with architecture-specific ATLASes (P3, P4, P4+hyperthreading,...). So I sort of have these build issues very much in front of me, but there's no hurry in incorporating them into numarray. The scipy work is definitely a priority. But thanks for considering the input. Cheers, f From rbastian at club-internet.fr Fri Jan 7 12:53:06 2005 From: rbastian at club-internet.fr (=?iso-8859-15?q?Ren=E9=20Bastian?=) Date: Fri Jan 7 12:53:06 2005 Subject: [Numpy-discussion] Re: ImportError In-Reply-To: <05010712542904.00761@rbastian> References: <05010712542904.00761@rbastian> Message-ID: <05010713271405.00761@rbastian> Le Vendredi 7 Janvier 2005 12:54, Ren? Bastian a ?crit : > Hi, > > I use python2.4+numarray1.1.1. > > import numarray.convolve as conv > > produces an ImportError : No module named convolve > > Can you help me ? This is the error message : Traceback (most recent call last): File "afolia01.py", line 11, in ? from filtres import * File "/home/rbastian/pythoneon/Filtres/filtres.py", line 5, in ? #-------------------------- ImportError: No module named convolve The line 5 is an commented line ! -- Ren? Bastian http://www.musiques-rb.org : Musique en Python From jmiller at stsci.edu Fri Jan 7 13:06:18 2005 From: jmiller at stsci.edu (Todd Miller) Date: Fri Jan 7 13:06:18 2005 Subject: [Numpy-discussion] Re: ImportError In-Reply-To: <05010713271405.00761@rbastian> References: <05010712542904.00761@rbastian> <05010713271405.00761@rbastian> Message-ID: <1105131949.15167.40.camel@halloween.stsci.edu> On Fri, 2005-01-07 at 07:27, Ren? Bastian wrote: > Le Vendredi 7 Janvier 2005 12:54, Ren? Bastian a ?crit : > > Hi, > > > > I use python2.4+numarray1.1.1. > > > > import numarray.convolve as conv > > > > produces an ImportError : No module named convolve > > > > Can you help me ? > > This is the error message : > > Traceback (most recent call last): > File "afolia01.py", line 11, in ? > from filtres import * > File "/home/rbastian/pythoneon/Filtres/filtres.py", line 5, in ? > #-------------------------- > ImportError: No module named convolve > > The line 5 is an commented line ! Show us maybe the first 10 lines of filtres.py. From rbastian at club-internet.fr Fri Jan 7 13:25:06 2005 From: rbastian at club-internet.fr (=?utf-8?q?Ren=C3=A9=20Bastian?=) Date: Fri Jan 7 13:25:06 2005 Subject: [Numpy-discussion] Re: ImportError In-Reply-To: <1105131949.15167.40.camel@halloween.stsci.edu> References: <05010712542904.00761@rbastian> <05010713271405.00761@rbastian> <1105131949.15167.40.camel@halloween.stsci.edu> Message-ID: <05010713591108.00761@rbastian> Le Vendredi 7 Janvier 2005 22:05, Todd Miller a ?crit : > On Fri, 2005-01-07 at 07:27, Ren? Bastian wrote: > > Le Vendredi 7 Janvier 2005 12:54, Ren? Bastian a ?crit : > > > Hi, > > > > > > I use python2.4+numarray1.1.1. > > > > > > import numarray.convolve as conv > > > > > > produces an ImportError : No module named convolve > > > > > > Can you help me ? > > > > This is the error message : > > > > Traceback (most recent call last): > > File "afolia01.py", line 11, in ? > > from filtres import * > > File "/home/rbastian/pythoneon/Filtres/filtres.py", line 5, in ? > > #-------------------------- > > ImportError: No module named convolve > > > > The line 5 is an commented line ! > > Show us maybe the first 10 lines of filtres.py. This are the first 14 lines of filtres.py (the tags are lHTML-like)

#-*-python-*-
#-*-coding:latin-1-*-
import numarray as NA
import numarray.convolve as conv
#--------------------------

from utiles import sr, midi2freq, aff
from math import sqrt
import Canal

"""
une collection de filtres
"""
#----------------------------------------------------------
filtres.py alone works ... bu if imported, it dont here the first 23 lines of an application (filtres is imported) where it dont work
#-*-python-*-
#-*-coding:Latin 1-*-
"""
suite d'evenements
 1 son seul ; tr\`es fort
 2 sons superpos\'e ; de moins en moins fort
 ....
 n sons superposes
s?par?s par des ?carts aleatoires

v0.colognestereo.raw est la version sans valeur 'amplitude' ni 'enveloppe'
"""
import random
import numarray.random_array as RA
#import Convolve
import numarray.convolve as Convolve
from numarray import *
from profils import *
from filtres import *
import utiles as U
from ondesNumarray import *
from oscillateurs import *
-- Ren? Bastian http://www.musiques-rb.org : Musique en Python From pearu at cens.ioc.ee Fri Jan 7 13:26:02 2005 From: pearu at cens.ioc.ee (Pearu Peterson) Date: Fri Jan 7 13:26:02 2005 Subject: [Numpy-discussion] A request for new distributions In-Reply-To: <026ED217-60B7-11D9-AE74-000A958D1666@cwi.nl> Message-ID: On Fri, 7 Jan 2005, Jack Jansen wrote: > > The alternative would be to add autoconf-like capabilities to the > > setup script such that it could determine if the libraries were in the > > default places (and valid!), then fall back to the lite versions if > > not. > > Ah, I see. So the problem is really that the library detection code > hasn't been written. FYI, scipy_distutils has rather general lapack/blas/atlas detection code facilities. It first looks for atlas/lapack libraries, then for blas/lapack libraries, and finally for blas/lapack Fortran sources that scipy_distutils would compile behind the scenes. See scipy_distutils/system_info.py and scipy/Lib/lib/lapack/setup_lapack.py for more information. Pearu From bsder at mail.allcaps.org Fri Jan 7 15:11:03 2005 From: bsder at mail.allcaps.org (Andrew P. Lentvorski, Jr.) Date: Fri Jan 7 15:11:03 2005 Subject: [Numpy-discussion] A request for new distributions In-Reply-To: <41DED610.8080409@colorado.edu> References: <1105049407.10516.809.camel@halloween.stsci.edu> <128915B8-608F-11D9-AE74-000A958D1666@cwi.nl> <41DE5C2D.7050400@ucsd.edu> <026ED217-60B7-11D9-AE74-000A958D1666@cwi.nl> <41DED610.8080409@colorado.edu> Message-ID: <435F784E-6101-11D9-A3AA-000A95C874EE@mail.allcaps.org> On Jan 7, 2005, at 10:33 AM, Fernando Perez wrote: > Jack Jansen wrote: > >> If the whole lapack section of addons was restructured like >> if os.environ.has_key('LINALG_LIB'): >> set things up for using that path >> elif os.path.exists('/usr/local/lib/atlas') >> use that >> elif os.path.exists('/System/Library/Frameworks/vecLib.framework') >> use that >> else >> use builtin blas_atlas > > If I may ask, it would be great if /usr/lib/(atlas/ATLAS) were added > to these default search paths, like the scipy setup.py file does. I would much rather that previous snippet of code look something like: if sys.scipypath: # Or some other flag/global/something use whatever is indicated else: if os.environ.has_key('LINALG_LIB'): set things up for using that path elif os.path.exists('/usr/local/lib/atlas') use that elif os.path.exists('/System/Library/Frameworks/vecLib.framework') use that else use builtin blas_atlas In addition, some of us do not trust anything in /usr for production work. This is to help make our system administrators lives easier. If I only use things from, say /tools, the sysadmins can completely erase and reload workstations for the purposes of bug fixes, security updates, etc. without disturbing my work. This prevents, "GAAAHHH! You upgraded the machine and now everything is using Foo_1.1.1 instead of Foo_1.1.0 and now everything is broken." Most Linux distributions are particularly bad about this. This affliction is also known as "Perl Hell". ;) -a From stephen.walton at csun.edu Fri Jan 7 16:15:01 2005 From: stephen.walton at csun.edu (Stephen Walton) Date: Fri Jan 7 16:15:01 2005 Subject: [Numpy-discussion] A request for new distributions In-Reply-To: <41DEF32D.8040301@colorado.edu> References: <1105049407.10516.809.camel@halloween.stsci.edu> <128915B8-608F-11D9-AE74-000A958D1666@cwi.nl> <41DE5C2D.7050400@ucsd.edu> <026ED217-60B7-11D9-AE74-000A958D1666@cwi.nl> <41DED610.8080409@colorado.edu> <1105129384.15167.31.camel@halloween.stsci.edu> <41DEF32D.8040301@colorado.edu> Message-ID: <41DF25B2.3080901@csun.edu> Fernando Perez wrote: > No worries. Right now I'm enjoying setting up a yum-based system for > handling Atlas/Numeric/scipy on a group of machines with > architecture-specific ATLASes (P3, P4, P4+hyperthreading,...). Ooh, ooh, I want a copy when you're done! Is 'enjoying' the right verb there? From Fernando.Perez at colorado.edu Fri Jan 7 16:17:04 2005 From: Fernando.Perez at colorado.edu (Fernando Perez) Date: Fri Jan 7 16:17:04 2005 Subject: [Numpy-discussion] A request for new distributions In-Reply-To: <41DF25B2.3080901@csun.edu> References: <1105049407.10516.809.camel@halloween.stsci.edu> <128915B8-608F-11D9-AE74-000A958D1666@cwi.nl> <41DE5C2D.7050400@ucsd.edu> <026ED217-60B7-11D9-AE74-000A958D1666@cwi.nl> <41DED610.8080409@colorado.edu> <1105129384.15167.31.camel@halloween.stsci.edu> <41DEF32D.8040301@colorado.edu> <41DF25B2.3080901@csun.edu> Message-ID: <41DF2647.3000509@colorado.edu> Stephen Walton wrote: > Fernando Perez wrote: > > >>No worries. Right now I'm enjoying setting up a yum-based system for >>handling Atlas/Numeric/scipy on a group of machines with >>architecture-specific ATLASes (P3, P4, P4+hyperthreading,...). > > > Ooh, ooh, I want a copy when you're done! I actually had you in mind this morning, b/c I remember you've asked about this before. My solution is messy, but I think it's going to work OK. I'll probably post a little writeup about it later. It may be useful to people. > Is 'enjoying' the right verb there? As they say in mountaineering, "it doesn't have to be fun to be fun" :) Cheers, f From rbastian at club-internet.fr Sat Jan 8 00:57:01 2005 From: rbastian at club-internet.fr (=?iso-8859-15?q?Ren=E9=20Bastian?=) Date: Sat Jan 8 00:57:01 2005 Subject: [Numpy-discussion] Re: ImportError In-Reply-To: <05010713271405.00761@rbastian> References: <05010712542904.00761@rbastian> <05010713271405.00761@rbastian> Message-ID: <05010801300900.00763@rbastian> There was an error in the pythoneon.pth file in the site-packages repertory. (circular references ?). I rewrote this file and now all waorks. Thanks and apologies ... rbastian Le Vendredi 7 Janvier 2005 13:27, Ren? Bastian a ?crit : > Le Vendredi 7 Janvier 2005 12:54, Ren? Bastian a ?crit : > > Hi, > > > > I use python2.4+numarray1.1.1. > > > > import numarray.convolve as conv > > > > produces an ImportError : No module named convolve > > > > Can you help me ? > > This is the error message : > > Traceback (most recent call last): > File "afolia01.py", line 11, in ? > from filtres import * > File "/home/rbastian/pythoneon/Filtres/filtres.py", line 5, in ? > #-------------------------- > ImportError: No module named convolve > > The line 5 is an commented line ! From Chris.Barker at noaa.gov Mon Jan 10 11:07:11 2005 From: Chris.Barker at noaa.gov (Chris Barker) Date: Mon Jan 10 11:07:11 2005 Subject: [Numpy-discussion] A request for new distributions In-Reply-To: References: Message-ID: <41E2D110.3080905@noaa.gov> Pearu Peterson wrote: > FYI, scipy_distutils has rather general lapack/blas/atlas detection code > facilities. It first looks for atlas/lapack libraries, then for > blas/lapack libraries This makes me happy, as Gentoo Linux puts atlas in: /usr/lib/blas/atlas /usr/lib/lapack/atlas However, I'm all for any system that "just works" on the most common systems, and allows me to specify my weirdo system if need be. Being an OS-X user, I'd be very happy if it "just works" there. I expect to tweak things when I'm running Gentoo. -Chris -- Christopher Barker, Ph.D. Oceanographer NOAA/OR&R/HAZMAT (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From jochen at fhi-berlin.mpg.de Wed Jan 12 09:19:12 2005 From: jochen at fhi-berlin.mpg.de (=?iso-8859-1?Q?Jochen_K=FCpper?=) Date: Wed Jan 12 09:19:12 2005 Subject: [Numpy-discussion] numarray dotblas Message-ID: <9ellay5zxo.fsf@gowron.rz-berlin.mpg.de> I needed the following patch to build _dotblas on a fresh system: ,---- | Index: Src/_dotblas.c | =================================================================== | RCS file: /cvsroot/numpy/numarray/Src/_dotblas.c,v | retrieving revision 1.2 | diff -u -u -r1.2 _dotblas.c | --- Src/_dotblas.c 5 Jan 2005 19:57:08 -0000 1.2 | +++ Src/_dotblas.c 12 Jan 2005 17:16:12 -0000 | @@ -10,8 +10,8 @@ | | | #include "Python.h" | -#include "numarray/libnumarray.h" | -#include "numarray/arrayobject.h" | +#include "libnumarray.h" | +#include "arrayobject.h" | #include | | #include `---- Alternatively -IInclude must be added to the compile flags (setup.py: headers()). Greetings, Jochen -- Einigkeit und Recht und Freiheit http://www.Jochen-Kuepper.de Libert?, ?galit?, Fraternit? GnuPG key: CC1B0B4D (Part 3 you find in my messages before fall 2003.) From jmiller at stsci.edu Wed Jan 12 10:43:19 2005 From: jmiller at stsci.edu (Todd Miller) Date: Wed Jan 12 10:43:19 2005 Subject: [Numpy-discussion] numarray dotblas In-Reply-To: <9ellay5zxo.fsf@gowron.rz-berlin.mpg.de> References: <9ellay5zxo.fsf@gowron.rz-berlin.mpg.de> Message-ID: <1105555358.24423.15.camel@halloween.stsci.edu> Thanks Jochen. It's committed now. Cheers, Todd On Wed, 2005-01-12 at 12:18, Jochen K?pper wrote: > I needed the following patch to build _dotblas on a fresh system: > ,---- > | Index: Src/_dotblas.c > | =================================================================== > | RCS file: /cvsroot/numpy/numarray/Src/_dotblas.c,v > | retrieving revision 1.2 > | diff -u -u -r1.2 _dotblas.c > | --- Src/_dotblas.c 5 Jan 2005 19:57:08 -0000 1.2 > | +++ Src/_dotblas.c 12 Jan 2005 17:16:12 -0000 > | @@ -10,8 +10,8 @@ > | > | > | #include "Python.h" > | -#include "numarray/libnumarray.h" > | -#include "numarray/arrayobject.h" > | +#include "libnumarray.h" > | +#include "arrayobject.h" > | #include > | > | #include > `---- > Alternatively -IInclude must be added to the compile flags (setup.py: headers()). > > Greetings, > Jochen -- From klimek at grc.nasa.gov Wed Jan 12 13:10:25 2005 From: klimek at grc.nasa.gov (Bob Klimek) Date: Wed Jan 12 13:10:25 2005 Subject: [Numpy-discussion] position of objects? Message-ID: <41E592EB.6090209@grc.nasa.gov> Hi, Is there a way to obtain the positions (coordinates) of objects that were found with find_objects() function (in nd_image)? Specifically what I'm looking for is the coordinates of the bounding box (for a 2d array it would be upper-left and lower-right). Regards, Bob From jmiller at stsci.edu Wed Jan 12 16:36:17 2005 From: jmiller at stsci.edu (Todd Miller) Date: Wed Jan 12 16:36:17 2005 Subject: [Numpy-discussion] Simplifying array() Message-ID: <1105576535.24423.213.camel@halloween.stsci.edu> Someone (way to go Rory!) recently posted a patch (woohoo!) for numarray which I think bears a little discussion since it involves the re-write of a fundamental numarray function: array(). The patch fixes a number of bugs and deconvolutes the logic of array(). The patch is here if you want to look at it yourself: http://sourceforge.net/tracker/?atid=450449&group_id=1369&func=browse One item I thought needed some discussion was the removal of two features: > * array() does too much. E.g., handling file/memory instances for > 'sequence'. There's fromfile for the former, and users needing > the latter functionality should be clued up enough to > instantiate NumArray directly. I agree with this myself. Does anyone care if they will no longer be able to construct an array from a file or buffer object using array() rather than fromfile() or NumArray(), respectively? Is a deprecation process necessary to remove them? I think strings.py and records.py also have "over-stuffed" array() functions... so consistency bids us to streamline those as well. Regards, Todd From rkern at ucsd.edu Wed Jan 12 17:03:31 2005 From: rkern at ucsd.edu (Robert Kern) Date: Wed Jan 12 17:03:31 2005 Subject: [Numpy-discussion] Matrix-SIG archives Message-ID: <41E5C8A8.5020303@ucsd.edu> It looks like the mailing list archives for the Matrix-SIG and other retired SIGs are down at the moment. I've alerted the python.org webmaster, but in the meantime, does anyone have the early archives sitting around somewhere? I'm trying to answer a question about the motivations of a particular design decision in Numeric (why dot(A,B) doesn't do conjugation on A when A is complex). Thanks in advance. -- Robert Kern rkern at ucsd.edu "In the fields of hell where the grass grows high Are the graves of dreams allowed to die." -- Richard Harter From florian.proff.schulze at gmx.net Thu Jan 13 01:32:00 2005 From: florian.proff.schulze at gmx.net (Florian Schulze) Date: Thu Jan 13 01:32:00 2005 Subject: [Numpy-discussion] Re: Simplifying array() References: <1105576535.24423.213.camel@halloween.stsci.edu> Message-ID: On 12 Jan 2005 19:35:36 -0500, Todd Miller wrote: > One item I thought needed some discussion was the removal of two > features: > >> * array() does too much. E.g., handling file/memory instances for >> 'sequence'. There's fromfile for the former, and users needing >> the latter functionality should be clued up enough to >> instantiate NumArray directly. > > I agree with this myself. Does anyone care if they will no longer be > able to construct an array from a file or buffer object using array() > rather than fromfile() or NumArray(), respectively? Is a deprecation > process necessary to remove them? IMHO, array should just delegate to other functions based on the arguments, then it can remain backward compatible. I use the from buffer functionality quite often and it would be nice if there would at least be a new function frombuffer or frommemory. Regards, Florian Schulze From faltet at carabos.com Thu Jan 13 01:35:14 2005 From: faltet at carabos.com (Francesc Altet) Date: Thu Jan 13 01:35:14 2005 Subject: [Numpy-discussion] Simplifying array() In-Reply-To: <1105576535.24423.213.camel@halloween.stsci.edu> References: <1105576535.24423.213.camel@halloween.stsci.edu> Message-ID: <200501131034.10856.faltet@carabos.com> A Dijous 13 Gener 2005 01:35, Todd Miller va escriure: > > * array() does too much. E.g., handling file/memory instances for > > 'sequence'. There's fromfile for the former, and users needing > > the latter functionality should be clued up enough to > > instantiate NumArray directly. > > I agree with this myself. Does anyone care if they will no longer be > able to construct an array from a file or buffer object using array() > rather than fromfile() or NumArray(), respectively? Is a deprecation > process necessary to remove them? For me is fine. I always call the array() factory function in order to get a buffer object, so no problem. > I think strings.py and records.py also have "over-stuffed" array() > functions... so consistency bids us to streamline those as well. I agree. Cheers, -- >OO< ? Francesc Altet || http://www.carabos.com/ V ?V ? Carabos Coop. V. || Who is your data daddy? PyTables "" From konrad.hinsen at laposte.net Thu Jan 13 05:33:16 2005 From: konrad.hinsen at laposte.net (konrad.hinsen at laposte.net) Date: Thu Jan 13 05:33:16 2005 Subject: [Numpy-discussion] Matrix-SIG archives In-Reply-To: <41E5C8A8.5020303@ucsd.edu> References: <41E5C8A8.5020303@ucsd.edu> Message-ID: On Jan 13, 2005, at 2:02, Robert Kern wrote: > does anyone have the early archives sitting around somewhere? I'm > trying to answer a question about the motivations of a particular > design decision in Numeric (why dot(A,B) doesn't do conjugation on A > when A is complex). I don't have the archives either, but I can answer that one from memory. The fundamental decision was to separate the concepts of "array" (structured collection of data items of identical type) and "vector", "matrix" or "tensor" (mathematical objects with specific properties that are numerically represented by arrays). Arrays are just that, their operations are defined in terms of operations on their element. Numeric.dot() does multiplication followed by summing on the last dimension of the first argument and the first dimension of the second, no matter what type the elements have. Konrad. -- --------------------------------------------------------------------- Konrad Hinsen Laboratoire L?on Brillouin, CEA Saclay, 91191 Gif-sur-Yvette Cedex, France Tel.: +33-1 69 08 79 25 Fax: +33-1 69 08 82 61 E-Mail: hinsen at llb.saclay.cea.fr --------------------------------------------------------------------- From konrad.hinsen at laposte.net Thu Jan 13 05:41:14 2005 From: konrad.hinsen at laposte.net (konrad.hinsen at laposte.net) Date: Thu Jan 13 05:41:14 2005 Subject: [Numpy-discussion] ScientificPython with numarray support Message-ID: A development release of Scientific Python that supports numarray as an alternative to NumPy (choice made at installation time) is now available at http://starship.python.net/~hinsen/ScientificPython/ or http://dirac.cnrs-orleans.fr/ScientificPython/ (Search for "2.51".) Note that some modules do not work under numarray because they rely on a NumPy feature that is currently implemented in numarray. They are listed in the README file. Konrad. -- --------------------------------------------------------------------- Konrad Hinsen Laboratoire L?on Brillouin, CEA Saclay, 91191 Gif-sur-Yvette Cedex, France Tel.: +33-1 69 08 79 25 Fax: +33-1 69 08 82 61 E-Mail: hinsen at llb.saclay.cea.fr --------------------------------------------------------------------- From cjw at sympatico.ca Thu Jan 13 07:27:16 2005 From: cjw at sympatico.ca (Colin J. Williams) Date: Thu Jan 13 07:27:16 2005 Subject: [Numpy-discussion] Simplifying array() In-Reply-To: <1105576535.24423.213.camel@halloween.stsci.edu> References: <1105576535.24423.213.camel@halloween.stsci.edu> Message-ID: <41E69311.3030308@sympatico.ca> Todd Miller wrote: >Someone (way to go Rory!) recently posted a patch (woohoo!) for >numarray which I think bears a little discussion since it involves >the re-write of a fundamental numarray function: array(). >The patch fixes a number of bugs and deconvolutes the logic of array(). > >The patch is here if you want to look at it yourself: > >http://sourceforge.net/tracker/?atid=450449&group_id=1369&func=browse > >One item I thought needed some discussion was the removal of two >features: > > > >> * array() does too much. E.g., handling file/memory instances for >> 'sequence'. There's fromfile for the former, and users needing >> the latter functionality should be clued up enough to >> instantiate NumArray directly. >> >> > >I agree with this myself. Does anyone care if they will no longer be >able to construct an array from a file or buffer object using array() >rather than fromfile() or NumArray(), respectively? Is a deprecation >process necessary to remove them? > > I would suggest deprecation on the way to removal. For the newcomer, who is not yet "clued up" some advice on the instantiation of NumArray would help. Currently, neither the word "class" or "NumArray" appear in the doc index. Rory leaves in type and typecode. It would be good to eliminate this apparent overlap. Why not deprecate and then drop type? As a compromise, either could be accepted as a NumArray.__init__ argument, since it is easy to distinguish between them. It would be good to clarify the acceptable content of a sequence. A list, perhaps with sublists, of numbers is clear enough but what about a sequence of NumArray instances or even a sequence of numbers, mixed with NumArray instances? Is the function asarray redundant? I suggest that the copy parameter be of the BoolType. This probably has no practical impact but it is consistent with current Python usage and makes it clear that this is a Yes/No parameter, rather than specifying a number of copies. >I think strings.py and records.py also have "over-stuffed" array() >functions... so consistency bids us to streamline those as well. > >Regards, >Todd > > > Thanks to Rory for initiating this. Colin W. From jmiller at stsci.edu Thu Jan 13 08:17:20 2005 From: jmiller at stsci.edu (Todd Miller) Date: Thu Jan 13 08:17:20 2005 Subject: [Numpy-discussion] Simplifying array() In-Reply-To: <41E69311.3030308@sympatico.ca> References: <1105576535.24423.213.camel@halloween.stsci.edu> <41E69311.3030308@sympatico.ca> Message-ID: <1105632994.3169.32.camel@jaytmiller.comcast.net> On Thu, 2005-01-13 at 10:26 -0500, Colin J. Williams wrote: > Todd Miller wrote: > > >Someone (way to go Rory!) recently posted a patch (woohoo!) for > >numarray which I think bears a little discussion since it involves > >the re-write of a fundamental numarray function: array(). > >The patch fixes a number of bugs and deconvolutes the logic of array(). > > > >The patch is here if you want to look at it yourself: > > > >http://sourceforge.net/tracker/?atid=450449&group_id=1369&func=browse > > > >One item I thought needed some discussion was the removal of two > >features: > > > > > > > >> * array() does too much. E.g., handling file/memory instances for > >> 'sequence'. There's fromfile for the former, and users needing > >> the latter functionality should be clued up enough to > >> instantiate NumArray directly. > >> > >> > > > >I agree with this myself. Does anyone care if they will no longer be > >able to construct an array from a file or buffer object using array() > >rather than fromfile() or NumArray(), respectively? Is a deprecation > >process necessary to remove them? > > > > > I would suggest deprecation on the way to removal. For the newcomer, > who is not yet "clued up" > some advice on the instantiation of NumArray would help. That's fair. The docstring for NumArray needs beefing up along the same lines as Rory's work on array(). I initially liked Florian's idea of frombuffer() but since I can't think of how it's not identical to NumArray(), I'm not sure there's any point. > Rory leaves in type and typecode. It would be good to eliminate this > apparent overlap. Why not > deprecate and then drop type? Some people like type. I don't want to touch this. > It would be good to clarify the acceptable content of a sequence. A > list, perhaps with sublists, of > numbers is clear enough but what about a sequence of NumArray instances > or even a sequence > of numbers, mixed with NumArray instances? The patch has a new docstring which spells out the array() construction algorithm. Lists of arrays would be seen as "numerical sequences". > Is the function asarray redundant? Yes, but it's clear and also needed for backward compatibility with Numeric. Besides, it's not just redundant, it's an idiom... > I suggest that the copy parameter be of the BoolType. This probably has > no practical impact but > it is consistent with current Python usage and makes it clear that this > is a Yes/No parameter, > rather than specifying a number of copies. Fair enough. Backward compatibility dictates not *requiring* a bool, but using it as a default is fine. From tim.hochberg at cox.net Thu Jan 13 09:01:15 2005 From: tim.hochberg at cox.net (Tim Hochberg) Date: Thu Jan 13 09:01:15 2005 Subject: [Numpy-discussion] Simplifying array() In-Reply-To: <41E69311.3030308@sympatico.ca> References: <1105576535.24423.213.camel@halloween.stsci.edu> <41E69311.3030308@sympatico.ca> Message-ID: <41E6A91A.7080209@cox.net> Colin J. Williams wrote: > Todd Miller wrote: > >> Someone (way to go Rory!) recently posted a patch (woohoo!) for >> numarray which I think bears a little discussion since it involves >> the re-write of a fundamental numarray function: array(). >> The patch fixes a number of bugs and deconvolutes the logic of array(). >> >> The patch is here if you want to look at it yourself: >> >> http://sourceforge.net/tracker/?atid=450449&group_id=1369&func=browse >> >> One item I thought needed some discussion was the removal of two >> features: >> >> >> >>> * array() does too much. E.g., handling file/memory instances for >>> 'sequence'. There's fromfile for the former, and users needing >>> the latter functionality should be clued up enough to >>> instantiate NumArray directly. >>> >> >> >> I agree with this myself. Does anyone care if they will no longer be >> able to construct an array from a file or buffer object using array() >> rather than fromfile() or NumArray(), respectively? Is a deprecation >> process necessary to remove them? >> This isn't going to cause me pain, FWIW. > I would suggest deprecation on the way to removal. For the newcomer, > who is not yet "clued up" > some advice on the instantiation of NumArray would help. Currently, > neither the word "class" or > "NumArray" appear in the doc index. > > Rory leaves in type and typecode. It would be good to eliminate this > apparent overlap. Why not > deprecate and then drop type? As a compromise, either could be > accepted as a NumArray.__init__ > argument, since it is easy to distinguish between them. I thought typecode was eventually going away, not type. Either way, it makes sense to drop one of them eventually. This should definately go through a period of deprecation thought: it will certainly require that I fix a bunch of my code. > It would be good to clarify the acceptable content of a sequence. A > list, perhaps with sublists, of > numbers is clear enough but what about a sequence of NumArray > instances or even a sequence > of numbers, mixed with NumArray instances? Isn't any sequence that is composed of numbers or subsequences acceptable, as long as it has a consistent shape (no ragged edges)? > > Is the function asarray redundant? No, the copy=False parameter is redundant ;) Well as a pair they are redundant, but if I was going to get rid of something, I'd get rid of copy, because it's lying: copy=False sometimes copies (when the sequence is not an array) and sometimes does not (when the sequence is an array). A better name would be alwaysCopy, but better still would be to just get rid of it altogether and rely on asarray. (asarray may be implemented using the copy parameter now, but that would be easy to fix.). While we're at it, savespace should get nuked too (all with appropriate deprecations I suppose), so the final signature of array would be: array(sequence=None, type=None, shape=None) Hmm. That's still too complicated. It really should be array(sequence, type=None) I believe that other uses can be more clearly accomplished using zeros and reshape. Of course that has drastic backward compatibility issues and even with generous usage of deprecations might not help the transition much. Still, that's probably what I'd shoot for if it were an option. > > I suggest that the copy parameter be of the BoolType. This probably > has no practical impact but > it is consistent with current Python usage and makes it clear that > this is a Yes/No parameter, > rather than specifying a number of copies. > >> I think strings.py and records.py also have "over-stuffed" array() >> functions... so consistency bids us to streamline those as well. >> Regards, >> Todd >> >> >> > Thanks to Rory for initiating this. Agreed. -tim From perry at stsci.edu Thu Jan 13 09:54:12 2005 From: perry at stsci.edu (Perry Greenfield) Date: Thu Jan 13 09:54:12 2005 Subject: [Numpy-discussion] Simplifying array() In-Reply-To: <41E6A91A.7080209@cox.net> References: <1105576535.24423.213.camel@halloween.stsci.edu> <41E69311.3030308@sympatico.ca> <41E6A91A.7080209@cox.net> Message-ID: <1DA08A62-658C-11D9-B8A8-000A95B68E50@stsci.edu> On Jan 13, 2005, at 12:00 PM, Tim Hochberg wrote: > Colin J. Williams wrote: > >> Rory leaves in type and typecode. It would be good to eliminate this >> apparent overlap. Why not >> deprecate and then drop type? As a compromise, either could be >> accepted as a NumArray.__init__ >> argument, since it is easy to distinguish between them. > > I thought typecode was eventually going away, not type. Either way, it > makes sense to drop one of them > eventually. This should definately go through a period of deprecation > thought: it will certainly require that I > fix a bunch of my code. Tim is right about this. The rationale was that typecode is inaccurate since types are no longer represented by letter codes (one can still use them for backward compatibility). From juenglin at cs.pdx.edu Thu Jan 13 10:26:15 2005 From: juenglin at cs.pdx.edu (Ralf Juengling) Date: Thu Jan 13 10:26:15 2005 Subject: [Numpy-discussion] iterating over an array Message-ID: <1105640695.14230.41.camel@alpspitze.cs.pdx.edu> Hi, I have an application where I cannot avoid (afaikt) looping over one array dimension. So I thought it might help speeding up the code by setting up the data in a way so that the dimension to loop over is the first dimension. This allows to write for data in a: do sth with data instead of for i in range(len(a)): data = a[i] do sth with data and would save the indexing operation. To my surprise it didn't make a difference in terms of speed. A little timing experiment suggests, that the first version is actually slightly slower than the second: >>> setup = 'import numarray as na; a = na.arange(2000,shape=(1000,2))' >>> Timer('for row in a: pass', setup).timeit(number=1000) 13.495718955993652 >>> Timer('for i in range(len(a)): row=a[i]', setup).timeit(number=1000) 12.162748098373413 I noticed that the array object does not have a special method __iter__, so apparently, no attempts have been made so far to make array iteration fast. Do you think it's possible to speed things up by implementing an __iter__ method? This is high on my wish list and I would help with implementing it, appreciating any advice. Thanks, Ralf From juenglin at cs.pdx.edu Thu Jan 13 10:31:14 2005 From: juenglin at cs.pdx.edu (Ralf Juengling) Date: Thu Jan 13 10:31:14 2005 Subject: [Numpy-discussion] Simplifying array() In-Reply-To: <41E6A91A.7080209@cox.net> References: <1105576535.24423.213.camel@halloween.stsci.edu> <41E69311.3030308@sympatico.ca> <41E6A91A.7080209@cox.net> Message-ID: <1105641044.14230.47.camel@alpspitze.cs.pdx.edu> On Thu, 2005-01-13 at 09:00, Tim Hochberg wrote: > > It would be good to clarify the acceptable content of a sequence. > > A list, perhaps with sublists, of numbers is clear enough but what > > about a sequence of NumArray instances or even a sequence of > > numbers, mixed with NumArray instances? > > Isn't any sequence that is composed of numbers or subsequences > acceptable, as long as it has a consistent shape (no ragged edges)? Why not make it a little more general and accept iterable objects? >From http://docs.python.org/lib/module-array.html : array( typecode[, initializer]) Return a new array whose items are restricted by typecode, and initialized from the optional initializer value, which must be a list, string, or iterable over elements of the appropriate type. Changed in version 2.4: Formerly, only lists or strings were accepted. If given a list or string, the initializer is passed to the new array's fromlist(), fromstring(), or fromunicode() method (see below) to add initial items to the array. Otherwise, the iterable initializer is passed to the extend() method. Ralf From verveer at embl.de Thu Jan 13 12:47:09 2005 From: verveer at embl.de (Peter Verveer) Date: Thu Jan 13 12:47:09 2005 Subject: [Numpy-discussion] position of objects? In-Reply-To: <41E592EB.6090209@grc.nasa.gov> References: <41E592EB.6090209@grc.nasa.gov> Message-ID: <39D8BD7A-65A4-11D9-B5CA-000D932805AC@embl.de> > Is there a way to obtain the positions (coordinates) of objects that > were found with find_objects() function (in nd_image)? Specifically > what I'm looking for is the coordinates of the bounding box (for a 2d > array it would be upper-left and lower-right). The find_objects() functions returns for each object the slices that define the bounding box of each object. The slices are a tuple of slice objects, one slice object for each axis. The start and stop attributes of slice objects can be used to find exact position and size along each axis. Cheers, Peter From jmiller at stsci.edu Thu Jan 13 13:07:12 2005 From: jmiller at stsci.edu (Todd Miller) Date: Thu Jan 13 13:07:12 2005 Subject: [Numpy-discussion] iterating over an array In-Reply-To: <1105640695.14230.41.camel@alpspitze.cs.pdx.edu> References: <1105640695.14230.41.camel@alpspitze.cs.pdx.edu> Message-ID: <1105650377.3325.26.camel@jaytmiller.comcast.net> On Thu, 2005-01-13 at 10:24 -0800, Ralf Juengling wrote: > Hi, > > I have an application where I cannot avoid (afaikt) > looping over one array dimension. So I thought it > might help speeding up the code by setting up the > data in a way so that the dimension to loop over is > the first dimension. This allows to write > > for data in a: > do sth with data > > instead of > > for i in range(len(a)): > data = a[i] > do sth with data > > and would save the indexing operation. To my surprise > it didn't make a difference in terms of speed. A > little timing experiment suggests, that the first > version is actually slightly slower than the second: > > >>> setup = 'import numarray as na; a = na.arange(2000,shape=(1000,2))' > > >>> Timer('for row in a: pass', setup).timeit(number=1000) > 13.495718955993652 > > >>> Timer('for i in range(len(a)): row=a[i]', setup).timeit(number=1000) > 12.162748098373413 > > > I noticed that the array object does not have a special > method __iter__, so apparently, no attempts have been > made so far to make array iteration fast. Do you think > it's possible to speed things up by implementing an > __iter__ method? I'm skeptical. My impression is that the fallback for the iteration system is to use the object's len() to determine the count and its getitem() to fetch the iteration elements, all in C without intermediate indexing objects. If numarray is to be sped up, I think the key is to speed up the indexing code and/or object creation code in numarray's _ndarraymodule.c and _numarraymodule.c. I'd be happy to be proved wrong but that's my 2 cents. Regards, Todd From Chris.Barker at noaa.gov Thu Jan 13 15:03:02 2005 From: Chris.Barker at noaa.gov (Chris Barker) Date: Thu Jan 13 15:03:02 2005 Subject: [Numpy-discussion] iterating over an array In-Reply-To: <1105640695.14230.41.camel@alpspitze.cs.pdx.edu> References: <1105640695.14230.41.camel@alpspitze.cs.pdx.edu> Message-ID: <41E6FCC7.4010309@noaa.gov> Ralf Juengling wrote: > for data in a: > do sth with data > > instead of > > for i in range(len(a)): > data = a[i] > do sth with data > Do you think > it's possible to speed things up by implementing an > __iter__ method? Frankly, I seriously doubt it would make much difference, the indexing operation would have to take a comparable period of time to your: do sth with data That is unlikely. By the way, here is a test with Python lists: setup = 'import numarray as na; a = [[i*2,i*2+1] for i in range(1000)]' >>> Timer('for i in range(len(a)): row=a[i]', setup).timeit(number=1000) 0.37136483192443848 Much faster than the numarray examples( ~ 30 on my machine). I suspect the real delay here is that each indexing operation has to create a new array (even if they do use the same data). Lists just return the item. Also, it's been discussed that numarray's generic indexing is much slower than Numeric's, for instance. This has made a huge difference when passing arrays into wxPython, for instance. Perhaps that's relevant? Here's a test with Numeric vs. numarray: >>> setup = 'import Numeric as na; a = na.arange(2000);a.shape=(1000,2)' >>> Timer('for i in range(len(a)): row=a[i]', setup).timeit(number=1000) 1.97064208984375 >>> setup = 'import numarray as na; a = na.arange(2000);a.shape=(1000,2)' >>> Timer('for i in range(len(a)): row=a[i]', setup).timeit(number=1000) 27.220904111862183 yup! that's it. numarray's indexing is SLOW. So it's not an iterator issue. Look in the archives of this list for discussion of why numarray's generic indexing is slow. A search for "wxPython indexing" will probably turn it up. -Chris -Chris -- Christopher Barker, Ph.D. Oceanographer NOAA/OR&R/HAZMAT (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From cjw at sympatico.ca Thu Jan 13 15:09:13 2005 From: cjw at sympatico.ca (Colin J. Williams) Date: Thu Jan 13 15:09:13 2005 Subject: [Numpy-discussion] Simplifying array() In-Reply-To: <1105641044.14230.47.camel@alpspitze.cs.pdx.edu> References: <1105576535.24423.213.camel@halloween.stsci.edu> <41E69311.3030308@sympatico.ca> <41E6A91A.7080209@cox.net> <1105641044.14230.47.camel@alpspitze.cs.pdx.edu> Message-ID: <41E6FF60.4090803@sympatico.ca> Ralf Juengling wrote: >On Thu, 2005-01-13 at 09:00, Tim Hochberg wrote: > > >>>It would be good to clarify the acceptable content of a sequence. >>>A list, perhaps with sublists, of numbers is clear enough but what >>>about a sequence of NumArray instances or even a sequence of >>>numbers, mixed with NumArray instances? >>> >>> >>Isn't any sequence that is composed of numbers or subsequences >>acceptable, as long as it has a consistent shape (no ragged edges)? >> >> > >Why not make it a little more general and accept iterable objects? > >>From http://docs.python.org/lib/module-array.html : > > >array( >typecode[, initializer]) > Return a new array whose items are restricted by typecode, and > initialized from the optional initializer value, which must be a > list, string, or iterable over elements of the appropriate type. > Changed in version 2.4: Formerly, only lists or strings were > accepted. If given a list or string, the initializer is passed > to the new array's fromlist(), fromstring(), or fromunicode() > method (see below) to add initial items to the array. Otherwise, > the iterable initializer is passed to the extend() method. > > > >Ralf > > > Yes, I'm not sure whether list comprehension produces an iter object but this should also be included. Similarly instances of subclasses of NumArray should be explicitly included. I like the term no "ragged edges". Colin W. From cjw at sympatico.ca Thu Jan 13 16:45:04 2005 From: cjw at sympatico.ca (Colin J. Williams) Date: Thu Jan 13 16:45:04 2005 Subject: [Numpy-discussion] Simplifying array() In-Reply-To: <1105632994.3169.32.camel@jaytmiller.comcast.net> References: <1105576535.24423.213.camel@halloween.stsci.edu> <41E69311.3030308@sympatico.ca> <1105632994.3169.32.camel@jaytmiller.comcast.net> Message-ID: <41E715F2.70205@sympatico.ca> Todd Miller wrote: >On Thu, 2005-01-13 at 10:26 -0500, Colin J. Williams wrote: > > >>Todd Miller wrote: >> >> >> >>>Someone (way to go Rory!) recently posted a patch (woohoo!) for >>>numarray which I think bears a little discussion since it involves >>>the re-write of a fundamental numarray function: array(). >>>The patch fixes a number of bugs and deconvolutes the logic of array(). >>> >>>The patch is here if you want to look at it yourself: >>> >>>http://sourceforge.net/tracker/?atid=450449&group_id=1369&func=browse >>> >>>One item I thought needed some discussion was the removal of two >>>features: >>> >>> >>> >>> >>> >>>> * array() does too much. E.g., handling file/memory instances for >>>> 'sequence'. There's fromfile for the former, and users needing >>>> the latter functionality should be clued up enough to >>>> instantiate NumArray directly. >>>> >>>> >>>> >>>> >>>I agree with this myself. Does anyone care if they will no longer be >>>able to construct an array from a file or buffer object using array() >>>rather than fromfile() or NumArray(), respectively? Is a deprecation >>>process necessary to remove them? >>> >>> >>> >>> >>I would suggest deprecation on the way to removal. For the newcomer, >>who is not yet "clued up" >>some advice on the instantiation of NumArray would help. >> >> > >That's fair. The docstring for NumArray needs beefing up along the same >lines as Rory's work on array(). > and, I would suggest, the documentation. >I initially liked Florian's idea of >frombuffer() but since I can't think of how it's not identical to >NumArray(), I'm not sure there's any point. > > > >>Rory leaves in type and typecode. It would be good to eliminate this >>apparent overlap. Why not >>deprecate and then drop type? >> >> > >Some people like type. I don't want to touch this. > > The basic suggestion was to drop one or the other, since one is an _nt entry and either an instance of a function while the other is a string. I recognize that "type" has become accepted in the numarray community but the same word is used by Python for a utility function. > > >>It would be good to clarify the acceptable content of a sequence. A >>list, perhaps with sublists, of >>numbers is clear enough but what about a sequence of NumArray instances >>or even a sequence >>of numbers, mixed with NumArray instances? >> >> > >The patch has a new docstring which spells out the array() construction >algorithm. Lists of arrays would be seen as "numerical sequences". > > > >>Is the function asarray redundant? >> >> > >Yes, but it's clear and also needed for backward compatibility with >Numeric. Besides, it's not just redundant, it's an idiom... > > *asarray*( seq, type=None, typecode=None) This function converts scalars, lists and tuples to a numarray, when possible. It passes numarrays through, making copies only to convert types. In any other case a TypeError is raised. *astype*( type) The astype method returns a copy of the array converted to the specified type. As with any copy, the new array is aligned, contiguous, and in native machine byte order. If the specified type is the same as current type, a copy is /still/ made. *array*( sequence=None, typecode=None, copy=1, savespace=0, type=None, shape=None) It seems that the function array could be used in place of either the function asarray or the method astype: >>> import numarray.numerictypes as _nt >>> import numarray.numarraycore as _n >>> a= _n.array([1, 2]) >>> a array([1, 2]) >>> a._type Int32 >>> b= a.astype(_nt.Float64) >>> b._type Float64 >>> a._type Int32 >>> c= _n.array(seq= a, type= _nt.Float64) Traceback (most recent call last): File "", line 1, in ? TypeError: array() got an unexpected keyword argument 'seq' >>> c= _n.array(a, type= _nt.Float64) >>> c._type Float64 >>> > > >>I suggest that the copy parameter be of the BoolType. This probably has >>no practical impact but >>it is consistent with current Python usage and makes it clear that this >>is a Yes/No parameter, >>rather than specifying a number of copies. >> >> > >Fair enough. Backward compatibility dictates not *requiring* a bool, >but using it as a default is fine. > > > > Colin W. From das_deniz at yahoo.com Thu Jan 13 20:41:14 2005 From: das_deniz at yahoo.com (D. Bahi) Date: Thu Jan 13 20:41:14 2005 Subject: [Numpy-discussion] Numeric 23.6 for Python 2.4 (23.7 anyone) Message-ID: <20050114044049.95790.qmail@web20422.mail.yahoo.com> Hey this works for me! Thanks very much Jonathan. Want to do it again for 23.7? das __________________________________ Do you Yahoo!? All your favorites on one personal page ? Try My Yahoo! http://my.yahoo.com From faltet at carabos.com Fri Jan 14 00:29:22 2005 From: faltet at carabos.com (Francesc Altet) Date: Fri Jan 14 00:29:22 2005 Subject: [Numpy-discussion] iterating over an array In-Reply-To: <41E6FCC7.4010309@noaa.gov> References: <1105640695.14230.41.camel@alpspitze.cs.pdx.edu> <41E6FCC7.4010309@noaa.gov> Message-ID: <200501140928.30275.faltet@carabos.com> A Dijous 13 Gener 2005 23:57, Chris Barker va escriure: > >>> setup = 'import Numeric as na; a = na.arange(2000);a.shape=(1000,2)' > >>> Timer('for i in range(len(a)): row=a[i]', setup).timeit(number=1000) > 1.97064208984375 > >>> setup = 'import numarray as na; a = na.arange(2000);a.shape=(1000,2)' > >>> Timer('for i in range(len(a)): row=a[i]', setup).timeit(number=1000) > 27.220904111862183 > > yup! that's it. numarray's indexing is SLOW. So it's not an iterator > issue. Look in the archives of this list for discussion of why > numarray's generic indexing is slow. A search for "wxPython indexing" > will probably turn it up. Well, if you want to really compare generic indexing speed, you can't mix array creation objects in the process, as your example seems to do. A pure indexing access test would look like: >>> setup = 'import numarray as na; a = [i*2 for i in range(2000)]' >>> Timer('for i in range(len(a)): row=a[i]', setup).timeit(number=1000) 0.48835396766662598 # With Python Lists >>> setup = 'import Numeric as na; a = na.arange(2000);a.shape=(1000*2,)' >>> Timer('for i in range(len(a)): row=a[i]', setup).timeit(number=1000) 0.65753912925720215 # With Numeric >>> setup = 'import numarray as na; a = na.arange(2000);a.shape=(1000*2,)' >>> Timer('for i in range(len(a)): row=a[i]', setup).timeit(number=1000) 0.89093804359436035 # With numarray That shows that numarray indexing is slower than Numeric, but not by a large extent (just a 40%). The real problem with numarray (for Ralf's example) is, as is already known, array creation time. Cheers, -- >OO< ? Francesc Altet || http://www.carabos.com/ V ?V ? Carabos Coop. V. || Who is your data daddy? PyTables "" From tkorvola at welho.com Fri Jan 14 02:08:20 2005 From: tkorvola at welho.com (Timo Korvola) Date: Fri Jan 14 02:08:20 2005 Subject: [Numpy-discussion] Simplifying array() In-Reply-To: <41E6FF60.4090803@sympatico.ca> (Colin J. Williams's message of "Thu, 13 Jan 2005 18:08:16 -0500") References: <1105576535.24423.213.camel@halloween.stsci.edu> <41E69311.3030308@sympatico.ca> <41E6A91A.7080209@cox.net> <1105641044.14230.47.camel@alpspitze.cs.pdx.edu> <41E6FF60.4090803@sympatico.ca> Message-ID: <87acrcwcg5.fsf@welho.com> "Colin J. Williams" writes: > Yes, I'm not sure whether list comprehension produces an iter object > but this should also be included. Lists are iterable but they also have a length, which is not accessible through the iterator: from a general iterator there is no way of knowing in advance how many items it will return. This may be a problem if you want to allocate memory for the values. -- Timo Korvola From jmiller at stsci.edu Fri Jan 14 06:19:22 2005 From: jmiller at stsci.edu (Todd Miller) Date: Fri Jan 14 06:19:22 2005 Subject: [Numpy-discussion] iterating over an array Message-ID: <1105712311.3481.2.camel@jaytmiller.comcast.net> On Fri, 2005-01-14 at 09:28 +0100, Francesc Altet wrote: > A Dijous 13 Gener 2005 23:57, Chris Barker va escriure: > > >>> setup = 'import Numeric as na; a = na.arange(2000);a.shape= (1000,2)' > > >>> Timer('for i in range(len(a)): row=a[i]', setup).timeit (number=1000) > > 1.97064208984375 > > >>> setup = 'import numarray as na; a = na.arange(2000);a.shape= (1000,2)' > > >>> Timer('for i in range(len(a)): row=a[i]', setup).timeit (number=1000) > > 27.220904111862183 > > > > yup! that's it. numarray's indexing is SLOW. So it's not an iterator > > issue. Look in the archives of this list for discussion of why > > numarray's generic indexing is slow. A search for "wxPython indexing" > > will probably turn it up. > > Well, if you want to really compare generic indexing speed, you can't mix > array creation objects in the process, as your example seems to do. > > A pure indexing access test would look like: > > >>> setup = 'import numarray as na; a = [i*2 for i in range(2000)]' > >>> Timer('for i in range(len(a)): row=a[i]', setup).timeit (number=1000) > 0.48835396766662598 # With Python Lists > >>> setup = 'import Numeric as na; a = na.arange(2000);a.shape= (1000*2,)' > >>> Timer('for i in range(len(a)): row=a[i]', setup).timeit (number=1000) > 0.65753912925720215 # With Numeric > >>> setup = 'import numarray as na; a = na.arange(2000);a.shape= (1000*2,)' > >>> Timer('for i in range(len(a)): row=a[i]', setup).timeit (number=1000) > 0.89093804359436035 # With numarray > > That shows that numarray indexing is slower than Numeric, but not by a large > extent (just a 40%). The real problem with numarray (for Ralf's example) is, > as is already known, array creation time. I thought we were done after what Francesc pointed out above, then I tried this: from timeit import Timer setup = 'import numarray as na; a = na.arange(2000,shape=(2000,))' print "numarray iteration: ", Timer('for i in a: pass', setup).timeit(number=1000) print "numarray simple indexing, int value:", Timer('for i in range(len(a)): row=a[i]', setup).timeit(number=1000) setup = 'import Numeric as na; a = na.arange(2000); a.shape=(2000,)' print "Numeric iteration: ", Timer('for i in a: pass', setup).timeit(number=1000) print "Numeric simple indexing, int value:", Timer('for i in range(len(a)): row=a[i]', setup).timeit(number=1000) And got: numarray iteration: 8.81474900246 numarray simple indexing, int value: 3.61732387543 Numeric iteration: 1.0384759903 Numeric simple indexing, int value: 2.18056321144 This is running on Python-2.3.4 compiled --with-debug using gcc-3.4.2 on a 1 GHZ Athlon XP and FC3 Linux. Simple indexing returning an int was 66% slower for me, but iteration was 880% slower. It looks to me like there is room for significant numarray iteration improvement; I'm not sure how it needs to be done or if Numeric has any special support. Regards, Todd From ryorke at telkomsa.net Fri Jan 14 07:33:21 2005 From: ryorke at telkomsa.net (Rory Yorke) Date: Fri Jan 14 07:33:21 2005 Subject: [Numpy-discussion] Re: Simplifying array() In-Reply-To: <1105576535.24423.213.camel@halloween.stsci.edu> References: <1105576535.24423.213.camel@halloween.stsci.edu> Message-ID: <20050113201813.GB6528@telkomsa.net> [Todd] > I agree with this myself. Does anyone care if they will no longer be > able to construct an array from a file or buffer object using array() > rather than fromfile() or NumArray(), respectively? Is a deprecation > process necessary to remove them? There seems to be a majority opinion in favour of deprecation, though at least Florian uses the sequence-as-a-buffer feature. [Colin] > I would suggest deprecation on the way to removal. For the > newcomer, who is not yet "clued up" some advice on the instantiation > of NumArray would help. Currently, The deprecation warning could include a pointer to NumArray or fromfile, as appropriate. I think some of the Python stdlib deprecations (doctest?) do exactly this. The NumArray docs do need to be fixed, though. [Colin] > Rory leaves in type and typecode. It would be good to eliminate > this apparent overlap. Why not deprecate and then drop type? As a > compromise, either could be accepted as a NumArray.__init__ > argument, since it is easy to distinguish between them. [Perry] > Tim is right about this. The rationale was that typecode is > inaccurate since types are no longer represented by letter codes > (one can still use them for backward compatibility). Also, the type keyword matches the NumArray type method. It does have the downside of clashing with the type builtin, of course. > It would be good to clarify the acceptable content of a sequence. A I think this is quite important, though perhaps not too difficult. I think any sequence, or nested sequences should be accepted, provided that they are "conformally sized" (for lack of a better phrase) and that the innermost sequences contain number types. I'll try to word this more precisely for the docs. Note that a NumArray is a sequence, in the sense that it has __getitem__ and __len__ methods, and is index from 0 upwards. Strings are also sequences, and Alexander made a comment to the patch that array() should handle sequences of strings. Consider Numeric's behaviour: >>> array(["abc",[1,2,3]]) array([[97, 98, 99], [ 1, 2, 3]]) I think this needs to be handled in fromlist, which, I think, handles fairly general sequences, but not strings. Note that this leads to a different interpretation of array(["abcd"]) and array("abcd") According to the above, array(["abcd"] should return array([[97,98,99,100]]) and, since plain strings go straight to fromstring, array("abcd") should return array([1684234849]) (probably dependent on endianess, what Long is, etc.). Is this acceptable? [Colin] >Is the function asarray redundant? [Tim] > No, the copy=False parameter is redundant ;) Well as a pair they are I'm not sure I follow Tim's argument, but asarray is not redundant for a different reason: it returns any NDArray arguments without calling array. generic.ravel calls numarraycore.asarray, and so ravel()ing RecArrays, or some other non-NumArray NDArray requires asarray to remain as it is. I'm not sure if this setup is desirable, but I decided not to change too many things at once. [Colin] >I suggest that the copy parameter be of the BoolType. This >probably has no practical impact but it is consistent with current >Python usage and makes it clear that this is a Yes/No parameter, >rather than specifying a number of copies. This makes sense; as Todd noted, we shouldn't rely on it being a bool, but having False as the default value is clearer. Cheers, Rory From Chris.Barker at noaa.gov Fri Jan 14 11:54:26 2005 From: Chris.Barker at noaa.gov (Chris Barker) Date: Fri Jan 14 11:54:26 2005 Subject: [Numpy-discussion] iterating over an array In-Reply-To: <200501140928.30275.faltet@carabos.com> References: <1105640695.14230.41.camel@alpspitze.cs.pdx.edu> <41E6FCC7.4010309@noaa.gov> <200501140928.30275.faltet@carabos.com> Message-ID: <41E8222B.3070900@noaa.gov> Francesc Altet wrote: > That shows that numarray indexing is slower than Numeric, but not by a large > extent (just a 40%). The real problem with numarray (for Ralf's example) is, > as is already known, array creation time. Thanks for clearing this up. The case I care about(at the moment) is in wxPython's "PointListHelper". It converts whatever Python sequence you give it into a wxList of wxPoints. The sequence you give it needs to look something like a list of (x,y) tuples. An NX2 Numeric or Numarray array works just fine, but both are slower than a list of tuples, and Numarray is MUCH slower. This appears to be exactly analogous the OP's example, of extracting a bunch of (2,) arrays from the (N,2) array. Then the two numbers must be extracted from the (2,) array, and then converted to a wxPoint. It seems the creation of all those (2,) numarrays is what's taking the time. A) Is there work going on on speeding this up? B) the real solution, at least for wxPython, is to make "PointListHelper" understand numarrays, so that it can go straight from the array->data pointer to the wxList of wxPoints. One of these days I'll get around to working on that! -Chris -- Christopher Barker, Ph.D. Oceanographer NOAA/OR&R/HAZMAT (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From jmiller at stsci.edu Fri Jan 14 13:46:13 2005 From: jmiller at stsci.edu (Todd Miller) Date: Fri Jan 14 13:46:13 2005 Subject: [Numpy-discussion] Re: Simplifying array() In-Reply-To: <20050113201813.GB6528@telkomsa.net> References: <1105576535.24423.213.camel@halloween.stsci.edu> <20050113201813.GB6528@telkomsa.net> Message-ID: <1105739144.5294.28.camel@jaytmiller.comcast.net> On Thu, 2005-01-13 at 22:18 +0200, Rory Yorke wrote: > [Todd] > > I agree with this myself. Does anyone care if they will no longer be > > able to construct an array from a file or buffer object using array() > > rather than fromfile() or NumArray(), respectively? Is a deprecation > > process necessary to remove them? > > There seems to be a majority opinion in favour of deprecation, though > at least Florian uses the sequence-as-a-buffer feature. By way of status, I applied and committed both Rory's patches this morning. Afterward, I added the deprecation warnings for the frombuffer() and fromfile() cases. frombuffer() is identical to NumArray(), so I did not add a new function. > [Colin] > > I would suggest deprecation on the way to removal. For the > > newcomer, who is not yet "clued up" some advice on the instantiation > > of NumArray would help. Currently, > > The deprecation warning could include a pointer to NumArray or > fromfile, as appropriate. I think some of the Python stdlib > deprecations (doctest?) do exactly this. The NumArray docs do need to > be fixed, though. I didn't touch the docs. > [Colin] > > Rory leaves in type and typecode. It would be good to eliminate > > this apparent overlap. Why not deprecate and then drop type? As a > > compromise, either could be accepted as a NumArray.__init__ > > argument, since it is easy to distinguish between them. > > [Perry] > > Tim is right about this. The rationale was that typecode is > > inaccurate since types are no longer represented by letter codes > > (one can still use them for backward compatibility). > > Also, the type keyword matches the NumArray type method. It does have > the downside of clashing with the type builtin, of course. IMHO, all this discussion about type/typecode is moot because typecode was added after the fact for Numeric compatibility. It's really makes no sense to take it out now that we're going for interoperability with scipy. I don't like it much either, but the alternative, being incompatible, is worse. "typecode" could be factored out in to the numerix layer, but that just makes life confusing; it's best that numarray works the same whether it's being used with scipy or not. > > It would be good to clarify the acceptable content of a sequence. A > > I think this is quite important, though perhaps not too difficult. I > think any sequence, or nested sequences should be accepted, provided > that they are "conformally sized" (for lack of a better phrase) and > that the innermost sequences contain number types. I'll try to word > this more precisely for the docs. > > Note that a NumArray is a sequence, in the sense that it has > __getitem__ and __len__ methods, and is index from 0 upwards. > > Strings are also sequences, and Alexander made a comment to the patch > that array() should handle sequences of strings. Consider Numeric's > behaviour: > > >>> array(["abc",[1,2,3]]) > array([[97, 98, 99], > [ 1, 2, 3]]) -1 from me. I think we're getting back into "array does too much" territory. > I think this needs to be handled in fromlist, which, I think, handles > fairly general sequences, but not strings. I think you're right, that's how it could be done. > Note that this leads to a different interpretation of array(["abcd"]) > and array("abcd") > > According to the above, array(["abcd"] should return > array([[97,98,99,100]]) and, since plain strings go straight to > fromstring, array("abcd") should return array([1684234849]) (probably > dependent on endianess, what Long is, etc.). Is this acceptable? I held off consolidating all the new default types to Long. Not having defaults hasn't been a problem up to now so I'm not sure Numeric compatibility is such a concern or that Long is really the best default... although it does make it easier to write doctests. Todd From haase at msg.ucsf.edu Fri Jan 14 15:24:08 2005 From: haase at msg.ucsf.edu (Sebastian Haase) Date: Fri Jan 14 15:24:08 2005 Subject: [Numpy-discussion] ScientificPython with numarray support In-Reply-To: References: Message-ID: <200501141523.07159.haase@msg.ucsf.edu> On Thursday 13 January 2005 05:42 am, konrad.hinsen at laposte.net wrote: > A development release of Scientific Python that supports numarray as an > alternative to NumPy Hi Konrad, This is great news ! In the readme it says: """ Note that this is a new feature and not very well tested. Feedback is welcome. Note also that the modules Scientific.Functions.Derivatives Scientific.Functions.FirstDerivatives Scientific.Functions.LeastSquares do not work correctly with numarray because they rely on a feature of Numeric that is missing in current numarray releases. """ I'm just curious what the missing feature is. The LeastSquare-fit is exactly what I'm interested in, since I couldn't find something similar anywhere else (like: It's not in SciPy, right?) Thanks, Sebastian Haase >(choice made at installation time) is now > available at > > http://starship.python.net/~hinsen/ScientificPython/ > or > http://dirac.cnrs-orleans.fr/ScientificPython/ > > (Search for "2.51".) > > Note that some modules do not work under numarray because they rely on > a NumPy feature that is currently implemented in numarray. They are > listed in the README file. > > Konrad. > -- > --------------------------------------------------------------------- > Konrad Hinsen > Laboratoire L?on Brillouin, CEA Saclay, > 91191 Gif-sur-Yvette Cedex, France > Tel.: +33-1 69 08 79 25 > Fax: +33-1 69 08 82 61 > E-Mail: hinsen at llb.saclay.cea.fr > --------------------------------------------------------------------- > > > > ------------------------------------------------------- > The SF.Net email is sponsored by: Beat the post-holiday blues > Get a FREE limited edition SourceForge.net t-shirt from ThinkGeek. > It's fun and FREE -- well, almost....http://www.thinkgeek.com/sfshirt > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/numpy-discussion From jdhunter at ace.bsd.uchicago.edu Fri Jan 14 16:21:22 2005 From: jdhunter at ace.bsd.uchicago.edu (John Hunter) Date: Fri Jan 14 16:21:22 2005 Subject: [Numpy-discussion] ScientificPython with numarray support In-Reply-To: <200501141523.07159.haase@msg.ucsf.edu> (Sebastian Haase's message of "Fri, 14 Jan 2005 15:23:07 -0800") References: <200501141523.07159.haase@msg.ucsf.edu> Message-ID: >>>>> "Sebastian" == Sebastian Haase writes: Sebastian> The LeastSquare-fit is exactly what I'm interested in, Sebastian> since I couldn't find something similar anywhere else Sebastian> (like: It's not in SciPy, right?) from scipy import exp, arange, zeros, Float, ones, transpose from RandomArray import normal from scipy.optimize import leastsq parsTrue = 2.0, -.76, 0.1 distance = arange(0, 4, 0.001) def func(pars): a, alpha, k = pars return a*exp(alpha*distance) + k def errfunc(pars): return data - func(pars) #return the error # some pseudo data; add some noise data = func(parsTrue) + normal(0.0, 0.1, distance.shape) guess = 1.0, -.4, 0.0 # the intial guess of the params best, info, ier, mesg = leastsq(errfunc, guess, full_output=1) print 'true', parsTrue print 'best', best From haase at msg.ucsf.edu Fri Jan 14 17:24:19 2005 From: haase at msg.ucsf.edu (Sebastian Haase) Date: Fri Jan 14 17:24:19 2005 Subject: [Numpy-discussion] ScientificPython with numarray support In-Reply-To: References: <200501141523.07159.haase@msg.ucsf.edu> Message-ID: <200501141723.17142.haase@msg.ucsf.edu> On Friday 14 January 2005 04:15 pm, John Hunter wrote: > >>>>> "Sebastian" == Sebastian Haase writes: > > Sebastian> The LeastSquare-fit is exactly what I'm interested in, > Sebastian> since I couldn't find something similar anywhere else > Sebastian> (like: It's not in SciPy, right?) > > from scipy import exp, arange, zeros, Float, ones, transpose > from RandomArray import normal > from scipy.optimize import leastsq > > parsTrue = 2.0, -.76, 0.1 > distance = arange(0, 4, 0.001) > > def func(pars): > a, alpha, k = pars > return a*exp(alpha*distance) + k > > def errfunc(pars): > return data - func(pars) #return the error > > # some pseudo data; add some noise > data = func(parsTrue) + normal(0.0, 0.1, distance.shape) > > > guess = 1.0, -.4, 0.0 # the intial guess of the params > > best, info, ier, mesg = leastsq(errfunc, guess, full_output=1) > > print 'true', parsTrue > print 'best', best > Thanks John, I thought it should be there. Is the code / algorithm about similar to what Konrad has in Scientific ? - Sebastian From gazzar at email.com Fri Jan 14 19:40:24 2005 From: gazzar at email.com (Gary Ruben) Date: Fri Jan 14 19:40:24 2005 Subject: [Numpy-discussion] ScientificPython with numarray support Message-ID: <20050115033916.3CAB3164002@ws1-4.us4.outblaze.com> Hi Sebastian, You could also use the linregress function in scipy.stats if you're doing least squares fitting of a straight line. Gary R. ----- Original Message ----- > On Friday 14 January 2005 04:15 pm, John Hunter wrote: > > >>>>> "Sebastian" == Sebastian Haase writes: > > > > Sebastian> The LeastSquare-fit is exactly what I'm interested in, > > Sebastian> since I couldn't find something similar anywhere else > > Sebastian> (like: It's not in SciPy, right?) > > > > best, info, ier, mesg = leastsq(errfunc, guess, full_output=1) > > Thanks John, > I thought it should be there. > Is the code / algorithm about similar to what Konrad has in Scientific ? > > - Sebastian -- ___________________________________________________________ Sign-up for Ads Free at Mail.com http://promo.mail.com/adsfreejump.htm From oliphant at ee.byu.edu Fri Jan 14 22:37:32 2005 From: oliphant at ee.byu.edu (Travis Oliphant) Date: Fri Jan 14 22:37:32 2005 Subject: [Numpy-discussion] Speeding up numarray -- questions on its design Message-ID: <41E8B9D9.5040301@ee.byu.edu> Hello all, I don't comment much on numarray because I haven't used it that much, as Numeric fits my needs quite well. It does bother me that there are two communities coexisting and that work seems to get repeated several times, so recently I have looked at numarray to see how far it is from being acceptable as a real replacement for Numeric. I have some comments based on perusing it's source. I don't want to seem overly critical, so please take my comments with the understanding that I appreciate the extensive work that has gone into Numarray. I do think that Numarray has made some great strides. I would really like to see a unification of Numeric and Numarray. 1) Are there plans to move the nd array entirely into C? -- I would like to see the nd array become purely a c-type. I would be willing to help here. I can see that part of the work has been done. 2) Why is the ND array C-structure so large? Why are the dimensions and strides array static? Why can't the extra stuff that the fancy arrays need be another structure and the numarray C structure just extended with a pointer to the extra stuff? 3) There seem to be too many files to define the array. The mixture of Python and C makes trying to understand the source very difficult. I thought one of the reasons for the re-write was to simplify the source code. 4) Object arrays must be supported. This was a bad oversight and an important feature of Numeric arrays. 5) The ufunc code interface needs to continue to be improved. I do see that some effort into understanding the old ufunc interface has taken place which is a good sign. Again, thanks to the work that has been done. I'm really interested to see if some of these modifications can be done as in my mind it will help the process of unifying the two camps. -Travis Oliphant From konrad.hinsen at laposte.net Sun Jan 16 03:29:20 2005 From: konrad.hinsen at laposte.net (konrad.hinsen at laposte.net) Date: Sun Jan 16 03:29:20 2005 Subject: [Numpy-discussion] ScientificPython with numarray support In-Reply-To: <200501141723.17142.haase@msg.ucsf.edu> References: <200501141523.07159.haase@msg.ucsf.edu> <200501141723.17142.haase@msg.ucsf.edu> Message-ID: On 15.01.2005, at 02:23, Sebastian Haase wrote: > I thought it should be there. > Is the code / algorithm about similar to what Konrad has in Scientific > ? > I don't know exactly what's in SciPy, but it's probably a variant of Levenberg-Marquart, just like in Scientific Python. However, there is one peculiarity of my implementation which is probably not shared by the SciPy one, and which is the cause of the incompatibility with numarray: the use of automatic derivatives in the linearization of the model. Most implementations use numerical differentiation. Automatic derivatives have the advantage of removing one numerical issue and one critical parameter (the differentiation step size), at the cost of somewhat limiting the applicability (the model must be expressed as an analytical function of the parameters) and of requiring a NumPy feature that numarray doesn't have (yet?). BTW, that feature was recenly discussed here: it is the possibility to apply the maths functions to objects of arbitrary type. This makes it possible to apply the same numerical code to numbers and arrays but also to the number-like objects that are used for automatic derivatives. Konrad. -- ------------------------------------------------------------------------ ------- Konrad Hinsen Laboratoire Leon Brillouin, CEA Saclay, 91191 Gif-sur-Yvette Cedex, France Tel.: +33-1 69 08 79 25 Fax: +33-1 69 08 82 61 E-Mail: hinsen at llb.saclay.cea.fr ------------------------------------------------------------------------ ------- From perry at stsci.edu Mon Jan 17 11:12:28 2005 From: perry at stsci.edu (Perry Greenfield) Date: Mon Jan 17 11:12:28 2005 Subject: [Numpy-discussion] Speeding up numarray -- questions on its design In-Reply-To: References: Message-ID: Travis Oliphant wrote: > I have some comments based on perusing it's source. I don't want to > seem overly critical, so please take my comments with the understanding > that I appreciate the extensive work that has gone into Numarray. I do > think that Numarray has made some great strides. I would really like > to > see a unification of Numeric and Numarray. > > 1) Are there plans to move the nd array entirely into C? > -- I would like to see the nd array become purely a c-type. I would > be willing to help here. I can see that part of the work has been > done. > I don't know that I would say they are definite, but I think that at some point we thought that would be necessary. We haven't yet since doing so makes it harder to change so it would be one of the last changes to the core that we would want to do. Our current priorities are towards making all the major libraries and packages available under it first and then finishing optimization issues (another issue that has to be tackled soon is handling 64-bit addressing; apparently the work to make Python sequences use 64-bit addresses is nearing completion so we want to be able to handle that. I expect we would want to make sure we find a way of handling that before we turn it all into C but maybe it is just as easy doing them in the opposite order. > 2) Why is the ND array C-structure so large? Why are the dimensions > and strides array static? Why can't the extra stuff that the fancy > arrays need be another structure and the numarray C structure just > extended with a pointer to the extra stuff? When Todd moved NDArray into C, he tried to keep it simple. As such, it has no "moving parts." We think making dimensions and strides malloc'ed rather than static would be fairly easy. Making the "extra stuff" variable is something we can look at. The bottom line is that adding the variability adds complexity and we're not sure we understand the storage economics of why we would doing it. Numarray was designed, first and foremost, for large arrays. For that case, the array struct size is irrelevant whereas additional complexity is not. I guess we would like to see some good practical examples where the array struct size matters. Do you have code with hundreds of thousands of small arrays existing simultaneously? > 3) There seem to be too many files to define the array. The mixture of > Python and C makes trying to understand the source very difficult. I > thought one of the reasons for the re-write was to simplify the source > code. > I think this reflects the transitional nature of going from mostly Python to a hybrid. We agree that the current state is more convoluted than it ought to be. If NDarray were all C, much of this would ended (though in some respects, being all in C will make it larger, harder to understand as well). The original hope was that most of the array setup computation could be kept in Python but that is what made it slow for small arrays (but it did allow us to implement it reasonably quickly with big array performance so that we could start using for our own projects without a long development effort). Unfortunately, the simplification in the rewrite is offset by handling the more complex cases (byte-swapping, etc.) and extra array indexing capabilities. > 4) Object arrays must be supported. This was a bad oversight and an > important feature of Numeric arrays. > The current implementation does support them (though in a different way, and generally not as efficiently, though Todd is more up on the details here). What aspect of object arrays are you finding lacking? C-api? > 5) The ufunc code interface needs to continue to be improved. I do see > that some effort into understanding the old ufunc interface has taken > place which is a good sign. > You are probably referring to work underway to integrate with scipy (I'm assuming you are looking at the version in CVS). > Again, thanks to the work that has been done. I'm really interested to > see if some of these modifications can be done as in my mind it will > help the process of unifying the two camps. > I'm glad to see that you are taking a look at it and welcome the comments and any offers of help in improving speed. Perry From Fernando.Perez at colorado.edu Mon Jan 17 13:25:34 2005 From: Fernando.Perez at colorado.edu (Fernando Perez) Date: Mon Jan 17 13:25:34 2005 Subject: [Numpy-discussion] Speeding up numarray -- questions on its design In-Reply-To: References: Message-ID: <41EC2D14.7000203@colorado.edu> Hi all, just some comments from the sidelines, while I applaud the fact that we are moving towards a successful numeric/numarray integration. Perry Greenfield wrote: > the array struct size matters. Do you have code with hundreds of > thousands > of small arrays existing simultaneously? I do have code with perhaps ~100k 'small' arrays (12x12x12 or so) in existence simultaneously, plus a few million created temporarily as part of the calculations. Needless to say, this uses Numeric :) What's been so nice about Numeric is that even with my innermost loops (carefully) coded in python, I get very acceptable performance for real-world problems. Perrry and I had this conversation over at scipy'04, so this is just a reminder. The Blitz++ project has faced similar problems of performance for their very flexible arrays classes, and their approach has been to have separate TinyVector/TinyMatrix classes. These do not offer almost any of the fancier features of the default Blitz Arrays, but they keep the same syntactic behavior and identical semantics where applicable. What they give up in flexibility, they gain in performance. I realize this requires a substantial amount of work, but perhaps it will be worthwhile in the long run. It would be great to have a numarray small_array() object which would not allow byteswapping, memory-mapping, or any of the extra features which make them memory and time consuming, but which would maintain compatibility with the regular arrays as far as arithmetic operators and ufunc application (including obviously lapack/blas/f2py usage). I know I am talking from 50.000 feet up, so I'm sure once you get down to the details this will probably not be easy (I can already see difficulties with the size of the underlying C structures for C API compatibility). But in the end, I think something like this might be the only way to satisfy all the disparate usage cases for numerical arrays in scientific computing. Besides the advanced features disparity, a simple set of guidelines for the crossover points in terms of performance would allow users to choose in their own codes what to use. At any rate, I'm extremely happy to see scipy/numarray integration moving forward. My thanks to all those who are actually doing the hard work. Regards, f From juenglin at cs.pdx.edu Mon Jan 17 14:33:26 2005 From: juenglin at cs.pdx.edu (Ralf Juengling) Date: Mon Jan 17 14:33:26 2005 Subject: [Numpy-discussion] Speeding up numarray -- questions on its design In-Reply-To: References: Message-ID: <1106001150.28436.300.camel@alpspitze.cs.pdx.edu> On Mon, 2005-01-17 at 11:12, Perry Greenfield wrote: > Travis Oliphant wrote: > > 3) There seem to be too many files to define the array. The mixture of > > Python and C makes trying to understand the source very difficult. I > > thought one of the reasons for the re-write was to simplify the source > > code. > > > I think this reflects the transitional nature of going from mostly > Python > to a hybrid. We agree that the current state is more convoluted than it > ought to be. If NDarray were all C, much of this would ended (though in > some respects, being all in C will make it larger, harder to understand > as well). The original hope was that most of the array setup computation > could be kept in Python but that is what made it slow for small arrays > (but it did allow us to implement it reasonably quickly with big array > performance so that we could start using for our own projects without > a long development effort). Unfortunately, the simplification in the > rewrite is offset by handling the more complex cases (byte-swapping, > etc.) and extra array indexing capabilities. I took a cursory look at the C API the other day and learned about this capability to process byte-swapped data. I am wondering why this is a good thing to have. Wouldn't it be enough and much easier to drop this feature and instead equip numarray IO routines with the capability to convert to and from a foreign endian to the host endian encoding? ralf From perry at stsci.edu Mon Jan 17 20:18:39 2005 From: perry at stsci.edu (Perry Greenfield) Date: Mon Jan 17 20:18:39 2005 Subject: [Numpy-discussion] Speeding up numarray -- questions on itsdesign In-Reply-To: <1106001150.28436.300.camel@alpspitze.cs.pdx.edu> Message-ID: Ralf Juengling wrote: > I took a cursory look at the C API the other day and learned about > this capability to process byte-swapped data. I am wondering why > this is a good thing to have. Wouldn't it be enough and much easier > to drop this feature and instead equip numarray IO routines with > the capability to convert to and from a foreign endian to the host > endian encoding? > Basically this feature was to allow use of memory mapped data that didn't use the native representation of the processor (also related to supporting record arrays). The details are given in a paper a couple years ago: http://www.stsci.edu/resources/software_hardware/numarray/papers/pycon2003.p df Perry From Chris.Barker at noaa.gov Tue Jan 18 09:49:38 2005 From: Chris.Barker at noaa.gov (Chris Barker) Date: Tue Jan 18 09:49:38 2005 Subject: [Numpy-discussion] Speeding up numarray -- questions on its design In-Reply-To: <41EC2D14.7000203@colorado.edu> References: <41EC2D14.7000203@colorado.edu> Message-ID: <41ED4AD0.6060204@noaa.gov> Hi all, This discussion has brought up a question I have had for a while: Can anyone provide a one-paragraph description of what numarray does that gives it better large-array performance than Numeric? By the way, For what it's worth, what's kept me from switching is the small array performance, and/or the array-creation performance. I don't use very large arrays, but I do use small ones all the time. thanks, -Chris -- Christopher Barker, Ph.D. Oceanographer NOAA/OR&R/HAZMAT (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From oliphant at ee.byu.edu Tue Jan 18 10:28:37 2005 From: oliphant at ee.byu.edu (Travis Oliphant) Date: Tue Jan 18 10:28:37 2005 Subject: [Numpy-discussion] Speeding up numarray -- questions on its design In-Reply-To: References: Message-ID: <41ED54E5.2050104@ee.byu.edu> Thanks for the comments that have been made. One of my reasons for commenting is to get an understanding of which design issues of Numarray are felt to be important and which can change. There seems to be this idea that small arrays are not worth supporting. I hope this is just due to time-constraints and not some fundamental idea that small arrays should never be considered with Numarray. Otherwise, there will always be two different array implementations developing at their own pace. I really want to gauge how willing developers of numarray are to changing things. Perry Greenfield wrote: >> 1) Are there plans to move the nd array entirely into C? >> -- I would like to see the nd array become purely a c-type. I would >> be willing to help here. I can see that part of the work has been done. >> > I don't know that I would say they are definite, but I think that at > some point we thought that would be necessary. We haven't yet since > doing so makes it harder to change so it would be one of the last > changes to the core that we would want to do. Our current priorities > are towards making all the major libraries and packages available > under it first and then finishing optimization issues (another issue > that has to be tackled soon is handling 64-bit addressing; apparently > the work to make Python sequences use 64-bit addresses is nearing > completion so we want to be able to handle that. I expect we would > want to make sure we find a way of handling that before we turn it > all into C but maybe it is just as easy doing them in the opposite > order. I do not think it would be difficult at this point to move it all to C and then make future changes there (you can always call pure Python code from C). With the structure in place and some experience behind you, now seems like as good a time as any. Especially, because now is a better time for me than any... I like what numarray is doing by not always defaulting to ints with the maybelong type. It is a good idea. > >> 2) Why is the ND array C-structure so large? Why are the dimensions >> and strides array static? Why can't the extra stuff that the fancy >> arrays need be another structure and the numarray C structure just >> extended with a pointer to the extra stuff? > > > When Todd moved NDArray into C, he tried to keep it simple. As > such, it > has no "moving parts." We think making dimensions and strides malloc'ed > rather than static would be fairly easy. Making the "extra stuff" > variable is something we can look at. But allocating dimensions and strides when needed is not difficult and it reduces the overhead of the ndarray object. Currently, that overhead seems extreme. I could be over-reacting here, but it just seems like it would have made more sense to expand the array object as little as possible to handle the complexity that you were searching for. It seems like more modifications were needed in the ufunc then in the arrayobject. > > The bottom line is that adding the variability adds complexity and we're > not sure we understand the storage economics of why we would doing it. > Numarray was designed, first and foremost, for large arrays. Small arrays are never going to disappear (Fernando Perez has an excellent example) and there are others. A design where a single pointer not being NULL is all that is needed to distinguish "simple" Numeric-like arrays from "fancy" numarray-like arrays seems like a great way to make sure that > For that case, > the array struct size is irrelevant whereas additional complexity is > not. I guess we would like to see some good practical examples where > the array struct size matters. Do you have code with hundreds of > thousands > of small arrays existing simultaneously? As mentioned before, such code exists especially when arrays become a basic datatype that you use all the time. How much complexity is really generated by offloading the extra struct material to a bigarray structure, thereby only increasing the Numeric array structure by 4 bytes instead of 200+? On another fundamental note, numarray is being sold as a replacement for Numeric. But, then, on closer inspection many things that Numeric does well, numarray is ignoring or not doing very well. I think this presents a certain amount of false advertising to new users, who don't understand the history. Most of them would probably never need the fanciness that numarray provides and would be quite satisfied with Numeric. They just want to know what others are using. I think it is a disservice to call numarray a replacement for Numeric until it actually is. It should currently be called an "alternative implementation" focused on large arrays. This (unintentional) slight of hand that has been occurring over the past year has been my biggest complaint with numarray. Making numarray a replacement for Numeric means that it has to support small arrays, object arrays, and ufuncs at least as well as but preferably better than Numeric. It should also be faster than Numeric whenever possible, because Numeric has lots of potential optimizations that have never been applied. If numarray does not do these things, then in my mind it cannot be a replacement for Numeric and should stop being called that on the numpy web site. >> 3) There seem to be too many files to define the array. The mixture of >> Python and C makes trying to understand the source very difficult. I >> thought one of the reasons for the re-write was to simplify the source >> code. >> > I think this reflects the transitional nature of going from mostly Python > to a hybrid. We agree that the current state is more convoluted than it > ought to be. If NDarray were all C, much of this would ended (though in > some respects, being all in C will make it larger, harder to understand > as well). The original hope was that most of the array setup computation > could be kept in Python but that is what made it slow for small arrays > (but it did allow us to implement it reasonably quickly with big array > performance so that we could start using for our own projects without > a long development effort). Unfortunately, the simplification in the > rewrite is offset by handling the more complex cases (byte-swapping, > etc.) and extra array indexing capabilities. I never really understood the "code is too complicated" argument anyway. I was just wondering if there is some support for reducing the number of source code files, or reorganizing them a bit. >> 4) Object arrays must be supported. This was a bad oversight and an >> important feature of Numeric arrays. >> > The current implementation does support them (though in a different > way, and generally not as efficiently, though Todd is more up on the > details here). What aspect of object arrays are you finding lacking? > C-api? I did not see such support when I looked at it, but given the previous comment, I could easily have missed where that support is provided. I'm mainly following up on Konrad's comment that his Automatic differentiation does not work with Numarray because of the missing support for object arrays. There are other applications for object arrays as well. Most of the support needs to come from the ufunc side. > >> 5) The ufunc code interface needs to continue to be improved. I do see >> that some effort into understanding the old ufunc interface has taken >> place which is a good sign. >> > You are probably referring to work underway to integrate with scipy (I'm > assuming you are looking at the version in CVS). Yes, I'm looking at the CVS version. > >> Again, thanks to the work that has been done. I'm really interested to >> see if some of these modifications can be done as in my mind it will >> help the process of unifying the two camps. >> > I'm glad to see that you are taking a look at it and welcome the > comments and > any offers of help in improving speed. > I would be interested in helping if there is support for really making numarray a real replacement for Numeric, by addressing the concerns that I've outlined. As stated at the beginning, I'm really just looking for how receptive numarray developers would be to the kinds of changes I'm talking about: (1) reducing the size of the array structure, (2) moving the ndarray entirely into C, (3) improving support for object arrays, (4) improving ufunc API support. I care less about array and ufunc C-API names being the same then the underlying capabilities being available. Best regards, -Travis Oliphant From paul at pfdubois.com Tue Jan 18 13:57:33 2005 From: paul at pfdubois.com (Paul Dubois) Date: Tue Jan 18 13:57:33 2005 Subject: [Numpy-discussion] Speeding up numarray -- questions on its design In-Reply-To: <41ED54E5.2050104@ee.byu.edu> References: <41ED54E5.2050104@ee.byu.edu> Message-ID: <41ED85F4.3010809@pfdubois.com> I haven't followed this discussion in detail but with respect to space for 'descriptors', it would simply be foolish to malloc space for these. The cost is ridiculous. You simply have to decide how big a number of dimensions to allow, make it a clearly findable definition in the sources, and dimension everything that big. Originally when we discussed this we considered 7, since that had been (and for all I know still is) the maximum array dimension in Fortran. But Jim Huginin needed 11 or something like it for his imaging. I've seen 40 in the numarray sources I think. It seems to me that an application that would care about this space (it being, after all, per array object) would be unusual indeed. If I've misunderstood what you're talking about, never mind. (:-> My advice is to make flexibility secondary to performance. It is always possible to layer on flexibility for those who want it. From rkern at ucsd.edu Tue Jan 18 14:17:30 2005 From: rkern at ucsd.edu (Robert Kern) Date: Tue Jan 18 14:17:30 2005 Subject: [Numpy-discussion] Speeding up numarray -- questions on its design In-Reply-To: <41ED54E5.2050104@ee.byu.edu> References: <41ED54E5.2050104@ee.byu.edu> Message-ID: <41ED8AC0.8090207@ucsd.edu> Travis Oliphant wrote: >>> 4) Object arrays must be supported. This was a bad oversight and an >>> important feature of Numeric arrays. >>> >> The current implementation does support them (though in a different >> way, and generally not as efficiently, though Todd is more up on the >> details here). What aspect of object arrays are you finding lacking? >> C-api? > > > I did not see such support when I looked at it, but given the previous > comment, I could easily have missed where that support is provided. I'm > mainly following up on Konrad's comment that his Automatic > differentiation does not work with Numarray because of the missing > support for object arrays. There are other applications for object > arrays as well. Most of the support needs to come from the ufunc side. It's tucked away in numarray.objects. Unfortunately for Konrad's application, numarray ufuncs don't recognize that it's being passed an object with the special methods defined, and they won't automatically create 0-D object "arrays". 0-D object arrays will work just fine when using operators (x+y works), but not when explicitly calling the ufuncs (add(x,y) does not work). Both methods work fine for 0-D numerical arrays. -- Robert Kern rkern at ucsd.edu "In the fields of hell where the grass grows high Are the graves of dreams allowed to die." -- Richard Harter From oliphant at ee.byu.edu Tue Jan 18 14:26:32 2005 From: oliphant at ee.byu.edu (Travis Oliphant) Date: Tue Jan 18 14:26:32 2005 Subject: [Numpy-discussion] Speeding up numarray -- questions on its design In-Reply-To: <41ED85F4.3010809@pfdubois.com> References: <41ED54E5.2050104@ee.byu.edu> <41ED85F4.3010809@pfdubois.com> Message-ID: <41ED8CA8.5090407@ee.byu.edu> Paul Dubois wrote: > I haven't followed this discussion in detail but with respect to space > for 'descriptors', it would simply be foolish to malloc space for > these. The cost is ridiculous. You simply have to decide how big a > number of dimensions to allow, make it a clearly findable definition > in the sources, and dimension everything that big. > Thanks for this comment. I can see now that it makes sense as it would presumably speed up small array creation. Why was this not done in the original sources? > Originally when we discussed this we considered 7, since that had been > (and for all I know still is) the maximum array dimension in Fortran. > But Jim Huginin needed 11 or something like it for his imaging. I've > seen 40 in the numarray sources I think. > > It seems to me that an application that would care about this space > (it being, after all, per array object) would be unusual indeed. > > If I've misunderstood what you're talking about, never mind. (:-> I think you've understood this part of it and have given good advice. > > My advice is to make flexibility secondary to performance. It is > always possible to layer on flexibility for those who want it. I like this attitude. -Travis From perry at stsci.edu Tue Jan 18 17:52:46 2005 From: perry at stsci.edu (Perry Greenfield) Date: Tue Jan 18 17:52:46 2005 Subject: [Numpy-discussion] Speeding up numarray -- questions on its design In-Reply-To: <41ED54E5.2050104@ee.byu.edu> Message-ID: Travis Oliphant > > Thanks for the comments that have been made. One of my reasons for > commenting is to get an understanding of which design issues of Numarray > are felt to be important and which can change. There seems to be this > idea that small arrays are not worth supporting. I hope this is just > due to time-constraints and not some fundamental idea that small arrays > should never be considered with Numarray. Otherwise, there will > always be two different array implementations developing at their > own pace. > I wouldn't say that we are "hostile" to small arrays. We do only have limited resources and can't do everything we would like. More on this below though. > I really want to gauge how willing developers of numarray are to > changing things. > Without going into all the details below, I think I can address this point. I suppose it all depends on what you mean by "how willing developers of numarray are to changing things." If you mean are we open to changes to numarray that speed up small arrays (and address other noted shortcomings). Yes, certainly (so long as they don't hurt the large array issues significantly). If it means we will drop everything and address all these issues immediately ourselves. No, we have other things to do regarding numarray that have higher priority before we can address these things. I would have a very hard time justifying the effort when there are other things needed by STScI more. We would love it if others could address them sooner though. More on related issues below. > >> 1) Are there plans to move the nd array entirely into C? [...] > I do not think it would be difficult at this point to move it all to C > and then make future changes there (you can always call pure Python code > from C). With the structure in place and some experience behind you, > now seems like as good a time as any. Especially, because now is a > better time for me than any... I like what numarray is doing by not > always defaulting to ints with the maybelong type. It is a good idea. > I hope that is true, but we've found doing moving thing to C a bigger effort than we would like. I'd like to be proved wrong by someone who can tackle sooner than we can. > > > >> 2) Why is the ND array C-structure so large? Why are the dimensions > >> and strides array static? Why can't the extra stuff that the fancy > >> arrays need be another structure and the numarray C structure just > >> extended with a pointer to the extra stuff? > > > > > > When Todd moved NDArray into C, he tried to keep it simple. As > > such, it > > has no "moving parts." We think making dimensions and strides malloc'ed > > rather than static would be fairly easy. Making the "extra stuff" > > > variable is something we can look at. > > But allocating dimensions and strides when needed is not difficult and > it reduces the overhead of the ndarray object. Currently, that overhead > seems extreme. I could be over-reacting here, but it just seems like it > would have made more sense to expand the array object as little as > possible to handle the complexity that you were searching for. It seems > like more modifications were needed in the ufunc then in the arrayobject. > I'm not convinced that this is a big issue, but we have no objection to someone making this change. But it falls well below small array performance in priority for us. > > > > The bottom line is that adding the variability adds complexity and we're > > not sure we understand the storage economics of why we would doing it. > > Numarray was designed, first and foremost, for large arrays. > > Small arrays are never going to disappear (Fernando Perez has an > excellent example) and there are others. A design where a single > pointer not being NULL is all that is needed to distinguish "simple" > Numeric-like arrays from "fancy" numarray-like arrays seems like a great > way to make sure that > I won't quarrel with that (but I'm not sure what you are suggesting in the bigger picture). > On another fundamental note, numarray is being sold as a replacement for > Numeric. But, then, on closer inspection many things that Numeric does > well, numarray is ignoring or not doing very well. I think this > presents a certain amount of false advertising to new users, who don't > understand the history. Most of them would probably never need the > fanciness that numarray provides and would be quite satisfied with > Numeric. They just want to know what others are using. I think it is > a disservice to call numarray a replacement for Numeric until it > actually is. It should currently be called an "alternative > implementation" focused on large arrays. This (unintentional) slight of > hand that has been occurring over the past year has been my biggest > complaint with numarray. Making numarray a replacement for Numeric > means that it has to support small arrays, object arrays, and ufuncs at > least as well as but preferably better than Numeric. It should also be > faster than Numeric whenever possible, because Numeric has lots of > potential optimizations that have never been applied. If numarray does > not do these things, then in my mind it cannot be a replacement for > Numeric and should stop being called that on the numpy web site. > It distresses me to be accused of false advertising. We were pretty up front at the beginning of the process of writing numarray that the approach we would be taking would likely mean slower small array performance. There were those (like you and Eric that expressed concern about that), but it wasn't at all clear what the consensus was regarding how much it could change and be acceptable. (I recall at one point when IDL was ported from Fortran to C which resulted in a factor of 2 overall slowdown in speed. People didn't accuse RSI of providing something that wasn't a replacement for IDL.) The fact was that at the time we started, several thought that backward compatibility wasn't that important. We didn't even try at the beginning to make the C-API the same. At the start, there was no claim that numarray would be an exact replacement for Numeric. (And I didn't hear huge objections at the time on the point and some that actually encouraged a break with how Numeric did things.) Much of the attempts to provide backward compatiblity have come well after the first implementations. We have strove to provide the full functionality of what Numeric had as we went to version 1.0. Sure, there are some holes for object arrays. So the issue of whether numarray is a replacement or not seems to be arguing over what the intent of the project was. Paul Dubois wrote the numpy page that make that reference, and sure, I didn't object to it (But why didn't you at the time? It's been there a long time, and the goals and direction of numarray have been quite visible for a long time. This wasn't some dark, secret project. Many of the things you are complaining about have been true for some time.) If people want to call numarray an alternative implementation, I'm fine with that. It was a replacement in our case. If we didn't develop it, we likely wouldn't be using Python in the full sense that we are now. Numeric wasn't really an option. At the time, many supported the idea of a reimplementation so it seemed like a good opportunity to add what we needed and do that. Obviously, we misread the importance of small array performance for a significant part of the community. (But I keep saying, if small array peformance is really that important, it would seem to me that much bigger wins are available as Fernando mentioned) It's been clear for a better part of a year that it would be a long time before there was any sort of unification between the two. That distressed me as I'm sure it did you. So some sort of useful sharing of libraries and packages seemed like the obvious way to go. In more specialized areas, there would be some divergence (e.g., we have dependencies on record arrays that we just can't provide in Numeric). I can no longer justify sinking many more months of work into numarray for issues of no value to STScI (other than the hope that it would convince others to switch, which isn't clear at all that it would). We need to move towards providing a lot of the tools that are available for Numeric. I can justify that work. The current situation is far from ideal (Paul called it "insane" at scipy if you prefer more colorful language). What we have are two camps that cannot afford to give up the capabilities that are unique to each version. But with most of the C-API compatable, and a way of coding most libraries (except for Ufuncs) to be compatible with both, we certainly can improve the situation. If you can help remove the biggest obstacle, small array performance, so that we could unify the two I would be thrilled, but most of the effort can't come from us, at least not in the near term (next year). We can help at some level. [...] > I never really understood the "code is too complicated" argument You lost me on this one. You mean the complaint that it was too complicated in Numeric way back? > anyway. I was just wondering if there is some support for reducing the > number of source code files, or reorganizing them a bit. > Yes, I'd say that this has relatively high priority. It would be nice to have feedback and advice on how to do this best. > >> 4) Object arrays must be supported. This was a bad oversight and an > >> important feature of Numeric arrays. > >> > > The current implementation does support them (though in a different > > way, and generally not as efficiently, though Todd is more up on the > > details here). What aspect of object arrays are you finding lacking? > > C-api? > > I did not see such support when I looked at it, but given the previous > comment, I could easily have missed where that support is provided. I'm > mainly following up on Konrad's comment that his Automatic > differentiation does not work with Numarray because of the missing > support for object arrays. There are other applications for object > arrays as well. Most of the support needs to come from the ufunc side. > I think Robert Kern pointed to the issue in a subsequent message. > > > > > >> Again, thanks to the work that has been done. I'm really interested to > >> see if some of these modifications can be done as in my mind it will > >> help the process of unifying the two camps. > >> > > I'm glad to see that you are taking a look at it and welcome the > > comments and > > any offers of help in improving speed. > > > I would be interested in helping if there is support for really making > numarray a real replacement for Numeric, by addressing the concerns that > I've outlined. As stated at the beginning, I'm really just looking > for how receptive numarray developers would be to the kinds of changes > I'm talking about: (1) reducing the size of the array structure, (2) > moving the ndarray entirely into C, (3) improving support for object > arrays, (4) improving ufunc API support. > I'm not exactly sure what you mean by 4). If you mean having a compatible api to numeric, that seem like a lot of work since the way ufuncs work in numarray is quite different. But you may mean something else. Perry From perry at stsci.edu Tue Jan 18 17:55:38 2005 From: perry at stsci.edu (Perry Greenfield) Date: Tue Jan 18 17:55:38 2005 Subject: [Numpy-discussion] Speeding up numarray -- questions on its design In-Reply-To: <41ED85F4.3010809@pfdubois.com> Message-ID: Paul Dubois wrote: > > I haven't followed this discussion in detail but with respect to space > for 'descriptors', it would simply be foolish to malloc space for these. > The cost is ridiculous. You simply have to decide how big a number of > dimensions to allow, make it a clearly findable definition in the > sources, and dimension everything that big. > > Originally when we discussed this we considered 7, since that had been > (and for all I know still is) the maximum array dimension in Fortran. > But Jim Huginin needed 11 or something like it for his imaging. I've > seen 40 in the numarray sources I think. > Actually, 40 came from Numeric. It may have been reduced to 11, but I'm sure it was 40 at one point. Jim even had a comment in the code to the effect that if someone needed more than 40, he wanted to see the problem that needed that. If people think it is too high, I'd be very happy to reduce it. Perry From cookedm at physics.mcmaster.ca Tue Jan 18 19:04:34 2005 From: cookedm at physics.mcmaster.ca (David M. Cooke) Date: Tue Jan 18 19:04:34 2005 Subject: [Numpy-discussion] Speeding up numarray -- questions on its design In-Reply-To: <41ED8AC0.8090207@ucsd.edu> (Robert Kern's message of "Tue, 18 Jan 2005 14:16:32 -0800") References: <41ED54E5.2050104@ee.byu.edu> <41ED8AC0.8090207@ucsd.edu> Message-ID: Robert Kern writes: > Travis Oliphant wrote: > >>>> 4) Object arrays must be supported. This was a bad oversight and an >>>> important feature of Numeric arrays. >>>> >>> The current implementation does support them (though in a different >>> way, and generally not as efficiently, though Todd is more up on the >>> details here). What aspect of object arrays are you finding lacking? >>> C-api? >> I did not see such support when I looked at it, but given the >> previous comment, I could easily have missed where that support is >> provided. I'm mainly following up on Konrad's comment that his >> Automatic differentiation does not work with Numarray because of the >> missing support for object arrays. There are other applications for >> object arrays as well. Most of the support needs to come from the >> ufunc side. > > It's tucked away in numarray.objects. Unfortunately for Konrad's > application, numarray ufuncs don't recognize that it's being passed an > object with the special methods defined, and they won't automatically > create 0-D object "arrays". 0-D object arrays will work just fine when > using operators (x+y works), but not when explicitly calling the > ufuncs (add(x,y) does not work). Both methods work fine for 0-D > numerical arrays. Are the 0-D object arrays necessary for this? The behaviour that Konrad needs is this (highly abstracted): class A: def __add__(self, other): return 0.1 def sin(self): return 0.5 Then: >>> a = A() >>> a + a 0.10000000000000001 >>> Numeric.add(a,a) 0.10000000000000001 >>> Numeric.sin(a) 0.5 The Numeric ufuncs, if the argument isn't an array, look for a method of the right name (here, sin) on the object, and call that. You could define a delegate class that does this with something like class MathFunctionDelegate: def __init__(self, fallback=Numeric): self._fallback = fallback def add(self, a, b): try: return a + b except TypeError: return self._fallback.add(a, b) def sin(self, x): sin = getattr(x, 'sin', None) if sin is None: return self._fallback.sin(x) else: return sin(x) ... etc. ... (This could be a module, too. This just allows parameterisation.) In ScientificPython, FirstDerivatives.py has a method of the DerivVar class that looks like this: def sin(self): v = Numeric.sin(self.value) d = Numeric.cos(self.value) return DerivVar(v, map(lambda x,f=d: f*x, self.deriv)) Add something like this to the __init__: self._mathfuncs = MathFunctionDelegate(Numeric) and that sin method becomes def sin(self): v = self._mathfuncs.sin(self.value) d = self._mathfuncs.cos(self.value) return DerivVar(v, map(lambda x,f=d: f*x, self.deriv)) That's not quite perfect, as the user has to use a mathfuncs object also; that's why having Numeric or numarray do the delegation automatically is nice. This would work equally well with numarray (or the math or cmath modules!) replacing Numeric. You could get fancy and be polymorphic: choose the right module to use depending on the type of the argument (Numeric arrays use Numeric, floats use math, etc.). If this was a module instead, you could have registration of types. I'll call this module numpy. Here's a possible (low-level) usage: import numpy import Numeric, numarray, math, cmath from Scientific.Functions import Derivatives numpy.register_type(Numeric.arraytype, Numeric) numpy.register_type(numarray.NumArray, numarray) numpy.register_type(float, math) numpy.register_type(complex, cmath) numpy.register_type(Derivatives.DerivVar, Derivates.derivate_math) numpy.default_constructor(numarray.array) a = numpy.array([1,2,3]) # makes a numarray b = Numeric.array([1,2,3]) # Numeric array print numpy.sin(a), numpy.sin(b) Things to consider with this would be: * how to handle a + b * where should the registering of types be done? (Probably by the packages themselves) * more complex predicates for registering handlers? (to handle subclasses, etc.) etc. Ok, I hope that's not too rambling. But the idea is that neither Numeric nor numarray need to provide the delegation ability. -- |>|\/|< /--------------------------------------------------------------------------\ |David M. Cooke http://arbutus.physics.mcmaster.ca/dmc/ |cookedm at physics.mcmaster.ca From konrad.hinsen at laposte.net Wed Jan 19 03:39:04 2005 From: konrad.hinsen at laposte.net (konrad.hinsen at laposte.net) Date: Wed Jan 19 03:39:04 2005 Subject: [Numpy-discussion] Speeding up numarray -- questions on its design In-Reply-To: <41ED54E5.2050104@ee.byu.edu> References: <41ED54E5.2050104@ee.byu.edu> Message-ID: On 18.01.2005, at 19:26, Travis Oliphant wrote: > On another fundamental note, numarray is being sold as a replacement > for Numeric. But, then, on closer inspection many things that Numeric > does well, numarray is ignoring or not doing very well. I think this > presents a certain amount of false advertising to new users, who don't > understand the history. Most of them would probably never need the > fanciness that I agree with that. I regularly get questions from people who download my code and then wonder why it "still" uses NumPy instead of the "newer" numarray. The reason is that my code has nothing to gain from numarray, as it uses many small and few if any very large arrays. I have no problem explaining that, but the fact that the question arises shows that there is a wrong perception by many newcomers of the relation between NumPy and numarray. > comment, I could easily have missed where that support is provided. > I'm mainly following up on Konrad's comment that his Automatic > differentiation does not work with Numarray because of the missing > support for object arrays. There are other applications for object > arrays as well. Most of the While I agree that object arrays are useful, they have nothing to do with the missing feature that I mentioned recently. That one concerns only ufuncs. In NumPy, they use a method call when presented with an object type they cannot handle directly. In numarray, they just produce an error message in that case. Returning to object arrays, I have used them occasionally but never in any of my public code, because there have been lots of minor bugs concerning them in all versions of NumPy. It would be nice if numarray could do a better job there. Konrad. -- ------------------------------------------------------------------------ ------- Konrad Hinsen Laboratoire Leon Brillouin, CEA Saclay, 91191 Gif-sur-Yvette Cedex, France Tel.: +33-1 69 08 79 25 Fax: +33-1 69 08 82 61 E-Mail: hinsen at llb.saclay.cea.fr ------------------------------------------------------------------------ ------- From konrad.hinsen at laposte.net Wed Jan 19 04:32:01 2005 From: konrad.hinsen at laposte.net (konrad.hinsen at laposte.net) Date: Wed Jan 19 04:32:01 2005 Subject: [Numpy-discussion] Speeding up numarray -- questions on its design In-Reply-To: References: Message-ID: <01532740-6A16-11D9-B1A9-000A95AB5F10@laposte.net> On 19.01.2005, at 02:56, Perry Greenfield wrote: > It distresses me to be accused of false advertising. We were pretty > up front at the beginning of the process of writing numarray that the It's not you, or the numarray team in general, that is being accused. Actually I doubt that any single person is responsible for the current state of misinformation. Those are the wonders of the OpenSource world. I saw Travis' post more as a request for clarification than an accusation against anyone in particular. As you describe very well, there is a gap between past intents and what has actually happened. > concern about that), but it wasn't at all clear what the consensus > was regarding how much it could change and be acceptable. (I recall It's probably still not clear. Perhaps there is no consensus at all. > The current situation is far from ideal (Paul called it "insane" > at scipy if you prefer more colorful language). What we have are > two camps that cannot afford to give up the capabilities that are > unique to each version. But with most of the C-API compatable, and > a way of coding most libraries (except for Ufuncs) to be compatible > with both, we certainly can improve the situation. I am not sure that compatibility is really the main issue. In the typical scientific computing installation, NumPy and numarray are building blocks. Some people use them without even being aware of them, indirectly through other libraries. In a building-block world, two bricks should be either equivalent or be able to coexist. The original intention was to make NumPy and numarray equivalent, but this is not what they are at the moment. But they do not coexist very well either. While it is easy to install both of them, every library that builds on them uses one or the other (and to make it worse, it is not always easy to figure out which one is used if both are available). Sooner or later, anyone who uses multiple libraries that are array clients is going to have a compatibility issue, which will probably be hard to understand because both sides' arrays look so very similar. Konrad. -- ------------------------------------------------------------------------ ------- Konrad Hinsen Laboratoire Leon Brillouin, CEA Saclay, 91191 Gif-sur-Yvette Cedex, France Tel.: +33-1 69 08 79 25 Fax: +33-1 69 08 82 61 E-Mail: hinsen at llb.saclay.cea.fr ------------------------------------------------------------------------ ------- From konrad.hinsen at laposte.net Wed Jan 19 04:48:13 2005 From: konrad.hinsen at laposte.net (konrad.hinsen at laposte.net) Date: Wed Jan 19 04:48:13 2005 Subject: [Numpy-discussion] Speeding up numarray -- questions on its design In-Reply-To: References: <41ED54E5.2050104@ee.byu.edu> <41ED8AC0.8090207@ucsd.edu> Message-ID: <3EF499F4-6A18-11D9-B1A9-000A95AB5F10@laposte.net> On 19.01.2005, at 04:03, David M. Cooke wrote: > That's not quite perfect, as the user has to use a mathfuncs object > also; that's why having Numeric or numarray do the delegation > automatically is nice. Exactly. It is an important practical feature of automatic derivatices that you can use with with nearly any existing mathematical code. If you have to import the math functions from somewhere else, then you have to adapt all that code, which in the case of code imported from some other module means rewriting it. More importantly, that approach doesn't scale to larger installations. If two different modules use it to provide generalized math functions, then the math functions of the two will not be interchangeable. In fact, it was exactly that kind of missing unversality that was the motivation for the ufunc code in NumPy ("u" for "universal"). Before NumPy, we had math (for float) and cmath (for complex), but there was no simple way to write code that would accept either float or complex even though that is often useful. Ufuncs would work on float, complex, arrays of either type, and "anything else" through the method call mechanism. > If this was a module instead, you could have registration of types. > I'll call this module numpy. Here's a possible (low-level) usage: Yes, a universal module with a registry would be another viable solution. But the whole community would have to agree on one such module to make it useful. > Things to consider with this would be: > * how to handle a + b a + b is just operator.add(a, b). The same mechanism would work. > * where should the registering of types be done? (Probably by the > packages themselves) Probably. The method call approach has an advantage here: no registry is required. In fact, if we could start all over again, I would argue for a math function module to be part of core Python that does nothing else but converting function calls into method calls. After all, math functions are just syntactic sugar for what functionally *is* a method call. Konrad. -- ------------------------------------------------------------------------ ------- Konrad Hinsen Laboratoire Leon Brillouin, CEA Saclay, 91191 Gif-sur-Yvette Cedex, France Tel.: +33-1 69 08 79 25 Fax: +33-1 69 08 82 61 E-Mail: hinsen at llb.saclay.cea.fr ------------------------------------------------------------------------ ------- From perry at stsci.edu Wed Jan 19 08:44:18 2005 From: perry at stsci.edu (Perry Greenfield) Date: Wed Jan 19 08:44:18 2005 Subject: [Numpy-discussion] Speeding up numarray -- questions on its design In-Reply-To: <01532740-6A16-11D9-B1A9-000A95AB5F10@laposte.net> References: <01532740-6A16-11D9-B1A9-000A95AB5F10@laposte.net> Message-ID: <4D98AB88-6A39-11D9-B8A8-000A95B68E50@stsci.edu> On Jan 19, 2005, at 7:31 AM, konrad.hinsen at laposte.net wrote: > >> The current situation is far from ideal (Paul called it "insane" >> at scipy if you prefer more colorful language). What we have are >> two camps that cannot afford to give up the capabilities that are >> unique to each version. But with most of the C-API compatable, and >> a way of coding most libraries (except for Ufuncs) to be compatible >> with both, we certainly can improve the situation. > > I am not sure that compatibility is really the main issue. In the > typical scientific computing installation, NumPy and numarray are > building blocks. Some people use them without even being aware of > them, indirectly through other libraries. > > In a building-block world, two bricks should be either equivalent or > be able to coexist. The original intention was to make NumPy and > numarray equivalent, but this is not what they are at the Just to clarify, the intention to make them equivalent was not originally true (and some encouraged the idea that there be a break with Numpy compatibility). But that has grown to be a much bigger goal over time. > moment. But they do not coexist very well either. While it is easy to > install both of them, every library that builds on them uses one or > the other (and to make it worse, it is not always easy to figure out > which one is used if both are available). Sooner or later, anyone who > uses multiple libraries that are array clients is going to have a > compatibility issue, which will probably be hard to understand because > both sides' arrays look so very similar. > No doubt that supporting both introduces more work, but for the most part, I think that with the exception of some parts(namely ufunc C-api), it should be possible write a library that supports both with little conditional code. That does mean not using some features of numarray, or depending some of the different behaviors of Numeric (e.g., scalar coercion rules), so that requires understanding the subsets to use. And that does cost. But one doesn't need to have two separate libraries. In such cases I'm hoping there is no need to mix different flavors of arrays. You either use Numeric arrays consistently or numarrays consistently. And if the two can be unified, then this will just be a intermediate solution. Perry From perry at stsci.edu Wed Jan 19 08:45:32 2005 From: perry at stsci.edu (Perry Greenfield) Date: Wed Jan 19 08:45:32 2005 Subject: [Numpy-discussion] Speeding up numarray -- questions on its design In-Reply-To: References: <41ED54E5.2050104@ee.byu.edu> Message-ID: <807F24EF-6A39-11D9-B8A8-000A95B68E50@stsci.edu> Konrad Hinsen wrote: >> comment, I could easily have missed where that support is provided. >> I'm mainly following up on Konrad's comment that his Automatic >> differentiation does not work with Numarray because of the missing >> support for object arrays. There are other applications for object >> arrays as well. Most of the > > While I agree that object arrays are useful, they have nothing to do > with the missing feature that I mentioned recently. That one concerns > only ufuncs. In NumPy, they use a method call when presented with an > object type they cannot handle directly. In numarray, they just > produce an error message in that case. > > Returning to object arrays, I have used them occasionally but never in > any of my public code, because there have been lots of minor bugs > concerning them in all versions of NumPy. It would be nice if numarray > could do a better job there. > This is a good point. In fact, when we started thinking about implementing object arrays, it looked tricker than it first appeared. One needs to ensure that all the objects referenced in the arrays have their reference counts appropriately adjusted with all operations. At that time it was quite easy to segfault Numeric using object arrays I'm guessing for this reason. Perhaps those problems have since been fixed. I don't recall the exact manipulations that caused the segfaults, but they were simple operations; and I don't know if the same problems remain. Perry From Chris.Barker at noaa.gov Wed Jan 19 09:37:42 2005 From: Chris.Barker at noaa.gov (Chris Barker) Date: Wed Jan 19 09:37:42 2005 Subject: [Numpy-discussion] Speeding up numarray -- questions on its design In-Reply-To: <6332FB22-69A4-11D9-B8A8-000A95B68E50@stsci.edu> References: <41EC2D14.7000203@colorado.edu> <41ED4AD0.6060204@noaa.gov> <6332FB22-69A4-11D9-B8A8-000A95B68E50@stsci.edu> Message-ID: <41EE990F.8050709@noaa.gov> Perry Greenfield wrote: > On Jan 18, 2005, at 12:43 PM, Chris Barker wrote: >> Can anyone provide a one-paragraph description of what numarray does >> that gives it better large-array performance than Numeric? > > It has two aspects: one is speed, but for us it was more about memory. Thanks for the summary, I have a better idea of the issues now. It doesn't look, to my untrained eyes, like any of these are contrary to small array performance, so I'm hopeful that the grand convergence can occur. -Chris -- Christopher Barker, Ph.D. Oceanographer NOAA/OR&R/HAZMAT (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From klimek at grc.nasa.gov Wed Jan 19 11:26:37 2005 From: klimek at grc.nasa.gov (Bob Klimek) Date: Wed Jan 19 11:26:37 2005 Subject: [Numpy-discussion] position of objects? In-Reply-To: <82AC3A32-67C6-11D9-932A-000D932805AC@embl.de> References: <41E592EB.6090209@grc.nasa.gov> <39D8BD7A-65A4-11D9-B5CA-000D932 805AC@embl.de> <41E82498.3070101@grc.nasa.gov> <82AC3A32-67C6-11D9-932A-00 0D932805AC@embl.de> Message-ID: <41EEB4D1.6060100@grc.nasa.gov> Peter Verveer wrote: > The watershed_ift() is a somewhat uusual implementation of watershed. > In principle it does the same as a normal watershed, except that it > does not produce watershed lines. I implemented this one, because with > the current implementation of binary morphology it is a bit cumbersome > to implement more common approaches. That will hopefully change in the > future. Well, it might turn out to still be useful. From what I'm reading, watershed from markers can do some interesting things. See the library below. > The procedure you show below seems to be based on a normal watershed. > I am not completely sure how the Image-J implementation works, but one > way to do that would be to do a watershed on the distance transform of > that image (actually you would use the negative of the distance > transform, with the local minima of that as the seeds). You could do > that with watershed_ift, in this case it would give you two labeled > objects, that in contrast to your example would however touch each > other. To do the exact same as below a watershed is needed that also > gives watershed lines. I'll give this procedure a try. Even if the labeled objects touch, some code could perhaps separate the objects by changing the touching pixels to 0. > Prompted by your earlier questions about skeletons I had a look at > what it would take to properly implement skeletons and other > morphology based algorithms, such as watersheds, and I found that I > need to rethink and reimplement the basic morphology operations first. ... Well, improving things is always good but from what I can see its not bad right now. if you are going to be changing things, one minor suggestion from me would be to make indices of label() (and sum(), mean(), ...) and find_objects() the same. For example, in an image containg two objects, label() returns a list of three: 0, 1, and 2 where 0 is the background and the two objects are labeled 1 and 2. But find_objects() returns a list of two (indices in the list being 0 and 1). Its not a big deal but in a for-loop it gets a little messy. Also forces me to do things like the following example which requires the loop to start at 1 (to skip the background) and run the range to n+1 to capture the second object. labeled, n = ND.label(binImage) objList = ND.sum(binImage, labeled, range(n+1)) for i in range(1, len(objList)): print 'object %d pixels: %d ' % (i, objList[i]) On a different note, I came across a morphology library which looks very promising. http://www.mmorph.com/pymorph/ I've contacted one of the authors of the package (R. Lotufo) and he indicated that they are thinking about updating it to run under numarray probably in about 6 months. Perhaps you and them could join forces. The only potential problem I see is that their code is designed strictly for 2D grayscale and binary images whereas you are trying to keep it general for any number of dimensions. Regards, Bob From konrad.hinsen at laposte.net Wed Jan 19 13:45:03 2005 From: konrad.hinsen at laposte.net (konrad.hinsen at laposte.net) Date: Wed Jan 19 13:45:03 2005 Subject: [Numpy-discussion] Speeding up numarray -- questions on its design In-Reply-To: <4D98AB88-6A39-11D9-B8A8-000A95B68E50@stsci.edu> References: <01532740-6A16-11D9-B1A9-000A95AB5F10@laposte.net> <4D98AB88-6A39-11D9-B8A8-000A95B68E50@stsci.edu> Message-ID: <1DD86004-6A63-11D9-B1A9-000A95AB5F10@laposte.net> On 19.01.2005, at 17:43, Perry Greenfield wrote: > Just to clarify, the intention to make them equivalent was not > originally true (and some encouraged the idea that there be a break > with Numpy compatibility). But that has grown to be a much bigger goal > over time. If my memory serves me well, the original intention was to have a new implementation that could replace the old one feature- and performancewise but without promising API compatibility. What we have now is the opposite. > No doubt that supporting both introduces more work, but for the most > part, I think that with the exception of some parts(namely ufunc > C-api), it should be possible write a library that supports both with > little conditional code. Yes, certainly. But not everybody is going to do it, for whatever reasons, if only lack of time or dependencies on exclusive features. So one day, there will be library A that requires NumPy and library B that requires numarray (that day may already have arrived). If I want to use both A and B in my code, I can expect to run into problems and unpleasant debugging sessions. Konrad. From perry at stsci.edu Wed Jan 19 14:10:02 2005 From: perry at stsci.edu (Perry Greenfield) Date: Wed Jan 19 14:10:02 2005 Subject: [Numpy-discussion] Speeding up numarray -- questions on its design In-Reply-To: <1DD86004-6A63-11D9-B1A9-000A95AB5F10@laposte.net> References: <01532740-6A16-11D9-B1A9-000A95AB5F10@laposte.net> <4D98AB88-6A39-11D9-B8A8-000A95B68E50@stsci.edu> <1DD86004-6A63-11D9-B1A9-000A95AB5F10@laposte.net> Message-ID: I'd like to clarify our position on this a bit in case previous messages have given a wrong or incomplete impression. 1) We don't deny that small array performance is important to many users. We understand that. But it generally isn't important for our projects, and in the list of things to do for numarray, we can't give it high priority this year. We have devoted resources to this issue in the past couple of years (but without sufficient success to persuade many to switch for that reason alone), and it is hard to continue to put much more resources into this not knowing whether it will be enough of an improvement to satisfy those that really need it. 2) This doesn't mean that we don't think it isn't important to add as soon as it can be done. That is, we aren't trying to prevent such improvements from being made. 3) We hope that there are people out there for which this is important who would like to see a numarray/Numeric unification, have some experience with the internals of one or the other (or are willing to learn), and are willing to devote the time to help make numarray faster (if you can rewrite everything from scratch and satisfy both worlds, that would make just as happy :-). 4) We are willing to help in the near term as far as helping explain how things currently work, where possible improvements can be made, helping in design discussions, reviewing proposed or actual changes, and doing the testing and integration of such changes. 5) But the onus of doing the actual implementation can't be on us for reasons I've already given. But besides those I think it is important that whoever does this should have a strong stake in the success of this (i.e., the performance improvements are important for their projects). Perry From perry at stsci.edu Thu Jan 20 06:37:36 2005 From: perry at stsci.edu (Perry Greenfield) Date: Thu Jan 20 06:37:36 2005 Subject: [Numpy-discussion] Speeding up numarray -- questions on its design In-Reply-To: References: <01532740-6A16-11D9-B1A9-000A95AB5F10@laposte.net> <4D98AB88-6A39-11D9-B8A8-000A95B68E50@stsci.edu> <1DD86004-6A63-11D9-B1A9-000A95AB5F10@laposte.net> Message-ID: On a different note, we will update the numarray home page to better reflect the current situation regard to Numeric, particularly to clarify that there is no official consensus regarding it as a replacement for Numeric (but also to spell out what the differences are so that people wondering about which to use will have a better idea to base their choice on, and to give an idea of what our development plans and priorities are). We're fairly busy at the moment so it may take a few days for such updates to the web page to happen. I'll post a message when that happens so that those interested can look at them and provide comments if they feel they are not accurate. I'll also contact Paul Dubois about updating the numpy page. Perry From jrennie at csail.mit.edu Thu Jan 20 07:25:49 2005 From: jrennie at csail.mit.edu (Jason Rennie) Date: Thu Jan 20 07:25:49 2005 Subject: [Numpy-discussion] Matlab/Numeric/numarray benchmarks In-Reply-To: <41D9CA01.9040108@arrowtheory.com> References: <41D9CA01.9040108@arrowtheory.com> Message-ID: <20050120152427.GC30466@csail.mit.edu> I have access to a variety of intel machines running Debian Sarge, and I'm trying to decide between numarray and Numeric for some experiments I'm about to run, so I thought I'd try out this benchmark. I need fast matrix multipliation and element-wise operations. Here are the results I see: Celeron/2.8GHz -------------- Matlab: 0.0475 1.44 5.78 Numeric: 0.0842 1.19 6.28 numarray: 7.62 9.78 Floating point exception Pentium4/2.8GHz --------------- Matlab: 0.0143 1.00 3.08 Numeric: 0.0653 1.19 6.26 numarray: 3.46 8.30 Floating point exception DualXeon/3.06GHz ---------------- Matlab: 0.0102 0.886 2.71 Numeric: 0.0272 10.2 2.46 numarray: 2.23 3.43 Floating point exception Numarray performance is pitiful. Numeric ain't bad, except for that matrixmultiply on the Xeon. As luck would have it, our cpu-cycle-servers are all Xeons, and the main big computations I have to do are matrix multiplies... Grrr... All three machines are Debian Sarge with atlas3-sse2 plus all the python2.3 packages installed. I had to include /usr/lib/atlas/sse2 in my LD_LIBRARY_PATH. Anyone have any clue why the Xeon would balk at the Numeric matrixmultiply? Thinking it might be an atlas3-sse2 issue, I tried atlas-sse: Xeon/atlas3-sse/Numeric: 0.0269 10.2 2.44 Xeon/atlas3-sse/numarray: 2.24 3.41 2.48 Apparently, there's a bug in the sse2 libraries that numarry is tripping... Still horrible Numeric/matrixmultiply performance... Interesting that sse2 doesn't provide a performance boost over sse. I tried it on another Xeon machine... same bad Numeric/matrixmultiply performance. I tried atlas3-base (386 instructions only): Xeon/atlas3-base/Numeric: 0.0269 10.2 2.60 Xeon/atlas3-base/numarray: 2.23 3.41 2.54 Sheesh! No worse than the libraries w/ sse instructions... But still, no improvement in the Numeric/matrixmultiply test. Next, refblas3/lapack3: Xeon/Numeric: 0.0271 3.45 2.72 Xeon/numarray: 2.24 3.42 2.62 Progress! Though, the Numeric/matrixmultiply is still four times slower than Matlab... As far as I can tell, I'm out of (Debian Sarge) libraries to try... Any ideas as to why the Numeric matrixmultiply would be so slow on the Xeon? Thanks, Jason P.S. I had to move the import statements to the top of the file to get benchmark.py to work. As a sanity check, I tried only importing sys, time, Numeric, and RandomArray, defining test10. I then called test10(). Same results as above. From perry at stsci.edu Thu Jan 20 07:34:00 2005 From: perry at stsci.edu (Perry Greenfield) Date: Thu Jan 20 07:34:00 2005 Subject: [Numpy-discussion] Matlab/Numeric/numarray benchmarks In-Reply-To: <20050120152427.GC30466@csail.mit.edu> References: <41D9CA01.9040108@arrowtheory.com> <20050120152427.GC30466@csail.mit.edu> Message-ID: Could you at least give enough information to understand the what the benchmark is? (Size of arrays, what the 3 columns are, etc). What is the benchmark code? I see references to benchmark.py but there doesn't appear to be any attachment. Thanks, Perry On Jan 20, 2005, at 10:24 AM, Jason Rennie wrote: > I have access to a variety of intel machines running Debian Sarge, and > I'm trying to decide between numarray and Numeric for some experiments > I'm about to run, so I thought I'd try out this benchmark. I need > fast matrix multipliation and element-wise operations. Here are the > results I see: > > Celeron/2.8GHz > -------------- > Matlab: 0.0475 1.44 5.78 > Numeric: 0.0842 1.19 6.28 > numarray: 7.62 9.78 Floating point exception > > Pentium4/2.8GHz > --------------- > Matlab: 0.0143 1.00 3.08 > Numeric: 0.0653 1.19 6.26 > numarray: 3.46 8.30 Floating point exception > > DualXeon/3.06GHz > ---------------- > Matlab: 0.0102 0.886 2.71 > Numeric: 0.0272 10.2 2.46 > numarray: 2.23 3.43 Floating point exception > > Numarray performance is pitiful. Numeric ain't bad, except for that > matrixmultiply on the Xeon. As luck would have it, our > cpu-cycle-servers are all Xeons, and the main big computations I have > to do are matrix multiplies... Grrr... > > All three machines are Debian Sarge with atlas3-sse2 plus all the > python2.3 packages installed. I had to include /usr/lib/atlas/sse2 in > my LD_LIBRARY_PATH. Anyone have any clue why the Xeon would balk at > the Numeric matrixmultiply? Thinking it might be an atlas3-sse2 > issue, I tried atlas-sse: > > Xeon/atlas3-sse/Numeric: 0.0269 10.2 2.44 > Xeon/atlas3-sse/numarray: 2.24 3.41 2.48 > > Apparently, there's a bug in the sse2 libraries that numarry is > tripping... Still horrible Numeric/matrixmultiply > performance... Interesting that sse2 doesn't provide a performance > boost over sse. I tried it on another Xeon machine... same bad > Numeric/matrixmultiply performance. I tried atlas3-base (386 > instructions only): > > Xeon/atlas3-base/Numeric: 0.0269 10.2 2.60 > Xeon/atlas3-base/numarray: 2.23 3.41 2.54 > > Sheesh! No worse than the libraries w/ sse instructions... But > still, no improvement in the Numeric/matrixmultiply test. Next, > refblas3/lapack3: > > Xeon/Numeric: 0.0271 3.45 2.72 > Xeon/numarray: 2.24 3.42 2.62 > > Progress! Though, the Numeric/matrixmultiply is still four times > slower than Matlab... > > As far as I can tell, I'm out of (Debian Sarge) libraries to > try... Any ideas as to why the Numeric matrixmultiply would be so slow > on the Xeon? > > Thanks, > > Jason > > P.S. I had to move the import statements to the top of the file to get > benchmark.py to work. As a sanity check, I tried only importing sys, > time, Numeric, and RandomArray, defining test10. I then called > test10(). Same results as above. > > > ------------------------------------------------------- > This SF.Net email is sponsored by: IntelliVIEW -- Interactive Reporting > Tool for open source databases. Create drag-&-drop reports. Save time > by over 75%! Publish reports on the web. Export to DOC, XLS, RTF, etc. > Download a FREE copy at http://www.intelliview.com/go/osdn_nl > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/numpy-discussion From jrennie at csail.mit.edu Thu Jan 20 07:45:45 2005 From: jrennie at csail.mit.edu (Jason Rennie) Date: Thu Jan 20 07:45:45 2005 Subject: [Numpy-discussion] Matlab/Numeric/numarray benchmarks In-Reply-To: References: <41D9CA01.9040108@arrowtheory.com> <20050120152427.GC30466@csail.mit.edu> Message-ID: <20050120154419.GF30466@csail.mit.edu> On Thu, Jan 20, 2005 at 10:33:48AM -0500, Perry Greenfield wrote: > Could you at least give enough information to understand the what the > benchmark is? (Size of arrays, what the 3 columns are, etc). What is > the benchmark code? I see references to benchmark.py but there doesn't > appear to be any attachment. It's Simon Burton's benchmark.py and bench.m code. Only modification I made was to move the imports to the top. Matlab code is identical. See attached for the exact code. Jason -------------- next part -------------- A non-text attachment was scrubbed... Name: benchmark.py Type: text/x-python Size: 1552 bytes Desc: not available URL: -------------- next part -------------- a=randn(1000,1000); b=randn(1000,1000); N=100; t0 = cputime; for i=1:N c = a+b; end t = cputime-t0; t = t/N N=10; t0 = cputime; for i=1:N c = a*b; end t = cputime-t0; t = t/N a=randn(500,500); N=10; t0 = cputime; for i=1:N c = eig(a); end t = cputime-t0; t = t/N From Peter.Chang at nottingham.ac.uk Thu Jan 20 08:14:09 2005 From: Peter.Chang at nottingham.ac.uk (Peter Chang) Date: Thu Jan 20 08:14:09 2005 Subject: [Numpy-discussion] Matlab/Numeric/numarray benchmarks In-Reply-To: <20050120154419.GF30466@csail.mit.edu> Message-ID: On Thu, 20 Jan 2005, Jason Rennie wrote: > It's Simon Burton's benchmark.py and bench.m code. Only modification > I made was to move the imports to the top. Matlab code is identical. > See attached for the exact code. There are errors in benchmark.py and the Matlab code isn't identical: 1) a missing division by count in test01() 2) a different default value for count in test11() 3) the Matlab code uses normally distributed random numbers whereas the Numeric/numarray code uses uniformlt distributed random numbers. Peter This message has been checked for viruses but the contents of an attachment may still contain software viruses, which could damage your computer system: you are advised to perform your own checks. Email communications with the University of Nottingham may be monitored as permitted by UK legislation. From ivilata at carabos.com Thu Jan 20 08:14:56 2005 From: ivilata at carabos.com (Ivan Vilata i Balaguer) Date: Thu Jan 20 08:14:56 2005 Subject: [Numpy-discussion] 'copy' argument in records.array Message-ID: <20050120161346.GC4102@tardis.terramar.selidor.net> Hi all! I have seen that records.array() has a boolean 'copy' argument which indicates whether copying the 'sequence' object or not when it already is an array. However, the written documentation does not mention it anywhere. Is this an officially supported argument? Thank you, Ivan PS: Would you mind cc:'ing me? I am not subscribed to the list. import disclaimer -- Ivan Vilata i Balaguer >qo< http://www.carabos.com/ C?rabos Coop. V. V V Enjoy Data "" -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 189 bytes Desc: Digital signature URL: From jmiller at stsci.edu Thu Jan 20 08:27:00 2005 From: jmiller at stsci.edu (Todd Miller) Date: Thu Jan 20 08:27:00 2005 Subject: [Numpy-discussion] Matlab/Numeric/numarray benchmarks In-Reply-To: <20050120154419.GF30466@csail.mit.edu> References: <41D9CA01.9040108@arrowtheory.com> <20050120152427.GC30466@csail.mit.edu> <20050120154419.GF30466@csail.mit.edu> Message-ID: <1106238362.16482.87.camel@halloween.stsci.edu> On Thu, 2005-01-20 at 10:44, Jason Rennie wrote: > On Thu, Jan 20, 2005 at 10:33:48AM -0500, Perry Greenfield wrote: > > Could you at least give enough information to understand the what the > > benchmark is? (Size of arrays, what the 3 columns are, etc). What is > > the benchmark code? I see references to benchmark.py but there doesn't > > appear to be any attachment. > > It's Simon Burton's benchmark.py and bench.m code. Only modification > I made was to move the imports to the top. Matlab code is identical. > See attached for the exact code. > > Jason Sigh. We discussed this some last week and as a result I ported Numeric's dotblas to numarray. Here's what I get running from numarray CVS and Numeric-23.7 both built with the latest blas, LAPACK, and ATLAS I could find and run on a 1.7 GHz P-IV: t= 0.0697661995888 t= 0.910463786125 t= 9.6143862009 t= 6.44409584999 t= 0.939763069153 t= 9.36037609577 Note that there's a bug in the benchmark (which has already been reported on this list) which explains the 100x difference in the first test case. Here's the results I get with a corrected version of the benchmark: numarray + : 0.0632889986038 numarray matrixmultiply : 0.91903450489 numarray eigenvalues : 8.78720998764 Numeric + : 0.0704428911209 Numeric matrixmultiply : 0.912343025208 Numeric eigenvalues : 8.919506073 I think this a closed issue, at least as far as the numarray/Numeric comparison goes. -------------- next part -------------- A non-text attachment was scrubbed... Name: benchmark.py Type: text/x-python Size: 2051 bytes Desc: not available URL: From jrennie at csail.mit.edu Thu Jan 20 10:18:40 2005 From: jrennie at csail.mit.edu (Jason Rennie) Date: Thu Jan 20 10:18:40 2005 Subject: [Numpy-discussion] Matlab/Numeric/numarray benchmarks In-Reply-To: References: <20050120154419.GF30466@csail.mit.edu> Message-ID: <20050120181752.GG30466@csail.mit.edu> On Thu, Jan 20, 2005 at 04:12:56PM +0000, Peter Chang wrote: > 1) a missing division by count in test01() > 2) a different default value for count in test11() My bad. Should have used Todd Miller's revised version. > 3) the Matlab code uses normally distributed random numbers whereas the > Numeric/numarray code uses uniformlt distributed random numbers. Good point. Revised numbers: + mmult eigen Xeon/3.06GHz/refblas3/lapack3/numarray: .0224 3.14 2.63 Xeon/3.06GHz/refblas3/lapack3/Numeric: .0268 3.45 2.73 Xeon/3.06GHz/atlas3-base/numarray: .0225 3.40 2.52 Xeon/3.06GHz/atlas3-base/Numeric: .0268 1.04 2.57 Xeon/3.06GHz/atlas3-sse/numarray: .0224 3.42 2.54 Xeon/3.06GHz/atlas3-sse/Numeric: .0269 1.05 2.58 Xeon/3.06GHz/atlas3-sse2/numarray: .0225 3.41 FP Exception Xeon/3.06GHz/atlas3-sse2/Numeric: .0269 FP Exc FP Exception Celeron/2.8GHz/atlas-base/numarray: .0814 11.3 6.53 Celeron/2.8GHz/atlas-base/Numeric: .0918 1.70 6.50 P4/2.8GHz/atlas-base/numarray: .0262 4.58 2.96 P4/2.8GHz/atlas-base/Numeric: .0318 1.15 3.00 Xeon/3.06GHz/Matlab: .0102 .886 2.70 P4/2.8GHz/Matlab: .0143 1.00 3.07 Very comparable (Numeric vs. numarray) except matrixmultiply, which I guess is explained by the Debian sarge python2.3-numarray (v1.1.1) not using the dotblas package/routine, as Todd Miller explained in an earlier post. I'll be looking forward to the Debian numarray release that includes dotblas. Looks like it will edge-out Numeric across the board (on the Xeon) once that's in place. For now, I'll be happy with Numeric/atlas3-base. The Matlab numbers use uniform random matrices. All code attached. Todd, Peter: sorry for the confusion I propagated. Jason -------------- next part -------------- A non-text attachment was scrubbed... Name: miller-benchmark.py Type: text/x-python Size: 2055 bytes Desc: not available URL: -------------- next part -------------- a=rand(1000,1000); b=rand(1000,1000); N=100; t0 = cputime; for i=1:N c = a+b; end t = cputime-t0; t = t/N N=10; t0 = cputime; for i=1:N c = a*b; end t = cputime-t0; t = t/N a=rand(500,500); N=10; t0 = cputime; for i=1:N c = eig(a); end t = cputime-t0; t = t/N From simon at arrowtheory.com Thu Jan 20 16:17:49 2005 From: simon at arrowtheory.com (Simon Burton) Date: Thu Jan 20 16:17:49 2005 Subject: [Numpy-discussion] Matlab/Numeric/numarray benchmarks In-Reply-To: <20050120152427.GC30466@csail.mit.edu> References: <41D9CA01.9040108@arrowtheory.com> <20050120152427.GC30466@csail.mit.edu> Message-ID: <20050121111610.678e5f6b.simon@arrowtheory.com> On Thu, 20 Jan 2005 10:24:27 -0500 Jason Rennie wrote: > All three machines are Debian Sarge with atlas3-sse2 plus all the > python2.3 packages installed. I had to include /usr/lib/atlas/sse2 in > my LD_LIBRARY_PATH. Anyone have any clue why the Xeon would balk at > the Numeric matrixmultiply? Thinking it might be an atlas3-sse2 > issue, I tried atlas-sse: > > Xeon/atlas3-sse/Numeric: 0.0269 10.2 2.44 > Xeon/atlas3-sse/numarray: 2.24 3.41 2.48 > > Apparently, there's a bug in the sse2 libraries that numarry is > tripping... Yes, we have the same problem here (Xeons with debian-sarge). It all works fine on the base atlas but blows on atlas3-sse2. I also compiled ATLAS 3.6 for sse2 and the same floating point exception happens. The next thing would be to write a simple c program that trips this exception, because I'm not convinced it is ATLAS's fault. Doesn't Numeric use the same library calls, and if so, why doesn't it also trip this exception ? One other thing I noticed was that atlas3-sse was not noticably faster than atlas3-base. (And i'm sorry about that bad benchmark code..) bye for now, Simon. -- Simon Burton, B.Sc. Licensed PO Box 8066 ANU Canberra 2601 Australia Ph. 61 02 6249 6940 http://arrowtheory.com From perry at stsci.edu Thu Jan 20 16:57:40 2005 From: perry at stsci.edu (Perry Greenfield) Date: Thu Jan 20 16:57:40 2005 Subject: [Numpy-discussion] Numeric web page In-Reply-To: <20050121111610.678e5f6b.simon@arrowtheory.com> Message-ID: Enthought has agreed to host the Numeric home page and will make the necessary changes to the content. On the other front, when I reviewed the current numarray home page, it seemed pretty much fine as is. It could stand some updating, and more detail on our development plans, but it didn't appear misleading to me. If anyone disagrees let me know what wording you consider improper or incorrect. Perry From yunmao at gmail.com Thu Jan 20 17:32:26 2005 From: yunmao at gmail.com (Yun Mao) Date: Thu Jan 20 17:32:26 2005 Subject: [Numpy-discussion] problems with duplicating and slicing an array Message-ID: <7cffadfa05012017293f833a87@mail.gmail.com> Hi everyone, I have two questions: 1. When I do v = u[:, :], it seems u and v still point to the same memory. e.g. When I do v[1,1]=0, u[1,1] will be zero out as well. What's the right way to duplicate an array? Now I have to do v = dot(u, identity(N)), which is kind of silly. 2. Is there a way to do Matlab style slicing? e.g. if I have i = array([0, 2]) x = array([1.1, 2.2, 3.3, 4.4]) I wish y = x(i) would give me [1.1, 3.3] Now I'm using map, but it gets a little annoying when there are two dimensions. Any ideas? Thanks!!! -Y From simon at arrowtheory.com Thu Jan 20 17:45:42 2005 From: simon at arrowtheory.com (Simon Burton) Date: Thu Jan 20 17:45:42 2005 Subject: [Numpy-discussion] problems with duplicating and slicing an array In-Reply-To: <7cffadfa05012017293f833a87@mail.gmail.com> References: <7cffadfa05012017293f833a87@mail.gmail.com> Message-ID: <20050121124417.15da0438.simon@arrowtheory.com> On Thu, 20 Jan 2005 20:29:26 -0500 Yun Mao wrote: > Hi everyone, > I have two questions: > 1. When I do v = u[:, :], it seems u and v still point to the same > memory. e.g. When I do v[1,1]=0, u[1,1] will be zero out as well. > What's the right way to duplicate an array? Now I have to do v = > dot(u, identity(N)), which is kind of silly. v = na.array(u) > > 2. Is there a way to do Matlab style slicing? e.g. if I have > i = array([0, 2]) > x = array([1.1, 2.2, 3.3, 4.4]) > I wish y = x(i) would give me [1.1, 3.3] > Now I'm using map, but it gets a little annoying when there are two > dimensions. Any ideas? have a look at the "take" method. Simon. From jrennie at csail.mit.edu Thu Jan 20 19:10:39 2005 From: jrennie at csail.mit.edu (Jason Rennie) Date: Thu Jan 20 19:10:39 2005 Subject: [Numpy-discussion] Matlab/Numeric/numarray benchmarks In-Reply-To: <20050121111610.678e5f6b.simon@arrowtheory.com> References: <41D9CA01.9040108@arrowtheory.com> <20050120152427.GC30466@csail.mit.edu> <20050121111610.678e5f6b.simon@arrowtheory.com> Message-ID: <20050121030940.GA8687@csail.mit.edu> On Fri, Jan 21, 2005 at 11:16:10AM +1100, Simon Burton wrote: > The next thing would be to write a simple c program that trips this exception, > because I'm not convinced it is ATLAS's fault. Doesn't Numeric use the same library calls, > and if so, why doesn't it also trip this exception ? Turns out the FP exception is a known bug in libc. Numeric does trip it (see my "revised numbers" post. For info on the bug, see, e.g. http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=279294 I also found that it was discussed earlier on this list. Search for "floating point exception weirdness". Jason From stephen.walton at csun.edu Thu Jan 20 19:11:16 2005 From: stephen.walton at csun.edu (Stephen Walton) Date: Thu Jan 20 19:11:16 2005 Subject: [Numpy-discussion] problems with duplicating and slicing an array In-Reply-To: <20050121124417.15da0438.simon@arrowtheory.com> References: <7cffadfa05012017293f833a87@mail.gmail.com> <20050121124417.15da0438.simon@arrowtheory.com> Message-ID: <41F07291.7070507@csun.edu> Simon Burton wrote: >On Thu, 20 Jan 2005 20:29:26 -0500 >Yun Mao wrote: > > > >>What's the right way to duplicate an array? >> >> > >v = na.array(u) > > v=u.copy() From konrad.hinsen at laposte.net Fri Jan 21 00:50:14 2005 From: konrad.hinsen at laposte.net (konrad.hinsen at laposte.net) Date: Fri Jan 21 00:50:14 2005 Subject: [Numpy-discussion] problems with duplicating and slicing an array In-Reply-To: <7cffadfa05012017293f833a87@mail.gmail.com> References: <7cffadfa05012017293f833a87@mail.gmail.com> Message-ID: <41561AA2-6B89-11D9-A2D4-000A95AB5F10@laposte.net> On 21.01.2005, at 02:29, Yun Mao wrote: > 1. When I do v = u[:, :], it seems u and v still point to the same > memory. e.g. When I do v[1,1]=0, u[1,1] will be zero out as well. > What's the right way to duplicate an array? Now I have to do v = > dot(u, identity(N)), which is kind of silly. There are several ways to make a copy of an array. My personal preference is import copy v = copy(u) because this is a general mechanism that works for all Python objects. > 2. Is there a way to do Matlab style slicing? e.g. if I have > i = array([0, 2]) > x = array([1.1, 2.2, 3.3, 4.4]) > I wish y = x(i) would give me [1.1, 3.3] > Now I'm using map, but it gets a little annoying when there are two > dimensions. Any ideas? y = Numeric.take(x, i) Konrad. -- ------------------------------------------------------------------------ ------- Konrad Hinsen Laboratoire Leon Brillouin, CEA Saclay, 91191 Gif-sur-Yvette Cedex, France Tel.: +33-1 69 08 79 25 Fax: +33-1 69 08 82 61 E-Mail: hinsen at llb.saclay.cea.fr ------------------------------------------------------------------------ ------- From konrad.hinsen at laposte.net Fri Jan 21 02:20:55 2005 From: konrad.hinsen at laposte.net (konrad.hinsen at laposte.net) Date: Fri Jan 21 02:20:55 2005 Subject: [Numpy-discussion] problems with duplicating and slicing an array In-Reply-To: <41561AA2-6B89-11D9-A2D4-000A95AB5F10@laposte.net> References: <7cffadfa05012017293f833a87@mail.gmail.com> <41561AA2-6B89-11D9-A2D4-000A95AB5F10@laposte.net> Message-ID: <33B69E6E-6B96-11D9-AB0C-000A95999556@laposte.net> On Jan 21, 2005, at 9:48, konrad.hinsen at laposte.net wrote: > There are several ways to make a copy of an array. My personal > preference is > > import copy > v = copy(u) That's of course import copy v = copy.copy(u) or from copy import copy v = copy(u) Konrad. -- --------------------------------------------------------------------- Konrad Hinsen Laboratoire L?on Brillouin, CEA Saclay, 91191 Gif-sur-Yvette Cedex, France Tel.: +33-1 69 08 79 25 Fax: +33-1 69 08 82 61 E-Mail: hinsen at llb.saclay.cea.fr --------------------------------------------------------------------- From faltet at carabos.com Fri Jan 21 05:16:12 2005 From: faltet at carabos.com (Francesc Altet) Date: Fri Jan 21 05:16:12 2005 Subject: [Numpy-discussion] Proposal for making of Numarray a real Numeric 'NG' Message-ID: <200501211413.51663.faltet@carabos.com> Hi List, I would like to make a formal proposal regarding with the subject of previous discussions in that list. This message is a bit long, but I've tried my best to expose my thoughts as clearly as possible. Is Numarray a good replacement of Numeric? ========================================== It has been a debate lately with regard to the convinience of claiming numarray to be a replacement of Numeric. Perhaps the main source for this claim has been the home page of the Numeric project [1]: """ If you are new to Numerical Python, please use Numarray. The older module, Numeric, is unsupported. At this writing Numarray is slower for very small arrays but faster for large ones. Numarray contains facilities to help you convert older code to use it. Some parts of the community have not made the switch yet but the Numarray libraries have been carefully named differently so that Numeric and Numarray can coexist in one application. """ So the paragraph is giving the impression that Numeric was going to be deprecated. While I recognize that I was between those that this statement lent us to think about numarray as a kind of 'Next Generation of Numeric', it seems now (from the previous discussions) that this was sort of unfortunate/misleading observation. In fact, Perry Greenfield, one of the main authors of numarray will be taking some steps in order to correct that observation in the near future [2]. However, I'd like to believe (and with me, quite a few more people for sure) that the mentioned statement, apart of creating some confusion, would eventually easy the long term convergence of both packages. This would be great not only to unify efforts, but also to allow the inclusion of Numeric/Numarray in the Python Standard Library, which would be a Good Thing. Numarray vs Numeric: Pros and Cons ================================== It's worth remembering that Numeric has been a major breakthrough in introducing the capability to deal with large (homogeneous) datasets in Python in a very efficient mannner. In my opinion Numarray is, generally speaking, a very good package as well with many interesting new features that lack Numeric. Between the main advantages of Numarray vs Numeric I can list the next (although I can be a bit misleaded here because of my own user cases of both libraries): - Memory-mapped objects: Allow working with on-disk numarray objects like if they were in-memory. - RecArrays: Objects that allow to deal with heterogeneous datasets (tables) in an efficient manner. This ought to be very beneficial in many fields. - CharArrays: Allow to work with large amounts of fixed and variable length strings. I see this implementation much more powerful that Numeric. - Index arrays within subscripts: e.g. if ind = array([4, 4, 0, 2]) and x = 2*arange(6), x[inx] results in array([8, 8, 0, 4]) - New design interface: We should not forget that numarray has been designed from the ground with Python Library integration in mind (or at least, this is my impression). So, it should have more chances (if there is some hope) to enter in the Standard Library than Numeric. [See [3] for a more acurate description of differences] In this point, it would be also fair to recognize the important effort that has been done by the Numarray crew (and others) to create a fairly good replacement for Numeric: the API is getting closer bit a bit, the numerix module makes easier to support both Numeric and numarray by an application (see [5] for a concrete case of switching between Numeric and Numarray in SciPy or [6] for matplotlib), the current effort to support Numarray in SciPy, and last but not least, their good responsiveness to enhancements in that respect. The real problem for Numarray: Object Creation Time =================================================== On the other hand, the main drawback of Numarray vs Numeric is, in my opinion, its poor performance regarding object creation. This might look like a banal thing at first glance, but it is not in many cases. One example recently reported in this list is: >>> from timeit import Timer >>> setup = 'import Numeric; a = Numeric.arange(2000);a.shape=(1000,2)' >>> Timer('for i in range(len(a)): row=a[i]', setup).timeit(number=100) 0.12782907485961914 >>> setup = 'import numarray; a = numarray.arange(2000);a.shape=(1000,2)' >>> Timer('for i in range(len(a)): row=a[i]', setup).timeit(number=100) 1.2013700008392334 So, numarray performs 10 times slower than Numeric not because its indexing access code would be 10 times slower, but mainly due to the fact that object creation is between 5 and 10 times slower, and the loop above implies an object creation on each iteration. Other case of use where object creation time is important can be seen in [4]. Proposal for making of Numarray a real Numeric 'NG' (Next Generation) ===================================================================== Provided that the most important reason (IMO) to not consider Numarray to be a good replacement of Numeric is object creation time, I would like to propose a coordinated effort to solve this precise issue. First of all, it would be nice if the most experienced people with Numarray (i.e. the Numarray crew) would give a deep analysis to that, and end with a series of small, autocontained benchmarks files that clearly exposes the possible bottlenecks. This maybe hard to do, but this is crucial. Once the problem has been reduced to optimize these small, auto-contained benchmarks, they can be made publicly accessible together with an explanation of what the problem is and what the benchmarks are intended for. After this, I suggest a call for contributions (in this list and scipy list, for example) on optimizing this code and spark discussions on that (a Wiki can work great here). I'm pretty sure that there is enough brain and challenge-hungry people in these lists to contribute solving the problem. If after these efforts, there are issues that can't be solved yet, at least the problem would be much more centered, and much more people can think on that (hopefully, the solution may not depend on the intricacies of Numeric/Numarray), so it maybe possible to sent it to the general Python list and hope that some guru would be willing to help us on that. Well, this is my proposal. Uh, sorry for the length of the message. Perhaps you may think that I've smoked too much and maybe you are right. However, I'm so convinced that such a Numeric/Numarray unification is going to be a Very Good Thing that I unrecklessly spend some time making this proposal (and look forward contributing in some way or another if this is going to be done). Cheers, [1] http://www.pfdubois.com/numpy/ [2] http://sourceforge.net/mailarchive/message.php?msg_id=10608642 [3] http://stsdas.stsci.edu/numarray/numarray-1.1.html/node18.html [4] http://sourceforge.net/mailarchive/message.php?msg_id=10582525 [5] http://aspn.activestate.com/ASPN/Mail/Message/scipy-dev/2299767 [6] http://matplotlib.sourceforge.net/matplotlib.numerix.html -- >qo< Francesc Altet ? ? http://www.carabos.com/ V ?V C?rabos Coop. V. ??Enjoy Data "" -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: not available URL: From aisaac at american.edu Fri Jan 21 06:29:00 2005 From: aisaac at american.edu (Alan G Isaac) Date: Fri Jan 21 06:29:00 2005 Subject: [Numpy-discussion] problems with duplicating and slicing an array In-Reply-To: <33B69E6E-6B96-11D9-AB0C-000A95999556@laposte.net> References: <7cffadfa05012017293f833a87@mail.gmail.com> <41561AA2-6B89-11D9-A2D4-000A95AB5F10@laposte.net><33B69E6E-6B96-11D9-AB0C-000A95999556@laposte.net> Message-ID: On Fri, 21 Jan 2005, konrad.hinsen at laposte.net apparently wrote: > There are several ways to make a copy of an array. Are there any other considerations in making this choice? Thank you, Alan Isaac From jrennie at csail.mit.edu Fri Jan 21 07:58:55 2005 From: jrennie at csail.mit.edu (Jason Rennie) Date: Fri Jan 21 07:58:55 2005 Subject: [Numpy-discussion] Proposal for making of Numarray a real Numeric 'NG' In-Reply-To: <200501211413.51663.faltet@carabos.com> References: <200501211413.51663.faltet@carabos.com> Message-ID: <20050121155712.GD16747@csail.mit.edu> On Fri, Jan 21, 2005 at 02:13:45PM +0100, Francesc Altet wrote: > """ > If you are new to Numerical Python, please use Numarray. The older module, > Numeric, is unsupported. At this writing Numarray is slower for very small > arrays but faster for large ones. Numarray contains facilities to help you > convert older code to use it. Some parts of the community have not made the > switch yet but the Numarray libraries have been carefully named differently > so that Numeric and Numarray can coexist in one application. > """ Another problem is that Numeric is extremely poorly advertised/marketed. - There is no single keyword for Numeric: it is referred to as "Numerical", "Numeric" and "numpy". Both "Numerical" and "numpy" are also used to refer to numarray. - Numeric does not have a home page of its own. The Sourceforge "Numerical" page lists both numarray and Numeric (which, coincidentally, is referred to as "numpy"). - The #1 & #2 Google results for "numeric python" are the numpy.org page, which is out-of-date, and advertises numarray as being a replacement for Numeric. Plus, what appears to be the main link for Numeric, "Release 22.0" points to a page with both numarray and Numeric releases, numarray first, and Numeric releases named "numpy". Could you try to be more confusing? - None of the top 10 Google links for "numeric python" point to the Sourceforge page. - A "numeric python" search on sourceforge lists 24 projects before the Numerical Python page. Jason From klimek at grc.nasa.gov Fri Jan 21 08:22:01 2005 From: klimek at grc.nasa.gov (Bob Klimek) Date: Fri Jan 21 08:22:01 2005 Subject: [Numpy-discussion] position of objects? In-Reply-To: <82AC3A32-67C6-11D9-932A-000D932805AC@embl.de> References: <41E592EB.6090209@grc.nasa.gov> <39D8BD7A-65A4-11D9-B5CA-000D932 805AC@embl.de> <41E82498.3070101@grc.nasa.gov> <82AC3A32-67C6-11D9-932A-00 0D932805AC@embl.de> Message-ID: <41F12C47.3080404@grc.nasa.gov> Peter Verveer wrote: > The procedure you show below seems to be based on a normal watershed. > I am not completely sure how the Image-J implementation works, but one > way to do that would be to do a watershed on the distance transform of > that image (actually you would use the negative of the distance > transform, with the local minima of that as the seeds). You could do > that with watershed_ift, in this case it would give you two labeled > objects, that in contrast to your example would however touch each > other. To do the exact same as below a watershed is needed that also > gives watershed lines. Hi Peter, I thought I'd try your suggestion above but I'm falling short. Where I stall is at local minima (or local maxima if you don't invert the image). Currently there is no local minima (or maxima) function in nd_image is there (or am I missing it)? Bob From perry at stsci.edu Fri Jan 21 08:49:10 2005 From: perry at stsci.edu (Perry Greenfield) Date: Fri Jan 21 08:49:10 2005 Subject: [Numpy-discussion] Proposal for making of Numarray a real Numeric 'NG' In-Reply-To: <200501211413.51663.faltet@carabos.com> References: <200501211413.51663.faltet@carabos.com> Message-ID: <15F3A864-6BCC-11D9-B8A8-000A95B68E50@stsci.edu> On Jan 21, 2005, at 8:13 AM, Francesc Altet wrote: > Hi List, > > I would like to make a formal proposal regarding with the subject of > previous discussions in that list. This message is a bit long, but I've > tried my best to expose my thoughts as clearly as possible. > [...] I think Francesc has summarized things very well and offered up some good ideas for how to proceed in speeding up small array performance. Particularly key is understanding exactly where the the time is going in the processing. We (read Todd, really) has some suspicions about what the bottlenecks are and I'll include his conclusions about these below. I just caution that getting good benchmark information to determine this correctly can be more difficult that it would first seem for the reasons he mentions. But anyway I'm certainly supportive of getting an effort going to address this issue (again, we can give support as we described before, but if it is to be done in the near term, others will have to actually do most of the work). A wiki page sounds like a good idea, and it probably should be hosted on scipy.org. If we see any response to this I'll ask to have one set up. > The real problem for Numarray: Object Creation Time > =================================================== > > On the other hand, the main drawback of Numarray vs Numeric is, in my > opinion, its poor performance regarding object creation. This might > look > like a banal thing at first glance, but it is not in many cases. One > example > recently reported in this list is: > >>>> from timeit import Timer >>>> setup = 'import Numeric; a = Numeric.arange(2000);a.shape=(1000,2)' >>>> Timer('for i in range(len(a)): row=a[i]', setup).timeit(number=100) > 0.12782907485961914 >>>> setup = 'import numarray; a = >>>> numarray.arange(2000);a.shape=(1000,2)' >>>> Timer('for i in range(len(a)): row=a[i]', setup).timeit(number=100) > 1.2013700008392334 > > So, numarray performs 10 times slower than Numeric not because its > indexing > access code would be 10 times slower, but mainly due to the fact that > object > creation is between 5 and 10 times slower, and the loop above implies > an > object creation on each iteration. > > Other case of use where object creation time is important can be seen > in > [4]. > It probably is perhaps too narrow to focus on just array creation. It likely is the biggest factor but there may be other issues as well. For the above case it's possible that the indexing mechanism itself can be speeded up, and that is likely part of the ratio of speeds being 5 to 10 times slower. Todd's comments: Here's a little input for how someone can continue looking at this. Here's the semi-solid info I have at the moment on ufunc execution time; included within it is a breakdown of some of the costs in the C-API function NA_NewAllFromBuffer() located in newarray.ch. I haven't been working on this; this is where I left off. My timing module, numarray.teacup, may be useful to someone else trying to measure timing; the accuracy of the measurements is questionable either due to bugs or the intrusiveness of the inline code disturbing the processor cache (it does dictionary operations for each timing measurement). I tried to design it so that timing measurements can be nested, with limited success. Nevertheless, as a rough guide that provides microsecond level measurements, I have found it useful. It only works on linux. Build numarray like this: % rm -rf build % python setup.py install --timing --force Then do this to see the cost of the generated output array in an add(): >>> import numarray as na >>> a = na.arange(10) >>> b = a.copy() >>> for i in range(101): ... jnk = na.add(a,b) ... >>> import numarray.teacup as tc >>> tc.report() Src/_ufuncmodule.c _cache_exec2 fast count: 101 avg_usec: 4.73 cycles: 0 Src/_ufuncmodule.c _cache_lookup2 broadcast count: 101 avg_usec: 4.46 cycles: 0 Src/_ufuncmodule.c _cache_lookup2 hit or miss count: 101 avg_usec: 27.50 cycles: 6 Src/_ufuncmodule.c _cache_lookup2 hit output count: 100 avg_usec: 25.22 cycles: 5 Src/_ufuncmodule.c _cache_lookup2 internal count: 101 avg_usec: 5.20 cycles: 0 Src/_ufuncmodule.c _cache_lookup2 miss count: 0 avg_usec: nan cycles: 0 Src/_ufuncmodule.c cached_dispatch2 exec count: 101 avg_usec: 13.65 cycles: 1 Src/_ufuncmodule.c cached_dispatch2 lookup count: 101 avg_usec: 37.35 cycles: 9 Src/_ufuncmodule.c cached_dispatch2 overall count: 101 avg_usec: 53.36 cycles: 12 Src/libnumarraymodule.c NewArray __new__ count: 304 avg_usec: 8.12 cycles: 0 Src/libnumarraymodule.c NewArray buffer count: 304 avg_usec: 5.37 cycles: 0 Src/libnumarraymodule.c NewArray misc count: 304 avg_usec: 0.25 cycles: 0 Src/libnumarraymodule.c NewArray type count: 304 avg_usec: 0.27 cycles: 0 Src/libnumarraymodule.c NewArray update count: 304 avg_usec: 1.16 cycles: 0 Src/libteacupmodule.c calibration nested count:999999 avg_usec: -0.00 cycles: 1 Src/libteacupmodule.c calibration top count:999999 avg_usec: -0.00 cycles: 0 I would caution anyone working on this that there are at least three locations in the code (some of it redundancy inserted for the purpose of performance optimization, some of it the consequences of having a class hierarchy) that need to be considered: _ndarraymodule.c, _numarraymodule.c, and newarray.ch. My suspicions: 1. Having an independent buffer/memory object rather than just mallocing the array storage. This is expensive in a lot of ways: it's an extra hidden object and also a Python function call. The ways I've thought of for Eliminating this add complexity and make numarray even more modal than it already is. 2. Being implemented as a new style class; this is an unknown cost and involves the creation of still other extra objects, like the object() dictionary, but presumably that has been fairly well optimized already. Calling up the object hierarchy to build the object (__new__) probably has additional overheads. Things to try: 1. Retain a free-list/cache of small objects and re-use them rather than creating/destroying all the time. Use a constant storage size and fit any small array into that space. I think this is the killer technique that would solve the small array problem without kludging up everything else. Do this first, and only then see if (2) or (3) need to be done. 2. Flatten the class hiearchy more (at least for construction) and remove any redundancy by refactoring. 3. Build in a malloc/free mode for array storage which bypasses the memorymodule completely and creates buffer objects when _data is accessed. Use the OWN_DATA bit in arrayobject.flags. > The real problem for Numarray: Object Creation Time >> =================================================== >> >> On the other hand, the main drawback of Numarray vs Numeric is, in my >> opinion, its poor performance regarding object creation. This might >> look >> like a banal thing at first glance, but it is not in many cases. One >> example >> recently reported in this list is: >> >>>>> from timeit import Timer >>>>> setup = 'import Numeric; a = Numeric.arange(2000);a.shape=(1000,2)' >>>>> Timer('for i in range(len(a)): row=a[i]', setup).timeit(number=100) >> 0.12782907485961914 >>>>> setup = 'import numarray; a = >>>>> numarray.arange(2000);a.shape=(1000,2)' >>>>> Timer('for i in range(len(a)): row=a[i]', setup).timeit(number=100) >> 1.2013700008392334 >> >> So, numarray performs 10 times slower than Numeric not because its >> indexing >> access code would be 10 times slower, but mainly due to the fact that >> object >> creation is between 5 and 10 times slower, and the loop above implies >> an >> object creation on each iteration. >> >> Other case of use where object creation time is important can be seen >> in >> [4]. One thing to note here is that NumArray() is really used to create numarray arrays, while array() is used to create Numeric arrays. In numarray, array() is a Python function which can be optimized to C in its own right. That alone will not fix the problem though. NumArray itself must be optimized. >> >> Proposal for making of Numarray a real Numeric 'NG' (Next Generation) >> ===================================================================== >> >> Provided that the most important reason (IMO) to not consider Numarray >> to be >> a good replacement of Numeric is object creation time, I would like to >> propose a coordinated effort to solve this precise issue. I think that is one place to optimize, and the best I'm aware of, but there's a lot of Python in numarray, and a single "." is enough to blow performance out of the water. I think this problem is easily solveable for small arrays with a modest effort. There are a lot of others though (moving the NumArray number protocol to C is one that comes to mind.) From paul at pfdubois.com Fri Jan 21 10:21:05 2005 From: paul at pfdubois.com (Paul F. Dubois) Date: Fri Jan 21 10:21:05 2005 Subject: [Numpy-discussion] Proposal for making of Numarray a real Numeric 'NG' In-Reply-To: <20050121155712.GD16747@csail.mit.edu> References: <200501211413.51663.faltet@carabos.com> <20050121155712.GD16747@csail.mit.edu> Message-ID: <41F1474D.1060801@pfdubois.com> Somehow 349,000+ accesses to the Numeric home page have occurred despite the fact that those searching for it did not get a good education at MIT. I had it on my own site so as to be able to use better tools than SF lets you do. We're in the middle of changing the ownership of the page due to my impending retirement, so perhaps that caused some confusion. The Numeric/numpy/Numerical thing has a long funny history. You had to be there. It isn't right but it is what it is. When I was leading the project there was a general feeling that a lot of the things we wanted to do with Numeric were going to be very hard to do with the existing implementation, some of which was generated by a code generator that had gotten lost, and some of which was impeneratable because it was written by a genius who went to (you guessed it) MIT. My intention was to replace Numeric with a quickly-written better implementation. That is why the Numeric page says what it says. I've left it that way as a reminder of the goal, which I continue to believe is important. Besides cleaning it up, the other motivation was to back off the 'performance at all cost' design enough that we would be 'safe' enough to qualify for the Python distribution and become a standard module. Numeric was written without many safety checks *on purpose*. Over time opinions about that philosphy changed. In fact, the team that wrote numarray did not do what I asked for, leading to the present confusion but also to, as noted by Altet, some nice features. I think it was unfortunate that this happened but as with most open source projects the person doing the work does the work the way they want and partly to satisfy their own needs. But they do the work, all credit to them. I'm not complaining. There are really only a couple of problems (object arrays and array creation time) that can be fixed. What is wrong with the array creation time is obvious. It is written in Python and has too much flexibility, which costs time to decode. Make a raw C-level creator with less choice and I bet it will be ok. Somebody help these guys; this isn't a product it is an open source project. Let's get to the promised land and retire Numeric/Numerical/numpy. Jason Rennie wrote: > On Fri, Jan 21, 2005 at 02:13:45PM +0100, Francesc Altet wrote: > >>""" >>If you are new to Numerical Python, please use Numarray. The older module, >>Numeric, is unsupported. At this writing Numarray is slower for very small >>arrays but faster for large ones. Numarray contains facilities to help you >>convert older code to use it. Some parts of the community have not made the >>switch yet but the Numarray libraries have been carefully named differently >>so that Numeric and Numarray can coexist in one application. >>""" > > > Another problem is that Numeric is extremely poorly advertised/marketed. > > - There is no single keyword for Numeric: it is referred to as "Numerical", > "Numeric" and "numpy". Both "Numerical" and "numpy" are also used to > refer to numarray. > > - Numeric does not have a home page of its own. The Sourceforge > "Numerical" page lists both numarray and Numeric (which, coincidentally, > is referred to as "numpy"). > > - The #1 & #2 Google results for "numeric python" are the numpy.org > page, which is out-of-date, and advertises numarray as being a replacement > for Numeric. Plus, what appears to be the main link for Numeric, > "Release 22.0" points to a page with both numarray and Numeric > releases, numarray first, and Numeric releases named "numpy". Could > you try to be more confusing? > > - None of the top 10 Google links for "numeric python" point to the > Sourceforge page. > > - A "numeric python" search on sourceforge lists 24 projects before the > Numerical Python page. > > Jason > > > ------------------------------------------------------- > This SF.Net email is sponsored by: IntelliVIEW -- Interactive Reporting > Tool for open source databases. Create drag-&-drop reports. Save time > by over 75%! Publish reports on the web. Export to DOC, XLS, RTF, etc. > Download a FREE copy at http://www.intelliview.com/go/osdn_nl > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/numpy-discussion > From jrennie at csail.mit.edu Fri Jan 21 11:32:58 2005 From: jrennie at csail.mit.edu (Jason Rennie) Date: Fri Jan 21 11:32:58 2005 Subject: [Numpy-discussion] Proposal for making of Numarray a real Numeric 'NG' In-Reply-To: <41F1474D.1060801@pfdubois.com> References: <200501211413.51663.faltet@carabos.com> <20050121155712.GD16747@csail.mit.edu> <41F1474D.1060801@pfdubois.com> Message-ID: <20050121193009.GA19684@csail.mit.edu> On Fri, Jan 21, 2005 at 10:17:49AM -0800, Paul F. Dubois wrote: > Somehow 349,000+ accesses to the Numeric home page have occurred despite > the fact that those searching for it did not get a good education at > MIT. I had it on my own site so as to be able to use better tools than > SF lets you do. We're in the middle of changing the ownership of the > page due to my impending retirement, so perhaps that caused some confusion. Sorry if I came off "big headed." Was just trying to point out that, to an outsider, it's, well, confusing. And, there are some very simple things that could be done to alleviate the confusion: a Numeric (not Numerical, not numarray) home page, consistent nomenclature. I'm not asking you to take your page down. I agree, it's a cool snapshot of history. And, I agree with you: it's often easier to host a home page on your own server. I've gone through hell trying to host the ifile home page on Savannah. I just think there needs to be a "Numeric" page somewhere with updated release information, pointers to current documentation, short explanation of how Numeric is different from numarray and maybe a short synopsis of the history behind the project(s). :) I'm also not trying to belittle the great achievements that are Numeric and numarray. I think these are both awesome packages. I sure can't claim to have written anything as useful. Jason From cookedm at physics.mcmaster.ca Fri Jan 21 13:25:55 2005 From: cookedm at physics.mcmaster.ca (David M. Cooke) Date: Fri Jan 21 13:25:55 2005 Subject: [Numpy-discussion] Speeding up Numeric Message-ID: Following up on the discussion of small-array performance, I decided to profile Numeric and numarray. I've been playing with Pyrex, implementing a vector class (for doubles) that runs as fast as I can make it, and it's beating Numeric and numarray by a good bit, so I figured those two were doing something. I can't say much about numarray right now (most of the runtime is in complicated Python code, with weird callbacks to and from C code), but for Numeric it was easy to find a problem. First, profiling Python extensions is not easy. I've played with using the -pg flag to the GCC C compiler, but I couldn't find a way to profile the extension itself (even with Python built with -pg). So I see about three ways to profile an extension: 1) Wrap each function in the extension in a pair of calls to something that keeps track of time in the function. This is what the numarray.teacup module does. This is unsatisfying, intrusive, labour-intensive, and it does manually what -pg does automatically. 2) Compile the extension into the Python executable (where both the extension and Python have been compiled with the -pg flag). Unfortunately, as far as I can tell, this is not possible with distutils. If you have another build framework, however, it's not that hard to do. I've had some success with this approach with other extensions. 3) Use oprofile (http://oprofile.sourceforge.net/), which runs on Linux on a x86 processor. This is the approach that I've used here. oprofile is a combination of a kernel module for Linux, a daemon for collecting sample data, and several tools to analyse the samples. It periodically polls the processor performance counters, and records which code is running. It's a system-level profiler: it profiles _everything_ that's running on the system. One obstacle is that does require root access. Short tutorial on using oprofile -------------------------------- Using oprofile on Debian with a 2.6.x kernel is easy (replace sudo with your favourite "be-root" method): $ sudo apt-get install oprofile $ sudo modprobe oprofile # make sure the kernel module is installed Now, start the oprofile daemon. On my machine, --event=default just looks at retired CPU instructions. Read the oprofile documentation for more info. $ sudo opcontrol --event=default --start $ (run code) $ opcontrol --dump # dump the statistics to disk # this is the only thing a non-root user can do $ sudo opcontrol --stop # we don't need the daemon anymore To do another profile run, you need to reset the data $ sudo opcontrol --reset You should be able to to the above when the daemon is running, but I the daemon crashes on me when I do that; I find I end up having to also clear the old statistics manually: $ sudo rm -rf /var/lib/oprofile/samples Once you've collected samples, you can analyse the results. Here, I'll be looking at adding two 1000-element arrays with the following code: import Numeric as NA a = NA.arange(1000.0) b = NA.arange(1000.0) for i in xrange(10000000): a + b This takes 1m14s on my machine (an AMD64 3200+ running in 64-bit mode). So, where I have (run code) above, I'd do $ python x.py Once I've collected the samples, I can analyse them. Note that samples are collected on a per-application basis; if you've got other processes using python, they'll be included. You could copy the python binary to another location, and use that for the analysis, then your program would be the only picked by the following analysis. $ opstack -t 1 /usr/bin/python self % child % image name symbol name 132281 10.5031 0 0 python (no symbols) ------------------------------------------------------------------------------- 704810 55.9618 0 0 _numpy.so check_array ------------------------------------------------------------------------------- 309384 24.5650 0 0 umath.so DOUBLE_add ------------------------------------------------------------------------------- 112974 8.9701 0 0 libc-2.3.2.so (no symbols) ------------------------------------------------------------------------------- The -t 1 limits the display to those routines taking more than 1% of the runtime. 10% for python, and 10% for the C-library probably aren't so bad (I'm thinking that's calls to malloc() and friends). However, the big problem is that only 25% of the time is actually doing useful work. What's check_array doing? We can delve deeper: $ mkdir profile-Numeric $ opannotate --source -o profile-Numeric \ --search-dir= /usr/bin/python Now profile-Numeric//Src has annotated copies of the source for the Numeric extension modules. The definition of check_array is in ufuncobject.c, which gives us 386 0.0286 :void check_array(PyArrayObject *ap) { /* check_array total: 7046 : double *data; : int i, n; : 371 0.0275 : if (ap->descr->type_num == PyArray_DOUBLE || ap->descr->type_num == PyArray_CDOUBLE) { 89 0.0066 : data = (double *)ap->data; 758 0.0563 : n = PyArray_Size((PyObject *)ap); 46 0.0034 : if (ap->descr->type_num == PyArray_CDOUBLE) n *= 2; : 700662 51.9988 : for(i=0; i>> import Numeric >>> a = Numeric.array([1e308]) >>> a + a array([ inf]) It will catch NaN's though. It's obvious when you realize that HUGE_VAL is inf; inf <= inf is true. With HAVE_FINITE defined, I get, for the same a array, >>> a + a OverflowError: math range error The Numeric documentation has this to say about the check_return parameter to PyUFunc_FromFuncAndData (which determines whether check_array is called): Usually best to set to 1. If this is non-zero then returned matrices will be cleaned up so that rank-0 arrays will be returned as python scalars. Also, if non-zero, then any math error that sets the errno global variable will cause an appropriate Python exception to be raised. Note that the rank-0 array -> scalar conversion happens regardless. check_return doesn't affect this at all. Removing check_array -------------------- Commenting out the body of check_array in ufuncobject.c speeds up the script above by *two* times. On my iBook (a G4 800), it speeds it up by *four* times. Using timeit.py: $ python /usr/lib/python2.3/timeit.py \ -s 'import Numeric as NA; N=1e4; a=NA.arange(N)/N; b=NA.arange(N)/N' \ 'a+b' I get for various values of N: N Stock Numeric numarray numarray Numeric without recent 1.1.1 23.7 check_array CVS 1e1 1.13 1.08 10.5 9.9 1e2 1.73 1.35 10.8 10.6 1e3 6.91 3.2 13.3 12.9 1e4 83.3 42.5 52.8 52.3 1e5 4890 4420 4520 4510 1e6 52700 47400 47100 47000 1e7 532000 473000 476000 474000 Numeric is as fast as numarray now! :-) The 10x change in per-element speed between 1e4 and 1e5 is due to cache effects. N Stock Numeric numarray numarray Numeric without recent 1.1.1 23.7 check_array CVS 1e1 1.31 1.28 8.49 7.64 1e2 5.86 5.44 14.4 12.1 1e3 51.8 48 70.4 54.5 1e4 542 502 643 508 1e5 7480 6880 7430 6850 1e6 77500 70700 82700 69100 1e7 775000 710000 860000 694000 Numeric is faster than numarray from CVS, but there seems to be regression. Without check_array, Numeric is almost as fast as as numarray 1.1.1. Remarks ------- - I'd rather have my speed than checks for NaN's. Have that in a separate function (I'm willing to write one), or do numarray-style processor flag checks (tougher). - General plea: *please*, *please*, when releasing a library for which speed is a selling point, profile it first! - doing the same profiling on numarray finds 15% of the time actually adding, 65% somewhere in python, and 15% in libc. - I'm still fiddling. Using the three-argument form of Numeric.add (so no memory allocation needs to be done), 64% of the time is now spend adding; I think that could be better. The Pyrex vector class I'm working on does 80% adding (with memory allocation for the result). Hope this helps :-) -- |>|\/|< /--------------------------------------------------------------------------\ |David M. Cooke http://arbutus.physics.mcmaster.ca/dmc/ |cookedm at physics.mcmaster.ca From paul at pfdubois.com Fri Jan 21 13:43:53 2005 From: paul at pfdubois.com (Paul F. Dubois) Date: Fri Jan 21 13:43:53 2005 Subject: [Numpy-discussion] Speeding up Numeric In-Reply-To: References: Message-ID: <41F176E5.4080806@pfdubois.com> As I mentioned in a recent post, the original Numeric philosphy was damn the torpedos full steam ahead; performance first, safety second. There was a deliberate decision not to handle NaN, inf, or anything like it, and if you overflowed you should die. Unfortunately the original high-performance community did not remain the only community, and there were lots of complaints about dying, it being considered unPythonic to die. High-performance people don't mind dying so much; to me it just means my algorithm is wrong and I need to hear about it. But for a calculator for a biologist or student of linear algebra that's not the right answer. While I haven't researched the source history, I doubt that checking was in there before. And putting it in should have been the result of a long public discussion first. Perhaps there was one and I missed it, since I haven't paid too much attention the last few years (my present project involves Numeric only a tiny amount). When I retire maybe I will write a high-performance one. But it will be in C++ and half the people will hate me. (:-> Speed kills. But some of us have to drive fast. From Chris.Barker at noaa.gov Fri Jan 21 14:25:04 2005 From: Chris.Barker at noaa.gov (Chris Barker) Date: Fri Jan 21 14:25:04 2005 Subject: [Numpy-discussion] Speeding up Numeric In-Reply-To: <41F176E5.4080806@pfdubois.com> References: <41F176E5.4080806@pfdubois.com> Message-ID: <41F1718D.4010601@noaa.gov> Paul F. Dubois wrote: > But for a calculator for a biologist or student of linear > algebra that's not the right answer. I'm neither, but my needs are similar, and I really want Numeric to NOT stop, and just keep going with a NaN, inf or -inf. IMHO, that is the only intelligent way for an array package to behave, and you get a performance boost as well! However, my understanding is that the IEEE special values are not universally supported by compilers and/or math libraries, so this poses a problem. I wonder if it's still an issue or if all the compilers of interest have the requisite support? > When I retire maybe I will write a high-performance one. But it will be > in C++ and half the people will hate me. (:-> Maybe a python wrapper around Blitz++ ? I would love that! David, If you could apply your skills to profiling array creation in numarray, you'd be doing a great service! -Chris -- Christopher Barker, Ph.D. Oceanographer NOAA/OR&R/HAZMAT (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From Chris.Barker at noaa.gov Fri Jan 21 14:57:48 2005 From: Chris.Barker at noaa.gov (Chris Barker) Date: Fri Jan 21 14:57:48 2005 Subject: [Numpy-discussion] problems with duplicating and slicing an array In-Reply-To: <20050121124417.15da0438.simon@arrowtheory.com> References: <7cffadfa05012017293f833a87@mail.gmail.com> <20050121124417.15da0438.simon@arrowtheory.com> Message-ID: <41F1796E.7040303@noaa.gov> Simon Burton wrote: > On Thu, 20 Jan 2005 20:29:26 -0500 > Yun Mao wrote: >>2. Is there a way to do Matlab style slicing? e.g. if I have >> i = array([0, 2]) >> x = array([1.1, 2.2, 3.3, 4.4]) >> I wish y = x(i) would give me [1.1, 3.3] > > have a look at the "take" method. or use numarray: >>> import numarray as N >>> i = N.array([0, 2]) >>> x = N.array([1.1, 2.2, 3.3, 4.4]) >>> y = x[i] >>> y array([ 1.1, 3.3]) >>> -- Christopher Barker, Ph.D. Oceanographer NOAA/OR&R/HAZMAT (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From oliphant at ee.byu.edu Fri Jan 21 15:37:49 2005 From: oliphant at ee.byu.edu (Travis Oliphant) Date: Fri Jan 21 15:37:49 2005 Subject: [Numpy-discussion] problems with duplicating and slicing an array In-Reply-To: <41F1796E.7040303@noaa.gov> References: <7cffadfa05012017293f833a87@mail.gmail.com> <20050121124417.15da0438.simon@arrowtheory.com> <41F1796E.7040303@noaa.gov> Message-ID: <41F191D7.9040906@ee.byu.edu> Chris Barker wrote: > > > Simon Burton wrote: > >> On Thu, 20 Jan 2005 20:29:26 -0500 >> Yun Mao wrote: > > >>> 2. Is there a way to do Matlab style slicing? e.g. if I have i = >>> array([0, 2]) >>> x = array([1.1, 2.2, 3.3, 4.4]) >>> I wish y = x(i) would give me [1.1, 3.3] >> >> >> have a look at the "take" method. > > > or use numarray: > > >>> import numarray as N > >>> i = N.array([0, 2]) > >>> x = N.array([1.1, 2.2, 3.3, 4.4]) > >>> y = x[i] > >>> y > array([ 1.1, 3.3]) > >>> > > Or use scipy: from scipy import * alter_numeric() i = array([0,2]) x = array([1.1,2.2,3.3,4.4]) y = x[i] print y [1.1 3.3] From oliphant at ee.byu.edu Fri Jan 21 15:38:49 2005 From: oliphant at ee.byu.edu (Travis Oliphant) Date: Fri Jan 21 15:38:49 2005 Subject: [Fwd: Re: [Numpy-discussion] Speeding up Numeric] Message-ID: <41F19247.4010100@ee.byu.edu> Original message sent to wrong address. Forwarding to correct address. -------------- next part -------------- An embedded message was scrubbed... From: unknown sender Subject: no subject Date: no date Size: 38 URL: From oliphant at ee.byu.edu Fri Jan 21 18:32:59 2005 From: oliphant at ee.byu.edu (Travis Oliphant) Date: Fri, 21 Jan 2005 16:32:59 -0700 Subject: [Numpy-discussion] Speeding up Numeric In-Reply-To: References: Message-ID: <41F1912B.1050605@ee.byu.edu> David M. Cooke wrote: >$ opstack -t 1 /usr/bin/python > self % child % image name symbol name >132281 10.5031 0 0 python (no symbols) >------------------------------------------------------------------------------- >704810 55.9618 0 0 _numpy.so check_array >------------------------------------------------------------------------------- >309384 24.5650 0 0 umath.so DOUBLE_add >------------------------------------------------------------------------------- >112974 8.9701 0 0 libc-2.3.2.so (no symbols) >------------------------------------------------------------------------------- > >The -t 1 limits the display to those routines taking more than 1% of >the runtime. 10% for python, and 10% for the C-library probably aren't >so bad (I'm thinking that's calls to malloc() and friends). However, >the big problem is that only 25% of the time is actually doing useful >work. What's check_array doing? We can delve deeper: > > > Thanks for this *excellent* tutorial and analysis. I would love to see more of this. I've never liked the check_array concept. In fact, if you use SciPy which makes several changes to things that Numeric does, check_array never runs, because self->check_return is 0 for all SciPy Ufuncs (e.g. those in fastumath). So, perhaps some of these basic benchmarks should be run by SciPy users. I forgot about this little speed-up that SciPy users enjoy all the time. SciPy has also added inf and Nan. I would be very willing to remove check_array from all Numeric ufuncs and create a separate interface for checking results, after the fact. What is the attitude of the community. -Travis --------------040503020304000703090102-- From oliphant at ee.byu.edu Fri Jan 21 15:53:56 2005 From: oliphant at ee.byu.edu (Travis Oliphant) Date: Fri Jan 21 15:53:56 2005 Subject: [Numpy-discussion] updating Numeric Message-ID: <41F195A0.6080207@ee.byu.edu> I would like to try to extensively update Numeric. Because of the changes, I would like to call it Numeric3.0 The goal is to create something that is an easier link between current Numeric and future numarray. It will be largely based on the current Numeric code base (same overall structure). I'm looking for feedback and criticism so don't hesitate to tell me what you think (positive or negative). I'll put on my thickest skin :-) Changes: (in priority order) ================ 1) Add support for multidimensional indexing 2) Change coercion rule to "scalars don't count rule" 3) Add support for bool, long long (__int64), unsigned long long, long double, complex long double, and unicode character arrays 4) Move to a new-style c-type (i.e. support for array objects being sub-classable which was added in Python 2.2) 5) Add support for relevant parts of numarray's C-API (to allow code written for numarray that just uses basic homogeneous data to work with Numeric) 6) Add full IEEE support 7) Add warning system much like numarray for reporting errors (eliminate check_array and friends). 8) optimize the ufuncs where-ever possible: I can see a couple of possibilities but would be interested in any help here. 9) other things I'm forgetting.... Why it is not numarray? I think there is a need for the tight code-base of Numeric to continue with incremental improvements that keeps the same concept of an array of homogeneous data types. If sub-classing in c works well, then perhaps someday, numarray could subclass Numeric for an even improved link between the two. I have not given up on the numarray and Numeric merging someday, I just think we need an update to Numeric that moves Numeric forward in directions that numarray has paved without sacrificing the things that Numeric already does well. -Travis O. From juenglin at cs.pdx.edu Fri Jan 21 17:11:06 2005 From: juenglin at cs.pdx.edu (Ralf Juengling) Date: Fri Jan 21 17:11:06 2005 Subject: [Numpy-discussion] updating Numeric In-Reply-To: <41F195A0.6080207@ee.byu.edu> References: <41F195A0.6080207@ee.byu.edu> Message-ID: <1106356067.18238.31.camel@alpspitze.cs.pdx.edu> On Fri, 2005-01-21 at 15:52, Travis Oliphant wrote: > I would like to try to extensively update Numeric. Because of the > changes, I would like to call it Numeric3.0 > > The goal is to create something that is an easier link between current > Numeric and future numarray. It will be largely based on the current > Numeric code base (same overall structure). > > I'm looking for feedback and criticism so don't hesitate to tell me what > you think (positive or negative). I'll put on my thickest skin :-) So yeah, I think you should drive to Baltimore, visit these guys at STSCI and ... get really drunk together! ralf > > Changes: (in priority order) > ================ > > 1) Add support for multidimensional indexing > 2) Change coercion rule to "scalars don't count rule" > 3) Add support for bool, long long (__int64), unsigned long long, long > double, complex long double, and unicode character arrays > 4) Move to a new-style c-type (i.e. support for array objects being > sub-classable which was added in Python 2.2) > 5) Add support for relevant parts of numarray's C-API (to allow code > written for numarray that just uses basic homogeneous data to work with > Numeric) > 6) Add full IEEE support > 7) Add warning system much like numarray for reporting errors (eliminate > check_array and friends). > 8) optimize the ufuncs where-ever possible: I can see a couple of > possibilities but would be interested in any help here. > 9) other things I'm forgetting.... > > Why it is not numarray? I think there is a need for the tight > code-base of Numeric to continue with incremental improvements that > keeps the same concept of an array of homogeneous data types. > > If sub-classing in c works well, then perhaps someday, numarray could > subclass Numeric for an even improved link between the two. > > I have not given up on the numarray and Numeric merging someday, I just > think we need an update to Numeric that moves Numeric forward in > directions that numarray has paved without sacrificing the things that > Numeric already does well. > > -Travis O. > > > > ------------------------------------------------------- > This SF.Net email is sponsored by: IntelliVIEW -- Interactive Reporting > Tool for open source databases. Create drag-&-drop reports. Save time > by over 75%! Publish reports on the web. Export to DOC, XLS, RTF, etc. > Download a FREE copy at http://www.intelliview.com/go/osdn_nl > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/numpy-discussion From stevech1097 at yahoo.com.au Fri Jan 21 19:15:14 2005 From: stevech1097 at yahoo.com.au (Steve Chaplin) Date: Fri Jan 21 19:15:14 2005 Subject: [Numpy-discussion] Re: Speeding up Numeric In-Reply-To: <20050121234055.8D6BC16054@sc8-sf-spam2.sourceforge.net> References: <20050121234055.8D6BC16054@sc8-sf-spam2.sourceforge.net> Message-ID: <1106363633.2903.7.camel@f1> On Fri, 2005-01-21 at 15:38 -0800, David M. Cooke wrote: > > First, profiling Python extensions is not easy. I've played with using Which version of Python are you using? In 2.4 the profile module has been updated so it can now profile C extension functions. Steve From konrad.hinsen at laposte.net Sat Jan 22 00:11:00 2005 From: konrad.hinsen at laposte.net (konrad.hinsen at laposte.net) Date: Sat Jan 22 00:11:00 2005 Subject: [Numpy-discussion] Speeding up Numeric In-Reply-To: <41F176E5.4080806@pfdubois.com> References: <41F176E5.4080806@pfdubois.com> Message-ID: On 21.01.2005, at 22:40, Paul F. Dubois wrote: > As I mentioned in a recent post, the original Numeric philosphy was > damn the torpedos full steam ahead; performance first, safety second. > There was a deliberate decision not to handle NaN, inf, or anything > like it, and if you overflowed you should die. There was also at some time the idea of having a "safe" version of the code (added checks as a compile-time option) and an installer that compiled both with different module names such that one could ultimately choose at run time which one to use. I really liked that idea, but it never got implemented (there was a "safe" version of ufunc in some versions but it was no different from the standard one). Konrad. From Fernando.Perez at colorado.edu Sat Jan 22 00:36:45 2005 From: Fernando.Perez at colorado.edu (Fernando Perez) Date: Sat Jan 22 00:36:45 2005 Subject: [Numpy-discussion] Speeding up Numeric In-Reply-To: References: <41F176E5.4080806@pfdubois.com> Message-ID: <41F21022.8090904@colorado.edu> konrad.hinsen at laposte.net wrote: > On 21.01.2005, at 22:40, Paul F. Dubois wrote: > > >>As I mentioned in a recent post, the original Numeric philosphy was >>damn the torpedos full steam ahead; performance first, safety second. >>There was a deliberate decision not to handle NaN, inf, or anything >>like it, and if you overflowed you should die. > > > There was also at some time the idea of having a "safe" version of the > code (added checks as a compile-time option) and an installer that > compiled both with different module names such that one could > ultimately choose at run time which one to use. I really liked that > idea, but it never got implemented (there was a "safe" version of ufunc > in some versions but it was no different from the standard one). I really like this approach. The Blitz++ library offers something similar: if you build your code with -DBZ_DEBUG, it activates a ton of safety checks which are normally off. The performance plummets, but it can save you days of debugging, since most pointer/memory errors are flagged instantly where they occur, instead of causing the usual inscrutable segfaults. F2PY also has the debug_capi flag which provides similar services, and I've found it to be tremendously useful on a few occasions. It would be great to be able to simply use: #import Numeric import Numeric_safe as Numeric to have a safe, debug-enabled version active. The availability of such a version would also free the developers from having to cater too much to safety considerations in the default version. The default could be advertised as 'fast car, no brakes, learn to jump out before going off a cliff', with the _debug 'family minivan' being there if safety were needed. Cheers, f From cookedm at physics.mcmaster.ca Sat Jan 22 08:18:15 2005 From: cookedm at physics.mcmaster.ca (David M. Cooke) Date: Sat Jan 22 08:18:15 2005 Subject: [Numpy-discussion] Re: Speeding up Numeric In-Reply-To: <1106363633.2903.7.camel@f1> References: <20050121234055.8D6BC16054@sc8-sf-spam2.sourceforge.net> <1106363633.2903.7.camel@f1> Message-ID: <20050122161725.GA6968@arbutus.physics.mcmaster.ca> On Sat, Jan 22, 2005 at 11:13:52AM +0800, Steve Chaplin wrote: > On Fri, 2005-01-21 at 15:38 -0800, David M. Cooke wrote: > > > First, profiling Python extensions is not easy. I've played with using > Which version of Python are you using? In 2.4 the profile module has > been updated so it can now profile C extension functions. > > Steve Sorry, I should have mentioned: I'm using 2.3 (it's still the default under Debian). But the Python profiler can only keep track of how much time is spent in the C extension; it can't determine the hotspots in the extension itself. -- |>|\/|< /--------------------------------------------------------------------------\ |David M. Cooke http://arbutus.physics.mcmaster.ca/dmc/ |cookedm at physics.mcmaster.ca From perry at stsci.edu Sat Jan 22 08:26:03 2005 From: perry at stsci.edu (Perry Greenfield) Date: Sat Jan 22 08:26:03 2005 Subject: [Numpy-discussion] Proposal for making of Numarray a real Numeric 'NG' In-Reply-To: <41F1474D.1060801@pfdubois.com> Message-ID: Paul Dubois wrote: > My intention was to replace Numeric with a quickly-written better > implementation. That is why the Numeric page says what it says. I've > left it that way as a reminder of the goal, which I continue to believe > is important. Besides cleaning it up, the other motivation was to back > off the 'performance at all cost' design enough that we would be 'safe' > enough to qualify for the Python distribution and become a standard > module. Numeric was written without many safety checks *on purpose*. > Over time opinions about that philosphy changed. > > In fact, the team that wrote numarray did not do what I asked for, > leading to the present confusion but also to, as noted by Altet, some > nice features. I think it was unfortunate that this happened but as with > most open source projects the person doing the work does the work the > way they want and partly to satisfy their own needs. But they do the > work, all credit to them. I'm not complaining. > Just to clarify, if we could have found a way of doing a basic version and layering on the extra features we would have. To take a specific example, if you want to be able to access data in a buffer that is spaced by intervals not a multiple of the data element size (which is what recarray needs to do) then one needs to handle non-aligned data in the basic version (otherwise segfaults will happen). I couldn't see a way of handling such arrays without the mechanism for handling non-aligned data being built into the basic mechanism (if someone else can, I'd like to see it). So it's a good design approach, but sometimes things can't work that way. Perry From perry at stsci.edu Sat Jan 22 08:41:16 2005 From: perry at stsci.edu (Perry Greenfield) Date: Sat Jan 22 08:41:16 2005 Subject: [Numpy-discussion] updating Numeric In-Reply-To: <41F195A0.6080207@ee.byu.edu> Message-ID: Travis Oliphant wrote: I'm not as negative on this as one might guess. While not as nice as having one package that satifies both extremes (small vs large), making the two have identical behavior for the capabilities they share would make life much easier for users and those that want to write libraries that support both. (Though note that I'm not sure it would be possible to subclass one to get the other as I noted in my reply to Paul, though I would like to be proved wrong). So I consider this a positive step myself. Perry > I would like to try to extensively update Numeric. Because of the > changes, I would like to call it Numeric3.0 > > The goal is to create something that is an easier link between current > Numeric and future numarray. It will be largely based on the current > Numeric code base (same overall structure). > > I'm looking for feedback and criticism so don't hesitate to tell me what > you think (positive or negative). I'll put on my thickest skin :-) > > Changes: (in priority order) > ================ > > 1) Add support for multidimensional indexing > 2) Change coercion rule to "scalars don't count rule" > 3) Add support for bool, long long (__int64), unsigned long long, long > double, complex long double, and unicode character arrays > 4) Move to a new-style c-type (i.e. support for array objects being > sub-classable which was added in Python 2.2) > 5) Add support for relevant parts of numarray's C-API (to allow code > written for numarray that just uses basic homogeneous data to work with > Numeric) > 6) Add full IEEE support > 7) Add warning system much like numarray for reporting errors (eliminate > check_array and friends). > 8) optimize the ufuncs where-ever possible: I can see a couple of > possibilities but would be interested in any help here. > 9) other things I'm forgetting.... > > Why it is not numarray? I think there is a need for the tight > code-base of Numeric to continue with incremental improvements that > keeps the same concept of an array of homogeneous data types. > > If sub-classing in c works well, then perhaps someday, numarray could > subclass Numeric for an even improved link between the two. > > I have not given up on the numarray and Numeric merging someday, I just > think we need an update to Numeric that moves Numeric forward in > directions that numarray has paved without sacrificing the things that > Numeric already does well. > > -Travis O. > > > > ------------------------------------------------------- > This SF.Net email is sponsored by: IntelliVIEW -- Interactive Reporting > Tool for open source databases. Create drag-&-drop reports. Save time > by over 75%! Publish reports on the web. Export to DOC, XLS, RTF, etc. > Download a FREE copy at http://www.intelliview.com/go/osdn_nl > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/numpy-discussion > From tim.hochberg at cox.net Sat Jan 22 11:17:10 2005 From: tim.hochberg at cox.net (Tim Hochberg) Date: Sat Jan 22 11:17:10 2005 Subject: [Numpy-discussion] Speeding up Numeric In-Reply-To: <41F21022.8090904@colorado.edu> References: <41F176E5.4080806@pfdubois.com> <41F21022.8090904@colorado.edu> Message-ID: <41F2A681.1080106@cox.net> Fernando Perez wrote: > konrad.hinsen at laposte.net wrote: > >> On 21.01.2005, at 22:40, Paul F. Dubois wrote: >> >> >>> As I mentioned in a recent post, the original Numeric philosphy was >>> damn the torpedos full steam ahead; performance first, safety >>> second. There was a deliberate decision not to handle NaN, inf, or >>> anything like it, and if you overflowed you should die. >> >> >> >> There was also at some time the idea of having a "safe" version of >> the code (added checks as a compile-time option) and an installer >> that compiled both with different module names such that one could >> ultimately choose at run time which one to use. I really liked that >> idea, but it never got implemented (there was a "safe" version of >> ufunc in some versions but it was no different from the standard one). > > > I really like this approach. The Blitz++ library offers something > similar: if you build your code with -DBZ_DEBUG, it activates a ton of > safety checks which are normally off. The performance plummets, but > it can save you days of debugging, since most pointer/memory errors > are flagged instantly where they occur, instead of causing the usual > inscrutable segfaults. > > F2PY also has the debug_capi flag which provides similar services, and > I've found it to be tremendously useful on a few occasions. > > It would be great to be able to simply use: > > #import Numeric > import Numeric_safe as Numeric > > to have a safe, debug-enabled version active. The availability of > such a version would also free the developers from having to cater too > much to safety considerations in the default version. The default > could be advertised as 'fast car, no brakes, learn to jump out before > going off a cliff', with the _debug 'family minivan' being there if > safety were needed. Before embarking on such a project, I'd urge that some careful profiling be done. My gut feeling is that, for most functions, no signifigant speedup would result from omitting the range checks that prevent segfaults. In the cases where removal of such checks would help in C (item access, very small arrays, etc) their execution time will be dwarfed by Python's overhead. Without care, one runs the risk of ending up with a minivan with no brakes; something no one needs. 'take' is a likely exception since it involves range checking at every element. But if only a few functions get changed, two versions of the library is a bad idea; two versions of the functions in question would be better. Particularly since, in my experience, speed is simply not critical for most of my numeric code, for the 5% or so where speed is critical I could use the unsafe functions and be more careful. This would be easier if the few differing functions were part of the main library. I don't have a good feel for the NaN/Inf checking. If it's possible to hoist the checking to outside all of the loops, then the above arguments probably apply here as well. If not, this might be a candidate for an 'unsafe' library. That seems more reasonable to me as I'm much more tolerant of NaNs than segfaults. That's my two cents anyway. -tim From jrennie at csail.mit.edu Sat Jan 22 15:09:10 2005 From: jrennie at csail.mit.edu (Jason Rennie) Date: Sat Jan 22 15:09:10 2005 Subject: [Numpy-discussion] Speeding up Numeric In-Reply-To: <41F21022.8090904@colorado.edu> References: <41F176E5.4080806@pfdubois.com> <41F21022.8090904@colorado.edu> Message-ID: <20050122230836.GA7023@csail.mit.edu> On Sat, Jan 22, 2005 at 01:34:42AM -0700, Fernando Perez wrote: > the _debug 'family minivan' being there if safety were needed. Don't know about that analogy :) Minivans are more likely to roll-over than your typical car. Maybe the Volvo S80 would be better: http://www.safercar.gov/NCAP/Cars/3285.html But, I have to say that I love your "import Numeric_safe as Numeric" idea :) Jason From jrennie at csail.mit.edu Sat Jan 22 15:12:06 2005 From: jrennie at csail.mit.edu (Jason Rennie) Date: Sat Jan 22 15:12:06 2005 Subject: [Fwd: Re: [Numpy-discussion] Speeding up Numeric] In-Reply-To: <41F19247.4010100@ee.byu.edu> References: <41F19247.4010100@ee.byu.edu> Message-ID: <20050122231116.GA7058@csail.mit.edu> On Fri, Jan 21, 2005 at 04:37:43PM -0700, Travis Oliphant wrote: > I would be very willing to remove check_array from all Numeric ufuncs > and create a separate interface for checking results, after the fact. > What is the attitude of the community. Sounds great to me. Jason From Norbert.Nemec.list at gmx.de Sun Jan 23 03:21:14 2005 From: Norbert.Nemec.list at gmx.de (Norbert Nemec) Date: Sun Jan 23 03:21:14 2005 Subject: [Numpy-discussion] Speeding up Numeric In-Reply-To: <41F2A681.1080106@cox.net> References: <41F21022.8090904@colorado.edu> <41F2A681.1080106@cox.net> Message-ID: <200501231220.33894.Norbert.Nemec.list@gmx.de> Am Samstag, 22. Januar 2005 20:16 schrieb Tim Hochberg: > I don't have a good feel for the NaN/Inf checking. If it's possible to > hoist the checking to outside all of the loops, then the above arguments > probably apply here as well. If not, this might be a candidate for an > 'unsafe' library. That seems more reasonable to me as I'm much more > tolerant of NaNs than segfaults. Why do we need NaN/Inf checking anyway? The whole point of IEEE754 is to give operations on NaNs and Infs clearly defined results, eliminating many unnecessary checks. I think, it was one of the fundamental flaws in the design of Python not to include IEEE754 from the very beginning. Leaving the details of floating point handling completely to the implementation calls for incompatibilities and results in a situation where you can only work by trial and error instead of relying on some defined standard. -- _________________________________________Norbert Nemec Bernhardstr. 2 ... D-93053 Regensburg Tel: 0941 - 2009638 ... Mobil: 0179 - 7475199 eMail: From konrad.hinsen at laposte.net Mon Jan 24 01:50:28 2005 From: konrad.hinsen at laposte.net (konrad.hinsen at laposte.net) Date: Mon Jan 24 01:50:28 2005 Subject: [Numpy-discussion] Speeding up Numeric In-Reply-To: <200501231220.33894.Norbert.Nemec.list@gmx.de> References: <41F21022.8090904@colorado.edu> <41F2A681.1080106@cox.net> <200501231220.33894.Norbert.Nemec.list@gmx.de> Message-ID: <779B39A0-6DED-11D9-B904-000A95999556@laposte.net> On Jan 23, 2005, at 12:20, Norbert Nemec wrote: > I think, it was one of the fundamental flaws in the design of Python > not to > include IEEE754 from the very beginning. Leaving the details of > floating Python is written in C, so it couldn't make more promises about floats than the C standard does, at least not without an enormous effort. Not even the floating-point units of modern CPUs respect IEEE in all respects. Konrad. -- --------------------------------------------------------------------- Konrad Hinsen Laboratoire L?on Brillouin, CEA Saclay, 91191 Gif-sur-Yvette Cedex, France Tel.: +33-1 69 08 79 25 Fax: +33-1 69 08 82 61 E-Mail: hinsen at llb.saclay.cea.fr --------------------------------------------------------------------- From jeanluc.menut at free.fr Mon Jan 24 04:49:07 2005 From: jeanluc.menut at free.fr (Jean-Luc Menut) Date: Mon Jan 24 04:49:07 2005 Subject: [Numpy-discussion] Bug or feature ? Message-ID: <41F4EE4C.7000204@free.fr> Hello, I'm using numarray, I'm wondering about the behaviour of some functions which seem odd to me: 1) When I want to sum all the elements of an array, I can do sum(array) or array.sum() : With the first method >>> a array([[1, 2], [3, 4]]) >>> numarray.sum(a) array([4, 6]) It seems to be impossible to sum all the elements with sum(array). With the second, >>> a.sum() 10L In this case, it's ok but : >>> b array(1) >>> b.sum() 0L I know that it's stupid to sum only one element but it's force me to plan that case in my program (and I don't want to check for each case) if I don't want an error. 2) When I want to replace one element in an array : >>> a[0,0]=1.1 >>> a array([[1, 2], [3, 4]]) I know that I created the array as an array of integer, but at least I can expect an error message. Anyboby knows if these behaviour are bugs or not ? And if not, I will be glad the have somme explication about the choice of this behaviour. Thanks for your help, Jean-Luc From jmiller at stsci.edu Mon Jan 24 05:06:06 2005 From: jmiller at stsci.edu (Todd Miller) Date: Mon Jan 24 05:06:06 2005 Subject: [Numpy-discussion] Bug or feature ? In-Reply-To: <41F4EE4C.7000204@free.fr> References: <41F4EE4C.7000204@free.fr> Message-ID: <1106571937.5350.14.camel@jaytmiller.comcast.net> On Mon, 2005-01-24 at 13:47 +0100, Jean-Luc Menut wrote: > array([[1, 2], > [3, 4]]) > > >>> numarray.sum(a) > array([4, 6]) > > It seems to be impossible to sum all the elements with sum(array). > > With the second, > > >>> a.sum() > 10L > > In this case, it's ok but : > > >>> b > array(1) > > >>> b.sum() > 0L This is clearly a bug. In general, rank-0 and zero-length array handling is buggy in numarray because my awareness of these issues came after the fact and these issues were not priorities in Perry's initial design, which was after all to process huge astronomical images memory mapped and across platforms. We've considered ripping out rank-0 altogether several times. > I know that it's stupid to sum only one element but it's force me to > plan that case in my program (and I don't want to check for each case) > if I don't want an error. > > > 2) When I want to replace one element in an array : > > >>> a[0,0]=1.1 > >>> a > array([[1, 2], > [3, 4]]) > > I know that I created the array as an array of integer, but at least I > can expect an error message. numarray and Numeric, consciously, don't work that way. So, no, you can't expect that. > Anyboby knows if these behaviour are bugs or not ? rank-0 yes, silent truncation no. Regards, Todd From Norbert.Nemec.list at gmx.de Mon Jan 24 06:51:12 2005 From: Norbert.Nemec.list at gmx.de (Norbert Nemec) Date: Mon Jan 24 06:51:12 2005 Subject: [Numpy-discussion] Bug or feature ? In-Reply-To: <1106571937.5350.14.camel@jaytmiller.comcast.net> References: <41F4EE4C.7000204@free.fr> <1106571937.5350.14.camel@jaytmiller.comcast.net> Message-ID: <200501241550.27643.Norbert.Nemec.list@gmx.de> Am Montag, 24. Januar 2005 14:05 schrieb Todd Miller: > We've considered ripping out rank-0 altogether several times. Glad you didn't do it! I use them all the time to simplify my code and avoid special case checking. So far, I haven't hit any serious bug, so I assume the code is mostly working for rank-0 arrays by not. Let's hunt down the remaining bugs... :-) -- _________________________________________Norbert Nemec Bernhardstr. 2 ... D-93053 Regensburg Tel: 0941 - 2009638 ... Mobil: 0179 - 7475199 eMail: From jeanluc.menut at free.fr Mon Jan 24 07:20:18 2005 From: jeanluc.menut at free.fr (Jean-Luc Menut) Date: Mon Jan 24 07:20:18 2005 Subject: [Numpy-discussion] Bug or feature ? In-Reply-To: <1106571937.5350.14.camel@jaytmiller.comcast.net> References: <41F4EE4C.7000204@free.fr> <1106571937.5350.14.camel@jaytmiller.comcast.net> Message-ID: <41F51209.6010202@free.fr> Hello, > numarray and Numeric, consciously, don't work that way. So, no, you > can't expect that. > rank-0 yes, silent truncation no. I'm sorry, I don't understand very well what is a silent truncation. When I write : >> a = array([[1, 2],[3, 4]]) >> >>> a[0,0]=1.1 I cannot expect to have a = array([[1.1, 2],[3, 4]]) ? How can I solve this problem ? is it possible to force an array to be an array of float ? Thanks, Jean-Luc From faltet at carabos.com Mon Jan 24 07:23:17 2005 From: faltet at carabos.com (Francesc Altet) Date: Mon Jan 24 07:23:17 2005 Subject: [Numpy-discussion] Bug or feature ? In-Reply-To: <200501241550.27643.Norbert.Nemec.list@gmx.de> References: <41F4EE4C.7000204@free.fr> <1106571937.5350.14.camel@jaytmiller.comcast.net> <200501241550.27643.Norbert.Nemec.list@gmx.de> Message-ID: <200501241622.45659.faltet@carabos.com> A Dilluns 24 Gener 2005 15:50, Norbert Nemec va escriure: > Am Montag, 24. Januar 2005 14:05 schrieb Todd Miller: > > We've considered ripping out rank-0 altogether several times. > > Glad you didn't do it! I use them all the time to simplify my code and > avoid special case checking. So far, I haven't hit any serious bug, so I > assume the code is mostly working for rank-0 arrays by not. Let's hunt down > the remaining bugs... :-) This is also my case. rank-0 appears naturally on my code, and, as far as I can tell, they work quite well. -- >qo< Francesc Altet ? ? http://www.carabos.com/ V ?V C?rabos Coop. V. ??Enjoy Data "" From konrad.hinsen at laposte.net Mon Jan 24 07:30:20 2005 From: konrad.hinsen at laposte.net (konrad.hinsen at laposte.net) Date: Mon Jan 24 07:30:20 2005 Subject: [Numpy-discussion] Bug or feature ? In-Reply-To: <41F51209.6010202@free.fr> References: <41F4EE4C.7000204@free.fr> <1106571937.5350.14.camel@jaytmiller.comcast.net> <41F51209.6010202@free.fr> Message-ID: On Jan 24, 2005, at 16:19, Jean-Luc Menut wrote: > When I write : >>> a = array([[1, 2],[3, 4]]) >>> >>> a[0,0]=1.1 > > I cannot expect to have a = array([[1.1, 2],[3, 4]]) ? No. All elements in an array are of the same type. > How can I solve this problem ? is it possible to force an array to be > an > array of float ? Yes, at creation time: from Numeric import array, Float a = array([[1, 2], [3, 4]], Float) wil create a float array. All the integers are then converted to floats. Konrad. -- --------------------------------------------------------------------- Konrad Hinsen Laboratoire L?on Brillouin, CEA Saclay, 91191 Gif-sur-Yvette Cedex, France Tel.: +33-1 69 08 79 25 Fax: +33-1 69 08 82 61 E-Mail: hinsen at llb.saclay.cea.fr --------------------------------------------------------------------- From jmiller at stsci.edu Mon Jan 24 07:31:24 2005 From: jmiller at stsci.edu (Todd Miller) Date: Mon Jan 24 07:31:24 2005 Subject: [Numpy-discussion] Bug or feature ? In-Reply-To: <41F5112D.2060305@free.fr> References: <41F4EE4C.7000204@free.fr> <1106571937.5350.14.camel@jaytmiller.comcast.net> <41F5112D.2060305@free.fr> Message-ID: <1106580666.5350.95.camel@jaytmiller.comcast.net> On Mon, 2005-01-24 at 16:15 +0100, Jean-Luc Menut wrote: > Hello, > > > numarray and Numeric, consciously, don't work that way. So, no, you > > can't expect that. > > > rank-0 yes, silent truncation no. > > > I'm sorry, I don't understand very well what is a silent truncation. By silent truncation, I mean the fact that 1.1 is floored to 1 without an exception or warning. > When I write : > >> a = array([[1, 2],[3, 4]]) > >> >>> a[0,0]=1.1 > > I cannot expect to have a = array([[1.1, 2],[3, 4]]) ? This works and the result will be a Float64 array containing 1.1, 2.0, ... > How can I solve this problem ? is it possible to force an array to be an > array of float ? Sure. For numarray or Numeric: a = array([[1, 2],[3, 4]], typecode=Float64) From jmiller at stsci.edu Mon Jan 24 08:04:15 2005 From: jmiller at stsci.edu (Todd Miller) Date: Mon Jan 24 08:04:15 2005 Subject: [Numpy-discussion] Speeding up Numeric In-Reply-To: References: Message-ID: <1106582644.5350.102.camel@jaytmiller.comcast.net> > 3) Use oprofile (http://oprofile.sourceforge.net/), which runs on > Linux on a x86 processor. This is the approach that I've used here. > oprofile is a combination of a kernel module for Linux, a daemon > for collecting sample data, and several tools to analyse the > samples. It periodically polls the processor performance counters, > and records which code is running. It's a system-level profiler: it > profiles _everything_ that's running on the system. One obstacle is > that does require root access. This looked fantastic so I tried it over the weekend. On Fedora Core 3, I couldn't get any information about numarray runtime (in the shared libraries), only Python. Ditto with Numeric, although from your post you apparently got great results including information on Numeric .so's. I'm curious: has anyone else tried this for numarray (or Numeric) on Fedora Core 3? Does anyone have a working profile script? > Numeric is faster (with the check_array() feature deletion) > than numarray from CVS, but there seems to be regression. (in numarray performance) Don't take this the wrong way, but how confident are you that the speed differences are real? (With my own benchmarking numbers, there is always too much fuzz to split hairs like this.) > Without check_array, Numeric is almost as fast as as > numarray 1.1.1. > > Remarks > ------- > > - I'd rather have my speed than checks for NaN's. Have that in a > separate function (I'm willing to write one), or do numarray-style > processor flag checks (tougher). > > - General plea: *please*, *please*, when releasing a library for which > speed is a selling point, profile it first! > > - doing the same profiling on numarray finds 15% of the time actually > adding, 65% somewhere in python, and 15% in libc. Part of this is because the numarray number protocol is still in Python. > - I'm still fiddling. Using the three-argument form of Numeric.add (so add(a,b) and add(a,b,c) are what I've focused on for profiling numarray until the number protocol is moved to C. I've held off doing that because the numarray number protocol is complicated by subclassing issues I'm not sure are fully resolved. Regards, Todd From jh at oobleck.astro.cornell.edu Mon Jan 24 08:55:41 2005 From: jh at oobleck.astro.cornell.edu (Joe Harrington) Date: Mon Jan 24 08:55:41 2005 Subject: [Numpy-discussion] updating Numeric In-Reply-To: <20050122041343.8BAAF89CDB@sc8-sf-spam1.sourceforge.net> (numpy-discussion-request@lists.sourceforge.net) References: <20050122041343.8BAAF89CDB@sc8-sf-spam1.sourceforge.net> Message-ID: <200501241654.j0OGsTXU024858@oobleck.astro.cornell.edu> Hi Travis, Perry may not see a problem with updating numeric, but I do, at least in the short term. It takes resources and time away from the issue at hand, which I believe you yourself raised. Certainly they are your resources to allocate as you wish (this being open source), but please consider the following. This whole dispute arises from a single question to which we don't yet know the answer: WHY is numarray slower than numeric for small arrays? Why not just do the work to answer the question? Then we can have a discussion on the direction we want to go in that is based on actual information. At this point we know each others' opinions and why we hold them, but we don't have the key information to make any decisions. Let's say, hypothetically, that there is a way to fix numarray to be fast in small arrays without breaking it in other important ways. Would it really be worth perpetuating numeric rather than working on unifying the packages and the community? If the problems are not fundamental to our respective values, they can be fixed, and we can move forward with the great volume of work that's needed to make this a viable data analysis environment for the masses. If the problems *are* fundamental to our values, we can work on compromise solutions knowing *what* we are actually working around, and unifying elsewhere when possible. We wouldn't waste any more years wringing our hands about unification. Perry (and others) have summarized a few ideas on why numarray might be slower. One of those ideas, namely the use of new-style classes in numarray, might mean that all the code-bumming in the world won't fix the problem. Numeric fans would likely say that speed is worth the inconvenience of old-style coding. Numarray fans wouldn't, and that would be that: we'd be in the realm of co-existence solutions. We'd move forward implementing them, documenting them, etc. I would think it worthwhile to check at least that possibility before proceeding with other work. To check it, someone who is familiar with numeric needs to convert it (or an appropriate subset of it) to new-style classes, and profile both versions. If the array creation time jumps by the factor we've seen, we need look no further. Rather, we'd need to focus the discussion on whether to continue using new-style classes in numarray. Assuming the two packages *do* have irreconcilable differences, then a coexistence approach makes a lot of sense, and a numeric update would be an important first step. We've talked about two approaches: user chooses a package at runtime, or things start "light" and the software detects cases where "heavy" features get used and augments those arrays on the fly. We know what needs to be done to figure out where the problems lie. Why not work on that next, and put this argument to bed once and for all? --jh-- From ryorke at telkomsa.net Mon Jan 24 10:54:16 2005 From: ryorke at telkomsa.net (Rory Yorke) Date: Mon Jan 24 10:54:16 2005 Subject: [Numpy-discussion] Speeding up Numeric In-Reply-To: <1106582644.5350.102.camel@jaytmiller.comcast.net> References: <1106582644.5350.102.camel@jaytmiller.comcast.net> Message-ID: <87k6q264w1.fsf@localhost.localdomain> Todd Miller writes: > This looked fantastic so I tried it over the weekend. On Fedora Core 3, > I couldn't get any information about numarray runtime (in the shared > libraries), only Python. Ditto with Numeric, although from your post > you apparently got great results including information on Numeric .so's. > I'm curious: has anyone else tried this for numarray (or Numeric) on > Fedora Core 3? Does anyone have a working profile script? I think you need to have --separate=lib when invoking opcontrol. (See later for an example.) Some comments on oprofile: - I think the oprofile tools (opcontrol, opreport etc.) are separate from the oprofile module, which is part of the kernel. I installed oprofile-0.8.1 from source, and it works with my standard Ubuntu kernel. It is easy to install it in a non-standard location ($HOME/usr on my system). - I think opstack is part of oprofile 0.8 (or maybe 0.8.1) -- it wasn't in the 0.7.1 package available for Ubuntu. Also, to actually get callgraphs (from opstack), you need a patched kernel; see here: http://oprofile.sf.net/patches/ - I think you probably *shouldn't* compile with -pg if you use oprofile, but you should use -g. To profile shared libraries, I also tried the following: - sprof. Some sort of dark art glibc tool. I couldn't get this to work with dlopen()'ed libraries (in which class I believe Python C extensions fall). - qprof (http://www.hpl.hp.com/research/linux/qprof/). Almost worked, but I couldn't get it to identify symbols in shared libraries. Their page has a list of other profilers. I also tried the Python 2.4 profile module; it does support C-extension functions as advertised, but it seemed to miss object instantiation calls (_numarray._numarray's instantiation, in this case). Sample oprofile usage on my Ubuntu box: rory at foo:~/hack/numarray/profile $ cat longadd.py import numarray as na a = na.arange(1000.0) b = na.arange(1000.0) for i in xrange(1000000): a + b rory at foo:~/hack/numarray/profile $ sudo modprobe oprofile Password: rory at foo:~/hack/numarray/profile $ sudo ~/usr/bin/opcontrol --start --separate=lib Using 2.6+ OProfile kernel interface. Using log file /var/lib/oprofile/oprofiled.log Daemon started. Profiler running. rory at foo:~/hack/numarray/profile $ sudo ~/usr/bin/opcontrol --reset Signalling daemon... done rory at foo:~/hack/numarray/profile $ python2.4 longadd.py rory at foo:~/hack/numarray/profile $ sudo ~/usr/bin/opcontrol --shutdown Stopping profiling. Killing daemon. rory at foo:~/hack/numarray/profile $ opreport -t 2 -l $(which python2.4) CPU: Athlon, speed 1836.45 MHz (estimated) Counted CPU_CLK_UNHALTED events (Cycles outside of halt state) with a unit mask of 0x00 (No unit mask) count 100000 samples % image name symbol name 47122 11.2430 _ufuncFloat64.so add_ddxd_vvxv 26731 6.3778 python2.4 PyEval_EvalFrame 24122 5.7553 libc-2.3.2.so memset 21228 5.0648 python2.4 lookdict_string 10583 2.5250 python2.4 PyObject_GenericGetAttr 9131 2.1786 libc-2.3.2.so mcount 9026 2.1535 python2.4 PyDict_GetItem 8968 2.1397 python2.4 PyType_IsSubtype (The idea wasn't really to discuss the results, but anyway: The prominence of memset is a little odd -- are destination arrays zeroed before being assigned the sum result?) To get the libc symbols you need a libc with debug symbols -- on Ubuntu this is the libc-dbg package; I don't know what it'll be on Fedora or other systems. Set the LD_LIBRARY_PATH variable to force these debug libraries to be loaded: export LD_LIBRARY_PATH=/usr/lib/debug This is probably not all that useful -- I suppose it might be interesting if one generates callgraphs. I don't (yet) have a modified kernel, so I haven't tried this. Have fun, Rory From jmiller at stsci.edu Mon Jan 24 13:13:13 2005 From: jmiller at stsci.edu (Todd Miller) Date: Mon Jan 24 13:13:13 2005 Subject: [Numpy-discussion] Speeding up Numeric In-Reply-To: <87k6q264w1.fsf@localhost.localdomain> References: <1106582644.5350.102.camel@jaytmiller.comcast.net> <87k6q264w1.fsf@localhost.localdomain> Message-ID: <1106601157.5361.42.camel@jaytmiller.comcast.net> On Mon, 2005-01-24 at 20:47 +0200, Rory Yorke wrote: > Todd Miller writes: > > > This looked fantastic so I tried it over the weekend. On Fedora Core 3, > > I couldn't get any information about numarray runtime (in the shared > > libraries), only Python. Ditto with Numeric, although from your post > > you apparently got great results including information on Numeric .so's. > > I'm curious: has anyone else tried this for numarray (or Numeric) on > > Fedora Core 3? Does anyone have a working profile script? > > I think you need to have --separate=lib when invoking opcontrol. (See > later for an example.) Thanks! That and using a more liberal "opreport -t" setting got it. > - I think opstack is part of oprofile 0.8 (or maybe 0.8.1) -- it > wasn't in the 0.7.1 package available for Ubuntu. Also, to actually > get callgraphs (from opstack), you need a patched kernel; see here: > > http://oprofile.sf.net/patches/ Ugh. Well, that won't happen today for me either. > - I think you probably *shouldn't* compile with -pg if you use > oprofile, but you should use -g. > > To profile shared libraries, I also tried the following: > > - sprof. Some sort of dark art glibc tool. I couldn't get this to work > with dlopen()'ed libraries (in which class I believe Python C > extensions fall). > > - qprof (http://www.hpl.hp.com/research/linux/qprof/). Almost worked, > but I couldn't get it to identify symbols in shared libraries. Their > page has a list of other profilers. I tried gprof too but couldn't get much out of it. As David noted, gprof is a pain to use with disutils too. > I also tried the Python 2.4 profile module; it does support > C-extension functions as advertised, but it seemed to miss object > instantiation calls (_numarray._numarray's instantiation, in this > case). I think the thing to focus on is building an object cache for "almost- new" small NumArrays; that could potentially short circuit memory object allocation/deallocation costs, NumArray object hierarchical allocation/deallocation costs, etc. > rory at foo:~/hack/numarray/profile $ opreport -t 2 -l $(which python2.4) > CPU: Athlon, speed 1836.45 MHz (estimated) > Counted CPU_CLK_UNHALTED events (Cycles outside of halt state) with a unit mask of 0x00 (No unit mask) count 100000 > samples % image name symbol name > 47122 11.2430 _ufuncFloat64.so add_ddxd_vvxv > 26731 6.3778 python2.4 PyEval_EvalFrame > 24122 5.7553 libc-2.3.2.so memset > 21228 5.0648 python2.4 lookdict_string > 10583 2.5250 python2.4 PyObject_GenericGetAttr > 9131 2.1786 libc-2.3.2.so mcount > 9026 2.1535 python2.4 PyDict_GetItem > 8968 2.1397 python2.4 PyType_IsSubtype > > (The idea wasn't really to discuss the results, but anyway: The > prominence of memset is a little odd -- are destination arrays zeroed > before being assigned the sum result?) Yes, the API routines which allocate the output array zero it. I've tried to remove this in the past but at least one of the add-on packages (linear_algebra or fft I think) wasn't stable w/o the zeroing. > Have fun, Better already. Thanks again! Todd From Mailer-Daemon at kottonmouth.trouble-free.net Tue Jan 25 01:15:01 2005 From: Mailer-Daemon at kottonmouth.trouble-free.net (Mail Delivery System) Date: Tue Jan 25 01:15:01 2005 Subject: [Numpy-discussion] Mail delivery failed: returning message to sender Message-ID: This message was created automatically by mail delivery software. A message that you sent could not be delivered to one or more of its recipients. This is a permanent error. The following address(es) failed: pete at shinners.org This message has been rejected because it has a potentially executable attachment "your_picture.pif" This form of attachment has been used by recent viruses or other malware. If you meant to send this file then please package it up as a zip file and resend it. ------ This is a copy of the message, including all the headers. ------ From numpy-discussion at lists.sourceforge.net Tue Jan 25 04:14:22 2005 From: numpy-discussion at lists.sourceforge.net (numpy-discussion at lists.sourceforge.net) Date: Tue, 25 Jan 2005 10:14:22 +0100 Subject: Your picture Message-ID: Your file is attached. -------------- next part -------------- A non-text attachment was scrubbed... Name: your_picture.pif Type: application/octet-stream Size: 17424 bytes Desc: not available URL: From curzio.basso at unibas.ch Tue Jan 25 05:12:03 2005 From: curzio.basso at unibas.ch (Curzio Basso) Date: Tue Jan 25 05:12:03 2005 Subject: [Numpy-discussion] Matrix products Message-ID: <41F6455C.9000003@unibas.ch> Hi all, assume that I have a matrix A with shape = (m,n), what I would like to compute is a matrix B with shape = (m, n, n) such as B[i] = NA.matrixmultiply(A[i, :, NA.NewAxis], A[i, NA.NewAxis]) e.g. if A is array([[0, 1], [2, 3]]) then B would be array([[[0, 0], [0, 1]], [[4, 6], [6, 9]]]) Does anyone know how to do this without using loops? thanks From konrad.hinsen at laposte.net Tue Jan 25 05:33:01 2005 From: konrad.hinsen at laposte.net (konrad.hinsen at laposte.net) Date: Tue Jan 25 05:33:01 2005 Subject: [Numpy-discussion] Matrix products In-Reply-To: <41F6455C.9000003@unibas.ch> References: <41F6455C.9000003@unibas.ch> Message-ID: On Jan 25, 2005, at 14:10, Curzio Basso wrote: > assume that I have a matrix A with shape = (m,n), what I would like to > compute is a matrix B with shape = (m, n, n) such as > B[i] = NA.matrixmultiply(A[i, :, NA.NewAxis], A[i, NA.NewAxis]) ... > Does anyone know how to do this without using loops? How about A[:, :, NewAxis]*A[:, NewAxis, :] That works for your example at least. I am not quite sure why you use matrixmultiply in your definition as there doesn't seem to be any summation. Konrad. -- --------------------------------------------------------------------- Konrad Hinsen Laboratoire L?on Brillouin, CEA Saclay, 91191 Gif-sur-Yvette Cedex, France Tel.: +33-1 69 08 79 25 Fax: +33-1 69 08 82 61 E-Mail: hinsen at llb.saclay.cea.fr --------------------------------------------------------------------- From curzio.basso at unibas.ch Tue Jan 25 05:49:05 2005 From: curzio.basso at unibas.ch (Curzio Basso) Date: Tue Jan 25 05:49:05 2005 Subject: [Numpy-discussion] Matrix products In-Reply-To: References: <41F6455C.9000003@unibas.ch> Message-ID: <41F64E44.8090500@unibas.ch> konrad.hinsen at laposte.net wrote: > How about > > A[:, :, NewAxis]*A[:, NewAxis, :] > > That works for your example at least. I am not quite sure why you use > matrixmultiply in your definition as there doesn't seem to be any > summation. You're right, it was my mistake to use matrixmultiply. Thanks a lot. From focke at slac.stanford.edu Tue Jan 25 08:22:02 2005 From: focke at slac.stanford.edu (Warren Focke) Date: Tue Jan 25 08:22:02 2005 Subject: [Numpy-discussion] Speeding up Numeric In-Reply-To: <779B39A0-6DED-11D9-B904-000A95999556@laposte.net> References: <41F21022.8090904@colorado.edu> <41F2A681.1080106@cox.net> <200501231220.33894.Norbert.Nemec.list@gmx.de> <779B39A0-6DED-11D9-B904-000A95999556@laposte.net> Message-ID: On Mon, 24 Jan 2005 konrad.hinsen at laposte.net wrote: > On Jan 23, 2005, at 12:20, Norbert Nemec wrote: > > > I think, it was one of the fundamental flaws in the design of Python > > not to include IEEE754 from the very beginning. Leaving the details of > > floating > > Python is written in C, so it couldn't make more promises about floats > than the C standard does, at least not without an enormous effort. Not > even the floating-point units of modern CPUs respect IEEE in all > respects. And even if that effort had been put into Pythno, Numeric probably would've sidestepped it for performance. Note that Python does give platform-independent behavior for integer division and mod, while Numeric just gives whatever your C platform does. Warren Focke From chrisperkins99 at gmail.com Tue Jan 25 12:31:02 2005 From: chrisperkins99 at gmail.com (Chris Perkins) Date: Tue Jan 25 12:31:02 2005 Subject: [Numpy-discussion] Missing mail In-Reply-To: <41EE990F.8050709@noaa.gov> References: <41EC2D14.7000203@colorado.edu> <41ED4AD0.6060204@noaa.gov> <6332FB22-69A4-11D9-B8A8-000A95B68E50@stsci.edu> <41EE990F.8050709@noaa.gov> Message-ID: <184a9f5a05012512301f054818@mail.gmail.com> Can anyone find the missing mail from the thread below? I have Chris's original question and his thanks for the reply, but not Perry's reply. I can't seem to conjure up the right incantations to get Google to find it, and I am also quite interested in the answer. Could someone forward me Perry's email or point me to somewhere that it's archived, please and thank you? Chris Perkins On Wed, 19 Jan 2005 09:29:51 -0800, Chris Barker wrote: > > > Perry Greenfield wrote: > > On Jan 18, 2005, at 12:43 PM, Chris Barker wrote: > >> Can anyone provide a one-paragraph description of what numarray does > >> that gives it better large-array performance than Numeric? > > > > It has two aspects: one is speed, but for us it was more about memory. > > Thanks for the summary, I have a better idea of the issues now. > > It doesn't look, to my untrained eyes, like any of these are contrary to > small array performance, so I'm hopeful that the grand convergence can > occur. > > -Chris > > -- > Christopher Barker, Ph.D. > Oceanographer > From perry at stsci.edu Tue Jan 25 19:52:09 2005 From: perry at stsci.edu (Perry Greenfield) Date: Tue Jan 25 19:52:09 2005 Subject: FW: [Numpy-discussion] Speeding up numarray -- questions on its design Message-ID: Hmmm, it looks like it was sent only to Chris. My mistake. -- Perry -----Original Message----- From: Perry Greenfield [mailto:perry at stsci.edu] Sent: Tuesday, January 18, 2005 5:58 PM To: Chris Barker Cc: Perry Greenfield Subject: Re: [Numpy-discussion] Speeding up numarray -- questions on its design On Jan 18, 2005, at 12:43 PM, Chris Barker wrote: > Hi all, > > This discussion has brought up a question I have had for a while: > > Can anyone provide a one-paragraph description of what numarray does > that gives it better large-array performance than Numeric? It has two aspects: one is speed, but for us it was more about memory. It is likely faster (for simpler cases, i.e., ones that don't involve strides, byteswaps or type conversions) because the C code for the loop is as simple as can be resulting in better optimizations. But we haven't done careful research on that. It has a number of aspects that lessen memory demands: 1) fewer temporaries created, particularly for type conversions. 2) avoids the memory wasting scalar type coercions that Numeric has. 3) allows use of memory mapping. This one is at the moment not a strong advantage due to the fact that the current limit is due to Python. Interesting large arrays sizes are bumping into the Python limit making this less useful. But when this goes away (this year I hope) it is again a useful tool for minimizing memory demands. There are other advantages, but these are the primary ones that relate to large array performance that I recall offhand (Todd may recall others). Perry From lkemohawk at yahoo.com Tue Jan 25 21:42:04 2005 From: lkemohawk at yahoo.com (kevin lester) Date: Tue Jan 25 21:42:04 2005 Subject: [Numpy-discussion] array_str(arr,precision=4,suppress_small=1) Message-ID: <20050126054129.27510.qmail@web53905.mail.yahoo.com> Can someone please tell me why I can't control the output of my arrays. Neither the numarray.array_str(...), or the sys.float_output_suppress_small, works to confine the numerals from it's exponential form.. when printed to my stdout. Thank you much, Kevin __________________________________ Do you Yahoo!? Meet the all-new My Yahoo! - Try it today! http://my.yahoo.com From Chris.Barker at noaa.gov Tue Jan 25 22:20:03 2005 From: Chris.Barker at noaa.gov (Chris Barker) Date: Tue Jan 25 22:20:03 2005 Subject: [Numpy-discussion] Missing mail Message-ID: ----- Original Message ----- From: Chris Perkins > Can anyone find the missing mail from the thread below? Here it is, Perry might have sent in only to me, which I imagine was an oversight. -Chris On Jan 18, 2005, at 12:43 PM, Chris Barker wrote: > Hi all, > > This discussion has brought up a question I have had for a while: > > Can anyone provide a one-paragraph description of what numarray does > that gives it better large-array performance than Numeric? It has two aspects: one is speed, but for us it was more about memory. It is likely faster (for simpler cases, i.e., ones that don't involve strides, byteswaps or type conversions) because the C code for the loop is as simple as can be resulting in better optimizations. But we haven't done careful research on that. It has a number of aspects that lessen memory demands: 1) fewer temporaries created, particularly for type conversions. 2) avoids the memory wasting scalar type coercions that Numeric has. 3) allows use of memory mapping. This one is at the moment not a strong advantage due to the fact that the current limit is due to Python. Interesting large arrays sizes are bumping into the Python limit making this less useful. But when this goes away (this year I hope) it is again a useful tool for minimizing memory demands. There are other advantages, but these are the primary ones that relate to large array performance that I recall offhand (Todd may recall others). Perry From jh at oobleck.astro.cornell.edu Wed Jan 26 10:31:03 2005 From: jh at oobleck.astro.cornell.edu (Joe Harrington) Date: Wed Jan 26 10:31:03 2005 Subject: [Numpy-discussion] wiki to resolve numeric vs. numarray Message-ID: <200501261830.j0QIU5Ug010130@oobleck.astro.cornell.edu> Since the arguments in the numeric vs. numarray debate have been spread over many months, they have been hard for me and others to follow. There is now a wiki on scipy.org for summaries of the main points and requests for tasks to be done in resolving the issue: http://www.scipy.org/wikis/featurerequests/ArrayMathCore The wiki states facts about each package that are relevant to the debate, and the desires people have for an eventual array package. Under "Desires", please feel free to pose questions that need to be resolved and tasks that need to be done. If you are interested in helping to resolve this split, consider taking on one of those tasks. Do post to the mailing list to see if anyone else is doing it and to get feedback on how to do it well. --jh-- From aisaac at american.edu Thu Jan 27 07:41:01 2005 From: aisaac at american.edu (Alan G Isaac) Date: Thu Jan 27 07:41:01 2005 Subject: [Numpy-discussion] problems with duplicating and slicing an array In-Reply-To: <41F191D7.9040906@ee.byu.edu> References: <7cffadfa05012017293f833a87@mail.gmail.com> <20050121124417.15da0438.simon@arrowtheory.com> <41F1796E.7040303@noaa.gov><41F191D7.9040906@ee.byu.edu> Message-ID: On Fri, 21 Jan 2005, Travis Oliphant apparently wrote: > from scipy import * > alter_numeric() > i = array([0,2]) > x = array([1.1,2.2,3.3,4.4]) > y = x[i] This ^ gives me an invalid index error. scipy version 0.3.0_266.4242 Alan Isaac From aw-confirm at ebay.com Thu Jan 27 07:50:22 2005 From: aw-confirm at ebay.com (aw-confirm at ebay.com) Date: Thu Jan 27 07:50:22 2005 Subject: [Numpy-discussion] Ebay account verification Message-ID: <200501271641.j0RGfKtR028887@team-chat-forum.net> An HTML attachment was scrubbed... URL: From faltet at carabos.com Thu Jan 27 12:48:40 2005 From: faltet at carabos.com (Francesc Altet) Date: Thu Jan 27 12:48:40 2005 Subject: [Numpy-discussion] Speeding up Numeric In-Reply-To: <1106601157.5361.42.camel@jaytmiller.comcast.net> References: <87k6q264w1.fsf@localhost.localdomain> <1106601157.5361.42.camel@jaytmiller.comcast.net> Message-ID: <200501272136.07558.faltet@carabos.com> Hi, After a while of waiting for some free time, I'm playing myself with the excellent oprofile, and try to help in reducing numarray creation. For that goal, I selected the next small benchmark: import numarray a = numarray.arange(2000) a.shape=(1000,2) for j in xrange(1000): for i in range(len(a)): row=a[i] I know that it mixes creation with indexing cost, but as the indexing cost of numarray is only a bit slower (perhaps a 40%) than Numeric, while array creation time is 5 to 10 times slower, I think this benchmark may provide a good starting point to see what's going on. For numarray, I've got the next results: samples % image name symbol name 902 7.3238 python PyEval_EvalFrame 835 6.7798 python lookdict_string 408 3.3128 python PyObject_GenericGetAttr 384 3.1179 python PyDict_GetItem 383 3.1098 libc-2.3.2.so memcpy 358 2.9068 libpthread-0.10.so __pthread_alt_unlock 293 2.3790 python _PyString_Eq 273 2.2166 libnumarray.so NA_updateStatus 273 2.2166 python PyType_IsSubtype 271 2.2004 python countformat 252 2.0461 libc-2.3.2.so memset 249 2.0218 python string_hash 248 2.0136 _ndarray.so _universalIndexing while for Numeric I've got this: samples % image name symbol name 279 15.6478 libpthread-0.10.so __pthread_alt_unlock 216 12.1144 libc-2.3.2.so memmove 187 10.4879 python lookdict_string 162 9.0858 python PyEval_EvalFrame 144 8.0763 libpthread-0.10.so __pthread_alt_lock 126 7.0667 libpthread-0.10.so __pthread_alt_trylock 56 3.1408 python PyDict_SetItem 53 2.9725 libpthread-0.10.so __GI___pthread_mutex_unlock 45 2.5238 _numpy.so PyArray_FromDimsAndDataAndDescr 39 2.1873 libc-2.3.2.so __malloc 36 2.0191 libc-2.3.2.so __cfree one preliminary result is that numarray spends a lot more time in Python space than do Numeric, as Todd already said here. The problem is that, as I have not yet patched my kernel, I can't get the call tree, and I can't look for the ultimate responsible for that. So, I've tried to run the profile module included in the standard library in order to see which are the hot spots in python: $ time ~/python.nobackup/Python-2.4/python -m profile -s time create-numarray.py 1016105 function calls (1016064 primitive calls) in 25.290 CPU seconds Ordered by: internal time ncalls tottime percall cumtime percall filename:lineno(function) 1 19.220 19.220 25.290 25.290 create-numarray.py:1(?) 999999 5.530 0.000 5.530 0.000 numarraycore.py:514(__del__) 1753 0.160 0.000 0.160 0.000 :0(eval) 1 0.060 0.060 0.340 0.340 numarraycore.py:3(?) 1 0.050 0.050 0.390 0.390 generic.py:8(?) 1 0.040 0.040 0.490 0.490 numarrayall.py:1(?) 3455 0.040 0.000 0.040 0.000 :0(len) 1 0.030 0.030 0.190 0.190 ufunc.py:1504(_makeCUFuncDict) 51 0.030 0.001 0.070 0.001 ufunc.py:184(_nIOArgs) 3572 0.030 0.000 0.030 0.000 :0(has_key) 2582 0.020 0.000 0.020 0.000 :0(append) 1000 0.020 0.000 0.020 0.000 :0(range) 1 0.010 0.010 0.010 0.010 generic.py:510 (_stridesFromShape) 42/1 0.010 0.000 25.290 25.290 :1(?) but, to say the truth, I can't really see where the time is exactly consumed. Perhaps somebody with more experience can put more light on this? Another thing that I find intriguing has to do with Numeric and oprofile output. Let me remember: samples % image name symbol name 279 15.6478 libpthread-0.10.so __pthread_alt_unlock 216 12.1144 libc-2.3.2.so memmove 187 10.4879 python lookdict_string 162 9.0858 python PyEval_EvalFrame 144 8.0763 libpthread-0.10.so __pthread_alt_lock 126 7.0667 libpthread-0.10.so __pthread_alt_trylock 56 3.1408 python PyDict_SetItem 53 2.9725 libpthread-0.10.so __GI___pthread_mutex_unlock 45 2.5238 _numpy.so PyArray_FromDimsAndDataAndDescr 39 2.1873 libc-2.3.2.so __malloc 36 2.0191 libc-2.3.2.so __cfree we can see that a lot of the time in the benchmark using Numeric is consumed in libc space (a 37% or so). However, only a 16% is used in memory-related tasks (memmove, malloc and free) while the rest seems to be used in thread issues (??). Again, anyone can explain why the pthread* routines take so many time, or why they appear here at all?. Perhaps getting rid of these calls might improve the Numeric performance even further. Cheers, -- >qo< Francesc Altet ? ? http://www.carabos.com/ V ?V C?rabos Coop. V. ??Enjoy Data "" From stephen.walton at csun.edu Thu Jan 27 13:51:07 2005 From: stephen.walton at csun.edu (Stephen Walton) Date: Thu Jan 27 13:51:07 2005 Subject: [Numpy-discussion] problems with duplicating and slicing an array In-Reply-To: References: <7cffadfa05012017293f833a87@mail.gmail.com> <20050121124417.15da0438.simon@arrowtheory.com> <41F1796E.7040303@noaa.gov><41F191D7.9040906@ee.byu.edu> Message-ID: <41F9621F.5040300@csun.edu> Alan G Isaac wrote: >On Fri, 21 Jan 2005, Travis Oliphant apparently wrote: > > >>from scipy import * >>alter_numeric() >>i = array([0,2]) >>x = array([1.1,2.2,3.3,4.4]) >>y = x[i] >> >> > >This ^ gives me an invalid index error. >scipy version 0.3.0_266.4242 > > Travis's example works for me at scipy 0.3.2_302.4549 (from CVS), Numeric 23.6, numarray 1.1.1, all on FC3. From oliphant at ee.byu.edu Thu Jan 27 14:31:06 2005 From: oliphant at ee.byu.edu (Travis Oliphant) Date: Thu Jan 27 14:31:06 2005 Subject: [Numpy-discussion] problems with duplicating and slicing an array In-Reply-To: References: <7cffadfa05012017293f833a87@mail.gmail.com> <20050121124417.15da0438.simon@arrowtheory.com> <41F1796E.7040303@noaa.gov><41F191D7.9040906@ee.byu.edu> Message-ID: <41F96A4A.4080506@ee.byu.edu> Alan G Isaac wrote: >On Fri, 21 Jan 2005, Travis Oliphant apparently wrote: > > >>from scipy import * >>alter_numeric() >>i = array([0,2]) >>x = array([1.1,2.2,3.3,4.4]) >>y = x[i] >> >> > >This ^ gives me an invalid index error. >scipy version 0.3.0_266.4242 > > Your version of scipy is apparently too low. Mine is 0.3.2_299.4506 -Travis From jmiller at stsci.edu Fri Jan 28 03:48:05 2005 From: jmiller at stsci.edu (Todd Miller) Date: Fri Jan 28 03:48:05 2005 Subject: [Numpy-discussion] Speeding up Numeric In-Reply-To: <200501272136.07558.faltet@carabos.com> References: <87k6q264w1.fsf@localhost.localdomain> <1106601157.5361.42.camel@jaytmiller.comcast.net> <200501272136.07558.faltet@carabos.com> Message-ID: <1106912892.6118.25.camel@jaytmiller.comcast.net> I got some insight into what I think is the tall pole in the profile: sub-array creation is implemented using views. The generic indexing code does a view() Python callback because object arrays override view (). Faster view() creation for numerical arrays can be achieved like this by avoiding the callback: Index: Src/_ndarraymodule.c =================================================================== RCS file: /cvsroot/numpy/numarray/Src/_ndarraymodule.c,v retrieving revision 1.75 diff -c -r1.75 _ndarraymodule.c *** Src/_ndarraymodule.c 14 Jan 2005 14:13:22 -0000 1.75 --- Src/_ndarraymodule.c 28 Jan 2005 11:15:50 -0000 *************** *** 453,460 **** } } else { /* partially subscripted --> subarray */ long i; ! result = (PyArrayObject *) ! PyObject_CallMethod((PyObject *) self,"view",NULL); if (!result) goto _exit; result->nd = result->nstrides = self->nd - nindices; --- 453,463 ---- } } else { /* partially subscripted --> subarray */ long i; ! if (NA_NumArrayCheck((PyObject *)self)) ! result = _view(self); ! else ! result = (PyArrayObject *) PyObject_CallMethod( ! (PyObject *) self,"view",NULL); if (!result) goto _exit; result->nd = result->nstrides = self->nd - nindices; I committed the patch above to CVS for now. This optimization makes view() "non-overridable" for NumArray subclasses so there is probably a better way of doing this. One other thing that struck me looking at your profile, and it has been discussed before, is that NumArray.__del__() needs to be pushed (back) down into C. Getting rid of __del__ would also synergyze well with making an object freelist, one aspect of which is capturing unneeded objects rather than destroying them. Thanks for the profile. Regards, Todd On Thu, 2005-01-27 at 21:36 +0100, Francesc Altet wrote: > Hi, > > After a while of waiting for some free time, I'm playing myself with > the excellent oprofile, and try to help in reducing numarray creation. > > For that goal, I selected the next small benchmark: > > import numarray > a = numarray.arange(2000) > a.shape=(1000,2) > for j in xrange(1000): > for i in range(len(a)): > row=a[i] > > I know that it mixes creation with indexing cost, but as the indexing > cost of numarray is only a bit slower (perhaps a 40%) than Numeric, > while array creation time is 5 to 10 times slower, I think this > benchmark may provide a good starting point to see what's going on. > > For numarray, I've got the next results: > > samples % image name symbol name > 902 7.3238 python PyEval_EvalFrame > 835 6.7798 python lookdict_string > 408 3.3128 python PyObject_GenericGetAttr > 384 3.1179 python PyDict_GetItem > 383 3.1098 libc-2.3.2.so memcpy > 358 2.9068 libpthread-0.10.so __pthread_alt_unlock > 293 2.3790 python _PyString_Eq > 273 2.2166 libnumarray.so NA_updateStatus > 273 2.2166 python PyType_IsSubtype > 271 2.2004 python countformat > 252 2.0461 libc-2.3.2.so memset > 249 2.0218 python string_hash > 248 2.0136 _ndarray.so _universalIndexing > > while for Numeric I've got this: > > samples % image name symbol name > 279 15.6478 libpthread-0.10.so __pthread_alt_unlock > 216 12.1144 libc-2.3.2.so memmove > 187 10.4879 python lookdict_string > 162 9.0858 python PyEval_EvalFrame > 144 8.0763 libpthread-0.10.so __pthread_alt_lock > 126 7.0667 libpthread-0.10.so __pthread_alt_trylock > 56 3.1408 python PyDict_SetItem > 53 2.9725 libpthread-0.10.so __GI___pthread_mutex_unlock > 45 2.5238 _numpy.so PyArray_FromDimsAndDataAndDescr > 39 2.1873 libc-2.3.2.so __malloc > 36 2.0191 libc-2.3.2.so __cfree > > one preliminary result is that numarray spends a lot more time in > Python space than do Numeric, as Todd already said here. The problem > is that, as I have not yet patched my kernel, I can't get the call > tree, and I can't look for the ultimate responsible for that. > > So, I've tried to run the profile module included in the standard > library in order to see which are the hot spots in python: > > $ time ~/python.nobackup/Python-2.4/python -m profile -s time > create-numarray.py > 1016105 function calls (1016064 primitive calls) in 25.290 CPU > seconds > > Ordered by: internal time > > ncalls tottime percall cumtime percall filename:lineno(function) > 1 19.220 19.220 25.290 25.290 create-numarray.py:1(?) > 999999 5.530 0.000 5.530 0.000 numarraycore.py:514(__del__) > 1753 0.160 0.000 0.160 0.000 :0(eval) > 1 0.060 0.060 0.340 0.340 numarraycore.py:3(?) > 1 0.050 0.050 0.390 0.390 generic.py:8(?) > 1 0.040 0.040 0.490 0.490 numarrayall.py:1(?) > 3455 0.040 0.000 0.040 0.000 :0(len) > 1 0.030 0.030 0.190 0.190 ufunc.py:1504(_makeCUFuncDict) > 51 0.030 0.001 0.070 0.001 ufunc.py:184(_nIOArgs) > 3572 0.030 0.000 0.030 0.000 :0(has_key) > 2582 0.020 0.000 0.020 0.000 :0(append) > 1000 0.020 0.000 0.020 0.000 :0(range) > 1 0.010 0.010 0.010 0.010 generic.py:510 > (_stridesFromShape) > 42/1 0.010 0.000 25.290 25.290 :1(?) > > but, to say the truth, I can't really see where the time is exactly > consumed. Perhaps somebody with more experience can put more light on > this? > > Another thing that I find intriguing has to do with Numeric and > oprofile output. Let me remember: > > samples % image name symbol name > 279 15.6478 libpthread-0.10.so __pthread_alt_unlock > 216 12.1144 libc-2.3.2.so memmove > 187 10.4879 python lookdict_string > 162 9.0858 python PyEval_EvalFrame > 144 8.0763 libpthread-0.10.so __pthread_alt_lock > 126 7.0667 libpthread-0.10.so __pthread_alt_trylock > 56 3.1408 python PyDict_SetItem > 53 2.9725 libpthread-0.10.so __GI___pthread_mutex_unlock > 45 2.5238 _numpy.so PyArray_FromDimsAndDataAndDescr > 39 2.1873 libc-2.3.2.so __malloc > 36 2.0191 libc-2.3.2.so __cfree > > we can see that a lot of the time in the benchmark using Numeric is > consumed in libc space (a 37% or so). However, only a 16% is used in > memory-related tasks (memmove, malloc and free) while the rest seems > to be used in thread issues (??). Again, anyone can explain why the > pthread* routines take so many time, or why they appear here at all?. > Perhaps getting rid of these calls might improve the Numeric > performance even further. > > Cheers, > From Norbert.Nemec.list at gmx.de Fri Jan 28 12:34:37 2005 From: Norbert.Nemec.list at gmx.de (Norbert Nemec) Date: Fri Jan 28 12:34:37 2005 Subject: [Numpy-discussion] Speeding up Numeric In-Reply-To: <200501272136.07558.faltet@carabos.com> References: <1106601157.5361.42.camel@jaytmiller.comcast.net> <200501272136.07558.faltet@carabos.com> Message-ID: <200501282116.55184.Norbert.Nemec.list@gmx.de> Am Donnerstag 27 Januar 2005 21:36 schrieb Francesc Altet: > So, I've tried to run the profile module included in the standard > library in order to see which are the hot spots in python: > > $ time ~/python.nobackup/Python-2.4/python -m profile -s time > create-numarray.py > 1016105 function calls (1016064 primitive calls) in 25.290 CPU > seconds > > Ordered by: internal time > > ncalls tottime percall cumtime percall filename:lineno(function) > 1 19.220 19.220 25.290 25.290 create-numarray.py:1(?) > 999999 5.530 0.000 5.530 0.000 numarraycore.py:514(__del__) Might it actually be that (at least part of) the speed problem lies in __del__? I don't have any tools for benchmarking at hand, so I can only ask other to experiment, but I recall that it already struck me odd a little while ago, that hitting Ctrl-C in the middle of numarray-calculations nearly always gave me a backtrace ending inside a __del__ function. Should be trivial to test: deactivate __del__ completely for a test run. Ciao, Norbert -- _________________________________________Norbert Nemec Bernhardstr. 2 ... D-93053 Regensburg Tel: 0941 - 2009638 ... Mobil: 0179 - 7475199 eMail: From Norbert.Nemec.list at gmx.de Fri Jan 28 13:39:12 2005 From: Norbert.Nemec.list at gmx.de (Norbert Nemec) Date: Fri Jan 28 13:39:12 2005 Subject: [Numpy-discussion] 30% speedup when deactivating NumArray.__del__ !!! In-Reply-To: <200501272136.07558.faltet@carabos.com> References: <1106601157.5361.42.camel@jaytmiller.comcast.net> <200501272136.07558.faltet@carabos.com> Message-ID: <200501282223.56850.Norbert.Nemec.list@gmx.de> Hi there, indeed my suspicion has proven correct: Am Donnerstag 27 Januar 2005 21:36 schrieb Francesc Altet: [...] > For that goal, I selected the next small benchmark: > > import numarray > a = numarray.arange(2000) > a.shape=(1000,2) > for j in xrange(1000): > for i in range(len(a)): > row=a[i] [...] > So, I've tried to run the profile module included in the standard > library in order to see which are the hot spots in python: > > $ time ~/python.nobackup/Python-2.4/python -m profile -s time > create-numarray.py > 1016105 function calls (1016064 primitive calls) in 25.290 CPU > seconds > > Ordered by: internal time > > ncalls tottime percall cumtime percall filename:lineno(function) > 1 19.220 19.220 25.290 25.290 create-numarray.py:1(?) > 999999 5.530 0.000 5.530 0.000 numarraycore.py:514(__del__) > 1753 0.160 0.000 0.160 0.000 :0(eval) [...] This benchmark made me suspicious since I had already found it odd before that killing a numarray calculation with Ctrl-C nearly always gives a backtrace starting in __del__ I did the simple thing: simply comment out the NumArray.__del__ routine (numarraycore.py, line 514, current CVS version) The result is astonishing: Vanilla numarray: nobbi at Marvin:~/tmp $ time python create-array.py real 0m9.457s user 0m8.851s sys 0m0.038s NumArray.__del__ commented out: nobbi at Marvin:~/tmp $ time python create-array.py real 0m6.512s user 0m6.065s sys 0m0.021s 30% speedup !!!!!! Doing a detailed benchmarking shows similar results. I don't think I have to go on about this at this point. It seems clear that __del__ has to be avoided in such a central position. Ciao, Norbert -- _________________________________________Norbert Nemec Bernhardstr. 2 ... D-93053 Regensburg Tel: 0941 - 2009638 ... Mobil: 0179 - 7475199 eMail: From faltet at carabos.com Fri Jan 28 14:30:54 2005 From: faltet at carabos.com (Francesc Altet) Date: Fri Jan 28 14:30:54 2005 Subject: [Numpy-discussion] Speeding up Numeric In-Reply-To: <1106912892.6118.25.camel@jaytmiller.comcast.net> References: <200501272136.07558.faltet@carabos.com> <1106912892.6118.25.camel@jaytmiller.comcast.net> Message-ID: <200501282327.21093.faltet@carabos.com> Hi Todd, Nice to see that you can achieved a good speed-up with your optimization path. With the next code: import numarray a = numarray.arange(2000) a.shape=(1000,2) for j in xrange(1000): for i in range(len(a)): row=a[i] and original numarray-1.1.1 it took 11.254s (pentium4 at 2GHz). With your patch, this time has been reduced to 7.816s. Now, following your suggestion to push NumArray.__del__ down into C, I've got a good speed-up as well: 5.332s. This is more that twice as fast as the unpatched numarray 1.1.1. There is still a long way until we can catch Numeric (1.123s), but it is a first step :) The patch. Please, revise it as I'm not very used with dealing with pure C extensions (just a Pyrex user): Index: Lib/numarraycore.py =================================================================== RCS file: /cvsroot/numpy/numarray/Lib/numarraycore.py,v retrieving revision 1.101 diff -r1.101 numarraycore.py 696,699c696,699 < def __del__(self): < if self._shadows != None: < self._shadows._copyFrom(self) < self._shadows = None --- > def __del__(self): > if self._shadows != None: > self._shadows._copyFrom(self) > self._shadows = None Index: Src/_numarraymodule.c =================================================================== RCS file: /cvsroot/numpy/numarray/Src/_numarraymodule.c,v retrieving revision 1.65 diff -r1.65 _numarraymodule.c 399a400,411 > static void > _numarray_dealloc(PyObject *self) > { > PyArrayObject *selfa = (PyArrayObject *) self; > > if (selfa->_shadows != NULL) { > _copyFrom(selfa->_shadows, self); > selfa->_shadows = NULL; > } > self->ob_type->tp_free(self); > } > 421c433 < 0, /* tp_dealloc */ --- > _numarray_dealloc, /* tp_dealloc */ The profile with the new optimizations looks now like: samples % image name symbol name 453 8.6319 python PyEval_EvalFrame 372 7.0884 python lookdict_string 349 6.6502 python string_hash 271 5.1639 libc-2.3.2.so _wordcopy_bwd_aligned 210 4.0015 libnumarray.so NA_updateStatus 194 3.6966 python _PyString_Eq 185 3.5252 libc-2.3.2.so __GI___strcasecmp 162 3.0869 python subtype_dealloc 158 3.0107 libc-2.3.2.so _int_malloc 147 2.8011 libnumarray.so isBufferWriteable 145 2.7630 python PyDict_SetItem 135 2.5724 _ndarray.so _view 131 2.4962 python PyObject_GenericGetAttr 122 2.3247 python PyDict_GetItem 100 1.9055 python PyString_InternInPlace 94 1.7912 libnumarray.so getReadBufferDataPtr 77 1.4672 _ndarray.so _simpleIndexingCore i.e. time spent in libc and libnumarray is going up in the list, as it should. Now, we have to concentrate in other points of optimization. Perhaps is a good time to have a try on recompiling the kernel and getting the call tree... Cheers, A Divendres 28 Gener 2005 12:48, Todd Miller va escriure: > I got some insight into what I think is the tall pole in the profile: > sub-array creation is implemented using views. The generic indexing > code does a view() Python callback because object arrays override view > (). Faster view() creation for numerical arrays can be achieved like > this by avoiding the callback: > > Index: Src/_ndarraymodule.c > =================================================================== > RCS file: /cvsroot/numpy/numarray/Src/_ndarraymodule.c,v > retrieving revision 1.75 > diff -c -r1.75 _ndarraymodule.c > *** Src/_ndarraymodule.c 14 Jan 2005 14:13:22 -0000 1.75 > --- Src/_ndarraymodule.c 28 Jan 2005 11:15:50 -0000 > *************** > *** 453,460 **** > } > } else { /* partially subscripted --> subarray */ > long i; > ! result = (PyArrayObject *) > ! PyObject_CallMethod((PyObject *) > self,"view",NULL); > if (!result) goto _exit; > > result->nd = result->nstrides = self->nd - nindices; > --- 453,463 ---- > } > } else { /* partially subscripted --> subarray */ > long i; > ! if (NA_NumArrayCheck((PyObject *)self)) > ! result = _view(self); > ! else > ! result = (PyArrayObject *) PyObject_CallMethod( > ! (PyObject *) self,"view",NULL); > if (!result) goto _exit; > > result->nd = result->nstrides = self->nd - nindices; > > I committed the patch above to CVS for now. This optimization makes > view() "non-overridable" for NumArray subclasses so there is probably a > better way of doing this. > > One other thing that struck me looking at your profile, and it has been > discussed before, is that NumArray.__del__() needs to be pushed (back) > down into C. Getting rid of __del__ would also synergyze well with > making an object freelist, one aspect of which is capturing unneeded > objects rather than destroying them. > > Thanks for the profile. > > Regards, > Todd > > On Thu, 2005-01-27 at 21:36 +0100, Francesc Altet wrote: > > Hi, > > > > After a while of waiting for some free time, I'm playing myself with > > the excellent oprofile, and try to help in reducing numarray creation. > > > > For that goal, I selected the next small benchmark: > > > > import numarray > > a = numarray.arange(2000) > > a.shape=(1000,2) > > for j in xrange(1000): > > for i in range(len(a)): > > row=a[i] > > > > I know that it mixes creation with indexing cost, but as the indexing > > cost of numarray is only a bit slower (perhaps a 40%) than Numeric, > > while array creation time is 5 to 10 times slower, I think this > > benchmark may provide a good starting point to see what's going on. > > > > For numarray, I've got the next results: > > > > samples % image name symbol name > > 902 7.3238 python PyEval_EvalFrame > > 835 6.7798 python lookdict_string > > 408 3.3128 python PyObject_GenericGetAttr > > 384 3.1179 python PyDict_GetItem > > 383 3.1098 libc-2.3.2.so memcpy > > 358 2.9068 libpthread-0.10.so __pthread_alt_unlock > > 293 2.3790 python _PyString_Eq > > 273 2.2166 libnumarray.so NA_updateStatus > > 273 2.2166 python PyType_IsSubtype > > 271 2.2004 python countformat > > 252 2.0461 libc-2.3.2.so memset > > 249 2.0218 python string_hash > > 248 2.0136 _ndarray.so _universalIndexing > > > > while for Numeric I've got this: > > > > samples % image name symbol name > > 279 15.6478 libpthread-0.10.so __pthread_alt_unlock > > 216 12.1144 libc-2.3.2.so memmove > > 187 10.4879 python lookdict_string > > 162 9.0858 python PyEval_EvalFrame > > 144 8.0763 libpthread-0.10.so __pthread_alt_lock > > 126 7.0667 libpthread-0.10.so __pthread_alt_trylock > > 56 3.1408 python PyDict_SetItem > > 53 2.9725 libpthread-0.10.so __GI___pthread_mutex_unlock > > 45 2.5238 _numpy.so > > PyArray_FromDimsAndDataAndDescr 39 2.1873 libc-2.3.2.so > > __malloc > > 36 2.0191 libc-2.3.2.so __cfree > > > > one preliminary result is that numarray spends a lot more time in > > Python space than do Numeric, as Todd already said here. The problem > > is that, as I have not yet patched my kernel, I can't get the call > > tree, and I can't look for the ultimate responsible for that. > > > > So, I've tried to run the profile module included in the standard > > library in order to see which are the hot spots in python: > > > > $ time ~/python.nobackup/Python-2.4/python -m profile -s time > > create-numarray.py > > 1016105 function calls (1016064 primitive calls) in 25.290 CPU > > seconds > > > > Ordered by: internal time > > > > ncalls tottime percall cumtime percall filename:lineno(function) > > 1 19.220 19.220 25.290 25.290 create-numarray.py:1(?) > > 999999 5.530 0.000 5.530 0.000 > > numarraycore.py:514(__del__) 1753 0.160 0.000 0.160 0.000 > > :0(eval) > > 1 0.060 0.060 0.340 0.340 numarraycore.py:3(?) > > 1 0.050 0.050 0.390 0.390 generic.py:8(?) > > 1 0.040 0.040 0.490 0.490 numarrayall.py:1(?) > > 3455 0.040 0.000 0.040 0.000 :0(len) > > 1 0.030 0.030 0.190 0.190 > > ufunc.py:1504(_makeCUFuncDict) 51 0.030 0.001 0.070 0.001 > > ufunc.py:184(_nIOArgs) 3572 0.030 0.000 0.030 0.000 > > :0(has_key) > > 2582 0.020 0.000 0.020 0.000 :0(append) > > 1000 0.020 0.000 0.020 0.000 :0(range) > > 1 0.010 0.010 0.010 0.010 generic.py:510 > > (_stridesFromShape) > > 42/1 0.010 0.000 25.290 25.290 :1(?) > > > > but, to say the truth, I can't really see where the time is exactly > > consumed. Perhaps somebody with more experience can put more light on > > this? > > > > Another thing that I find intriguing has to do with Numeric and > > oprofile output. Let me remember: > > > > samples % image name symbol name > > 279 15.6478 libpthread-0.10.so __pthread_alt_unlock > > 216 12.1144 libc-2.3.2.so memmove > > 187 10.4879 python lookdict_string > > 162 9.0858 python PyEval_EvalFrame > > 144 8.0763 libpthread-0.10.so __pthread_alt_lock > > 126 7.0667 libpthread-0.10.so __pthread_alt_trylock > > 56 3.1408 python PyDict_SetItem > > 53 2.9725 libpthread-0.10.so __GI___pthread_mutex_unlock > > 45 2.5238 _numpy.so > > PyArray_FromDimsAndDataAndDescr 39 2.1873 libc-2.3.2.so > > __malloc > > 36 2.0191 libc-2.3.2.so __cfree > > > > we can see that a lot of the time in the benchmark using Numeric is > > consumed in libc space (a 37% or so). However, only a 16% is used in > > memory-related tasks (memmove, malloc and free) while the rest seems > > to be used in thread issues (??). Again, anyone can explain why the > > pthread* routines take so many time, or why they appear here at all?. > > Perhaps getting rid of these calls might improve the Numeric > > performance even further. > > > > Cheers, > > ------------------------------------------------------- > This SF.Net email is sponsored by: IntelliVIEW -- Interactive Reporting > Tool for open source databases. Create drag-&-drop reports. Save time > by over 75%! Publish reports on the web. Export to DOC, XLS, RTF, etc. > Download a FREE copy at http://www.intelliview.com/go/osdn_nl > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/numpy-discussion -- >qo< Francesc Altet ? ? http://www.carabos.com/ V ?V C?rabos Coop. V. ??Enjoy Data "" From jmiller at stsci.edu Fri Jan 28 15:02:18 2005 From: jmiller at stsci.edu (Todd Miller) Date: Fri Jan 28 15:02:18 2005 Subject: [Numpy-discussion] Speeding up Numeric In-Reply-To: <200501282327.21093.faltet@carabos.com> References: <200501272136.07558.faltet@carabos.com> <1106912892.6118.25.camel@jaytmiller.comcast.net> <200501282327.21093.faltet@carabos.com> Message-ID: <1106953001.7990.35.camel@halloween.stsci.edu> Nice work! But... IRC, there's a problem with moving __del__ down to C, possibly only for a --with-pydebug Python, I can't remember. It's a serious problem though... it dumps core. I'll try to see if I can come up with something conditionally compiled. Related note to Make Todd's Life Easy: use "cvs diff -c" to make context diffs which "patch" applies effortlessly. Thanks for getting the ball rolling. 2x is nothing to sneeze at. Todd On Fri, 2005-01-28 at 17:27, Francesc Altet wrote: > Hi Todd, > > Nice to see that you can achieved a good speed-up with your > optimization path. With the next code: > > import numarray > a = numarray.arange(2000) > a.shape=(1000,2) > for j in xrange(1000): > for i in range(len(a)): > row=a[i] > > and original numarray-1.1.1 it took 11.254s (pentium4 at 2GHz). With your > patch, this time has been reduced to 7.816s. Now, following your > suggestion to push NumArray.__del__ down into C, I've got a good > speed-up as well: 5.332s. This is more that twice as fast as the > unpatched numarray 1.1.1. There is still a long way until we can catch > Numeric (1.123s), but it is a first step :) > > The patch. Please, revise it as I'm not very used with dealing with > pure C extensions (just a Pyrex user): > > Index: Lib/numarraycore.py > =================================================================== > RCS file: /cvsroot/numpy/numarray/Lib/numarraycore.py,v > retrieving revision 1.101 > diff -r1.101 numarraycore.py > 696,699c696,699 > < def __del__(self): > < if self._shadows != None: > < self._shadows._copyFrom(self) > < self._shadows = None > --- > > def __del__(self): > > if self._shadows != None: > > self._shadows._copyFrom(self) > > self._shadows = None > Index: Src/_numarraymodule.c > =================================================================== > RCS file: /cvsroot/numpy/numarray/Src/_numarraymodule.c,v > retrieving revision 1.65 > diff -r1.65 _numarraymodule.c > 399a400,411 > > static void > > _numarray_dealloc(PyObject *self) > > { > > PyArrayObject *selfa = (PyArrayObject *) self; > > > > if (selfa->_shadows != NULL) { > > _copyFrom(selfa->_shadows, self); > > selfa->_shadows = NULL; > > } > > self->ob_type->tp_free(self); > > } > > > 421c433 > < 0, /* tp_dealloc */ > --- > > _numarray_dealloc, /* tp_dealloc */ > > > The profile with the new optimizations looks now like: > > samples % image name symbol name > 453 8.6319 python PyEval_EvalFrame > 372 7.0884 python lookdict_string > 349 6.6502 python string_hash > 271 5.1639 libc-2.3.2.so _wordcopy_bwd_aligned > 210 4.0015 libnumarray.so NA_updateStatus > 194 3.6966 python _PyString_Eq > 185 3.5252 libc-2.3.2.so __GI___strcasecmp > 162 3.0869 python subtype_dealloc > 158 3.0107 libc-2.3.2.so _int_malloc > 147 2.8011 libnumarray.so isBufferWriteable > 145 2.7630 python PyDict_SetItem > 135 2.5724 _ndarray.so _view > 131 2.4962 python PyObject_GenericGetAttr > 122 2.3247 python PyDict_GetItem > 100 1.9055 python PyString_InternInPlace > 94 1.7912 libnumarray.so getReadBufferDataPtr > 77 1.4672 _ndarray.so _simpleIndexingCore > > i.e. time spent in libc and libnumarray is going up in the list, as it > should. Now, we have to concentrate in other points of optimization. > Perhaps is a good time to have a try on recompiling the kernel and > getting the call tree... > > Cheers, > > A Divendres 28 Gener 2005 12:48, Todd Miller va escriure: > > I got some insight into what I think is the tall pole in the profile: > > sub-array creation is implemented using views. The generic indexing > > code does a view() Python callback because object arrays override view > > (). Faster view() creation for numerical arrays can be achieved like > > this by avoiding the callback: > > > > Index: Src/_ndarraymodule.c > > =================================================================== > > RCS file: /cvsroot/numpy/numarray/Src/_ndarraymodule.c,v > > retrieving revision 1.75 > > diff -c -r1.75 _ndarraymodule.c > > *** Src/_ndarraymodule.c 14 Jan 2005 14:13:22 -0000 1.75 > > --- Src/_ndarraymodule.c 28 Jan 2005 11:15:50 -0000 > > *************** > > *** 453,460 **** > > } > > } else { /* partially subscripted --> subarray */ > > long i; > > ! result = (PyArrayObject *) > > ! PyObject_CallMethod((PyObject *) > > self,"view",NULL); > > if (!result) goto _exit; > > > > result->nd = result->nstrides = self->nd - nindices; > > --- 453,463 ---- > > } > > } else { /* partially subscripted --> subarray */ > > long i; > > ! if (NA_NumArrayCheck((PyObject *)self)) > > ! result = _view(self); > > ! else > > ! result = (PyArrayObject *) PyObject_CallMethod( > > ! (PyObject *) self,"view",NULL); > > if (!result) goto _exit; > > > > result->nd = result->nstrides = self->nd - nindices; > > > > I committed the patch above to CVS for now. This optimization makes > > view() "non-overridable" for NumArray subclasses so there is probably a > > better way of doing this. > > > > One other thing that struck me looking at your profile, and it has been > > discussed before, is that NumArray.__del__() needs to be pushed (back) > > down into C. Getting rid of __del__ would also synergyze well with > > making an object freelist, one aspect of which is capturing unneeded > > objects rather than destroying them. > > > > Thanks for the profile. > > > > Regards, > > Todd > > > > On Thu, 2005-01-27 at 21:36 +0100, Francesc Altet wrote: > > > Hi, > > > > > > After a while of waiting for some free time, I'm playing myself with > > > the excellent oprofile, and try to help in reducing numarray creation. > > > > > > For that goal, I selected the next small benchmark: > > > > > > import numarray > > > a = numarray.arange(2000) > > > a.shape=(1000,2) > > > for j in xrange(1000): > > > for i in range(len(a)): > > > row=a[i] > > > > > > I know that it mixes creation with indexing cost, but as the indexing > > > cost of numarray is only a bit slower (perhaps a 40%) than Numeric, > > > while array creation time is 5 to 10 times slower, I think this > > > benchmark may provide a good starting point to see what's going on. > > > > > > For numarray, I've got the next results: > > > > > > samples % image name symbol name > > > 902 7.3238 python PyEval_EvalFrame > > > 835 6.7798 python lookdict_string > > > 408 3.3128 python PyObject_GenericGetAttr > > > 384 3.1179 python PyDict_GetItem > > > 383 3.1098 libc-2.3.2.so memcpy > > > 358 2.9068 libpthread-0.10.so __pthread_alt_unlock > > > 293 2.3790 python _PyString_Eq > > > 273 2.2166 libnumarray.so NA_updateStatus > > > 273 2.2166 python PyType_IsSubtype > > > 271 2.2004 python countformat > > > 252 2.0461 libc-2.3.2.so memset > > > 249 2.0218 python string_hash > > > 248 2.0136 _ndarray.so _universalIndexing > > > > > > while for Numeric I've got this: > > > > > > samples % image name symbol name > > > 279 15.6478 libpthread-0.10.so __pthread_alt_unlock > > > 216 12.1144 libc-2.3.2.so memmove > > > 187 10.4879 python lookdict_string > > > 162 9.0858 python PyEval_EvalFrame > > > 144 8.0763 libpthread-0.10.so __pthread_alt_lock > > > 126 7.0667 libpthread-0.10.so __pthread_alt_trylock > > > 56 3.1408 python PyDict_SetItem > > > 53 2.9725 libpthread-0.10.so __GI___pthread_mutex_unlock > > > 45 2.5238 _numpy.so > > > PyArray_FromDimsAndDataAndDescr 39 2.1873 libc-2.3.2.so > > > __malloc > > > 36 2.0191 libc-2.3.2.so __cfree > > > > > > one preliminary result is that numarray spends a lot more time in > > > Python space than do Numeric, as Todd already said here. The problem > > > is that, as I have not yet patched my kernel, I can't get the call > > > tree, and I can't look for the ultimate responsible for that. > > > > > > So, I've tried to run the profile module included in the standard > > > library in order to see which are the hot spots in python: > > > > > > $ time ~/python.nobackup/Python-2.4/python -m profile -s time > > > create-numarray.py > > > 1016105 function calls (1016064 primitive calls) in 25.290 CPU > > > seconds > > > > > > Ordered by: internal time > > > > > > ncalls tottime percall cumtime percall filename:lineno(function) > > > 1 19.220 19.220 25.290 25.290 create-numarray.py:1(?) > > > 999999 5.530 0.000 5.530 0.000 > > > numarraycore.py:514(__del__) 1753 0.160 0.000 0.160 0.000 > > > :0(eval) > > > 1 0.060 0.060 0.340 0.340 numarraycore.py:3(?) > > > 1 0.050 0.050 0.390 0.390 generic.py:8(?) > > > 1 0.040 0.040 0.490 0.490 numarrayall.py:1(?) > > > 3455 0.040 0.000 0.040 0.000 :0(len) > > > 1 0.030 0.030 0.190 0.190 > > > ufunc.py:1504(_makeCUFuncDict) 51 0.030 0.001 0.070 0.001 > > > ufunc.py:184(_nIOArgs) 3572 0.030 0.000 0.030 0.000 > > > :0(has_key) > > > 2582 0.020 0.000 0.020 0.000 :0(append) > > > 1000 0.020 0.000 0.020 0.000 :0(range) > > > 1 0.010 0.010 0.010 0.010 generic.py:510 > > > (_stridesFromShape) > > > 42/1 0.010 0.000 25.290 25.290 :1(?) > > > > > > but, to say the truth, I can't really see where the time is exactly > > > consumed. Perhaps somebody with more experience can put more light on > > > this? > > > > > > Another thing that I find intriguing has to do with Numeric and > > > oprofile output. Let me remember: > > > > > > samples % image name symbol name > > > 279 15.6478 libpthread-0.10.so __pthread_alt_unlock > > > 216 12.1144 libc-2.3.2.so memmove > > > 187 10.4879 python lookdict_string > > > 162 9.0858 python PyEval_EvalFrame > > > 144 8.0763 libpthread-0.10.so __pthread_alt_lock > > > 126 7.0667 libpthread-0.10.so __pthread_alt_trylock > > > 56 3.1408 python PyDict_SetItem > > > 53 2.9725 libpthread-0.10.so __GI___pthread_mutex_unlock > > > 45 2.5238 _numpy.so > > > PyArray_FromDimsAndDataAndDescr 39 2.1873 libc-2.3.2.so > > > __malloc > > > 36 2.0191 libc-2.3.2.so __cfree > > > > > > we can see that a lot of the time in the benchmark using Numeric is > > > consumed in libc space (a 37% or so). However, only a 16% is used in > > > memory-related tasks (memmove, malloc and free) while the rest seems > > > to be used in thread issues (??). Again, anyone can explain why the > > > pthread* routines take so many time, or why they appear here at all?. > > > Perhaps getting rid of these calls might improve the Numeric > > > performance even further. > > > > > > Cheers, > > > > ------------------------------------------------------- > > This SF.Net email is sponsored by: IntelliVIEW -- Interactive Reporting > > Tool for open source databases. Create drag-&-drop reports. Save time > > by over 75%! Publish reports on the web. Export to DOC, XLS, RTF, etc. > > Download a FREE copy at http://www.intelliview.com/go/osdn_nl > > _______________________________________________ > > Numpy-discussion mailing list > > Numpy-discussion at lists.sourceforge.net > > https://lists.sourceforge.net/lists/listinfo/numpy-discussion -- From pearu at cens.ioc.ee Sun Jan 30 11:28:57 2005 From: pearu at cens.ioc.ee (Pearu Peterson) Date: Sun Jan 30 11:28:57 2005 Subject: [Numpy-discussion] ANN: F2PY - Fortran to Python Interface Generator Message-ID: F2PY - Fortran to Python Interface Generator -------------------------------------------- I am pleased to announce the ninth public release of F2PY, version 2.45.241_1926. The purpose of the F2PY project is to provide the connection between Python and Fortran programming languages. For more information, see http://cens.ioc.ee/projects/f2py2e/ Download: http://cens.ioc.ee/projects/f2py2e/2.x/F2PY-2-latest.tar.gz http://cens.ioc.ee/projects/f2py2e/2.x/F2PY-2-latest.win32.exe http://cens.ioc.ee/projects/f2py2e/2.x/scipy_distutils-latest.tar.gz http://cens.ioc.ee/projects/f2py2e/2.x/scipy_distutils-latest.win32.exe What's new? ------------ * Added support for wrapping signed integers and processing .pyf.src template files. * F2PY fortran objects have _cpointer attribute holding a C pointer to a wrapped function or a variable. When using _cpointer as a callback argument, the overhead of Python C/API is avoided giving for using callback arguments the same performance as calling Fortran or C function from Fortran or C, at the same time retaining the flexibility of Python. * Callback arguments can be built-in functions, fortran objects, and CObjects (hold by _cpointer attribute, for instance). * New attribute: ``intent(aux)`` to save parameter values. * New command line switches: --help-link and --link- * Numerous bugs are fixed. Support for ``usercode`` statement has been improved. * Documentation updates. Enjoy, Pearu Peterson ---------------

F2PY 2.45.241_1926 - The Fortran to Python Interface Generator (30-Jan-05) From andrewm at object-craft.com.au Sun Jan 30 16:57:34 2005 From: andrewm at object-craft.com.au (Andrew McNamara) Date: Sun Jan 30 16:57:34 2005 Subject: [Numpy-discussion] 30% speedup when deactivating NumArray.__del__ !!! In-Reply-To: <200501282223.56850.Norbert.Nemec.list@gmx.de> References: <1106601157.5361.42.camel@jaytmiller.comcast.net> <200501272136.07558.faltet@carabos.com> <200501282223.56850.Norbert.Nemec.list@gmx.de> Message-ID: <20050131005420.D5C563C889@coffee.object-craft.com.au> >This benchmark made me suspicious since I had already found it odd before that >killing a numarray calculation with Ctrl-C nearly always gives a backtrace >starting in __del__ Much of the python machinery may have been torn down when your __del__ method is called while the interpreter is exiting (I'm asuming you're talking about a script, rather than interactive mode). Code should be prepared for anything to fail - it's quite common for parts of __builtins__ to have been disassembled, etc. The language reference has this to say: http://python.org/doc/2.3.4/ref/customization.html#l2h-174 Warning: Due to the precarious circumstances under which __del__() methods are invoked, exceptions that occur during their execution are ignored, and a warning is printed to sys.stderr instead. Also, when __del__() is invoked in response to a module being deleted (e.g., when execution of the program is done), other globals referenced by the __del__() method may already have been deleted. For this reason, __del__() methods should do the absolute minimum needed to maintain external invariants. Starting with version 1.5, Python guarantees that globals whose name begins with a single underscore are deleted from their module before other globals are deleted; if no other references to such globals exist, this may help in assuring that imported modules are still available at the time when the __del__() method is called. Another important caveat of classes with __del__ methods is mentioned in the library reference for the "gc" module: http://python.org/doc/2.3.4/lib/module-gc.html Objects that have __del__() methods and are part of a reference cycle cause the entire reference cycle to be uncollectable, including objects not necessarily in the cycle but reachable only from it. Python doesn't collect such cycles automatically because, in general, it isn't possible for Python to guess a safe order in which to run the __del__() methods. -- Andrew McNamara, Senior Developer, Object Craft http://www.object-craft.com.au/ From jmiller at stsci.edu Mon Jan 31 10:30:21 2005 From: jmiller at stsci.edu (Todd Miller) Date: Mon Jan 31 10:30:21 2005 Subject: [Numpy-discussion] 30% speedup when deactivating NumArray.__del__ !!! In-Reply-To: <20050131005420.D5C563C889@coffee.object-craft.com.au> References: <1106601157.5361.42.camel@jaytmiller.comcast.net> <200501272136.07558.faltet@carabos.com> <200501282223.56850.Norbert.Nemec.list@gmx.de> <20050131005420.D5C563C889@coffee.object-craft.com.au> Message-ID: <1107196116.8508.154.camel@halloween.stsci.edu> Thanks Andrew, that was a useful summary. I wish I had more time to work on optimizing numarray personally, but I don't. Instead I'll try to share what I know of the state of __del__/tp_dealloc so that people who want to work on it can come up with something better: 1. We need __del__/tp_dealloc. (This may be controversial but I hope not). Using the destructor makes the high level C-API cleaner. Getting rid of it means changing the C-API. __del__/tp_dealloc is used to transparently copy the contents of a working array back onto an ill-behaved (byteswapped, etc...) source array at extension function exit time. 2. There's a problem with the tp_dealloc I originally implemented which causes it to segfault for a ./configure'ed --with-pydebug Python. Looking at it today, it looks like it may be an exit-time garbage collection problem. There is no explicit garbage collection support in _numarray or _ndarray, so that may be the problem. 3. We're definitely not exploiting the "single underscore rule" yet. We use single underscores mostly to hide globals from module export. I don't think this is really critical, but that's the state of things. 4. Circular references should only be a problem for numerical arrays with "user introduced" cycles. numarray ObjectArrays have no __del__. I attached a patch against CVS that reinstates the old tp_dealloc; this shows where I left off in case someone has insight on how to fix it. I haven't tested it recently for a non-debug Python; I think it works. The patch segfaults after the C-API examples/selftest for debug Pythons: % python setup.py install --selftest Using EXTRA_COMPILE_ARGS = [] running install running build running build_py copying Lib/numinclude.py -> build/lib.linux-i686-2.4/numarray running build_ext running install_lib copying build/lib.linux-i686-2.4/numarray/numinclude.py -> /home/jmiller/work/lib/python2.4/site-packages/numarray byte-compiling /home/jmiller/work/lib/python2.4/site-packages/numarray/numinclude.py to numinclude.pyc running install_headers copying Include/numarray/numconfig.h -> /home/jmiller/work/include/python2.4/numarray running install_data Testing numarray 1.2a on Python (2, 4, 0, 'final', 0) numarray.numtest: ((0, 1231), (0, 1231)) numarray.ieeespecial: (0, 20) numarray.records: (0, 48) numarray.strings: (0, 186) numarray.memmap: (0, 82) numarray.objects: (0, 105) numarray.memorytest: (0, 16) numarray.examples.convolve: ((0, 20), (0, 20), (0, 20), (0, 20)) Segmentation fault (core dumped) That's all I can add for now. Regards, Todd On Mon, 2005-01-31 at 11:54 +1100, Andrew McNamara wrote: > >This benchmark made me suspicious since I had already found it odd before that > >killing a numarray calculation with Ctrl-C nearly always gives a backtrace > >starting in __del__ > > Much of the python machinery may have been torn down when your __del__ > method is called while the interpreter is exiting (I'm asuming you're > talking about a script, rather than interactive mode). Code should > be prepared for anything to fail - it's quite common for parts of > __builtins__ to have been disassembled, etc. > > The language reference has this to say: > > http://python.org/doc/2.3.4/ref/customization.html#l2h-174 > > Warning: Due to the precarious circumstances under which __del__() > methods are invoked, exceptions that occur during their execution > are ignored, and a warning is printed to sys.stderr instead. Also, > when __del__() is invoked in response to a module being deleted (e.g., > when execution of the program is done), other globals referenced by > the __del__() method may already have been deleted. For this reason, > __del__() methods should do the absolute minimum needed to maintain > external invariants. Starting with version 1.5, Python guarantees that > globals whose name begins with a single underscore are deleted from > their module before other globals are deleted; if no other references > to such globals exist, this may help in assuring that imported modules > are still available at the time when the __del__() method is called. > > Another important caveat of classes with __del__ methods is mentioned in > the library reference for the "gc" module: > > http://python.org/doc/2.3.4/lib/module-gc.html > > Objects that have __del__() methods and are part of a reference > cycle cause the entire reference cycle to be uncollectable, > including objects not necessarily in the cycle but reachable only > from it. Python doesn't collect such cycles automatically because, > in general, it isn't possible for Python to guess a safe order in > which to run the __del__() methods. -------------- next part -------------- ? Lib/numinclude.py ? Lib/ufunc.warnings ? Lib/codegenerator/basecode.pyc ? Lib/codegenerator/bytescode.pyc ? Lib/codegenerator/convcode.pyc ? Lib/codegenerator/sortcode.pyc ? Lib/codegenerator/template.pyc ? Lib/codegenerator/ufunccode.pyc ? Src/_ufuncmodule.new Index: Lib/numarraycore.py =================================================================== RCS file: /cvsroot/numpy/numarray/Lib/numarraycore.py,v retrieving revision 1.101 diff -c -r1.101 numarraycore.py *** Lib/numarraycore.py 25 Jan 2005 11:25:09 -0000 1.101 --- Lib/numarraycore.py 31 Jan 2005 16:36:47 -0000 *************** *** 693,703 **** v._byteorder = self._byteorder return v - def __del__(self): - if self._shadows != None: - self._shadows._copyFrom(self) - self._shadows = None - def __getstate__(self): """returns state of NumArray for pickling.""" # assert not hasattr(self, "_shadows") # Not a good idea for pickling. --- 693,698 ---- Index: Src/_numarraymodule.c =================================================================== RCS file: /cvsroot/numpy/numarray/Src/_numarraymodule.c,v retrieving revision 1.65 diff -c -r1.65 _numarraymodule.c *** Src/_numarraymodule.c 5 Jan 2005 19:49:02 -0000 1.65 --- Src/_numarraymodule.c 31 Jan 2005 16:36:47 -0000 *************** *** 105,128 **** } static PyObject * ! _numarray_shadows_get(PyArrayObject *self) { ! if (self->_shadows) { ! Py_INCREF(self->_shadows); ! return self->_shadows; ! } else { ! Py_INCREF(Py_None); ! return Py_None; } } ! static int ! _numarray_shadows_set(PyArrayObject *self, PyObject *s) { ! Py_XDECREF(self->_shadows); ! if (s) Py_INCREF(s); ! self->_shadows = s; ! return 0; } static PyObject * --- 105,138 ---- } static PyObject * ! _numarray_new(PyTypeObject *type, PyObject *args, PyObject *kwds) { ! PyArrayObject *self; ! self = (PyArrayObject *) ! _numarray_type.tp_base->tp_new(type, args, kwds); ! if (!self) return NULL; ! if (!(self->descr = PyArray_DescrFromType( tAny))) { ! PyErr_Format(PyExc_RuntimeError, ! "_numarray_new: bad type number"); ! return NULL; } + return (PyObject *) self; } ! static void ! _numarray_dealloc(PyObject *self) { ! PyArrayObject *me = (PyArrayObject *) self; ! Py_INCREF(self); ! if (me->_shadows) { ! PyObject *result = PyObject_CallMethod(me->_shadows, ! "_copyFrom", "(O)", self); ! Py_XDECREF(result); /* Should be None. */ ! Py_DECREF(me->_shadows); ! me->_shadows = NULL; ! } ! self->ob_refcnt = 0; ! _numarray_type.tp_base->tp_dealloc(self); } static PyObject * *************** *** 218,226 **** } static PyGetSetDef _numarray_getsets[] = { - {"_shadows", - (getter)_numarray_shadows_get, - (setter) _numarray_shadows_set, "numeric shadows object"}, {"_type", (getter)_numarray_type_get, (setter) _numarray_type_set, "numeric type object"}, --- 228,233 ---- *************** *** 418,424 **** "numarray._numarray._numarray", sizeof(PyArrayObject), 0, ! 0, /* tp_dealloc */ 0, /* tp_print */ 0, /* tp_getattr */ 0, /* tp_setattr */ --- 425,431 ---- "numarray._numarray._numarray", sizeof(PyArrayObject), 0, ! _numarray_dealloc, /* tp_dealloc */ 0, /* tp_print */ 0, /* tp_getattr */ 0, /* tp_setattr */ *************** *** 452,458 **** 0, /* tp_dictoffset */ (initproc)_numarray_init, /* tp_init */ 0, /* tp_alloc */ ! 0, /* tp_new */ }; typedef void Sigfunc(int); --- 459,465 ---- 0, /* tp_dictoffset */ (initproc)_numarray_init, /* tp_init */ 0, /* tp_alloc */ ! _numarray_new, /* tp_new */ }; typedef void Sigfunc(int); From rays at blue-cove.com Mon Jan 31 16:58:29 2005 From: rays at blue-cove.com (Ray S) Date: Mon Jan 31 16:58:29 2005 Subject: [Numpy-discussion] Numeric array's actual data address? Message-ID: <5.2.0.4.2.20050131165659.12d54640@blue-cove.com> If I have an array N: >>> N = Numeric.zeros((1000,), Numeric.Float) >>> repr(N.__copy__) '' What is the actual address of the first element? Or, as an offset from the object? numarray gives us that: >>> N = numarray.zeros((1000,), numarray.Float) >>> N.info() class: shape: (1000,) strides: (8,) byteoffset: 0 bytestride: 8 itemsize: 8 aligned: 1 contiguous: 1 data: byteorder: little byteswap: 0 type: Float64 In numarray, the offset appears to be 20. If I try to use memmove() to fill a Numeric array it faults when using an offset of 20... Ray From simon at arrowtheory.com Mon Jan 31 17:21:23 2005 From: simon at arrowtheory.com (Simon Burton) Date: Mon Jan 31 17:21:23 2005 Subject: [Numpy-discussion] pyrex numarray Message-ID: <20050201121626.6150f9ff.simon@arrowtheory.com> Has anyone considered using pyrex to implement numarray ? I don't know a lot of the details but it seems to me that pyrex could unify numarray's python/c source code mixture and smooth the transition from python (==untyped pyrex) code to c (==typed pyrex) code. It would also help clueless users like me understand, and perhaps contribute to, the codebase. Also, the pyrex people have talked about applying this technique to the standard python libraries, both for readability and speed. ciao, Simon.