Numeric to numarray experiences
Hi there, I've just translated a package for molecular modelling, which makes extensive use of Numeric, from Numeric to numarray. The outcome is somewhat negative - for now we are basically going to postpone the transition - the reasons might be interesting for the list and the numarray developpers out there (who are doing a brave job!). Speed: A typical task in our package is the least-square fitting of a large array of coordinate frames ( N1 x N2 x 3) onto a set of reference or average coordinates (using a sub-set of coordinates for the matching). The example I looked at (500 x 876 x 3 items) took 1.3 s with Numeric and 4.7 s with numarray. The main culprits for the slow-down were: * compress() - factor 10 * average() - factor 7 (average() is missing from Numeric and I hence had to write a little function myself) * LinearAlgebra.singular_value_decomposition() - factor 10 but a lot of extra time is also spent in uufunc.py and various numarraycore.py routines. Memory efficiency: I hoped numarray would solve some of the Out-of-memory problems that I get with Numeric but it turns out that it is rather less memory efficient for my kind of applications. Slicing an array that takes up 800MB on disc just about runs through with Numeric (and heavy swapping) but gives an Out-of-memory with numarray. Suggestions: OK, it's easy to make clever comments without contributing any real work... - compress(), take(), etc, really need some optimization - a C-coded average() routine would be helpful - faster LinearAlgebra routines are necessary Our sysadmin noted that unlike Numeric, numarray is not using any external math libraries (like LAPACK) that have been speed-optimized for decades and are available in CPU-optimized variants (e.g. ATLAS). It's probably difficult to match this efficiency with any new code ... Greetings Raik PS: I didn't find any useful HowTo for the translation from Numeric to numarray. The practical issues were the different nonzero() return value, the more restrictive boolean comparison, that take doesn't support 'O' arrays any longer, and the missing average(). -- ----------------------------------------------------- Raik Grünberg | Bioinformatique Structurale | Institut Pasteur | Paris, France -----------------------------------------------------
On Tue, 2004-10-05 at 10:41, Raik Grünberg wrote:
Our sysadmin noted that unlike Numeric, numarray is not using any external math libraries (like LAPACK) that have been speed-optimized for decades and are available in CPU-optimized variants (e.g. ATLAS). It's probably difficult to match this efficiency with any new code ...
This is a key point. Have a look at addons.py in numarray, some
previous comments on this list, and build numarray with the line
env USE_LAPACK=1 python setup.py build
after editing addons.py appropriately. You should see a major speed
improvement.
--
Stephen Walton
On Tuesday 05 October 2004 03:17 pm, Stephen Walton wrote:
On Tue, 2004-10-05 at 10:41, Raik Grünberg wrote:
Our sysadmin noted that unlike Numeric, numarray is not using any external math libraries (like LAPACK) that have been speed-optimized for decades and are available in CPU-optimized variants (e.g. ATLAS). It's probably difficult to match this efficiency with any new code ...
This is a key point. Have a look at addons.py in numarray, some previous comments on this list, and build numarray with the line
env USE_LAPACK=1 python setup.py build
after editing addons.py appropriately. You should see a major speed improvement.
I would kindly suggest updating the numarray documentation. In the section on installation, it is easy to overlook the option to compile againist existing libraries. That is explained in section 16, which appears to be out of date. The code listed in Packages/LinearAlgebra2/setup.py has been moved to addons.py, correct? -- Darren
On Tue, 2004-10-05 at 16:00, Darren Dale wrote:
On Tuesday 05 October 2004 03:17 pm, Stephen Walton wrote:
On Tue, 2004-10-05 at 10:41, Raik Grünberg wrote:
Our sysadmin noted that unlike Numeric, numarray is not using any external math libraries (like LAPACK) that have been speed-optimized for decades and are available in CPU-optimized variants (e.g. ATLAS). It's probably difficult to match this efficiency with any new code ...
This is a key point. Have a look at addons.py in numarray, some previous comments on this list, and build numarray with the line
env USE_LAPACK=1 python setup.py build
after editing addons.py appropriately. You should see a major speed improvement.
I would kindly suggest updating the numarray documentation.
Thanks, will do.
In the section on installation, it is easy to overlook the option to compile againist existing libraries. That is explained in section 16, which appears to be out of date. The code listed in Packages/LinearAlgebra2/setup.py has been moved to addons.py, correct?
That's correct. Regards, Todd
I hadn't seen this until now. It's hard for us to understand exactly the reasons for the slower performance with such large arrays. Could you send us the code and an indication of the what inputs and parameters were used so we could try to figure out why some of these problems exist (we can check the specific functions you mention, but I want to make sure you aren't iterating over array slices or such). It's not obvious to me why you are having out of memory errors and this may help. Perry Greenfield
-----Original Message----- From: numpy-discussion-admin@lists.sourceforge.net [mailto:numpy-discussion-admin@lists.sourceforge.net]On Behalf Of Raik Grünberg Sent: Tuesday, October 05, 2004 1:41 PM To: numpy-discussion@lists.sourceforge.net Subject: [Numpy-discussion] Numeric to numarray experiences
Hi there,
I've just translated a package for molecular modelling, which makes extensive use of Numeric, from Numeric to numarray. The outcome is somewhat negative - for now we are basically going to postpone the transition - the reasons might be interesting for the list and the numarray developpers out there (who are doing a brave job!).
Speed: A typical task in our package is the least-square fitting of a large array of coordinate frames ( N1 x N2 x 3) onto a set of reference or average coordinates (using a sub-set of coordinates for the matching). The example I looked at (500 x 876 x 3 items) took 1.3 s with Numeric and 4.7 s with numarray. The main culprits for the slow-down were: * compress() - factor 10 * average() - factor 7 (average() is missing from Numeric and I hence had to write a little function myself) * LinearAlgebra.singular_value_decomposition() - factor 10 but a lot of extra time is also spent in uufunc.py and various numarraycore.py routines.
Memory efficiency: I hoped numarray would solve some of the Out-of-memory problems that I get with Numeric but it turns out that it is rather less memory efficient for my kind of applications. Slicing an array that takes up 800MB on disc just about runs through with Numeric (and heavy swapping) but gives an Out-of-memory with numarray.
Suggestions: OK, it's easy to make clever comments without contributing any real work... - compress(), take(), etc, really need some optimization - a C-coded average() routine would be helpful - faster LinearAlgebra routines are necessary
Our sysadmin noted that unlike Numeric, numarray is not using any external math libraries (like LAPACK) that have been speed-optimized for decades and are available in CPU-optimized variants (e.g. ATLAS). It's probably difficult to match this efficiency with any new code ...
Greetings Raik
PS: I didn't find any useful HowTo for the translation from Numeric to numarray. The practical issues were the different nonzero() return value, the more restrictive boolean comparison, that take doesn't support 'O' arrays any longer, and the missing average().
-- ----------------------------------------------------- Raik Grünberg | Bioinformatique Structurale | Institut Pasteur | Paris, France -----------------------------------------------------
------------------------------------------------------- This SF.net email is sponsored by: IT Product Guide on ITManagersJournal Use IT products in your business? Tell us what you think of them. Give us Your Opinions, Get Free ThinkGeek Gift Certificates! Click to find out more http://productguide.itmanagersjournal.com/guidepromo.tmpl _______________________________________________ Numpy-discussion mailing list Numpy-discussion@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/numpy-discussion
participants (5)
-
Darren Dale
-
Perry Greenfield
-
Raik Grünberg
-
Stephen Walton
-
Todd Miller