
3) Use oprofile (http://oprofile.sourceforge.net/), which runs on Linux on a x86 processor. This is the approach that I've used here. oprofile is a combination of a kernel module for Linux, a daemon for collecting sample data, and several tools to analyse the samples. It periodically polls the processor performance counters, and records which code is running. It's a system-level profiler: it profiles _everything_ that's running on the system. One obstacle is that does require root access.
This looked fantastic so I tried it over the weekend. On Fedora Core 3, I couldn't get any information about numarray runtime (in the shared libraries), only Python. Ditto with Numeric, although from your post you apparently got great results including information on Numeric .so's. I'm curious: has anyone else tried this for numarray (or Numeric) on Fedora Core 3? Does anyone have a working profile script?
Numeric is faster
(with the check_array() feature deletion)
than numarray from CVS, but there seems to be regression.
(in numarray performance) Don't take this the wrong way, but how confident are you that the speed differences are real? (With my own benchmarking numbers, there is always too much fuzz to split hairs like this.)
Without check_array, Numeric is almost as fast as as numarray 1.1.1.
Remarks -------
- I'd rather have my speed than checks for NaN's. Have that in a separate function (I'm willing to write one), or do numarray-style processor flag checks (tougher).
- General plea: *please*, *please*, when releasing a library for which speed is a selling point, profile it first!
- doing the same profiling on numarray finds 15% of the time actually adding, 65% somewhere in python, and 15% in libc.
Part of this is because the numarray number protocol is still in Python.
- I'm still fiddling. Using the three-argument form of Numeric.add (so
add(a,b) and add(a,b,c) are what I've focused on for profiling numarray until the number protocol is moved to C. I've held off doing that because the numarray number protocol is complicated by subclassing issues I'm not sure are fully resolved. Regards, Todd