On Wed, Jul 2, 2008 at 23:56, David Cournapeau <cournapeau@cslab.kecl.ntt.co.jp> wrote:
On Wed, 2008-07-02 at 23:36 -0500, Robert Kern wrote:
Neither one has participated in this thread. At least, no such email has made it to my inbox.
This was in the thread "import numpy" is slow, I mixed the two, sorry.
Ah yes. It's probably not worth tracking down.
I think it's worth moving these imports into the functions, then.
Ok, will do it, then.
I've checked in your changes with some modifications to the comments.
Then, something which takes a awful lot of time is finfo to get floating points limits. This takes like 30-40 ms. I wonder if there are some ways to make it faster. After that, there is no obvious spot I remember, but I can get them tonight when I go back to my lab.
They can all be turned into properties that look up in a cache first. iinfo already does this.
Yes, it is cached, but the first run is slow and seems to take a long time. As it is used as a default argument in numpy.ma.extras, it is run when you import numpy. Just to check, I set the default argument to None, and now import numpy is ~85ms instead of 180ms. 40ms to get the tiny attribute of float sounds slow, but maybe there is no way around it (maybe MachAr can be sped up a bit, but this looks like quite sensitive code).
So here are all of the places that use the computed finfo values on the module level: numpy.lib.polynomial: Makes _single_eps and _double_eps global but only uses them inside functions. numpy.ma.extras: Just imports _single_eps and _double_eps from numpy.lib.polynomial but uses them inside functions. numpy.ma.core: Makes a global divide_tolerance that is used as a default in a constructor. This class is then instantiated at the module's level. I think the first two are easily replaced by actual calls to finfo() inside their functions. Because of the _finfo_cache, this should be fast enough, and it makes the code cleaner. Currently numpy.lib.polynomial and numpy.ma.extras go through if: tests to determine which of _single_eps and _double_eps to use. The last one is a bit tricky. I've pushed down the computation into the actual __call__ where it is used. It caches the result. It's not ideal, but it works. I hope this is acceptable, Pierre. Here are my timings (Intel OS X 10.5.3 with numpy.linalg linked to Accelerate.framework; warm disk caches; taking the consistent user and system times from repeated executions; I'd ignore the wall-clock time): Before: $ time python -c "import numpy" python -c "import numpy" 0.30s user 0.82s system 91% cpu 1.232 total Removal of finfo: $ time python -c "import numpy" python -c "import numpy" 0.27s user 0.82s system 94% cpu 1.156 total Removal of finfo and delayed imports: $ time python -c "import numpy" python -c "import numpy" 0.19s user 0.56s system 93% cpu 0.811 total Not too shabby. Anyways, I've checked it all in. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco