[Numpy-discussion] slow import of numpy modules

Thu Jul 3 02:25:10 EDT 2008

On Wed, Jul 2, 2008 at 23:56, David Cournapeau
<cournapeau at cslab.kecl.ntt.co.jp> wrote:
> On Wed, 2008-07-02 at 23:36 -0500, Robert Kern wrote:
>>
>> Neither one has participated in this thread. At least, no such email
>> has made it to my inbox.
>
> This was in the thread "import numpy" is slow, I mixed the two, sorry.

Ah yes. It's probably not worth tracking down.

>> I think it's worth moving these imports into the functions, then.
>
> Ok, will do it, then.

I've checked in your changes with some modifications to the comments.

>> > Then, something which takes a awful lot of time is finfo to get floating
>> > points limits. This takes like 30-40 ms. I wonder if there are some ways
>> > to make it faster. After that, there is no obvious spot I remember, but
>> > I can get them tonight when I go back to my lab.
>>
>> They can all be turned into properties that look up in a cache first.
>> iinfo already does this.
>
> Yes, it is cached, but the first run is slow and seems to take a long
> time. As it is used as a default argument in numpy.ma.extras, it is run
> when you import numpy. Just to check, I set the default argument to
> None, and now import numpy is ~85ms instead of 180ms. 40ms to get the
> tiny attribute of float sounds slow, but maybe there is no way around it
> (maybe MachAr can be sped up a bit, but this looks like quite sensitive
> code).

So here are all of the places that use the computed finfo values on
the module level:

  numpy.lib.polynomial:
    Makes _single_eps and _double_eps global but only uses them inside
functions.
  numpy.ma.extras:
    Just imports _single_eps and _double_eps from numpy.lib.polynomial
but uses them inside functions.
  numpy.ma.core:
    Makes a global divide_tolerance that is used as a default in a
constructor. This class is then instantiated at the module's level.

I think the first two are easily replaced by actual calls to finfo()
inside their functions. Because of the _finfo_cache, this should be
fast enough, and it makes the code cleaner. Currently
numpy.lib.polynomial and numpy.ma.extras go through if: tests to
determine which of _single_eps and _double_eps to use.

The last one is a bit tricky. I've pushed down the computation into
the actual __call__ where it is used. It caches the result. It's not
ideal, but it works. I hope this is acceptable, Pierre.

Here are my timings (Intel OS X 10.5.3 with numpy.linalg linked to
Accelerate.framework; warm disk caches; taking the consistent user and
system times from repeated executions; I'd ignore the wall-clock
time):

Before:
$ time python -c "import numpy"
python -c "import numpy"  0.30s user 0.82s system 91% cpu 1.232 total

Removal of finfo:
$ time python -c "import numpy"
python -c "import numpy"  0.27s user 0.82s system 94% cpu 1.156 total

Removal of finfo and delayed imports:
$ time python -c "import numpy"
python -c "import numpy"  0.19s user 0.56s system 93% cpu 0.811 total

Not too shabby. Anyways, I've checked it all in.

-- 
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth."
 -- Umberto Eco