Hi, I was wondering if there was any way to speed up the global import of numpy modules. For a simple import numpy, it takes ~250 ms. In comparison, importing Numeric is only taking 40 ms. It appears that even if you only import a numpy submodule, it loads all the libraries, resulting in the painful performance hit. Are there plans to speed up the importing of numpy, or at least have it not load libraries that aren't requested? Nathan
On Wed, Jul 2, 2008 at 17:43, Nathan Jensen <Nathan_Jensen@raytheon.com> wrote:
Hi,
I was wondering if there was any way to speed up the global import of numpy modules. For a simple import numpy, it takes ~250 ms. In comparison, importing Numeric is only taking 40 ms. It appears that even if you only import a numpy submodule, it loads all the libraries, resulting in the painful performance hit. Are there plans to speed up the importing of numpy,
I am not sure how much is possible.
or at least have it not load libraries that aren't requested?
At this point in time, it is too late to make such sweeping changes to the API. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco
On 2 Jul 2008, at 3:59 PM, Robert Kern wrote:
On Wed, Jul 2, 2008 at 17:43, Nathan Jensen <Nathan_Jensen@raytheon.com> wrote:
Hi,
I was wondering if there was any way to speed up the global import of numpy modules. For a simple import numpy, it takes ~250 ms. In comparison, importing Numeric is only taking 40 ms. It appears that even if you only import a numpy submodule, it loads all the libraries, resulting in the painful performance hit. Are there plans to speed up the importing of numpy,
I am not sure how much is possible.
or at least have it not load libraries that aren't requested?
At this point in time, it is too late to make such sweeping changes to the API.
One could use an environmental variable such as NUMPY_SUPPRESS_TOP_LEVEL_IMPORTS, that, if defined, suppresses the importing of unneeded packages. This would only affect systems that define this variable, thus not breaking the API but providing the flexibility for those that need it. (This or a similar variable could also contain a list of the numpy components to import automatically.) If you want to try this, just modify numpy/__init__.py with something like the following import os fast_import = if 'NUMPY_SUPPRESS_TOL_LEVEL_IMPORTS' in os.environ del os if fast_import: <customised imports> else: <standard imports etc.> del fast_import Michael.
On Wed, 2008-07-02 at 17:00 -0700, Michael McNeil Forbes wrote:
One could use an environmental variable such as NUMPY_SUPPRESS_TOP_LEVEL_IMPORTS, that, if defined, suppresses the importing of unneeded packages. This would only affect systems that define this variable, thus not breaking the API but providing the flexibility for those that need it. (This or a similar variable could also contain a list of the numpy components to import automatically.)
This does not sound like a good idea to me. It would mean that you effectively have two code paths depending on the environment variable, with more problems to support (people would use this option, but many other software would break without them knowing why; typically, scipy would not work anymore). I think that import numpy.core being slower than import numpy is a bug which can be solved without breaking anything, though. cheers, David
On Wed, Jul 2, 2008 at 20:23, David Cournapeau <cournapeau@cslab.kecl.ntt.co.jp> wrote:
I think that import numpy.core being slower than import numpy is a bug which can be solved without breaking anything, though.
It does not appear to be slower to me. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco
On Wed, 2008-07-02 at 21:21 -0500, Robert Kern wrote:
On Wed, Jul 2, 2008 at 20:23, David Cournapeau <cournapeau@cslab.kecl.ntt.co.jp> wrote:
I think that import numpy.core being slower than import numpy is a bug which can be solved without breaking anything, though.
It does not appear to be slower to me.
It isn't either on my computer. While we are talking about import timings, there was a system for lazy import at some point, right (this is when I first tried python and numpy a few years ago, so I may mix with something else) ? Because we could win between 20 and 40 % time of import by lazily importing a few modules (namely urllib, which I guess it not often used, and already takes around 20-30 ms; inspect and compiler are takinh a long time too, but maybe those are always needed, I have not checked carefully). Maybe this would be complicated to implement for numpy, though. cheers, David
On Wed, Jul 2, 2008 at 21:38, David Cournapeau <cournapeau@cslab.kecl.ntt.co.jp> wrote:
On Wed, 2008-07-02 at 21:21 -0500, Robert Kern wrote:
On Wed, Jul 2, 2008 at 20:23, David Cournapeau <cournapeau@cslab.kecl.ntt.co.jp> wrote:
I think that import numpy.core being slower than import numpy is a bug which can be solved without breaking anything, though.
It does not appear to be slower to me.
It isn't either on my computer.
So ... what were you referring to?
While we are talking about import timings, there was a system for lazy import at some point, right (this is when I first tried python and numpy a few years ago, so I may mix with something else) ?
There is special purpose code, yes. We used to use it to load proxy objects for scipy subpackages such that "import scipy" would have scipy.stats semi-immediately available. We have stopped using it because of fragility, confusing behavior at the interpreter, py2exe problems, and my general abhorrence of things which mess too deeply with imports. It is not a general-purpose solution for lazily-loading stdlib modules, I don't think.
Because we could win between 20 and 40 % time of import by lazily importing a few modules (namely urllib, which I guess it not often used, and already takes around 20-30 ms; inspect and compiler are takinh a long time too, but maybe those are always needed, I have not checked carefully). Maybe this would be complicated to implement for numpy, though.
These imports could easily be pushed down into the handful of functions that need them (with an appropriate comment about why they are down there). There is no need to have complicated machinery involved. Do you have a breakdown of the import costs? -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco
On Wed, 2008-07-02 at 21:50 -0500, Robert Kern wrote:
So ... what were you referring to?
To a former email from Matthieu in this thread (or Stefan ?).
There is special purpose code, yes. We used to use it to load proxy objects for scipy subpackages such that "import scipy" would have scipy.stats semi-immediately available. We have stopped using it because of fragility, confusing behavior at the interpreter, py2exe problems, and my general abhorrence of things which mess too deeply with imports. It is not a general-purpose solution for lazily-loading stdlib modules, I don't think.
I was afraid of something like this.
Because we could win between 20 and 40 % time of import by lazily importing a few modules (namely urllib, which I guess it not often used, and already takes around 20-30 ms; inspect and compiler are takinh a long time too, but maybe those are always needed, I have not checked carefully). Maybe this would be complicated to implement for numpy, though.
These imports could easily be pushed down into the handful of functions that need them (with an appropriate comment about why they are down there). There is no need to have complicated machinery involved.
Do you have a breakdown of the import costs?
I don't have the precise timings/scripts at the moment, but even by using really crude method: - urllib2 (in numpy.lib._datasource) by itself takes 30 ms from 180ms. That's an easy 20 % win, since it is not often called. - inspect in numpy.lib.utils: this cost around 25 ms If I just comment the above imports, I go from 180 to 120 ms. Then, something which takes a awful lot of time is finfo to get floating points limits. This takes like 30-40 ms. I wonder if there are some ways to make it faster. After that, there is no obvious spot I remember, but I can get them tonight when I go back to my lab. cheers, David
On Wed, Jul 2, 2008 at 23:14, David Cournapeau <cournapeau@cslab.kecl.ntt.co.jp> wrote:
On Wed, 2008-07-02 at 21:50 -0500, Robert Kern wrote:
So ... what were you referring to?
To a former email from Matthieu in this thread (or Stefan ?).
Neither one has participated in this thread. At least, no such email has made it to my inbox.
There is special purpose code, yes. We used to use it to load proxy objects for scipy subpackages such that "import scipy" would have scipy.stats semi-immediately available. We have stopped using it because of fragility, confusing behavior at the interpreter, py2exe problems, and my general abhorrence of things which mess too deeply with imports. It is not a general-purpose solution for lazily-loading stdlib modules, I don't think.
I was afraid of something like this.
Because we could win between 20 and 40 % time of import by lazily importing a few modules (namely urllib, which I guess it not often used, and already takes around 20-30 ms; inspect and compiler are takinh a long time too, but maybe those are always needed, I have not checked carefully). Maybe this would be complicated to implement for numpy, though.
These imports could easily be pushed down into the handful of functions that need them (with an appropriate comment about why they are down there). There is no need to have complicated machinery involved.
Do you have a breakdown of the import costs?
I don't have the precise timings/scripts at the moment, but even by using really crude method: - urllib2 (in numpy.lib._datasource) by itself takes 30 ms from 180ms. That's an easy 20 % win, since it is not often called. - inspect in numpy.lib.utils: this cost around 25 ms
If I just comment the above imports, I go from 180 to 120 ms.
I think it's worth moving these imports into the functions, then.
Then, something which takes a awful lot of time is finfo to get floating points limits. This takes like 30-40 ms. I wonder if there are some ways to make it faster. After that, there is no obvious spot I remember, but I can get them tonight when I go back to my lab.
They can all be turned into properties that look up in a cache first. iinfo already does this. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco
On Wed, 2008-07-02 at 23:36 -0500, Robert Kern wrote:
Neither one has participated in this thread. At least, no such email has made it to my inbox.
This was in the thread "import numpy" is slow, I mixed the two, sorry.
I think it's worth moving these imports into the functions, then.
Ok, will do it, then.
Then, something which takes a awful lot of time is finfo to get floating points limits. This takes like 30-40 ms. I wonder if there are some ways to make it faster. After that, there is no obvious spot I remember, but I can get them tonight when I go back to my lab.
They can all be turned into properties that look up in a cache first. iinfo already does this.
Yes, it is cached, but the first run is slow and seems to take a long time. As it is used as a default argument in numpy.ma.extras, it is run when you import numpy. Just to check, I set the default argument to None, and now import numpy is ~85ms instead of 180ms. 40ms to get the tiny attribute of float sounds slow, but maybe there is no way around it (maybe MachAr can be sped up a bit, but this looks like quite sensitive code). cheers, David
On Thu, 2008-07-03 at 13:56 +0900, David Cournapeau wrote:
Ok, will do it, then.
I put the patches in ticket 838. I tried to commit the changes directly, but it looks like they disabled some proxy settings necessary to commit to svn at my company. On my computer, the changes cut 1/3 of total numpy import time, which is not bad since the changes are trivial. cheers, David
On Wed, Jul 2, 2008 at 23:56, David Cournapeau <cournapeau@cslab.kecl.ntt.co.jp> wrote:
On Wed, 2008-07-02 at 23:36 -0500, Robert Kern wrote:
Neither one has participated in this thread. At least, no such email has made it to my inbox.
This was in the thread "import numpy" is slow, I mixed the two, sorry.
Ah yes. It's probably not worth tracking down.
I think it's worth moving these imports into the functions, then.
Ok, will do it, then.
I've checked in your changes with some modifications to the comments.
Then, something which takes a awful lot of time is finfo to get floating points limits. This takes like 30-40 ms. I wonder if there are some ways to make it faster. After that, there is no obvious spot I remember, but I can get them tonight when I go back to my lab.
They can all be turned into properties that look up in a cache first. iinfo already does this.
Yes, it is cached, but the first run is slow and seems to take a long time. As it is used as a default argument in numpy.ma.extras, it is run when you import numpy. Just to check, I set the default argument to None, and now import numpy is ~85ms instead of 180ms. 40ms to get the tiny attribute of float sounds slow, but maybe there is no way around it (maybe MachAr can be sped up a bit, but this looks like quite sensitive code).
So here are all of the places that use the computed finfo values on the module level: numpy.lib.polynomial: Makes _single_eps and _double_eps global but only uses them inside functions. numpy.ma.extras: Just imports _single_eps and _double_eps from numpy.lib.polynomial but uses them inside functions. numpy.ma.core: Makes a global divide_tolerance that is used as a default in a constructor. This class is then instantiated at the module's level. I think the first two are easily replaced by actual calls to finfo() inside their functions. Because of the _finfo_cache, this should be fast enough, and it makes the code cleaner. Currently numpy.lib.polynomial and numpy.ma.extras go through if: tests to determine which of _single_eps and _double_eps to use. The last one is a bit tricky. I've pushed down the computation into the actual __call__ where it is used. It caches the result. It's not ideal, but it works. I hope this is acceptable, Pierre. Here are my timings (Intel OS X 10.5.3 with numpy.linalg linked to Accelerate.framework; warm disk caches; taking the consistent user and system times from repeated executions; I'd ignore the wall-clock time): Before: $ time python -c "import numpy" python -c "import numpy" 0.30s user 0.82s system 91% cpu 1.232 total Removal of finfo: $ time python -c "import numpy" python -c "import numpy" 0.27s user 0.82s system 94% cpu 1.156 total Removal of finfo and delayed imports: $ time python -c "import numpy" python -c "import numpy" 0.19s user 0.56s system 93% cpu 0.811 total Not too shabby. Anyways, I've checked it all in. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco
On Thu, 2008-07-03 at 01:25 -0500, Robert Kern wrote:
Before: $ time python -c "import numpy" python -c "import numpy" 0.30s user 0.82s system 91% cpu 1.232 total
Removal of finfo: $ time python -c "import numpy" python -c "import numpy" 0.27s user 0.82s system 94% cpu 1.156 total
Removal of finfo and delayed imports: $ time python -c "import numpy" python -c "import numpy" 0.19s user 0.56s system 93% cpu 0.811 total
I don't know how much is due to the hardware and how much is due to OS differences, but in my case (Linux with core 2 duo), with your changes, it went from: real 0m0.184s user 0m0.146s sys 0m0.034s To real 0m0.081s user 0m0.056s sys 0m0.022s Definitely worthwhile (now, importing numpy has no noticeable latency on a fast computer, which feels nice). Thanks for committing the changes, David
Hardy, Core 2 Duo laptop, picking a typical score, warm disk caches. Before: maqroll[research]> time python -c 'import numpy' 0.180u 0.032s 0:00.20 105.0% 0+0k 0+0io 0pf+0w After: maqroll[research]> time python -c 'import numpy' 0.100u 0.032s 0:00.12 108.3% 0+0k 0+0io 0pf+0w Definitely a worthwhile improvement. Many thanks to all responsible! Cheers, f
participants (5)
-
David Cournapeau
-
Fernando Perez
-
Michael McNeil Forbes
-
Nathan Jensen
-
Robert Kern