
On Aug 15, 2008, at 4:38 PM, Pauli Virtanen wrote:
I think you can still do something evil, like this:
import os if os.environ.get('NUMPY_VIA_API', '0') != '0': from numpy.lib.fromnumeric import * ...
But I'm not sure how many milliseconds must be gained to justify this...
I don't think it's enough. I don't like environmental variable tricks like that. My tests suggest: current SVN: 0.12 seconds my patch: 0.10 seconds removing some top-level imports: 0.09 seconds my patch and removing some additional top-level imports: 0.08 seconds (this is a guess)
First, I reverted my patch, so my import times went from 0.10 second to 0.12 seconds.
Second, I commented out the pure module imports from numpy/__init__.py
import linalg import fft import random import ctypeslib import ma import doc
The import time went to 0.089. Note that my patch also gets rid of "import doc" and "import ctypeslib", which take up a good chunk of time. The fft, linalg, and random libraries take 0.002 seconds each, and ma takes 0.007.
Not doing these imports makes code about 0.01 second faster than my patches, which shaved off 0.02 seconds. That 0.01 second comes from not importing the fft, linalg, and ma modules.
My patch does improve things in a few other places, so perhaps those other places adds another 0.01 seconds of performance.
Why can't things be better? Take a look at the slowest imports. (Note, times are inclusive of the children)
== Slowest (including children) == 0.089 numpy (None) 0.085 add_newdocs (numpy) 0.079 lib (add_newdocs) 0.041 type_check (lib) 0.040 numpy.core.numeric (type_check) 0.015 _internal (numpy.core.numeric) 0.014 numpy.testing (lib) 0.014 re (_internal) 0.010 unittest (numpy.testing) 0.010 numeric (numpy.core.numeric) 0.009 io (lib)
Most of the time is spent importing 'lib'.
Can that be made quicker? Not easily. "lib" is first imported in "add_newdocs". Personally, I want to get rid of add_newdocs and move the docstrings into the correct locations.
Stubbing the function out by adding
def add_newdoc(*args): pass
to the tops of add_newdocs.py saves 0.005 seconds, but if you try it out and remove the "import lib" from add_newdocs.py then you'll have to fix a cyclical dependency.
numpy/__init__.py: import core numpy/core/__init__.py: from defmatrix import * numpy/core/defmatrix.py: from numpy.lib.utils import issubdtype numpy/lib/__init__.py: from type_check import * numpy/lib/type_check.py: import numpy.core.numeric as _nx AttributeError: 'module' object has no attribute 'core'
The only way out of the loop is to have numpy/__init__.py import lib before importing core.
It's possible to clean up the code so this loop doesn't exist, and fix things so that fewer things are imported when some environment variable is set, but it doesn't look easy. Modules depend on other modules a bit too much to make me happy.
Andrew dalke@dalkescientific.com