[Numpy-discussion] "import numpy" performance

Mon Jul 2 16:34:48 EDT 2012

On Mon, Jul 2, 2012 at 8:17 PM, Andrew Dalke <dalke at dalkescientific.com> wrote:
> In this email I propose a few changes which I think are minor
> and which don't really affect the external NumPy API but which
> I think could improve the "import numpy" performance by at
> least 40%. This affects me because I and my clients use a
> chemistry toolkit which uses only NumPy arrays, and where
> we run short programs often on the command-line.
>
>
> In July of 2008 I started a thread about how "import numpy"
> was noticeably slow for one of my customers. They had
> chemical analysis software, often even run on a single
> molecular structure using command-line tools, and the
> several invocations with 0.1 seconds overhead was one of
> the dominant costs even when numpy wasn't needed.
>
> I fixed most of their problems by deferring numpy imports
> until needed. I remember well the Steve Jobs anecdote at
>   http://folklore.org/StoryView.py?project=Macintosh&story=Saving_Lives.txt
> and spent another day of my time in 2008 to identify the
> parts of the numpy import sequence which seemed excessive.
> I managed to get the import time down from 0.21 seconds to
> 0.08 seconds.
>
> Very little of that made it into NumPy.
>
>
> The three biggest changes I would like are:
>
> 1) remove "add_newdocs" and put the docstrings in the C code
>  'add_newdocs' still needs to be there,
>
> The code says:
>
> # This is only meant to add docs to objects defined in C-extension modules.
> # The purpose is to allow easier editing of the docstrings without
> # requiring a re-compile.
>
> However, the change log shows that there are relatively few commits
> to this module
>
>   Year    Number of commits
>   ====    =================
>   2012       8
>   2011      62
>   2010       9
>   2009      18
>   2008      17
>
> so I propose moving the docstrings to the C code, and perhaps
> leaving 'add_newdocs' there, but only used when testing new
> docstrings.

I don't have any opinion on how acceptable this would be, but I also
don't see a benchmark showing how much this would help?

> 2) Don't optimistically assume that all submodules are
> needed. For example, some current code uses
>
>>>> import numpy
>>>> numpy.fft.ifft
> <function ifft at 0x10199f578>
>
> (See a real-world example at
>   http://stackoverflow.com/questions/10222812/python-numpy-fft-and-inverse-fft
> )
>
> IMO, this optimizes the needs of the interactive shell
> NumPy author over the needs of the many-fold more people
> who don't spend their time in the REPL and/or don't need
> those extra features added to every NumPy startup. Please
> bear in mind that NumPy users of the first category will
> be active on the mailing list, go to SciPy conferences,
> etc. while members of the second category are less visible.
>
> I recognize that this is backwards incompatible, and will
> not change. However, I understand that "NumPy 2.0" is a
> glimmer in the future, which might be a natural place for
> a transition to the more standard Python style of
>
>   from numpy import fft
>
> Personally, I think the documentation now (if it doesn't
> already) should transition to use this form.

I think this ship has sailed, but it'd be worth looking into lazy
importing, where 'numpy.fft' isn't actually imported until someone
starts using it. There are a bunch of libraries that do this, and one
would have to fiddle to get compatibility with all the different
python versions and make sure you're not killing performance (might
have to be in C) but something along the lines of

class _FFTModule(object):
  def __getattribute__(self, name):
    mod = importlib.import_module("numpy.fft")
    _FFTModule.__getattribute__ = mod.__getattribute__
    return getattr(mod, name)
fft = _FFTModule()

> 3) Especially: don't always import 'numpy.testing'
>
> As far as I can tell, automatic import of this module
> is not needed, so is pure overhead for the vast majority
> of NumPy users. Unfortunately, there's a large number
> of user-facing 'test' and 'bench' bound methods acting
> as functions.
>
> from numpy.testing import Tester
> test = Tester().test
> bench = Tester().test
>
> They seem rather pointless to me but could be replaced
> with per-module functions like
>
> def test(...):
>    from numpy.testing import Tester
>    Tester().test(...)
>
>
>
>
> I have not worried about numpy import performance for
> 4 years. While I have been developing scientific software
> for 20 years, and in Python for 15 years, it has been
> in areas of biology and chemistry which don't use arrays.
> I use numpy for a day about once every two years, and
> so far I have had no reason to use scipy.
>
>
> This has changed.
>
> I talked with one of my clients last week. They (and I)
> use a chemistry toolkit called "RDKit". RDKit uses
> numpy as a way to store coordinate data for molecules.
> I checked with the package author and he confirms:
>
>   yeah, it's just using the homogenous array most of the time.
>
> My client complained about RDKit's high startup cost,
> due to the NumPy dependency. On my laptop, with a warm
> disk cache, it take 0.119s to "import rdkit". On a cold
> cache it can take 3 seconds. On their cluster filesytem,
> with a cold cache, it can take over 10 seconds.
>
> (I told them about zipimport. They will be looking into
> that as a solution. However, it doesn't easily help the other
> people who use the RDKit toolkit.)
>
>
> With instrumentation I found that 0.083s of the 0.119s
> is spent loading numpy.core.multiarray.

Sounds like this would be useful to profile. I have no idea why
importing this would need so much time, since the actual import just
requires filling in a few C structs. Could be the linker, could be
lots of things, profiling is useful.

> The slowest module
> import times are listed here, with the cumulative time for
> each module and the name of the (first) importing parent
> in parentheses:
>
> 0.119 rdkit
> 0.089 rdchem (pyPgSQL)
> 0.083 numpy.core.multiarray (rdchem)
> 0.038 add_newdocs (numpy.core.multiarray)
> 0.032 numpy.lib (add_newdocs)
> 0.023 type_check (numpy.lib)
> 0.023 numpy.core.numeric (type_check)
> 0.012 numpy.testing (numpy.core.numeric)
> 0.010 unittest (numpy.testing)
> 0.008 cDataStructs (pyPgSQL)
> 0.007 random (numpy.core.multiarray)
> 0.007 mtrand (random)
> 0.006 case (unittest)
> 0.005 rdmolfiles (pyPgSQL)
> 0.005 rdmolops (pyPgSQL)
> 0.005 difflib (case)
> 0.005 chebyshev (numpy.core.multiarray)
> 0.004 hermite (numpy.core.multiarray)
> 0.004 hermite_e (numpy.core.multiarray)
> 0.004 laguerre (numpy.core.multiarray)
>
> These timing were on a MacBook Pro I bought this year, using
> <module 'numpy' from '/Library/Python/2.7/site-packages/numpy-1.6.1-py2.7-macosx-10.7-intel.egg/numpy/__init__.pyc'>
>
>
>
> The minimal cheminformatics program is
>
> % time python -c 'from rdkit import Chem; print Chem.MolToSmiles(Chem.MolFromSmiles("OCC"))'
> CCO
> 0.126u 0.035s 0:00.16 93.7%     0+0k 0+0io 0pf+0w
>
> (This chemical structure doesn't contain coordinates,
> so numpy is pure overhead. However, other formats do
> contain 2D or 3D coordinates. None need hermite or
> other polynomials, which together are 10% of the
> wall-clock time.)
>
> With a hot disk cache, 1/2 of the time is spent importing.
> With a cold cache, this is worse. (Eg, trying again now, it
> takes 1.955 seconds to "import rdkit." Trying much later,
> it takes 1.17 seconds to "import numpy". My web browser
> windows and tabs have filled most of memory.)

If you want reproducible numbers for cold cache load times you should
drop caches explicitly -- I think on OS X you use 'purge' to do this.

> Real code is of course more complex than this trivial
> bit of code. But on the other hand, the typical
> development cycle is to write the code to work for one
> compound, get that working, and then run it on thousands
> of compounds. The typical algorithm will be less than 1
> second long, so during early development it's obvious that
> much of the run-time is dominated by the import time
> startup.
>
> My hope is to get the single-compound time down to
> under 0.1 seconds, or about 40% faster than it is now.
> Below the 0.1 second threshold, human factors studies
> show that people consider the time to be "instantaneous."
>
> I do not think I'll be able to shave of the full 0.06s
> which I want, but I can get close. If I can figure out
> how to get rid of add_newdocs (0.038s) and numpy.testing
> (0.012s) then I'll end up removing 0.05s. If I can remove
> automatic inclusion of the polynomial modules then I'm
> well into my goal.
>
> What can be done to make these changes to NumPy? What
> are the objections to my providing an updated set of
> patches removing add_newdocs and numpy.testing ?
>
> Cheers,
>
>
>                                 Andrew
>                                 dalke at dalkescientific.com
>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion