![](https://secure.gravatar.com/avatar/f65ea1dfe7ef7b9f889ed6877fcc69b8.jpg?s=120&d=mm&r=g)
On Thu, Jul 31, 2008 at 10:02 PM, Robert Kern <robert.kern@gmail.com> wrote:
On Thu, Jul 31, 2008 at 05:43, Andrew Dalke <dalke@dalkescientific.com> wrote:
On Jul 31, 2008, at 12:03 PM, Robert Kern wrote:
But you still can't remove them since they are being used inside numerictypes. That's why I labeled them "internal utility functions" instead of leaving them with minimal docstrings such that you would have to guess.
My proposal is to replace that code with a table mapping the type name to the uppercase/lowercase/capitalized forms, thus eliminating the (small) amount of time needed to import string.
It makes adding new types slightly more difficult.
I know it's a tradeoff.
Probably not a bad one. Write up the patch, and then we'll see how much it affects the import time.
I would much rather that we discuss concrete changes like this rather than rehash the justifications of old decisions. Regardless of the merits about the old decisions (and I agreed with your position at the time), it's a pointless and irrelevant conversation. The decisions were made, and now we have a user base to whom we have promised not to break their code so egregiously again. The relevant conversation is what changes we can make now.
Some general guidelines:
1) Everything exposed by "from numpy import *" still needs to work. a) The layout of everything under numpy.core is an implementation detail. b) _underscored functions and explicitly labeled internal functions can probably be modified. c) Ask about specific functions when in doubt.
2) The improvement in import times should be substantial. Feel free to bundle up the optimizations for consideration.
3) Moving imports from module-level down into the functions where they are used is generally okay if we get a reasonable win from it. The local imports should be commented, explaining that they are made local in order to improve the import times.
4) __import__ hacks are off the table.
5) Proxy objects ... I would really like to avoid proxy objects. They have caused fragility in the past.
6) I'm not a fan of having environment variables control the way numpy gets imported, but I'm willing to consider it. For example, I might go for having proxy objects for linalg et al. *only* if a particular environment variable were set. But there had better be a very large improvement in import times.
I just want to say that I agree with Andrew that slow imports just suck. But it's not really that bad, for example on my system: In [1]: %time import numpy CPU times: user 0.11 s, sys: 0.01 s, total: 0.12 s Wall time: 0.12 s so that's ok. For comparison: In [1]: %time import sympy CPU times: user 0.12 s, sys: 0.02 s, total: 0.14 s Wall time: 0.14 s But I am still unhappy about it, I'd like if the package could import much faster, because it adds up, when you need to import 7 packages like that, it's suddenly 1s and that's just too much. But of course everything within the constrains that Robert has outlined. From the theoretical point of view, I don't understand why python cannot just import numpy (or any other package) immediatelly, and only at the moment the user actually access something, to import it in real. Mercurial uses a lazy import module, that does exactly this. Maybe that's an option? Look into mercurial/demandimport.py. Use it like this: In [1]: import demandimport In [2]: demandimport.enable() In [3]: %time import numpy CPU times: user 0.00 s, sys: 0.00 s, total: 0.00 s Wall time: 0.00 s That's pretty good, huh? :) Unfortunately, numpy cannot work with lazy import (yet): In [5]: %time from numpy import array ERROR: An unexpected error occurred while tokenizing input The following traceback may be corrupted or invalid The error message is: ('EOF in multi-line statement', (17, 0)) --------------------------------------------------------------------------- AttributeError Traceback (most recent call last) [skip] /usr/lib/python2.5/site-packages/numpy/lib/index_tricks.py in <module>() 14 import function_base 15 import numpy.core.defmatrix as matrix ---> 16 makemat = matrix.matrix 17 18 # contributed by Stefan van der Walt /home/ondra/ext/sympy/demandimport.pyc in __getattribute__(self, attr) 73 return object.__getattribute__(self, attr) 74 self._load() ---> 75 return getattr(self._module, attr) 76 def __setattr__(self, attr, val): 77 self._load() AttributeError: 'module' object has no attribute 'matrix' BTW, neither can SymPy. However, maybe it shows some possibilities and maybe it's possible to fix numpy to work with such a lazy import. On the other hand, I can imagine it can bring a lot more troubles, so it should probably only be optional. Ondrej