(Trying again now that I'm subscribed. BTW, there's no link to the subscription page from numpy.scipy.org .) The initial 'import numpy' loads a huge number of modules, even when I don't need them. Python 2.5 (r25:51918, Sep 19 2006, 08:49:13) [GCC 4.0.1 (Apple Computer, Inc. build 5341)] on darwin Type "help", "copyright", "credits" or "license" for more information.
import sys len(sys.modules) 28 import numpy len(sys.modules) 256 len([s for s in sorted(sys.modules) if 'numpy' in s]) 127 numpy.__version__ '1.1.0'
As a result, I assume that's the reason my program's startup cost is quite high. [josiah:~/src/fp] dalke% time python -c 'a=4' 0.014u 0.038s 0:00.05 80.0% 0+0k 0+1io 0pf+0w [josiah:~/src/fp] dalke% time python -c 'import numpy' 0.161u 0.279s 0:00.44 97.7% 0+0k 0+9io 0pf+0w My total runtime is something like 1.4 seconds, and the only thing I'm using NumPy for is to make an array of doubles that I can pass to a C extension. (I could use the array module or ctypes, but figured numpy is more useful for downstream code.) Why does numpy/__init__.py need to import all of these other modules and submodules? Any chance of cutting down on the number, in order to improve startup costs? Andrew dalke@dalkescientific.com
On Mon, Jun 30, 2008 at 18:32, Andrew Dalke <dalke@dalkescientific.com> wrote:
(Trying again now that I'm subscribed. BTW, there's no link to the subscription page from numpy.scipy.org .)
The initial 'import numpy' loads a huge number of modules, even when I don't need them.
Python 2.5 (r25:51918, Sep 19 2006, 08:49:13) [GCC 4.0.1 (Apple Computer, Inc. build 5341)] on darwin Type "help", "copyright", "credits" or "license" for more information.
import sys len(sys.modules) 28 import numpy len(sys.modules) 256 len([s for s in sorted(sys.modules) if 'numpy' in s]) 127 numpy.__version__ '1.1.0'
As a result, I assume that's the reason my program's startup cost is quite high.
[josiah:~/src/fp] dalke% time python -c 'a=4' 0.014u 0.038s 0:00.05 80.0% 0+0k 0+1io 0pf+0w [josiah:~/src/fp] dalke% time python -c 'import numpy' 0.161u 0.279s 0:00.44 97.7% 0+0k 0+9io 0pf+0w
My total runtime is something like 1.4 seconds, and the only thing I'm using NumPy for is to make an array of doubles that I can pass to a C extension. (I could use the array module or ctypes, but figured numpy is more useful for downstream code.)
Why does numpy/__init__.py need to import all of these other modules and submodules?
Strictly speaking, there is no *need* for any of it. It was a judgment call trading off import time for the convenience in fairly typical use cases which do use functions across the breadth of the library. Your use case isn't so typical and so suffers on the import time end of the balance.
Any chance of cutting down on the number, in order to improve startup costs?
Not at this point in time, no. That would break too much code. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco
On Jul 1, 2008, at 2:22 AM, Robert Kern wrote:
Your use case isn't so typical and so suffers on the import time end of the balance.
I'm working on my presentation for EuroSciPy. "Isn't so typical" seems to be a good summary of my first slide. :)
Any chance of cutting down on the number, in order to improve startup costs?
Not at this point in time, no. That would break too much code.
Understood. Thanks for the response, Andrew dalke@dalkescientific.com
Would it not be possible to import just the necessary module of numpy to meet the necessary functionality of your application. i.e. import numpy.core or whatever you're using you could even do: import numpy.core as numpy I think, to simplify your code, I'm no expert though. Hanni 2008/7/1 Andrew Dalke <dalke@dalkescientific.com>:
On Jul 1, 2008, at 2:22 AM, Robert Kern wrote:
Your use case isn't so typical and so suffers on the import time end of the balance.
I'm working on my presentation for EuroSciPy. "Isn't so typical" seems to be a good summary of my first slide. :)
Any chance of cutting down on the number, in order to improve startup costs?
Not at this point in time, no. That would break too much code.
Understood.
Thanks for the response,
Andrew dalke@dalkescientific.com
_______________________________________________ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Hi, IIRC, il you do import numpy.core as numpy, it starts by importing numpy, so it will be even slower. Matthieu 2008/7/1 Hanni Ali <hanni.ali@gmail.com>:
Would it not be possible to import just the necessary module of numpy to meet the necessary functionality of your application.
i.e.
import numpy.core
or whatever you're using
you could even do:
import numpy.core as numpy
I think, to simplify your code, I'm no expert though.
Hanni
2008/7/1 Andrew Dalke <dalke@dalkescientific.com>:
On Jul 1, 2008, at 2:22 AM, Robert Kern wrote:
Your use case isn't so typical and so suffers on the import time end of the balance.
I'm working on my presentation for EuroSciPy. "Isn't so typical" seems to be a good summary of my first slide. :)
Any chance of cutting down on the number, in order to improve startup costs?
Not at this point in time, no. That would break too much code.
Understood.
Thanks for the response,
Andrew dalke@dalkescientific.com
_______________________________________________ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
_______________________________________________ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
-- French PhD student Website : http://matthieu-brucher.developpez.com/ Blogs : http://matt.eifelle.com and http://blog.developpez.com/?blog=92 LinkedIn : http://www.linkedin.com/in/matthieubrucher
You are correct, it appears to take slightly longer to import numpy.core and longer again to import numpy.core as numpy I should obviously check first in future. Hanni 2008/7/1 Matthieu Brucher <matthieu.brucher@gmail.com>:
Hi,
IIRC, il you do import numpy.core as numpy, it starts by importing numpy, so it will be even slower.
Matthieu
2008/7/1 Hanni Ali <hanni.ali@gmail.com>:
Would it not be possible to import just the necessary module of numpy to meet the necessary functionality of your application.
i.e.
import numpy.core
or whatever you're using
you could even do:
import numpy.core as numpy
I think, to simplify your code, I'm no expert though.
Hanni
2008/7/1 Andrew Dalke <dalke@dalkescientific.com>:
On Jul 1, 2008, at 2:22 AM, Robert Kern wrote:
Your use case isn't so typical and so suffers on the import time end of the balance.
I'm working on my presentation for EuroSciPy. "Isn't so typical" seems to be a good summary of my first slide. :)
Any chance of cutting down on the number, in order to improve startup costs?
Not at this point in time, no. That would break too much code.
Understood.
Thanks for the response,
Andrew dalke@dalkescientific.com
_______________________________________________ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
_______________________________________________ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
-- French PhD student Website : http://matthieu-brucher.developpez.com/ Blogs : http://matt.eifelle.com and http://blog.developpez.com/?blog=92 LinkedIn : http://www.linkedin.com/in/matthieubrucher _______________________________________________ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
2008/7/1 Hanni Ali <hanni.ali@gmail.com>:
Would it not be possible to import just the necessary module of numpy to meet the necessary functionality of your application.
Matthieu Brucher responded:
IIRC, il you do import numpy.core as numpy, it starts by importing numpy, so it will be even slower.
which you can see if you start python with the "-v" option to display imports.
import numpy.core import numpy # directory /Library/Frameworks/Python.framework/ Versions/2.5/lib/python2.5/site-packages/numpy # /Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/ site-packages/numpy/__init__.pyc matches /Library/Frameworks/ Python.framework/Versions/2.5/lib/python2.5/site-packages/numpy/ __init__.py import numpy # precompiled from /Library/Frameworks/Python.framework/ Versions/2.5/lib/python2.5/site-packages/numpy/__init__.pyc # /Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/ site-packages/numpy/__config__.pyc matches /Library/Frameworks/ Python.framework/Versions/2.5/lib/python2.5/site-packages/numpy/ __config__.py import numpy.__config__ # precompiled from /Library/Frameworks/ Python.framework/Versions/2.5/lib/python2.5/site-packages/numpy/ __config__.pyc
... and many more Andrew dalke@dalkescientific.com
On Mon, Jun 30, 2008 at 18:32, Andrew Dalke <dalke@dalkescientific.com> wrote:
Why does numpy/__init__.py need to import all of these other modules and submodules? Any chance of cutting down on the number, in order to improve startup costs?
Can you try the SVN trunk? In another thread (it must be "numpy imports slowly!" week), David Cournapeau found some optimizations that could be done that don't affect the API. They seem to cut down my import times (on OS X) by about 1/3; on his Linux machine, it seems to be more. I would be interested to know how significantly it improves your use case. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco
On Jul 3, 2008, at 9:06 AM, Robert Kern wrote:
Can you try the SVN trunk?
Sure. Though did you know it's not easy to find how to get numpy from SVN? I had to go to the second page of Google, which linked to someone's talk. I expected to find a link to it at http://numpy.scipy.org/ . Just like I expected to find a link to the numpy mailing list. Okay, compiled. [josiah:numpy/build/lib.macosx-10.3-fat-2.5] dalke% time python -c 'pass' 0.015u 0.042s 0:00.06 83.3% 0+0k 0+0io 0pf+0w [josiah:numpy/build/lib.macosx-10.3-fat-2.5] dalke% time python -c 'import numpy' 0.084u 0.231s 0:00.33 93.9% 0+0k 0+8io 0pf+0w [josiah:numpy/build/lib.macosx-10.3-fat-2.5] dalke% Previously it took 0.44 seconds so it's now 24% faster.
I would be interested to know how significantly it improves your use case.
For one of my clients I wrote a tool to analyze import times. I don't have it, but here's something similar I just now whipped up: import time seen = set() import_order = [] elapsed_times = {} level = 0 parent = None children = {} def new_import(name, globals, locals, fromlist): global level, parent if name in seen: return old_import(name, globals, locals, fromlist) seen.add(name) import_order.append((name, level, parent)) t1 = time.time() old_parent = parent parent = name level += 1 module = old_import(name, globals, locals, fromlist) level -= 1 parent = old_parent t2 = time.time() elapsed_times[name] = t2-t1 return module old_import = __builtins__.__import__ __builtins__.__import__ = new_import import numpy parents = {} for name, level, parent in import_order: parents[name] = parent print "== Tree ==" for name, level,parent in import_order: print "%s%s: %.3f (%s)" % (" "*level, name, elapsed_times[name], parent) print "\n" print "== Slowest (including children) ==" slowest = sorted((t, name) for (name, t) in elapsed_times.items())[-20:] for elapsed_time, name in slowest[::-1]: print "%.3f %s (%s)" % (elapsed_time, name, parents[name]) The result using the version out of subversion is == Tree == numpy: 0.237 (None) numpy.__config__: 0.000 (numpy) version: 0.000 (numpy) os: 0.000 (version) imp: 0.000 (version) _import_tools: 0.024 (numpy) sys: 0.000 (_import_tools) glob: 0.024 (_import_tools) fnmatch: 0.020 (glob) re: 0.018 (fnmatch) sre_compile: 0.009 (re) _sre: 0.000 (sre_compile) sre_constants: 0.004 (sre_compile) sre_parse: 0.006 (re) copy_reg: 0.000 (re) add_newdocs: 0.156 (numpy) lib: 0.150 (add_newdocs) info: 0.000 (lib) numpy.version: 0.000 (lib) type_check: 0.091 (lib) ... many lines removed ... mtrand: 0.021 (numpy) ctypeslib: 0.024 (numpy) ctypes: 0.023 (ctypeslib) _ctypes: 0.003 (ctypes) gestalt: 0.013 (ctypes) ctypes._endian: 0.001 (ctypes) numpy.core._internal: 0.000 (ctypeslib) ma: 0.005 (numpy) extras: 0.001 (ma) numpy.lib.index_tricks: 0.000 (extras) numpy.lib.polynomial: 0.000 (extras) == Slowest (including children) == 0.237 numpy (None) 0.156 add_newdocs (numpy) 0.150 lib (add_newdocs) 0.091 type_check (lib) 0.090 numpy.core.numeric (type_check) 0.049 io (lib) 0.048 numpy.testing (numpy.core.numeric) 0.024 _import_tools (numpy) 0.024 ctypeslib (numpy) 0.024 glob (_import_tools) 0.023 ctypes (ctypeslib) 0.022 utils (numpy.testing) 0.022 difflib (utils) 0.021 mtrand (numpy) 0.020 fnmatch (glob) 0.020 _datasource (io) 0.020 tempfile (io) 0.018 re (fnmatch) 0.018 heapq (difflib) 0.013 gestalt (ctypes) This only reports the first time a module is imported so fixing, say, the 'glob' in _import_tools doesn't mean it won't appear elsewhere. Andrew dalke@dalkescientific.com
On Jul 4, 2008, at 2:22 PM, Andrew Dalke wrote:
[josiah:numpy/build/lib.macosx-10.3-fat-2.5] dalke% time python -c 'pass' 0.015u 0.042s 0:00.06 83.3% 0+0k 0+0io 0pf+0w [josiah:numpy/build/lib.macosx-10.3-fat-2.5] dalke% time python -c 'import numpy' 0.084u 0.231s 0:00.33 93.9% 0+0k 0+8io 0pf+0w [josiah:numpy/build/lib.macosx-10.3-fat-2.5] dalke%
For one of my clients I wrote a tool to analyze import times. I don't have it, but here's something similar I just now whipped up:
Based on those results I've been digging into the code trying to figure out why numpy imports so many files, and at the same time I've been trying to guess at the use case Robert Kern regards as typical when he wrote: Your use case isn't so typical and so suffers on the import time end of the balance and trying to figure out what code would break if those modules weren't all eagerly imported and were instead written as most other Python modules are written. I have two thoughts for why mega-importing might be useful: - interactive users get to do tab complete and see everything (eg, "import numpy" means "numpy.fft.ifft" works, without having to do "import numpy.fft" manually) - class inspectors don't need to to directory checks to find possible modules (This is a stretch, since every general purpose inspector I know of has to know how to frob the directories to find directories.) Are these the reasons numpy imports everything or are there other reasons? The first guess comes from the comment in numpy/__init__.py "The following sub-packages must be explicitly imported:" meaning, I take it, that the other modules (core, lib, random, linalg, fft, testing) do not need to be explicitly imported. Is the numpy recommendation that people should do: import numpy numpy.fft.ifft(data) ? If so, the documentation should be updated to say that "random", "ma", "ctypeslib" and several other libraries are included in that list. Why is the last so important that it should be in the top- level namespace? In my opinion, this assistance is counter to standard practice in effectively every other Python package. I don't see the benefit. You may ask if there are possible improvements. There's no obvious place taking up a bunch of time but there are plenty of small places which add up. For examples: 1) I wondered why 'cPickle' needed to be imported. One of the places it's used is numpy.lib.format which is only imported by numpy.lib.io. It's easy to defer the 'import format' to be inside the functions which need it. Note that io.py already defers the import of zipfile, so function-local imports are not inappropriate. 'io' imports 'tempfile', needing 0.016 seconds. This can be a deferred cost only incurred by those who use io.savez, which already has some function-local imports. The reason for the high import costs? Here's what tempfile itself imports. tempfile: 0.016 (io) errno: 0.000 (tempfile) random: 0.010 (tempfile) binascii: 0.003 (random) _random: 0.003 (random) fcntl: 0.003 (tempfile) thread: 0.000 (tempfile) (This is read as 'tempfile' is imported by 'io' and takes 0.016 seconds total, including all children, and the directly imported children of 'tempfile' are 'errno', 'random', 'fcntl' and 'thread'. 'random' imports 'binascii' and '_random'.) BTW, the load and save commands in io do an incorrect check. if isinstance(file, type("")): fid = _file(file,"rb") else: fid = file Filenames can be unicode strings. This test should either be isinstance(file, basestring) or not hasatttr(file, 'read') 2) What's the point of "add_newdocs"? According to the top of the module # This is only meant to add docs to objects defined in C- extension modules. # The purpose is to allow easier editing of the docstrings without # requiring a re-compile. which implies this aids development, but not deployment. The import takes a miniscule 0.006 seconds of the 0.225 ("import lib" and its subimports takes 0.141 seconds) but seems to add no direct end-user benefit. Shouldn't this documentation be pushed into the C code at least for each release? 3) I see that numpy/core/numerictypes.py imports 'string', which takes 0.008 seconds. I wondered why. It's part of "english_lower", "english_upper", and "english_capitalize", which are functions defined in that module. The implementation can't be improved, and using string.translate is the right approach. However, 3a) the two functions have no leading underscore and have docstrings to imply that this is part of the public API (although they are not included in __all__). Are they meant for general use? Note that english_capitalize is over-engineered for the use-case in that file. There are no empty type names, so the test "if s" is never false. 3b) there are only 33 types in that module so a hand-written lookup table mapping the name to the appropriate name/alias would work. Yes, it makes adding new types less than completely auomatic, but that's done rarely. Getting rid of these functions, and thus getting rid of the import speeds numpy startup time by 3.5%. 4) numpy.testing takes 0.041 seconds to import. The text I quoted above says that it's a numpy requirement that 'testing' always be imported, even though I'm hard pressed to figure out why that's important. Assuming it is important, 0.020 seconds is spent importing 'difflib' difflib: 0.020 (utils) heapq: 0.016 (difflib) itertools: 0.003 (heapq) operator: 0.003 (heapq) bisect: 0.005 (heapq) _bisect: 0.003 (bisect) _heapq: 0.003 (heapq) which is only used in numpy.testing.utils:assert_string . That can be deferred. Similarly, numpytest: 0.012 (numpy.testing) glob: 0.005 (numpytest) fnmatch: 0.002 (glob) shlex: 0.006 (numpytest) collections: 0.003 (shlex) numpy.testing.utils: 0.000 (numpytest) but notice that 'glob' while imported is never used in 'numpytest', and that 'shlex' can easily be a deferred import. This saves (for the common case) 0.01 seconds. 5) There some additional savings in _datasource _datasource: 0.016 (io) shutil: 0.003 (_datasource) stat: 0.000 (shutil) urlparse: 0.003 (_datasource) bz2: 0.003 (_datasource) gzip: 0.006 (_datasource) zlib: 0.003 (gzip) This module provides the "Datasource" class, which is accessed through "numpy.lib.io.Datasource". Deferring the 'bz2' and 'gzip' imports until needed saves 0.01 seconds. This will require some modification to the code more than shifting the import statement. These together add up to about 0.08 seconds, which is about 30% of the 'import numpy' cost. I could probably get another 0.05 seconds if I dug around more, but I can't without knowing what use case numpy is trying to achieve. Why are all those ancillary modules (testing, ctypeslib) eagerly loaded when there seems no need for that feature? Andrew dalke@dalkescientific.com
On Wed, Jul 30, 2008 at 4:12 PM, Andrew Dalke <dalke@dalkescientific.com> wrote:
4) numpy.testing takes 0.041 seconds to import. The text I quoted above says that it's a numpy requirement that 'testing' always be imported, even though I'm hard pressed to figure out why that's important.
I suppose it's necessary for providing the test() and bench() functions in subpackages, but I that isn't a good reason to impose upon all users the time required to set up numpy.testing.
Assuming it is important, 0.020 seconds is spent importing 'difflib'
difflib: 0.020 (utils) heapq: 0.016 (difflib) itertools: 0.003 (heapq) operator: 0.003 (heapq) bisect: 0.005 (heapq) _bisect: 0.003 (bisect) _heapq: 0.003 (heapq)
which is only used in numpy.testing.utils:assert_string . That can be deferred.
Similarly,
numpytest: 0.012 (numpy.testing) glob: 0.005 (numpytest) fnmatch: 0.002 (glob) shlex: 0.006 (numpytest) collections: 0.003 (shlex) numpy.testing.utils: 0.000 (numpytest)
but notice that 'glob' while imported is never used in 'numpytest', and that 'shlex' can easily be a deferred import. This saves (for the common case) 0.01 seconds.
Thanks for taking the time to find those; I just removed the unused glob and delayed the import of shlex, difflib, and inspect in numpy.testing.
On Jul 30, 2008, at 10:51 PM, Alan McIntyre wrote:
I suppose it's necessary for providing the test() and bench() functions in subpackages, but I that isn't a good reason to impose upon all users the time required to set up numpy.testing.
I just posted this in my reply to Stéfan, but I'll say it again here. numpy defines numpy.test numpy.bench and numpy.testing.test The two 'test's use the same implementation. This is a likely unneeded duplication and one should be removed. The choice depends on if people think the name should be 'numpy.test' or 'numpy.testing.test'. BTW, where's the on-line documentation for these functions? They are actually bound methods, and I wondered if the doc programs handle them okay. If they should be top-level functions then I would prefer the be actual functions to hide an import. In that case, replace from testing import Tester test = Tester().test with def test(label='fast', verbose=1, extra_argv=None, doctests=False, coverage=False, **kwargs): from testing import Tester Tester.test(label, verbose, extra_argv, doctests, coverage, **kwargs) or something similar. This would keep the API unchanged (assuming those are important in the top-level) and reduce the number of imports. Else I would keep/move them in 'numpy.testing' and require that if someone wants to use 'test' or 'bench' then to get them after a 'from numpy import testing'.
Thanks for taking the time to find those; I just removed the unused glob and delayed the import of shlex, difflib, and inspect in numpy.testing.
Thanks! Andrew dalke@dalkescientific.com
On Wed, Jul 30, 2008 at 8:19 PM, Andrew Dalke <dalke@dalkescientific.com> wrote:
numpy defines
numpy.test numpy.bench
and
numpy.testing.test
The two 'test's use the same implementation. This is a likely unneeded duplication and one should be removed. The choice depends on if people think the name should be 'numpy.test' or 'numpy.testing.test'.
They actually do two different things; numpy.test() runs test for all of numpy, and numpy.testing.test() runs tests for numpy.testing only. There are similar functions in numpy.lib, numpy.core, etc.
On Jul 31, 2008, at 4:21 AM, Alan McIntyre wrote:
They actually do two different things; numpy.test() runs test for all of numpy, and numpy.testing.test() runs tests for numpy.testing only. There are similar functions in numpy.lib, numpy.core, etc.
Really? This is the code from numpy/__init__.py: from testing import Tester test = Tester().test bench = Tester().bench This is the code from numpy/testing/__init__.py: test = Tester().test ... ahhh, here's the magic, from testing/nosetester.py:NoseTester if package is None: f = sys._getframe(1) package = f.f_locals.get('__file__', None) assert package is not None package = os.path.dirname(package) Why are 'test' and 'bench' part of the general API instead something only used during testing? Andrew dalke@dalkescientific.com
I'm working on the patches for reducing the import overhead. I want to make sure I don't break anything. I'm trying to figure out how to run all of the tests. I expected, based on the following Alan McIntyre wrote:
They actually do two different things; numpy.test() runs test for all of numpy, and numpy.testing.test() runs tests for numpy.testing only. There are similar functions in numpy.lib, numpy.core, etc.
Robert Kern wrote:
By now, we have most of the denizens here trained to do numpy.test() when testing their new installations.
README:
After installation, tests can be run (from outside the source directory) with:
python -c 'import numpy; numpy.test()'
that 'numpy.test()' runs everthing. When I run numpy.test() I don't seem to run all of the tests. That is, I don't see the output I get when I run numpy.lib.test() . Here's a copy of my output, to show you what I mean. Also, I can't figure out what when I run a test I get a new Python prompt. Python 2.5 (r25:51918, Sep 19 2006, 08:49:13) [GCC 4.0.1 (Apple Computer, Inc. build 5341)] on darwin Type "help", "copyright", "credits" or "license" for more information.
import numpy numpy.test() Running unit tests for numpy NumPy version 1.2.0.dev5595 NumPy is installed in /Library/Frameworks/Python.framework/Versions/ 2.5/lib/python2.5/site-packages/numpy Python version 2.5 (r25:51918, Sep 19 2006, 08:49:13) [GCC 4.0.1 (Apple Computer, Inc. build 5341)] nose version 0.10.3 Not implemented: Defined_Binary_Op Not implemented: Defined_Binary_Op Defined_Operator not defined used by Generic_Spec Needs match implementation: Allocate_Stmt Needs match implementation: Associate_Construct .... many lines removed .... Needs match implementation: Target_Stmt Needs match implementation: Type_Bound_Procedure_Part Needs match implementation: Where_Construct
Nof match implementation needs: 51 out of 224 Nof tests needs: 224 out of 224 Total number of classes: 529 ----- No module named test_derived_scalar_ext , recompiling test_derived_scalar_ext. Parsing '/tmp/tmpyQVvVI.f90'.. Generating interface for test_derived_scalar_ext Subroutine: f2py_test_derived_scalar_ext_foo Generating interface for test_derived_scalar_ext.myt: f2py_type_myt_32 Generating interface for Integer: npy_int32 Generating interface for test_derived_scalar_ext Subroutine: f2py_test_derived_scalar_ext_f2pywrap_foo2 setup arguments: ' build_ext --build-temp tmp/ext_temp --build-lib tmp build_clib --build-temp tmp/clib_temp --build-clib tmp/clib_clib' running build_ext running build_src building library "test_derived_scalar_ext_fortran_f2py" sources building library "test_derived_scalar_ext_f_wrappers_f2py" sources building extension "test_derived_scalar_ext" sources running build_clib customize UnixCCompiler customize UnixCCompiler using build_clib customize NAGFCompiler ... 12 lines removed as it tries to find a compiler ... customize Gnu95FCompiler Found executable /usr/local/bin/gfortran customize Gnu95FCompiler customize Gnu95FCompiler using build_clib building 'test_derived_scalar_ext_fortran_f2py' library compiling Fortran sources Fortran f77 compiler: /usr/local/bin/gfortran -Wall -ffixed-form -fno- second-underscore -fPIC -O3 -funroll-loops Fortran f90 compiler: /usr/local/bin/gfortran -Wall -fno-second- underscore -fPIC -O3 -funroll-loops Fortran fix compiler: /usr/local/bin/gfortran -Wall -ffixed-form -fno- second-underscore -Wall -fno-second-underscore -fPIC -O3 -funroll-loops creating tmp/clib_temp creating tmp/clib_temp/tmp compile options: '-c' gfortran:f90: /tmp/tmpyQVvVI.f90 creating tmp/clib_clib ar: adding 1 object files to tmp/clib_clib/ libtest_derived_scalar_ext_fortran_f2py.a ranlib:@ tmp/clib_clib/libtest_derived_scalar_ext_fortran_f2py.a building 'test_derived_scalar_ext_f_wrappers_f2py' library compiling Fortran sources Fortran f77 compiler: /usr/local/bin/gfortran -Wall -ffixed-form -fno- second-underscore -fPIC -O3 -funroll-loops Fortran f90 compiler: /usr/local/bin/gfortran -Wall -fno-second- underscore -fPIC -O3 -funroll-loops Fortran fix compiler: /usr/local/bin/gfortran -Wall -ffixed-form -fno- second-underscore -Wall -fno-second-underscore -fPIC -O3 -funroll-loops compile options: '-c' gfortran:f90: tmp/test_derived_scalar_ext_f_wrappers_f2py.f90 ar: adding 1 object files to tmp/clib_clib/ libtest_derived_scalar_ext_f_wrappers_f2py.a ranlib:@ tmp/clib_clib/libtest_derived_scalar_ext_f_wrappers_f2py.a customize UnixCCompiler customize UnixCCompiler using build_ext ... about 15 lines removed ... customize Gnu95FCompiler customize Gnu95FCompiler using build_ext building 'test_derived_scalar_ext' extension compiling C sources C compiler: gcc -arch ppc -arch i386 -isysroot /Developer/SDKs/ MacOSX10.4u.sdk -fno-strict-aliasing -Wno-long-double -no-cpp-precomp -mno-fused-madd -fno-common -dynamic -DNDEBUG -g -O3 creating tmp/ext_temp creating tmp/ext_temp/tmp compile options: '-I/Library/Frameworks/Python.framework/Versions/2.5/ lib/python2.5/site-packages/numpy/core/include -I/Library/Frameworks/ Python.framework/Versions/2.5/include/python2.5 -c' gcc: tmp/test_derived_scalar_extmodule.c /usr/local/bin/gfortran -Wall -Wall -undefined dynamic_lookup -bundle tmp/ext_temp/tmp/test_derived_scalar_extmodule.o -L/usr/local/lib/gcc/ i686-apple-darwin8/4.2.1 -Ltmp/clib_clib - ltest_derived_scalar_ext_f_wrappers_f2py - ltest_derived_scalar_ext_fortran_f2py -lgfortran -o tmp/ test_derived_scalar_ext.so Removing build directory tmp/ext_temp Removing build directory tmp/clib_temp Removing build directory tmp/clib_clib Python 2.5 (r25:51918, Sep 19 2006, 08:49:13) [GCC 4.0.1 (Apple Computer, Inc. build 5341)] on darwin Type "help", "copyright", "credits" or "license" for more information.
import numpy.lib numpy.lib.test() Running unit tests for numpy.lib NumPy version 1.2.0.dev5595 NumPy is installed in /Library/Frameworks/Python.framework/Versions/ 2.5/lib/python2.5/site-packages/numpy Python version 2.5 (r25:51918, Sep 19 2006, 08:49:13) [GCC 4.0.1 (Apple Computer, Inc. build 5341)] nose version 0.10.3 ........................................................................ ........................................................................ ........................................................................ ........................................................................ ........................................................................ ........................................................................ ........................................................................ ........................................................................ ........................................................................ ........................................................................ ........................................................................ ........................................................................ ........................................................................ .................................................................
Ran 1001 tests in 4.497s OK <nose.result.TextTestResult run=1001 errors=0 failures=0> Also, when I do the suggested python -c 'import numpy; numpy.test()' I get output very much like when I did things interactively, except the last lines of the output are: ... gcc: tmp/test_derived_scalar_extmodule.c /usr/local/bin/gfortran -Wall -Wall -undefined dynamic_lookup -bundle tmp/ext_temp/tmp/test_derived_scalar_extmodule.o -L/usr/local/lib/gcc/ i686-apple-darwin8/4.2.1 -Ltmp/clib_clib - ltest_derived_scalar_ext_f_wrappers_f2py - ltest_derived_scalar_ext_fortran_f2py -lgfortran -o tmp/ test_derived_scalar_ext.so Removing build directory tmp/ext_temp Removing build directory tmp/clib_temp Removing build directory tmp/clib_clib Argument expected for the -c option usage: /Library/Frameworks/Python.framework/Versions/2.5/Resources/ Python.app/Contents/MacOS/Python [option] ... [-c cmd | -m mod | file | -] [arg] ... Try `python -h' for more information. Andrew dalke@dalkescientific.com
On Mon, Aug 4, 2008 at 18:15, Andrew Dalke <dalke@dalkescientific.com> wrote:
I'm working on the patches for reducing the import overhead. I want to make sure I don't break anything. I'm trying to figure out how to run all of the tests. I expected, based on the following
Alan McIntyre wrote:
They actually do two different things; numpy.test() runs test for all of numpy, and numpy.testing.test() runs tests for numpy.testing only. There are similar functions in numpy.lib, numpy.core, etc.
Robert Kern wrote:
By now, we have most of the denizens here trained to do numpy.test() when testing their new installations.
README:
After installation, tests can be run (from outside the source directory) with:
python -c 'import numpy; numpy.test()'
that 'numpy.test()' runs everthing.
When I run numpy.test() I don't seem to run all of the tests. That is, I don't see the output I get when I run numpy.lib.test() . Here's a copy of my output, to show you what I mean.
You have old stuff in your checkout/installation. Make sure you have deleted all of the *.pycs and directories which have been deleted in SVN.
Also, I can't figure out what when I run a test I get a new Python prompt.
I can't parse that sentence. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco
On Aug 5, 2008, at 2:00 AM, Robert Kern wrote:
You have old stuff in your checkout/installation. Make sure you have deleted all of the *.pycs and directories which have been deleted in SVN.
I removed all .pyc files, wiped my installation directory, and it works now as I expect it to work.
Also, I can't figure out what when I run a test I get a new Python prompt.
I can't parse that sentence.
Mmm, I wrote "what" when I mean "why". In my earlier output from the interactive shell you could see at the end that a new Python session started. I was in the new shell, and couldn't go backwards to get to the 'numpy.test()' I had just executed. With the removal of old stuff, I no longer have that problem. Thanks for the fix, Andrew dalke@dalkescientific.com
On Aug 5, 2008, at 2:00 AM, Robert Kern wrote:
You have old stuff in your checkout/installation. Make sure you have deleted all of the *.pycs and directories which have been deleted in SVN.
Now that I've fixed that, I can tell that I made a mistake related to the self-test code. I can't figure it out. Many of the modules have code which look like from testing import Tester test = Tester().test bench = Tester().bench As I understand it, this code is migrating to use nosetests. Because people expect 'import numpy; numpy.test()' to work, there will be a compatibility period where this API is unchanged. I found that importing 'testing' costs 0.013 seconds, which is 10% of my current import time. I would like to defer the import until needed, so I rewrote the 'test' code as: def test(label='fast', verbose=1, extra_argv=None, doctests=False, coverage=False, **kwargs): from testing import Tester import numpy Tester(numpy).test(label, verbose, extra_argv, doctests, coverage, **kwargs) In my view there's no difference between them, but Tester().test does introspection to figure out the module location. (In fact, if I don't pass it the module explicitly then it expects that locals() ["__file__"] for sys._getframe(-1) will exist, which is not the case with my function. The underling code should instead check for that variable in globals().) I ended up with recursion errors, and I don't know why. Any idea of what to do? [josiah:~] dalke% python -c 'import numpy; numpy.test()' Running unit tests for numpy NumPy version 1.2.0.dev5607 NumPy is installed in /Library/Frameworks/Python.framework/Versions/ 2.5/lib/python2.5/site-packages/numpy Python version 2.5 (r25:51918, Sep 19 2006, 08:49:13) [GCC 4.0.1 (Apple Computer, Inc. build 5341)] nose version 0.10.3 Running unit tests for numpy NumPy version 1.2.0.dev5607 NumPy is installed in /Library/Frameworks/Python.framework/Versions/ 2.5/lib/python2.5/site-packages/numpy Python version 2.5 (r25:51918, Sep 19 2006, 08:49:13) [GCC 4.0.1 (Apple Computer, Inc. build 5341)] nose version 0.10.3 Running unit tests for numpy NumPy version 1.2.0.dev5607 NumPy is installed in /Library/Frameworks/Python.framework/Versions/ 2.5/lib/python2.5/site-packages/numpy .... Python version 2.5 (r25:51918, Sep 19 2006, 08:49:13) [GCC 4.0.1 (Apple Computer, Inc. build 5341)] nose version 0.10.3 EEEEEEEEEEE ====================================================================== ERROR: numpy.test ---------------------------------------------------------------------- Traceback (most recent call last): File "/Library/Frameworks/Python.framework/Versions/2.5/lib/ python2.5/site-packages/nose-0.10.3-py2.5.egg/nose/case.py", line 182, in runTest self.test(*self.arg) File "/Library/Frameworks/Python.framework/Versions/2.5/lib/ python2.5/site-packages/numpy/__init__.py", line 107, in test coverage, **kwargs) File "/Users/dalke/cvses/numpy/build/lib.macosx-10.3-fat-2.5/numpy/ testing/nosetester.py", line 270, in test t = NumpyTestProgram(argv=argv, exit=False, plugins=plugins) File "/Library/Frameworks/Python.framework/Versions/2.5/lib/ python2.5/site-packages/nose-0.10.3-py2.5.egg/nose/core.py", line 219, in __init__ argv=argv, testRunner=testRunner, testLoader=testLoader) File "/Library/Frameworks/Python.framework/Versions/2.5/lib/ python2.5/unittest.py", line 758, in __init__ self.parseArgs(argv) ... return self.call(*arg, **kw) File "/Library/Frameworks/Python.framework/Versions/2.5/lib/ python2.5/site-packages/nose-0.10.3-py2.5.egg/nose/plugins/ manager.py", line 145, in simple result = meth(*arg, **kw) File "/Library/Frameworks/Python.framework/Versions/2.5/lib/ python2.5/site-packages/nose-0.10.3-py2.5.egg/nose/plugins/ attrib.py", line 214, in wantMethod return self.validateAttrib(attribs) File "/Library/Frameworks/Python.framework/Versions/2.5/lib/ python2.5/site-packages/nose-0.10.3-py2.5.egg/nose/plugins/ attrib.py", line 164, in validateAttrib if not value(key, attribs): File "/Library/Frameworks/Python.framework/Versions/2.5/lib/ python2.5/site-packages/nose-0.10.3-py2.5.egg/nose/plugins/ attrib.py", line 118, in eval_in_context return eval(expr, None, ContextHelper(attribs)) File "<string>", line 1, in <module> File "/Library/Frameworks/Python.framework/Versions/2.5/lib/ python2.5/site-packages/nose-0.10.3-py2.5.egg/nose/plugins/ attrib.py", line 50, in __getitem__ return self.obj.get(name, False) File "/Library/Frameworks/Python.framework/Versions/2.5/lib/ python2.5/site-packages/nose-0.10.3-py2.5.egg/nose/plugins/ attrib.py", line 66, in get log.debug('Get %s from %s.%s', name, self.cls, self.method) RuntimeError: maximum recursion depth exceeded
Andrew dalke@dalkescientific.com
On Mon, Aug 4, 2008 at 20:41, Andrew Dalke <dalke@dalkescientific.com> wrote:
On Aug 5, 2008, at 2:00 AM, Robert Kern wrote:
You have old stuff in your checkout/installation. Make sure you have deleted all of the *.pycs and directories which have been deleted in SVN.
Now that I've fixed that, I can tell that I made a mistake related to the self-test code. I can't figure it out.
Many of the modules have code which look like
from testing import Tester test = Tester().test bench = Tester().bench
As I understand it, this code is migrating to use nosetests. Because people expect 'import numpy; numpy.test()' to work, there will be a compatibility period where this API is unchanged.
I found that importing 'testing' costs 0.013 seconds, which is 10% of my current import time. I would like to defer the import until needed, so I rewrote the 'test' code as:
def test(label='fast', verbose=1, extra_argv=None, doctests=False, coverage=False, **kwargs): from testing import Tester import numpy Tester(numpy).test(label, verbose, extra_argv, doctests, coverage, **kwargs)
In my view there's no difference between them, but Tester().test does introspection to figure out the module location. (In fact, if I don't pass it the module explicitly then it expects that locals() ["__file__"] for sys._getframe(-1) will exist, which is not the case with my function. The underling code should instead check for that variable in globals().)
Probably. Or take a level= argument to tell _getframe() to go up an extra level. Or both.
I ended up with recursion errors, and I don't know why. Any idea of what to do?
Ah. My guess is that the test collector sees numpy.test() as a function that matches its regex for a unit test. It used to be a bound method, so I think nose ignored it, then. You should be able to tell nose not to collect it like so: def test(...): ... test.__test__ = False -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco
On Aug 5, 2008, at 4:19 AM, Robert Kern wrote:
def test(...): ... test.__test__ = False
That did it - thanks! Does "import numpy; numpy.bench()" work for anyone? When I try it I get [josiah:~] dalke% python -c 'import numpy; numpy.bench()' ---------------------------------------------------------------------- Ran 0 tests in 0.003s OK I can go ahead and remove those if they don't work for anyone. Andrew dalke@dalkescientific.com
At the moment, bench() doesn't work. That's something I'll try to look at this week, but from Friday until the 15th I'm going to be driving most of the time and may not get as much done as I'd like. On 8/5/08, Andrew Dalke <dalke@dalkescientific.com> wrote:
On Aug 5, 2008, at 4:19 AM, Robert Kern wrote:
def test(...): ... test.__test__ = False
That did it - thanks!
Does "import numpy; numpy.bench()" work for anyone? When I try it I get
[josiah:~] dalke% python -c 'import numpy; numpy.bench()'
---------------------------------------------------------------------- Ran 0 tests in 0.003s
OK
I can go ahead and remove those if they don't work for anyone.
Andrew dalke@dalkescientific.com
_______________________________________________ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
On Aug 5, 2008, at 3:53 PM, Alan McIntyre wrote:
At the moment, bench() doesn't work. That's something I'll try to look at this week, but from Friday until the 15th I'm going to be driving most of the time and may not get as much done as I'd like.
Thanks for the confirmation. The import speedup patch I just submitted keeps the 'bench' definitions there (including one that was missing). But instead of defining it as a bound method, I used functions that import the testing submodule and construct/call the right objects. It should behave the same. I think. Andrew dalke@dalkescientific.com
2008/7/30 Andrew Dalke <dalke@dalkescientific.com>:
Based on those results I've been digging into the code trying to figure out why numpy imports so many files, and at the same time I've been trying to guess at the use case Robert Kern regards as typical when he wrote:
Your use case isn't so typical and so suffers on the import time end of the balance
I.e. most people don't start up NumPy all the time -- they import NumPy, and then do some calculations, which typically take longer than the import time.
and trying to figure out what code would break if those modules weren't all eagerly imported and were instead written as most other Python modules are written.
For a benefit of 0.03s, I don't think it's worth it.
I have two thoughts for why mega-importing might be useful:
- interactive users get to do tab complete and see everything (eg, "import numpy" means "numpy.fft.ifft" works, without having to do "import numpy.fft" manually)
Numpy has a very flat namespace, for better or worse, which implies many imports. This can't be easily changed without modifying the API.
Is the numpy recommendation that people should do:
import numpy numpy.fft.ifft(data)
That's the way many people use it.
? If so, the documentation should be updated to say that "random", "ma", "ctypeslib" and several other libraries are included in that list.
Thanks for pointing that out, I'll edit the documentation wiki.
Why is the last so important that it should be in the top- level namespace?
It's a single Python file -- does it make much of a difference?
In my opinion, this assistance is counter to standard practice in effectively every other Python package. I don't see the benefit.
How do you propose we change this?
BTW, the load and save commands in io do an incorrect check.
if isinstance(file, type("")): fid = _file(file,"rb") else: fid = file
Thanks, fixed. [snip lots of suggestions]
Getting rid of these functions, and thus getting rid of the import speeds numpy startup time by 3.5%.
While I appreciate you taking the time to find these niggles, but we are short on developer time as it is. Asking them to spend their precious time on making a 3.5% improvement in startup time does not make much sense. If you provide a patch, on the other hand, it would only take a matter of seconds to decide whether to apply or not. You've already done most of the sleuth work.
I could probably get another 0.05 seconds if I dug around more, but I can't without knowing what use case numpy is trying to achieve. Why are all those ancillary modules (testing, ctypeslib) eagerly loaded when there seems no need for that feature?
Need is relative. You need fast startup time, but most of our users need quick access to whichever functions they want (and often use from an interactive terminal). I agree that "testing" and "ctypeslib" do not belong in that category, but they don't seem to do much harm either. Regards Stéfan
On Jul 30, 2008, at 10:59 PM, Stéfan van der Walt wrote:
I.e. most people don't start up NumPy all the time -- they import NumPy, and then do some calculations, which typically take longer than the import time.
Is that interactively, or is that through programs?
For a benefit of 0.03s, I don't think it's worth it.
The final number with all the hundredths of a second added up to 0.08 seconds, which was about 30% of the 'import numpy' cost.
Numpy has a very flat namespace, for better or worse, which implies many imports.
I don't get the feeling that numpy is flat. Python's stdlib is flat. Numpy has many 2- and 3-level modules.
Is the numpy recommendation that people should do:
import numpy numpy.fft.ifft(data)
That's the way many people use it.
The normal Python way is: from numpy import fft fft.ifft(data) because in most packages, parent modules don't import all of their children. I acknowledge that existing numpy code will break with my desired change, as this example from the tutorial import numpy import pylab # Build a vector of 10000 normal deviates with variance 0.5^2 and mean 2 mu, sigma = 2, 0.5 v = numpy.random.normal(mu,sigma,10000) and I am not saying to change this code. Instead, I am asking for limits on the eagerness, with a long-term goal of minimizing its use.
Why is [ctypeslib] so important that it should be in the top- level namespace?
It's a single Python file -- does it make much of a difference?
The file imports other files. Here's the import chain: ctypeslib: 0.047 (numpy) ctypes: -1.000 (ctypeslib) _ctypes: 0.003 (ctypes) gestalt: -1.000 (ctypes) ma: 0.005 (numpy) extras: 0.001 (ma) numpy.lib.index_tricks: 0.000 (extras) numpy.lib.polynomial: 0.000 (extras) (The "-1.000" indicates a bug in my instrumentation script, which I worked around with a -1.0 value.) Every numpy program, because it eagerly imports 'ctypeslib' to make it be accessible as a top-level variable, ends up importing ctypes.
if 1: ... t1 = time.time() ... import ctypes ... t2 = time.time() ... t2-t1 0.032159090042114258
That's 10% of the import time.
In my opinion, this assistance is counter to standard practice in effectively every other Python package. I don't see the benefit.
How do you propose we change this?
If I had my way, remove things like (in numpy/__init__.py) import linalg import fft import random import ctypeslib import ma but leave the list of submodules in "__all__" so that "from numpy import *" works. Perhaps add a top-level function to 'import_all()' which mimics the current behavior, and have iPython know about it so interactive users get it automatically. Or something like that. Yes, I know the numpy team won't change this behavior. I want to know why you all will consider changing. Something more concrete: change the top-level definitions in 'numpy' from from testing import Tester test = Tester().test bench = Tester().bench with def test(label='fast', verbose=1, extra_argv=None, doctests=False, coverage=False, **kwargs): from testing import Tester Tester.test(label, verbose, extra_argv, doctests, coverage, **kwargs and do something similar for 'bench'. Note that numpy currently implements numpy.test <-- this is a Tester().test numpy.testing.test <-- another Tester().test bound method so there's some needless and distracting, but extremely minor, duplication.
Getting rid of these functions, and thus getting rid of the import speeds numpy startup time by 3.5%.
While I appreciate you taking the time to find these niggles, but we are short on developer time as it is. Asking them to spend their precious time on making a 3.5% improvement in startup time does not make much sense. If you provide a patch, on the other hand, it would only take a matter of seconds to decide whether to apply or not. You've already done most of the sleuth work.
I wrote that I don't know the reasons for why the design was as it is. Are those functions ("english_upper", "english_lower", "english_capitalize") expected as part of the public interface for the module? The lack of a "_" prefix and their verbose docstrings implies that they are for general use. In that case, they can't easily be gotten rid of. Yet it doesn't make sense for them to be part of 'numerictypes'. Why would I submit a patch if there's no way those definitions will disappear, for reasons I am not aware of? I am not asking you all to make these changes. I'm asking about how much change is acceptable, what are the restrictions, and why are they there? I also haven't yet figured out how to get the regression tests to run, and I'm not going to contribute patches without at least passing that bare minimum. BTW, how do I do that? In the top-level there's a 'test.sh' command but when I run it I get: % mkdir tmp % bash test.sh Running from numpy source directory. Traceback (most recent call last): File "setupscons.py", line 56, in <module> raise DistutilsError('\n'.join(msg)) distutils.errors.DistutilsError: You cannot build numpy with scons without the numscons package (Failure was: No module named numscons) test.sh: line 11: cd: /Users/dalke/cvses/numpy/tmp: No such file or directory and when I run 'nosetests' in the top-level directory I get: ImportError: Error importing numpy: you should not try to import numpy from its source directory; please exit the numpy source tree, and relaunch your python intepreter from there. I couldn't find (in a cursory search) instructions for running self- tests or regression tests.
I could probably get another 0.05 seconds if I dug around more, but I can't without knowing what use case numpy is trying to achieve. Why are all those ancillary modules (testing, ctypeslib) eagerly loaded when there seems no need for that feature?
Need is relative. You need fast startup time, but most of our users need quick access to whichever functions they want (and often use from an interactive terminal). I agree that "testing" and "ctypeslib" do not belong in that category, but they don't seem to do much harm either.
If there is no need for those features then I'll submit a patch to remove them. There is some need, and there are many ways to handle that need. The current solution in numpy is to import everything. Again I ask, does *everything* (like 'testing' and 'ctypeslib') need to be imported eagerly? In your use case of user-driven exploratory development the answer is no - the users described above rarely desire access to those package because those packages are best used in automated environments. Eg, why write tests which are only used once? Andrew dalke@dalkescientific.com
On Thu, 2008-07-31 at 02:07 +0200, Andrew Dalke wrote:
On Jul 30, 2008, at 10:59 PM, Stéfan van der Walt wrote:
I.e. most people don't start up NumPy all the time -- they import NumPy, and then do some calculations, which typically take longer than the import time.
Is that interactively, or is that through programs?
Most people use it interactively, or for long running programs. Import times only matters for interactive commands depending on numpy.
and I am not saying to change this code. Instead, I am asking for limits on the eagerness, with a long-term goal of minimizing its use.
For new API, this is never done, and is a bug if it is. In scipy, typically, import scipy does not import the whole subpackages list.
I also haven't yet figured out how to get the regression tests to run, and I'm not going to contribute patches without at least passing that bare minimum. BTW, how do I do that? In the top-level there's a 'test.sh' command but when I run it I get:
Argh, this file should have never ended here, that's entirely my fault. It was a merge from a (at the time) experimental branch. I can't remove it now because my company does not allow subversion access, but I will fix this tonight. Sorry for the confusion.
and when I run 'nosetests' in the top-level directory I get:
ImportError: Error importing numpy: you should not try to import numpy from its source directory; please exit the numpy source tree, and relaunch your python intepreter from there.
I couldn't find (in a cursory search) instructions for running self- tests or regression tests.
You are supposed to run the tests on an installed numpy, not in the sources: import numpy numpy.test(verbose = 10) You can't really use run numpy without it to be installed first (which is what the message is about). cheers, David
On Jul 31, 2008, at 3:53 AM, David Cournapeau wrote:
You are supposed to run the tests on an installed numpy, not in the sources:
import numpy numpy.test(verbose = 10)
Doesn't that make things more cumbersome to test? That is, if I were to make a change I would need to: - python setup.py build (to put the code into the build/* subdirectory) - cd the build directory, or switch to a terminal which was already there - manually do the import/test code you wrote, or a write two-line program for it I would rather do 'nosetests' in the source tree, if at all feasible, although that might only be possible for the Python source. Hmm. And it looks like testing/nosetester.py (which implements the 'test' function above) is meant to make it easier to run nose, except my feeling is the extra level of wrapping makes things more complicated. The nosetest command-line appears to be more flexible, with support for, for examples, dropping into the debugger on errors, and reseting the coverage test files. I'm speaking out of ignorance, btw. Cheers, Andrew dalke@dalkescientific.com
On Thu, 2008-07-31 at 08:12 +0200, Andrew Dalke wrote:
On Jul 31, 2008, at 3:53 AM, David Cournapeau wrote:
You are supposed to run the tests on an installed numpy, not in the sources:
import numpy numpy.test(verbose = 10)
Doesn't that make things more cumbersome to test? That is, if I were to make a change I would need to: - python setup.py build (to put the code into the build/* subdirectory) - cd the build directory, or switch to a terminal which was already there - manually do the import/test code you wrote, or a write two-line program for it
Yes. Nothing that an easy make file cannot solve, nonetheless (I am sure I am not the only one with a makefile/script which automates the above, to test a new svn updated numpy in one command). The problem is that it is difficult to support running uninstalled packages, in particular because of compiled code (distutils/setuptools have a develop mode to make this possible, though). Distutils put the build code in build directory, and the correct tree is built at install time.
I would rather do 'nosetests' in the source tree, if at all feasible, although that might only be possible for the Python source.
Yes but how do you do that ? You would do import scipy in an svn checkout, and the C extensions would be the ones installed ? That sounds like a nightmare from a reliability POV. There was a related discussion (using scipy wo installing it) on scipy ML, BTW: http://projects.scipy.org/pipermail/scipy-user/2008-July/017678.html cheers, David
On Thu, Jul 31, 2008 at 03:41:15PM +0900, David Cournapeau wrote:
Yes. Nothing that an easy make file cannot solve, nonetheless (I am sure I am not the only one with a makefile/script which automates the above, to test a new svn updated numpy in one command).
That's why distutils have a test target. You can do "python setup.py test", and if you have setup you setup.py properly it should work (obviously it is easy to make this statement, and harder to get the thing working). Gaël
Gael Varoquaux wrote:
That's why distutils have a test target. You can do "python setup.py test", and if you have setup you setup.py properly it should work (obviously it is easy to make this statement, and harder to get the thing working).
I have already seen some discussion about distutils like this, if you mean something like this: http://blog.ianbicking.org/pythons-makefile.html but I would take with rake and make over this anytime. I just don't understand why something like rake does not exist in python, but well, let's not go there. David
On Thu, Jul 31, 2008 at 11:05:33PM +0900, David Cournapeau wrote:
Gael Varoquaux wrote:
That's why distutils have a test target. You can do "python setup.py test", and if you have setup you setup.py properly it should work (obviously it is easy to make this statement, and harder to get the thing working).
I have already seen some discussion about distutils like this, if you mean something like this:
but I would take with rake and make over this anytime. I just don't understand why something like rake does not exist in python, but well, let's not go there.
Well, actually, in the enthought tools suite we use setuptools for packaging (I don't want to start a controversy, I am not advocating the use of setuptools, just stating a fact) and nose for testing, and getting "setup.py test" to wrok, including do the build test and download nose if not there, is a matter of addig those two lines to the setup.py: tests_require = [ 'nose >= 0.10.3', ], test_suite = 'nose.collector', Obviously, the build part has to be well-tuned for the machinery to work, but there is a lot of value here. Gaël
On Thu, Jul 31, 2008 at 11:16:12PM +0900, David Cournapeau wrote:
Gael Varoquaux wrote:
Obviously, the build part has to be well-tuned for the machinery to work, but there is a lot of value here.
Ah yes, setuptools does have this. But this is specific to setuptools, bare distutils does not have this test command, right ?
Dunno, sorry. The scale of my ignore of distutils and related subjects would probably impress you :). Gaël, looking forward to your tutorial on scons.
On Thu, Jul 31, 2008 at 01:12, Andrew Dalke <dalke@dalkescientific.com> wrote:
On Jul 31, 2008, at 3:53 AM, David Cournapeau wrote:
You are supposed to run the tests on an installed numpy, not in the sources:
import numpy numpy.test(verbose = 10)
Doesn't that make things more cumbersome to test? That is, if I were to make a change I would need to: - python setup.py build (to put the code into the build/* subdirectory) - cd the build directory, or switch to a terminal which was already there - manually do the import/test code you wrote, or a write two-line program for it
Developers can build_ext --inplace and frequently use nosetests anyways. numpy.test() is now primarily for users who are trying to see if their installation worked (or gathering requested information for the people on this list to help them troubleshoot) need to test the installed numpy. Note that we are *just* now transitioning to using nosetests for the development version of numpy. It used to be (through the 1.1.x releases) that we had our own test collection code inside numpy. numpy.test() was *necessary* in those releases. By now, we have most of the denizens here trained to do numpy.test() when testing their new installations. Maybe in 1.3, we'll remove it in favor of just having people use the nosetests command. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco
On Thu, Jul 31, 2008 at 2:12 AM, Andrew Dalke <dalke@dalkescientific.com> wrote:
Hmm. And it looks like testing/nosetester.py (which implements the 'test' function above) is meant to make it easier to run nose, except my feeling is the extra level of wrapping makes things more complicated. The nosetest command-line appears to be more flexible, with support for, for examples, dropping into the debugger on errors, and reseting the coverage test files.
You can actually pass those sorts of options to nose through the extra_argv parameter in test(). That might be a little cumbersome, but (as far as I know) it's something I'm going to do so infrequently it's not a big deal.
2008/7/31 Andrew Dalke <dalke@dalkescientific.com>:
Numpy has a very flat namespace, for better or worse, which implies many imports.
I don't get the feeling that numpy is flat. Python's stdlib is flat. Numpy has many 2- and 3-level modules.
With 500+ functions in the root namespace, I'd call numpy flat.
If I had my way, remove things like (in numpy/__init__.py)
import linalg import fft import random import ctypeslib import ma
but leave the list of submodules in "__all__" so that "from numpy import *" works. Perhaps add a top-level function to 'import_all()' which mimics the current behavior, and have iPython know about it so interactive users get it automatically. Or something like that.
Yes, I know the numpy team won't change this behavior. I want to know why you all will consider changing.
Maybe when we're convinced that there is a lot to be gained from making such a change. From my perspective, it doesn't look good: I) Major code breakage II) Confused users III) More difficult function discovery for beginners vs. I) Slight improvement in startup speed.
Getting rid of these functions, and thus getting rid of the import speeds numpy startup time by 3.5%.
While I appreciate you taking the time to find these niggles, but we are short on developer time as it is. Asking them to spend their precious time on making a 3.5% improvement in startup time does not make much sense. If you provide a patch, on the other hand, it would only take a matter of seconds to decide whether to apply or not. You've already done most of the sleuth work.
I wrote that I don't know the reasons for why the design was as it is. Are those functions ("english_upper", "english_lower", "english_capitalize") expected as part of the public interface for the module? The lack of a "_" prefix and their verbose docstrings implies that they are for general use. In that case, they can't easily be gotten rid of. Yet it doesn't make sense for them to be part of 'numerictypes'.
Anything underneath numpy.core that is not exposed as numpy.something is not for public consumption. Stéfan
On Thu, Jul 31, 2008 at 04:42, Stéfan van der Walt <stefan@sun.ac.za> wrote:
2008/7/31 Andrew Dalke <dalke@dalkescientific.com>:
I wrote that I don't know the reasons for why the design was as it is. Are those functions ("english_upper", "english_lower", "english_capitalize") expected as part of the public interface for the module? The lack of a "_" prefix and their verbose docstrings implies that they are for general use. In that case, they can't easily be gotten rid of. Yet it doesn't make sense for them to be part of 'numerictypes'.
Anything underneath numpy.core that is not exposed as numpy.something is not for public consumption.
That said, the reason those particular docstrings are verbose is because I wanted people to know why those functions exist there (e.g. "This is an internal utility function...."). But you still can't remove them since they are being used inside numerictypes. That's why I labeled them "internal utility functions" instead of leaving them with minimal docstrings such that you would have to guess. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco
On Jul 31, 2008, at 12:03 PM, Robert Kern wrote:
That said, the reason those particular docstrings are verbose is because I wanted people to know why those functions exist there (e.g. "This is an internal utility function....").
Err, umm, you mean that first line of the second paragraph in the docstring? *blush*
But you still can't remove them since they are being used inside numerictypes. That's why I labeled them "internal utility functions" instead of leaving them with minimal docstrings such that you would have to guess.
My proposal is to replace that code with a table mapping the type name to the uppercase/lowercase/capitalized forms, thus eliminating the (small) amount of time needed to import string. It makes adding new types slightly more difficult. I know it's a tradeoff. In this case it's somewhat like the proverbial New Jersey approach vs. the MIT one. The code that's there is the right way to solve the problem in the general case, but solving the specific problem can be punted, and as a result the code is (slightly) faster. Other parts of the code are like that, which is why I pointed out so many examples. Startup performance has not been a numpy concern. It a concern for me, and it has been (for other packages) a concern for some of my clients. Andrew dalke@dalkescientific.com
On Thu, Jul 31, 2008 at 12:43:17PM +0200, Andrew Dalke wrote:
Startup performance has not been a numpy concern. It a concern for me, and it has been (for other packages) a concern for some of my clients.
I am curious, if startup performance is a problem, I guess it is because you are running lots of little scripts where startup time is big compared to run time. Did you think of forking them from an already started process. I had this same problem (with libraries way slower than numpy to load) and used os.fork to a great success. Gaël
On Thu, Jul 31, 2008 at 10:14 AM, Gael Varoquaux < gael.varoquaux@normalesup.org> wrote:
On Thu, Jul 31, 2008 at 12:43:17PM +0200, Andrew Dalke wrote:
Startup performance has not been a numpy concern. It a concern for me, and it has been (for other packages) a concern for some of my clients.
I am curious, if startup performance is a problem, I guess it is because you are running lots of little scripts where startup time is big compared to run time. Did you think of forking them from an already started process. I had this same problem (with libraries way slower than numpy to load) and used os.fork to a great success.
Start up time is an issue for me, but in a larger sense than just numpy. I do run many scripts, some that are ephemeral and some that take significant amounts of time. However, numpy is just one of many many libraries that I must import, so improvements, even minor ones, are appreciated. The morale of this discussion, for me, is that just because _you_ don't care about a particular aspect or feature, doesn't mean that others don't or shouldn't. Your workarounds may not be viable for me and vice-versa. So let's just go with the spirit of open source and encourage those motivated to controbute to do so, provided their suggestions are sensible and do not break code. -Kevin
On Thu, Jul 31, 2008 at 10:34:04AM -0400, Kevin Jacobs <jacobs@bioinformed.com> wrote:
The morale of this discussion, for me, is that just because _you_ don't care about a particular aspect or feature, doesn't mean that others don't or shouldn't. Your workarounds may not be viable for me and vice-versa. So let's just go with the spirit of open source and encourage those motivated to controbute to do so, provided their suggestions are sensible and do not break code.
I fully agree ehre. And if people improve numpy's startup time with breaking or obfuscating stuff, I am very happy. I was just trying to help :). Yes, the value of open source is that different people improve the same tools to meet different goals, thus we should always keep on open ear to other people's requirements, especially if they come up with high-quality code. Gaël
On Thu, Jul 31, 2008 at 05:43, Andrew Dalke <dalke@dalkescientific.com> wrote:
On Jul 31, 2008, at 12:03 PM, Robert Kern wrote:
But you still can't remove them since they are being used inside numerictypes. That's why I labeled them "internal utility functions" instead of leaving them with minimal docstrings such that you would have to guess.
My proposal is to replace that code with a table mapping the type name to the uppercase/lowercase/capitalized forms, thus eliminating the (small) amount of time needed to import string.
It makes adding new types slightly more difficult.
I know it's a tradeoff.
Probably not a bad one. Write up the patch, and then we'll see how much it affects the import time. I would much rather that we discuss concrete changes like this rather than rehash the justifications of old decisions. Regardless of the merits about the old decisions (and I agreed with your position at the time), it's a pointless and irrelevant conversation. The decisions were made, and now we have a user base to whom we have promised not to break their code so egregiously again. The relevant conversation is what changes we can make now. Some general guidelines: 1) Everything exposed by "from numpy import *" still needs to work. a) The layout of everything under numpy.core is an implementation detail. b) _underscored functions and explicitly labeled internal functions can probably be modified. c) Ask about specific functions when in doubt. 2) The improvement in import times should be substantial. Feel free to bundle up the optimizations for consideration. 3) Moving imports from module-level down into the functions where they are used is generally okay if we get a reasonable win from it. The local imports should be commented, explaining that they are made local in order to improve the import times. 4) __import__ hacks are off the table. 5) Proxy objects ... I would really like to avoid proxy objects. They have caused fragility in the past. 6) I'm not a fan of having environment variables control the way numpy gets imported, but I'm willing to consider it. For example, I might go for having proxy objects for linalg et al. *only* if a particular environment variable were set. But there had better be a very large improvement in import times. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco
On Fri, Aug 1, 2008 at 5:02 AM, Robert Kern <robert.kern@gmail.com> wrote:
5) Proxy objects ... I would really like to avoid proxy objects. They have caused fragility in the past.
One recurrent problem around import times optimization is that it is some work to improve it, but it takes one line to destroy it all. For example, inspect import came back, and this alone is ~10-15 % of my import time on mac os x (from ~ 180 to ~160). This would be the main advantage of lazy import; but does it really worth the trouble, since it brings some complexity as you mentionned last time we had this discussion ? Maybe a simple test script to check for known costly import would be enough (running from time to time ?). Maybe ctypes can be loaded "in the fly", too. Those are the two obvious hotspot ( ~ 25 % altogether). with a recent SVN checkout
6) I'm not a fan of having environment variables control the way numpy gets imported, but I'm willing to consider it. For example, I might go for having proxy objects for linalg et al. *only* if a particular environment variable were set. But there had better be a very large improvement in import times.
linalg does not seem to have a huge impact. It is typically much faster to load than ctypeslib or inspect. cheers, David
On Jul 31, 2008, at 10:02 PM, Robert Kern wrote:
1) Everything exposed by "from numpy import *" still needs to work.
Does that include numpy.Tester? I don't mean numpy.test() nor numpy.bench(). Does that include numpy.PackageLoader? I don't mean numpy.pkgload.
2) The improvement in import times should be substantial. Feel free to bundle up the optimizations for consideration.
Okay, wasn't sure if bundle or small independent ones were best. I tried using Trac to submit a small patch. It's a big hassle to do for a two line patch.
5) Proxy objects ... I would really like to avoid proxy objects. They have caused fragility in the past.
Understood and agreed. Andrew dalke@dalkescientific.com
On Thu, Jul 31, 2008 at 16:09, Andrew Dalke <dalke@dalkescientific.com> wrote:
On Jul 31, 2008, at 10:02 PM, Robert Kern wrote:
1) Everything exposed by "from numpy import *" still needs to work.
Does that include numpy.Tester? I don't mean numpy.test() nor numpy.bench().
Does that include numpy.PackageLoader? I don't mean numpy.pkgload.
Probably not. I would consider those to be implementation details that got left in rather than a deliberate API exposure. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco
I've got a proof of concept that take the time on my machine to "import numpy" from 0.21 seconds down to 0.08 seconds. Doing that required some somewhat awkward things, like deferring all 'import re' statements. I don't think that's stable in the long run because people will blithely import re in the future and not care that it takes 0.02 seconds to import. I don't blame them for complaining; I was curious on how fast I could get things. Note that when I started complaining about this a month ago the import time on my machine was about 0.3 seconds. I'll work on patches within the next couple of days. Here's an outline of what I did, along with some questions about what's feasible. 1) don't import 'numpy.testing'. Savings = 0.012s. Doing so required patches like -from numpy.testing import Tester -test = Tester().test -bench = Tester().bench +def test(label='fast', verbose=1, extra_argv=None, doctests=False, + coverage=False, **kwargs): + from testing import Tester + import numpy + Tester(numpy).test(label, verbose, extra_argv, doctests, + coverage, **kwargs) +def bench(label='fast', verbose=1, extra_argv=None): + from testing import Tester + import numpy + Tester(numpy).bench(label, verbose, extra_argv) QUESTION: since numpy is moving to nose, and the documentation only describes doing 'import numpy; numpy.test()', can I remove all other definitions of "test" and "bench"? 2) removing 'import ctypeslib' in top-level -> 0.023 seconds QUESTION: is this considered part of the API that must be preserved? The primary use case is supposed to be to help interactive users. I don't think interactive users spend much time using ctypes, and those that do are also those that aren't confused about needing an extra import statement. 3) removing 'import string' in numerictypes.py -> 0.008 seconds . This requires some ugly but simple changes to the code. 4) remove the 'import re' in _internal, numpy/lib/, function_base, and other places. This reduced my overall startup cost by 0.013. 5) defer bzip and gzip imports in _datasource: 0.009 s. This will require non-trivial code changes. 6) defer 'format' from io.py: 0.007 s 7) _datasource imports shutil in order to use shutil.rmdir in a __del__. I don't think this can be deferred, because I don't want to do an import during system shutdown, which is when the __del__ might be called. It would save 0.004s. 8) If I can remove 'import doc' from the top-level numpy (is that part of the required API?) then I can save 0.004s. 9) defer urlparse in _datasource: about 0.003s 10) If I get rid of the cPickle top-level numeric.py then I can save 0.006 seconds. 11) not importing add_newdocs saves 0.005 s. This might be possible by moving all of the docstrings to the actual functions. I haven't looked into this much and it might not be possible. Those millisecond improvements add up! When I do an interactive 'import numpy' on my system I don't notice the import time like I did before. Andrew dalke@dalkescientific.com Andrew dalke@dalkescientific.com
On Thu, Jul 31, 2008 at 10:02 PM, Robert Kern <robert.kern@gmail.com> wrote:
On Thu, Jul 31, 2008 at 05:43, Andrew Dalke <dalke@dalkescientific.com> wrote:
On Jul 31, 2008, at 12:03 PM, Robert Kern wrote:
But you still can't remove them since they are being used inside numerictypes. That's why I labeled them "internal utility functions" instead of leaving them with minimal docstrings such that you would have to guess.
My proposal is to replace that code with a table mapping the type name to the uppercase/lowercase/capitalized forms, thus eliminating the (small) amount of time needed to import string.
It makes adding new types slightly more difficult.
I know it's a tradeoff.
Probably not a bad one. Write up the patch, and then we'll see how much it affects the import time.
I would much rather that we discuss concrete changes like this rather than rehash the justifications of old decisions. Regardless of the merits about the old decisions (and I agreed with your position at the time), it's a pointless and irrelevant conversation. The decisions were made, and now we have a user base to whom we have promised not to break their code so egregiously again. The relevant conversation is what changes we can make now.
Some general guidelines:
1) Everything exposed by "from numpy import *" still needs to work. a) The layout of everything under numpy.core is an implementation detail. b) _underscored functions and explicitly labeled internal functions can probably be modified. c) Ask about specific functions when in doubt.
2) The improvement in import times should be substantial. Feel free to bundle up the optimizations for consideration.
3) Moving imports from module-level down into the functions where they are used is generally okay if we get a reasonable win from it. The local imports should be commented, explaining that they are made local in order to improve the import times.
4) __import__ hacks are off the table.
5) Proxy objects ... I would really like to avoid proxy objects. They have caused fragility in the past.
6) I'm not a fan of having environment variables control the way numpy gets imported, but I'm willing to consider it. For example, I might go for having proxy objects for linalg et al. *only* if a particular environment variable were set. But there had better be a very large improvement in import times.
I just want to say that I agree with Andrew that slow imports just suck. But it's not really that bad, for example on my system: In [1]: %time import numpy CPU times: user 0.11 s, sys: 0.01 s, total: 0.12 s Wall time: 0.12 s so that's ok. For comparison: In [1]: %time import sympy CPU times: user 0.12 s, sys: 0.02 s, total: 0.14 s Wall time: 0.14 s But I am still unhappy about it, I'd like if the package could import much faster, because it adds up, when you need to import 7 packages like that, it's suddenly 1s and that's just too much. But of course everything within the constrains that Robert has outlined. From the theoretical point of view, I don't understand why python cannot just import numpy (or any other package) immediatelly, and only at the moment the user actually access something, to import it in real. Mercurial uses a lazy import module, that does exactly this. Maybe that's an option? Look into mercurial/demandimport.py. Use it like this: In [1]: import demandimport In [2]: demandimport.enable() In [3]: %time import numpy CPU times: user 0.00 s, sys: 0.00 s, total: 0.00 s Wall time: 0.00 s That's pretty good, huh? :) Unfortunately, numpy cannot work with lazy import (yet): In [5]: %time from numpy import array ERROR: An unexpected error occurred while tokenizing input The following traceback may be corrupted or invalid The error message is: ('EOF in multi-line statement', (17, 0)) --------------------------------------------------------------------------- AttributeError Traceback (most recent call last) [skip] /usr/lib/python2.5/site-packages/numpy/lib/index_tricks.py in <module>() 14 import function_base 15 import numpy.core.defmatrix as matrix ---> 16 makemat = matrix.matrix 17 18 # contributed by Stefan van der Walt /home/ondra/ext/sympy/demandimport.pyc in __getattribute__(self, attr) 73 return object.__getattribute__(self, attr) 74 self._load() ---> 75 return getattr(self._module, attr) 76 def __setattr__(self, attr, val): 77 self._load() AttributeError: 'module' object has no attribute 'matrix' BTW, neither can SymPy. However, maybe it shows some possibilities and maybe it's possible to fix numpy to work with such a lazy import. On the other hand, I can imagine it can bring a lot more troubles, so it should probably only be optional. Ondrej
On Sat, Aug 2, 2008 at 5:33 AM, Ondrej Certik <ondrej@certik.cz> wrote:
But I am still unhappy about it, I'd like if the package could import much faster, because it adds up, when you need to import 7 packages like that, it's suddenly 1s and that's just too much.
Too much for what ? We need more information on the kind of things people who complaing about numpy startup cost are doing. I suggested lazy import a few weeks ago when this discussion started (with the example of bzr instead of hg), but I am less convinced that it would be that useful, because numpy is fundamentally different than bzr/hg. As robert said, it would bring some complexity, and in an area where python is already "fishy". When you import numpy, you expect some core things to be available, and they are the ones who take the most time. In bzr/hg, you use a *program*, and you can relatively easily change the API because not many people use it. But numpy is essentially an API, not a tool, so we don't have this freedom. Also, it means it is relatively easy for bzr/hg developers to control lazy import ,because they are the users, and users of bzr/hg don't deal with python directly. If our own lazy import has some bugs, it will impact many people who will not be able to trace it. The main advantage I see with lazy imports is that it avoids someone else from breaking the speed-up work by re-importing globally a costly package.
But of course everything within the constrains that Robert has outlined. From the theoretical point of view, I don't understand why python cannot just import numpy (or any other package) immediatelly, and only at the moment the user actually access something, to import it in real.
I guess because it would be complex to do everywhere while keeping all the semantics of python import. Also, like everything "lazy", it means it is more complicated to follow what's happening. Your examples show that it would be complex to do. As I see it, there are some things in numpy we could do a bit differently to cut significantly import times (a few ten ), without changing much. Let's try that first.
Mercurial uses a lazy import module, that does exactly this. Maybe that's an option?
Note that mercurial is under the GPL :) cheers, David
On Jul 31, 2008, at 11:42 AM, Stéfan van der Walt wrote:
Maybe when we're convinced that there is a lot to be gained from making such a change. From my perspective, it doesn't look good:
I) Major code breakage II) Confused users III) More difficult function discovery for beginners
I'm not asking for a change. I fully realize this. I happen to think it's a mistake and there are other ways to have addressed the underlying requirement, but I know that's not going to change. (For example, follow matplotlib approach where there's a special library designed to be imported in interactive use. But I am *not* proposing this change.) I point out that this make numpy different than most other Python packages. Had this not been done then I) would not be a problem, II) is I think a wash, because people starting with numpy will still wonder why
import PIL PIL.Image Traceback (most recent call last): File "<stdin>", line 1, in <module> AttributeError: 'module' object has no attribute 'Image' import PIL.Image PIL.Image <module 'PIL.Image' from '/Library/Frameworks/Python.framework/ Versions/2.5/lib/python2.5/site-packages/PIL/Image.pyc'>
and
import xml xml.etree Traceback (most recent call last): File "<stdin>", line 1, in <module> AttributeError: 'module' object has no attribute 'etree' from xml import etree xml.etree <module 'xml.etree' from '/Library/Frameworks/Python.framework/ Versions/2.5/lib/python2.5/xml/etree/__init__.pyc'>
occur. III) assumes there couldn't have been other solutions. And it assumes that the difficulties are large, which I haven't seen in my experience.
I) Slight improvement in startup speed.
The user base for numpy might be .. 10,000 people? 100,000 people? Let's go with the latter, and assume that with command-line scripts, CGI scripts, and the other programs that people write in order to help do research means that numpy is started on average 10 times a day. 100,000 people * 10 times / day * 0.1 seconds per startup = almost 28 people-hours spent each day waiting for numpy to start. I'm willing to spend a few days to achieve that. Perhaps there's fewer people than I'm estimating. OTOH, perhaps there are more imports of numpy per day. An order of magnitude less time is still a couple of hours each day as the world waits to import all of the numpy libraries. If on average people import numpy 10 times a day and it could be made 0.1 seconds faster then that's 1 second per person per day. If it takes on average 5 minutes to learn to import the module directly and the onus is all on numpy, then after 1 year of use the efficiency has made up for it, and the benefits continue to grow. Slight improvements add up when multiplied by everyone. The goals of numpy when it started aren't going to be the same as when it's a mature, widely used and deployed package.
Andrew dalke@dalkescientific.com
2008/7/31 Andrew Dalke <dalke@dalkescientific.com>:
The user base for numpy might be .. 10,000 people? 100,000 people? Let's go with the latter, and assume that with command-line scripts, CGI scripts, and the other programs that people write in order to help do research means that numpy is started on average 10 times a day.
100,000 people * 10 times / day * 0.1 seconds per startup = almost 28 people-hours spent each day waiting for numpy to start.
I don't buy that argument. No single person is agile enough to do anything useful in the half a second or so it takes to start up NumPy. No one is *waiting* for NumPy to start. Just by answering this e-mail I could have (and maybe should have) started NumPy three hundred and sixty times. I don't want to argue about this, though. Write the patches, file a ticket, and hopefully someone will deem them important enough to apply them. Stéfan
Hi All, I've been reading this discussion with interest. I would just to highlight an alternate use of numpy to interactive use. We have a cluster of machines which process tasks on an individual basis where a master tasks may spawn 600 slave tasks to be processed. These tasks are spread across the cluster and processed as scripts in a individual python thread. Although reducing the process time by 300 seconds for the master task is only about a 1.5% speedup (total time can be i excess of 24000s). We process large number of these tasks in any given year and every little helps! Hanni 2008/7/31 Stéfan van der Walt <stefan@sun.ac.za>
2008/7/31 Andrew Dalke <dalke@dalkescientific.com>:
The user base for numpy might be .. 10,000 people? 100,000 people? Let's go with the latter, and assume that with command-line scripts, CGI scripts, and the other programs that people write in order to help do research means that numpy is started on average 10 times a day.
100,000 people * 10 times / day * 0.1 seconds per startup = almost 28 people-hours spent each day waiting for numpy to start.
I don't buy that argument. No single person is agile enough to do anything useful in the half a second or so it takes to start up NumPy. No one is *waiting* for NumPy to start. Just by answering this e-mail I could have (and maybe should have) started NumPy three hundred and sixty times.
I don't want to argue about this, though. Write the patches, file a ticket, and hopefully someone will deem them important enough to apply them.
Stéfan _______________________________________________ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
On Thu, Jul 31, 2008 at 7:31 AM, Hanni Ali <hanni.ali@gmail.com> wrote:
I would just to highlight an alternate use of numpy to interactive use. We have a cluster of machines which process tasks on an individual basis where a master tasks may spawn 600 slave tasks to be processed. These tasks are spread across the cluster and processed as scripts in a individual python thread. Although reducing the process time by 300 seconds for the master task is only about a 1.5% speedup (total time can be i excess of 24000s). We process large number of these tasks in any given year and every little helps!
There are other components of NumPy/SciPy that are more worthy of optimization. Given that programmer time is a scarce resource, it's more sensible to direct our efforts towards making the other 98.5% of the computation faster. /law of diminishing returns -- Nathan Bell wnbell@gmail.com http://graphics.cs.uiuc.edu/~wnbell/
Nathan Bell wrote:
There are other components of NumPy/SciPy that are more worthy of optimization. Given that programmer time is a scarce resource, it's more sensible to direct our efforts towards making the other 98.5% of the computation faster.
To be fair, when I took a look at the problem last month, it took a few of us (Robert and me IIRC) maximum 2 man hours altogether to divide by two numpy import times on linux, without altering at all the API. Maybe there are more things which can be done to get to a more 'flat' profile. cheers, David
On Thu, Jul 31, 2008 at 07:46:20AM -0500, Nathan Bell wrote:
On Thu, Jul 31, 2008 at 7:31 AM, Hanni Ali <hanni.ali@gmail.com> wrote:
I would just to highlight an alternate use of numpy to interactive use. We have a cluster of machines which process tasks on an individual basis where a master tasks may spawn 600 slave tasks to be processed. These tasks are spread across the cluster and processed as scripts in a individual python thread. Although reducing the process time by 300 seconds for the master task is only about a 1.5% speedup (total time can be i excess of 24000s). We process large number of these tasks in any given year and every little helps!
There are other components of NumPy/SciPy that are more worthy of optimization. Given that programmer time is a scarce resource, it's more sensible to direct our efforts towards making the other 98.5% of the computation faster.
This is true in general, but I have a different use case for one of my programs that uses numpy on a cluster. Basically, the program gets called thousands of times per day and the runtime for each is only a second or two. In this case I am much more dominated by numpy's import time. Scott PS: Yes, I could change the way that the routine works so that it is called many fewer times, however, that would be very difficult (although not impossible). A "free" speedup due to faster numpy import would be very nice. -- Scott M. Ransom Address: NRAO Phone: (434) 296-0320 520 Edgemont Rd. email: sransom@nrao.edu Charlottesville, VA 22903 USA GPG Fingerprint: 06A9 9553 78BE 16DB 407B FFCA 9BFA B6FF FFD3 2989
Stéfan van der Walt wrote:
No one is *waiting* for NumPy to start.
I am, and probably 10 times, a day, yes. And it's a major issue for CGI, though maybe no one's using that anymore anyway.
Just by answering this e-mail I could have (and maybe should have) started NumPy three hundred and sixty times.
sure, but I like wasting my time on mailing lists.... -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov
On Thu, Jul 31, 2008 at 5:36 AM, Andrew Dalke <dalke@dalkescientific.com> wrote:
The user base for numpy might be .. 10,000 people? 100,000 people? Let's go with the latter, and assume that with command-line scripts, CGI scripts, and the other programs that people write in order to help do research means that numpy is started on average 10 times a day.
100,000 people * 10 times / day * 0.1 seconds per startup = almost 28 people-hours spent each day waiting for numpy to start.
I'm willing to spend a few days to achieve that.
Perhaps there's fewer people than I'm estimating. OTOH, perhaps there are more imports of numpy per day. An order of magnitude less time is still a couple of hours each day as the world waits to import all of the numpy libraries.
If on average people import numpy 10 times a day and it could be made 0.1 seconds faster then that's 1 second per person per day. If it takes on average 5 minutes to learn to import the module directly and the onus is all on numpy, then after 1 year of use the efficiency has made up for it, and the benefits continue to grow.
Just think of the savings that could be achieved if all 2.1 million Walmart employees were outfitted with colostomy bags. 0.5 hours / day for bathroom breaks * 2,100,000 employees * 365 days/year * $7/hour = $2,682,750,000/year Granted, I'm probably not the first to run these numbers. -- Nathan Bell wnbell@gmail.com http://graphics.cs.uiuc.edu/~wnbell/
Andrew Dalke wrote:
If I had my way, remove things like (in numpy/__init__.py)
import linalg import fft import random import ctypeslib import ma
as a side benefit, this might help folks using py2exe, py2app and friends -- as it stands all those sub-modules need to be included in your app bundle regardless of whether they are used. I recall having to explicitly add them by hand, too, though that may have been a matplotlib.numerix issue.
but leave the list of submodules in "__all__" so that "from numpy import *" works.
Of course, no one should be doing that anyway.... ;-) And for what it's worth, I've found myself very frustrated by how long it takes to start up python and import numpy. I often do whip out the interpreter to do something fast, and I didn't used to have to wait for it. On my OS-X box (10.4.11, python2.5, numpy '1.1.1rc2'), it takes about 7 seconds to import numpy! -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov
Christopher Barker wrote:
On my OS-X box (10.4.11, python2.5, numpy '1.1.1rc2'), it takes about 7 seconds to import numpy!
Hot or cold ? If hot, there is something horribly wrong with your setup. On my macbook, it takes ~ 180 ms to to python -c "import numpy", and ~ 100 ms on linux (same machine). cheers, David
David Cournapeau wrote:
Christopher Barker wrote:
On my OS-X box (10.4.11, python2.5, numpy '1.1.1rc2'), it takes about 7 seconds to import numpy!
Hot or cold ? If hot, there is something horribly wrong with your setup.
hot -- it takes about 10 cold. I've been wondering about that. time python -c "import numpy" real 0m8.383s user 0m0.320s sys 0m7.805s and similar results if run multiple times in a row. Any idea what could be wrong? I have no clue where to start, though I suppose a complete clean out and re-install of python comes to mind. oh, and this is a dual G5 PPC (which should have a faster disk than your Macbook) -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov
On Thu, 31 Jul 2008 10:12:22 -0700 Christopher Barker <Chris.Barker@noaa.gov> wrote:
David Cournapeau wrote:
Christopher Barker wrote:
On my OS-X box (10.4.11, python2.5, numpy '1.1.1rc2'), it takes about 7 seconds to import numpy!
Hot or cold ? If hot, there is something horribly wrong with your setup.
hot -- it takes about 10 cold.
I've been wondering about that.
time python -c "import numpy"
real 0m8.383s user 0m0.320s sys 0m7.805s
and similar results if run multiple times in a row.
Any idea what could be wrong? I have no clue where to start, though I suppose a complete clean out and re-install of python comes to mind.
oh, and this is a dual G5 PPC (which should have a faster disk than your Macbook)
-Chris
No idea, but for comparison time /usr/bin/python -c "import numpy" real 0m0.295s user 0m0.236s sys 0m0.050s nwagner@linux:~/svn/matplotlib> cat /proc/cpuinfo processor : 0 vendor_id : AuthenticAMD cpu family : 6 model : 10 model name : mobile AMD Athlon (tm) 2500+ stepping : 0 cpu MHz : 662.592 cache size : 512 KB fdiv_bug : no hlt_bug : no f00f_bug : no coma_bug : no fpu : yes fpu_exception : yes cpuid level : 1 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 sep mtrr pge mca cmov pat pse36 mmx fxsr sse pni syscall mp mmxext 3dnowext 3dnow bogomips : 1316.57 Nils
On Thu, Jul 31, 2008 at 1:12 PM, Christopher Barker <Chris.Barker@noaa.gov>wrote:
David Cournapeau wrote:
Christopher Barker wrote:
On my OS-X box (10.4.11, python2.5, numpy '1.1.1rc2'), it takes about 7 seconds to import numpy!
Hot or cold ? If hot, there is something horribly wrong with your setup.
hot -- it takes about 10 cold.
I've been wondering about that.
time python -c "import numpy"
real 0m8.383s user 0m0.320s sys 0m7.805s
and similar results if run multiple times in a row.
Any idea what could be wrong? I have no clue where to start, though I suppose a complete clean out and re-install of python comes to mind.
Is only 'import numpy' slow, or other packages import slowly too ? Are there remote directories in your pythonpath ? Do you have old `eggs` in the site-packages directory that point to remote directories (installed with setuptools developp) ? Try cleaning the site-packages directory. That did the trick for me once. David
oh, and this is a dual G5 PPC (which should have a faster disk than your Macbook)
-Chris
-- Christopher Barker, Ph.D. Oceanographer
Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception
Chris.Barker@noaa.gov _______________________________________________ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
hot -- it takes about 10 cold.
I've been wondering about that.
time python -c "import numpy"
real 0m8.383s user 0m0.320s sys 0m7.805s
and similar results if run multiple times in a row.
What does python -c "import sys; print sys.path" say ?
Any idea what could be wrong? I have no clue where to start, though I suppose a complete clean out and re-install of python comes to mind.
oh, and this is a dual G5 PPC (which should have a faster disk than your Macbook)
disk should not matter. If hot, everything should be in the IO buffer, opening a file is of the order of a few micro seconds (that's certainly the order on Linux; the VM on Mac OS X is likely not as good, but still). cheers, David
David Cournapeau wrote:
time python -c "import numpy"
real 0m8.383s user 0m0.320s sys 0m7.805s
What does python -c "import sys; print sys.path" say ?
A lot! 41 entries, and lot's of eggs -- are eggs an issue? I'm also wondering how the order is determined -- if it looked in site-packages first, it would find numpy a whole lot faster. I also tried: python -v -v -c "import numpy" &>junk2.txt which results in: # installing zipimport hook import zipimport # builtin # installed zipimport hook # trying /Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site.so # trying /Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/sitemodule.so # trying /Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site.py ... ... And a LOT more: $ grep "# trying" junk2.txt | wc -l 7446 For comaprison: $ python -v -v -c "import sys" &>junk3.txt $ grep "# trying" junk3.txt | wc -l 618 which still seems like a lot. So I think I've found the problem, it's looking in 7446 places ! but why? I suspect the thing to do is to re-install from scratch, and only add in packages I'm really using now. I wonder if making sure all eggs are unzipped would help, too. Thanks for all your help on what is really OT at this point. -Chris $ python -c "import sys; [sys.stdout.write(p+'\n') for p in sys.path]; print len(sys.path) " /Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site-packages/altgraph-0.6.7-py2.5.egg /Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site-packages/Pyrex-0.9.5.1a-py2.5.egg /Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site-packages/Shapely-1.0-py2.5.egg /Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site-packages/ipython-0.8.2-py2.5.egg /Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site-packages /Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site-packages/setuptools-0.6c8-py2.5.egg /Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site-packages/py2app-0.4.2-py2.5.egg /Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site-packages/modulegraph-0.7.2.dev_r21-py2.5.egg /Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site-packages/PyPubSub-3.0a5-py2.5.egg /Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site-packages/macholib-1.2.1.dev_r23-py2.5.egg /Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site-packages/matplotlib-0.98.0-py2.5-macosx-10.3-fat.egg /Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site-packages/nose-0.10.3-py2.5.egg /Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site-packages/Pylons-0.9.7beta5-py2.5.egg /Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site-packages/Tempita-0.2-py2.5.egg /Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site-packages/WebError-0.8-py2.5.egg /Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site-packages/WebOb-0.9.2-py2.5.egg /Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site-packages/Mako-0.2.0-py2.5.egg /Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site-packages/decorator-2.2.0-py2.5.egg /Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site-packages/simplejson-1.8.1-py2.5-macosx-10.3-ppc.egg /Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site-packages/FormEncode-1.0.1-py2.5.egg /Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site-packages/PasteScript-1.6.3-py2.5.egg /Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site-packages/PasteDeploy-1.3.2-py2.5.egg /Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site-packages/Paste-1.6-py2.5.egg /Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site-packages/Beaker-0.9.5-py2.5.egg /Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site-packages/WebHelpers-0.6dev_20080613-py2.5.egg /Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site-packages/Routes-1.9-py2.5.egg /Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site-packages/bdist_mpkg-0.4.3-py2.5.egg /Users/cbarker/HAZMAT/Hazpy/trunk/lib /usr/local/lib/wxPython-unicode-2.8.8.1/lib/python2.5/site-packages /usr/local/lib/wxPython-unicode-2.8.8.1/lib/python2.5/site-packages/wx-2.8-mac-unicode /Library/Frameworks/Python.framework/Versions/2.5/lib/python25.zip /Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5 /Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/plat-darwin /Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/plat-mac /Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/plat-mac/lib-scriptpackages /Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/lib-tk /Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/lib-dynload /Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site-packages /Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site-packages/PIL /usr/local/lib/wxPython-unicode-2.8.8.1/lib/python2.5 41
Any idea what could be wrong? I have no clue where to start, though I suppose a complete clean out and re-install of python comes to mind.
oh, and this is a dual G5 PPC (which should have a faster disk than your Macbook)
disk should not matter. If hot, everything should be in the IO buffer, opening a file is of the order of a few micro seconds (that's certainly the order on Linux; the VM on Mac OS X is likely not as good, but still).
cheers,
David _______________________________________________ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
-- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov
On Fri, Aug 01, 2008 at 09:18:48AM -0700, Christopher Barker wrote:
What does python -c "import sys; print sys.path" say ?
A lot! 41 entries, and lot's of eggs -- are eggs an issue? I'm also wondering how the order is determined -- if it looked in site-packages first, it would find numpy a whole lot faster.
AFAIK this is a setuptools issue. From what I hear, it might be fixed in the svn version of setuptools, but they still have to make a release that has this feature. The two issues I can see are: import path priority, it should be screwed up like it is, and speed. Speed is obviously a hard problem.
I suspect the thing to do is to re-install from scratch, and only add in packages I'm really using now.
Avoid eggs if you can. This has been my policy. I am not sure how much this is just superstition or a real problem, though. I realize that you are on mac, and that mac unlike some distribution of linux does not have a good dependency tracking system. Thus seutptools and eggs are a great tentation. Them come to a cost, but it can probably be improved. If you care about this problem, you could try and work with the setuptools developers to improve the situation. I must say that I am under UBuntu, and I don't have the dependency problem at all, so setuptools does not answer an important need for me. I however realize that not everybody wants to use Ubuntu and I thus care about the problem, maybe not enough to invest much time in setuptools, but at least enough to try to report problems and track solution. Do not underestimate how difficult it is to get a package-manager that works well. If you ever do verify that it is indeed eggs that I slowing down your import, I'd be interested in having the confirmation, just so that I am sure I am not blaming them for nothing. Cheers, Gaël
On Fri, Aug 1, 2008 at 11:53, Gael Varoquaux <gael.varoquaux@normalesup.org> wrote:
On Fri, Aug 01, 2008 at 09:18:48AM -0700, Christopher Barker wrote:
What does python -c "import sys; print sys.path" say ?
A lot! 41 entries, and lot's of eggs -- are eggs an issue? I'm also wondering how the order is determined -- if it looked in site-packages first, it would find numpy a whole lot faster.
AFAIK this is a setuptools issue. From what I hear, it might be fixed in the svn version of setuptools, but they still have to make a release that has this feature.
The two issues I can see are: import path priority, it should be screwed up like it is, and speed. Speed is obviously a hard problem.
I suspect the thing to do is to re-install from scratch, and only add in packages I'm really using now.
Avoid eggs if you can. This has been my policy. I am not sure how much this is just superstition or a real problem, though.
Superstition. [~]$ python -c "import sys; print len(sys.path)" 269 [~]$ python -v -v -c "import numpy" 2> foo.txt [~]$ wc -l foo.txt 42500 foo.txt [~]$ time python -c "import numpy" python -c "import numpy" 0.18s user 0.46s system 88% cpu 0.716 total So cut it out. Chris, please profile your import so we actually have some real information to work with instead of prejudices. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco
Robert Kern wrote:
So cut it out.
Chris, please profile your import so we actually have some real information to work with instead of prejudices.
OK, while we're working on that, I've tried just re-naming my entire python install, then starting from scratch. So far, I've python re-installed, and installed just numpy ('1.1.1rc2'), and the import seems downright blazing compared to what I was getting: $ time python -c "import numpy" real 0m0.973s user 0m0.290s sys 0m0.682s much more in line with what others are getting. I'll start installing stuff that I actually am using now, and maybe I'll see when (if) it breaks down. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov
Christopher Barker wrote:
$ time python -c "import numpy"
real 0m0.973s user 0m0.290s sys 0m0.682s
much more in line with what others are getting.
I'll start installing stuff that I actually am using now, and maybe I'll see when (if) it breaks down.
OK, I just installed wxPython, and whoa! time python -c "import numpy" real 0m2.793s user 0m0.294s sys 0m2.494s so it's taking almost two seconds more to import numpy, now that wxPython is installed. I haven't even imported it yet. importing wx isn't as bad: $ time python -c "import wx" real 0m1.589s user 0m0.274s sys 0m1.000s -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov
Christopher Barker wrote:
OK, I just installed wxPython, and whoa!
time python -c "import numpy"
real 0m2.793s user 0m0.294s sys 0m2.494s
so it's taking almost two seconds more to import numpy, now that wxPython is installed. I haven't even imported it yet. importing wx isn't as bad:
$ time python -c "import wx"
real 0m1.589s user 0m0.274s sys 0m1.000s
Since numpy wo wx + wc import times adds up to numpy import times, this suggests that numpy may import wx. Which it shouldn't, obviously. There is something strange happening here. Please check wether wx really is imported when you do import numpy: python -c "import numpy; import sys; print sys.modules" And if it is, we have to know why it is imported at all when doing import numpy. cheers, David
On Sat, Aug 2, 2008 at 00:06, David Cournapeau <david@ar.media.kyoto-u.ac.jp> wrote:
Christopher Barker wrote:
OK, I just installed wxPython, and whoa!
time python -c "import numpy"
real 0m2.793s user 0m0.294s sys 0m2.494s
so it's taking almost two seconds more to import numpy, now that wxPython is installed. I haven't even imported it yet. importing wx isn't as bad:
$ time python -c "import wx"
real 0m1.589s user 0m0.274s sys 0m1.000s
Since numpy wo wx + wc import times adds up to numpy import times, this suggests that numpy may import wx. Which it shouldn't, obviously. There is something strange happening here. Please check wether wx really is imported when you do import numpy:
python -c "import numpy; import sys; print sys.modules"
And if it is, we have to know why it is imported at all when doing import numpy.
It isn't. The problem is on Chris's file system. Whatever is wrong with his file system (Bill Spotz's identical problem suggests too many temporary but unused inodes) increases the traversal of the file system. wx has a .pth file which adds entries to sys.path. Every time one tries to import something, the entries on sys.path are examined for the module. So increasing the number of entries on sys.path exacerbates the problem. But the problem really is his disk; it's not a problem with numpy or Python or anything else. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco
Robert Kern wrote:
It isn't. The problem is on Chris's file system. Whatever is wrong with his file system (Bill Spotz's identical problem suggests too many temporary but unused inodes) increases the traversal of the file system.
Ah, I did not think it could indeed affect the whole fs. This seems much more likely, then. I guess I was confused because wx caused me some problems a long time ago, with scipy, and thought maybe there were some leftovers in Chris' system. It would also explain why import numpy is still kind of slow on his machine. I don't remember the numbers, but I think it was quicker on my PPC minimac (under Mac os X) than on his computer.
wx has a .pth file which adds entries to sys.path. Every time one tries to import something, the entries on sys.path are examined for the module. So increasing the number of entries on sys.path exacerbates the problem. But the problem really is his disk; it's not a problem with numpy or Python or anything else.
It was an fs problem, after all. I am a bit surprised this can happen in such an aggravated manner, though. cheers, David
Robert Kern wrote:
It isn't. The problem is on Chris's file system.
Thanks for all your help, Robert. Interestingly, I haven't noticed any problems anywhere else, but who knows? I guess this is what Linux Torvalds meant when he said that OS-X's file system was "brain dead"
Whatever is wrong with his file system (Bill Spotz's identical problem suggests too many temporary but unused inodes)
I didn't see anything about Bill having similar issues -- was it on this list?
But the problem really is his disk; it's not a problem with numpy or Python or anything else.
so the question is: what can I do about it? Do I have any other choice than wiping the disk and re-installing? -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov
On Mon, Aug 4, 2008 at 14:24, Christopher Barker <Chris.Barker@noaa.gov> wrote:
Robert Kern wrote:
It isn't. The problem is on Chris's file system.
Thanks for all your help, Robert. Interestingly, I haven't noticed any problems anywhere else, but who knows?
I guess this is what Linux Torvalds meant when he said that OS-X's file system was "brain dead"
Whatever is wrong with his file system (Bill Spotz's identical problem suggests too many temporary but unused inodes)
I didn't see anything about Bill having similar issues -- was it on this list?
From my earlier message in this thread:
""" Looking at the Shark results you sent me, it looks like all of your time is getting sucked up by the system call getdirentries(). Googling for some of the function names in that stack brings me to the message "Slow python initialization" on the Pythonmac-SIG: http://mail.python.org/pipermail/pythonmac-sig/2005-December/015542.html The ultimate resolution was that Bill Spotz, the original poster, ran Disk Utility and used the Disk Repair option to clean up a large number of unused inodes. This solved the problem for him: http://mail.python.org/pipermail/pythonmac-sig/2005-December/015548.html """ -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco
OK, So I'm an idiot. After reading this, I thought "I haven't rebooted for a while". It turns out it's been 35 days. I think I've been having slow startup for a longer than that, but none the less, I re-booted (which took a long time), and presto: $ time python -c "import numpy" real 0m0.686s user 0m0.322s sys 0m0.363s much better! I suspect OS-X did some disk-cleaning on re-boot. Frankly, 35 days is pretty pathetic for an uptime, but as I said, I think this issue has been going on longer. Perhaps OS-X runs a disk check every n re-boots, like some linux distros do. Sorry about the noise, and thanks, particularly to Robert, for taking an interest in this. -Chris Robert Kern wrote:
On Mon, Aug 4, 2008 at 14:24, Christopher Barker <Chris.Barker@noaa.gov> wrote:
It isn't. The problem is on Chris's file system. Thanks for all your help, Robert. Interestingly, I haven't noticed any
Robert Kern wrote: problems anywhere else, but who knows?
I guess this is what Linux Torvalds meant when he said that OS-X's file system was "brain dead"
Whatever is wrong with his file system (Bill Spotz's identical problem suggests too many temporary but unused inodes) I didn't see anything about Bill having similar issues -- was it on this list?
From my earlier message in this thread:
""" Looking at the Shark results you sent me, it looks like all of your time is getting sucked up by the system call getdirentries(). Googling for some of the function names in that stack brings me to the message "Slow python initialization" on the Pythonmac-SIG:
http://mail.python.org/pipermail/pythonmac-sig/2005-December/015542.html
The ultimate resolution was that Bill Spotz, the original poster, ran Disk Utility and used the Disk Repair option to clean up a large number of unused inodes. This solved the problem for him:
http://mail.python.org/pipermail/pythonmac-sig/2005-December/015548.html """
-- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov
On Mon, Aug 4, 2008 at 18:01, Christopher Barker <Chris.Barker@noaa.gov> wrote:
OK,
So I'm an idiot. After reading this, I thought "I haven't rebooted for a while". It turns out it's been 35 days. I think I've been having slow startup for a longer than that, but none the less, I re-booted (which took a long time), and presto:
$ time python -c "import numpy"
real 0m0.686s user 0m0.322s sys 0m0.363s
much better!
It's still pretty bad, though. I do recommend running Disk Repair like Bill did. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco
I just added ticket 874 http://scipy.org/scipy/numpy/ticket/874 which on my machine takes the import time from 0.15 seconds down to 0.093 seconds. A bit over a month ago it was about 0.33 seconds. :) The biggest trick I didn't apply was to defer importing the re module, and only compile the patterns when they are needed. That would save about 0.013 seconds, but the result would be rather fragile as people rarely consider deferring pattern definitions. I'll submit a patch for it, expected it to be rejected. ;) I noticed that two places import the standard Python math library, which doesn't seem necessary: numpy/lib/__init__.py: import math __all__ = ['emath','math'] 'math' is never used in that module. It does mean that 'numpy.math' is the same as Python's math module, which I didn't expect. I didn't understand why it was explicitly in the __all_ list so I didn't change it. All tests pass without it. numpy/lib/index_tricks.py: import math .. size.append(math.ceil((key[k].stop - start)/ (step*1.0))) Wouldn't numpy.ceil be okay here? All tests pass when I use numpy.ceil. This would save about 0.002 seconds, so it's a small improvement. I think there's a subtle and minor bug in _datasource. def __del__(self): # Remove temp directories if self._istmpdest: rmtree(self._destpath) Is 'rmtree' guaranteed to be present in the module scope if the object is garbage collected during Python shutdown? Andrew dalke@dalkescientific.com
Robert Kern wrote:
It's still pretty bad, though. I do recommend running Disk Repair like Bill did.
I did that, and it found and did nothing -- I suspect it ran when I re-booted -- it did take a while to reboot. However, this is pretty consistently what I'm getting now: $ time python -c "import numpy" real 0m0.728s user 0m0.327s sys 0m0.398s Which is apparently pretty slow. Robert gets: $ time python -c "import numpy" python -c "import numpy" 0.18s user 0.46s system 88% cpu 0.716 total Is that on a similar machine??? Are you running Universal binaries? Would that make any difference? I wouldn't think so, I'm just grasping at straws here. This is a Dual 1.8GHz G5 desktop, running OS-X 10.4.11, Python 2.5.2 (python.org build), numpy 1.1.1 (from binary on sourceforge) I just tried this on a colleague's machine that is identical, and got about 0.4 seconds "real" -- so faster than mine, but still slow. This still feels blazingly fast to me, as I was getting something like 7+ seconds! thanks for all the help, -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov
On Sat, Aug 2, 2008 at 1:18 AM, Christopher Barker <Chris.Barker@noaa.gov> wrote:
A lot! 41 entries, and lot's of eggs -- are eggs an issue? I'm also wondering how the order is determined -- if it looked in site-packages first, it would find numpy a whole lot faster.
I don't think the number itself is an issue. Putting eggs first is the way it has to be I think, that's just how eggs are supposed to work.
I also tried:
python -v -v -c "import numpy" &>junk2.txt
which results in:
# installing zipimport hook import zipimport # builtin # installed zipimport hook # trying /Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site.so # trying /Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/sitemodule.so # trying /Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site.py
... ...
And a LOT more:
$ grep "# trying" junk2.txt | wc -l 7446
For comaprison: $ python -v -v -c "import sys" &>junk3.txt $ grep "# trying" junk3.txt | wc -l 618
which still seems like a lot.
So I think I've found the problem, it's looking in 7446 places ! but why?
Part of it is how python looks for modules. Again, I don't think the number itself is the issue: non existing files should not impact much because python import is basically doing a stat, and a stat on a non existing file, in the hot situation, takes nothing. IOW, I don't think the problem is the numbers themselves. It has to be something else. A simple profiling like python -m cProfile -o foo.stats foo.py and then: python -c "import pstats; p = pstats.Stats("foo.stats"); p.sort_stats('cumulative').print_stats(50)" May give useful information. This and using shark as Robert suggested should point to some direction, cheers, David
David Cournapeau wrote:
IOW, I don't think the problem is the numbers themselves. It has to be something else. A simple profiling like
python -m cProfile -o foo.stats foo.py
and then:
python -c "import pstats; p = pstats.Stats("foo.stats"); p.sort_stats('cumulative').print_stats(50)"
OK, see the results -- I think (though i may be wrong) this means that the problem isn't in finding the numpy package: As for Shark, I'm sorry I missed that message, but I'm trying to see if I can do that now -- I don't seem to have Shark installed, and the ADC site doesn't seem to be working, but I'll keep looking. Thanks for all your help with this... -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov Fri Aug 1 15:14:10 2008 ImportNumpy.stats 26987 function calls (26098 primitive calls) in 5.150 CPU seconds Ordered by: cumulative time List reduced from 631 to 50 due to restriction <50> ncalls tottime percall cumtime percall filename:lineno(function) 1 0.000 0.000 5.151 5.151 {execfile} 1 0.036 0.036 5.151 5.151 ImportNumpy.py:1(<module>) 1 0.146 0.146 5.115 5.115 /Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site-packages/numpy/__init__.py:63(<module>) 1 0.026 0.026 3.941 3.941 /Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site-packages/numpy/add_newdocs.py:9(<module>) 1 0.064 0.064 3.903 3.903 /Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site-packages/numpy/lib/__init__.py:1(<module>) 1 0.179 0.179 2.077 2.077 /Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site-packages/numpy/lib/io.py:1(<module>) 1 0.483 0.483 1.735 1.735 /Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site-packages/numpy/lib/_datasource.py:33(<module>) 1 0.035 0.035 1.582 1.582 /Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site-packages/numpy/lib/type_check.py:3(<module>) 1 0.112 0.112 1.547 1.547 /Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site-packages/numpy/core/__init__.py:2(<module>) 1 0.010 0.010 1.348 1.348 /Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site-packages/numpy/core/defmatrix.py:1(<module>) 1 0.302 0.302 1.338 1.338 /Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site-packages/numpy/lib/utils.py:1(<module>) 1 0.518 0.518 1.236 1.236 /Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/urllib2.py:74(<module>) 1 0.012 0.012 0.696 0.696 /Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site-packages/numpy/testing/__init__.py:2(<module>) 1 0.327 0.327 0.683 0.683 /Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site-packages/numpy/testing/numpytest.py:1(<module>) 1 0.011 0.011 0.681 0.681 /Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/compiler/__init__.py:22(<module>) 1 0.447 0.447 0.650 0.650 /Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/httplib.py:67(<module>) 1 0.351 0.351 0.356 0.356 /Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/compiler/transformer.py:9(<module>) 1 0.012 0.012 0.314 0.314 /Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/compiler/pycodegen.py:1(<module>) 1 0.181 0.181 0.300 0.300 /Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/compiler/pyassem.py:1(<module>) 1 0.162 0.162 0.205 0.205 /Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/inspect.py:24(<module>) 1 0.061 0.061 0.194 0.194 /Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site-packages/numpy/testing/utils.py:3(<module>) 1 0.163 0.163 0.163 0.163 /Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/mimetools.py:1(<module>) 1 0.131 0.131 0.163 0.163 /Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/tempfile.py:18(<module>) 1 0.161 0.161 0.162 0.162 /Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/unittest.py:45(<module>) 1 0.131 0.131 0.149 0.149 /Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/pydoc.py:35(<module>) 1 0.117 0.117 0.132 0.132 /Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/difflib.py:29(<module>) 1 0.061 0.061 0.122 0.122 /Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site-packages/numpy/_import_tools.py:2(<module>) 1 0.067 0.067 0.121 0.121 /Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site-packages/numpy/ctypeslib.py:1(<module>) 1 0.118 0.118 0.119 0.119 /Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/dis.py:1(<module>) 1 0.009 0.009 0.100 0.100 /Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site-packages/numpy/lib/polynomial.py:3(<module>) 3 0.000 0.000 0.080 0.027 /Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site-packages/numpy/lib/getlimits.py:36(__new__) 2 0.000 0.000 0.080 0.040 /Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site-packages/numpy/lib/getlimits.py:63(_init) 2 0.000 0.000 0.080 0.040 /Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site-packages/numpy/lib/machar.py:51(__init__) 2 0.051 0.026 0.080 0.040 /Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site-packages/numpy/lib/machar.py:70(_do_init) 1 0.026 0.026 0.079 0.079 /Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site-packages/numpy/lib/index_tricks.py:1(<module>) 1 0.052 0.052 0.063 0.063 /Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site-packages/numpy/core/numeric.py:1(<module>) 1 0.060 0.060 0.061 0.061 /Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/glob.py:1(<module>)
On Fri, Aug 1, 2008 at 17:22, Christopher Barker <Chris.Barker@noaa.gov> wrote:
David Cournapeau wrote:
IOW, I don't think the problem is the numbers themselves. It has to be something else. A simple profiling like
python -m cProfile -o foo.stats foo.py
and then:
python -c "import pstats; p = pstats.Stats("foo.stats"); p.sort_stats('cumulative').print_stats(50)"
OK, see the results -- I think (though i may be wrong) this means that the problem isn't in finding the numpy package:
Can you send foo.stats, too? -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco
Robert Kern wrote:
Can you send foo.stats, too?
sure can. Also, I've got Shark up and running, and have run it on a script that does nothing but import numpy, but really have no idea what I'm looking at, or how to send it to someone that does (you?). I'll keep poking at it some more. Thanks, -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov
On Fri, Aug 1, 2008 at 17:50, Christopher Barker <Chris.Barker@noaa.gov> wrote:
Robert Kern wrote:
Can you send foo.stats, too?
sure can.
Also, I've got Shark up and running, and have run it on a script that does nothing but import numpy, but really have no idea what I'm looking at, or how to send it to someone that does (you?). I'll keep poking at it some more.
File/Save As..., pick a file name. When asked about whether to embed source files or strip them out, choose Strip. Then email the resulting .mshark file to me. It looks like your Python just takes a truly inordinate amount of time to execute any code. Some of the problematic modules like httplib have been moved to local imports, but the time it takes for your Python to execute the code in that module is still ridiculously large. Can you profile just importing httplib instead of numpy? -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco
Robert Kern wrote:
File/Save As..., pick a file name. When asked about whether to embed source files or strip them out, choose Strip. Then email the resulting .mshark file to me.
I've done that, and sent it to you directly -- it's too big to put in the mailing list.
It looks like your Python just takes a truly inordinate amount of time to execute any code. Some of the problematic modules like httplib have been moved to local imports, but the time it takes for your Python to execute the code in that module is still ridiculously large. Can you profile just importing httplib instead of numpy?
I've got to go catch a bus now, and I don't have a Mac at home, so this will have to wait 'till next Monday -- thanks for all your time on this. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov
On Fri, Aug 1, 2008 at 17:50, Christopher Barker <Chris.Barker@noaa.gov> wrote:
Robert Kern wrote:
Can you send foo.stats, too?
sure can.
Also, I've got Shark up and running, and have run it on a script that does nothing but import numpy, but really have no idea what I'm looking at, or how to send it to someone that does (you?). I'll keep poking at it some more.
Looking at the Shark results you sent me, it looks like all of your time is getting sucked up by the system call getdirentries(). Googling for some of the function names in that stack brings me to the message "Slow python initialization" on the Pythonmac-SIG: http://mail.python.org/pipermail/pythonmac-sig/2005-December/015542.html The ultimate resolution was that Bill Spotz, the original poster, ran Disk Utility and used the Disk Repair option to clean up a large number of unused inodes. This solved the problem for him: http://mail.python.org/pipermail/pythonmac-sig/2005-December/015548.html -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco
On Thu, Jul 31, 2008 at 10:12:22AM -0700, Christopher Barker wrote:
I've been wondering about that.
time python -c "import numpy"
real 0m8.383s user 0m0.320s sys 0m7.805s
I don't know what is wrong, but this is plain wrong, unless you are on a distant file system, or something usual. On the box I am currently on, I get: python -c "import numpy" 0.10s user 0.03s system 101% cpu 0.122 total And this matches my overall experience. Gaël
On Thu, Jul 31, 2008 at 11:45, Christopher Barker <Chris.Barker@noaa.gov> wrote:
On my OS-X box (10.4.11, python2.5, numpy '1.1.1rc2'), it takes about 7 seconds to import numpy!
Can you try running a Python process that just imports numpy under Shark.app? http://developer.apple.com/tools/shark_optimize.html This will help us see what's eating up the time at the C level, at least. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco
participants (17)
-
Alan McIntyre
-
Andrew Dalke
-
Christopher Barker
-
David Cournapeau
-
David Cournapeau
-
David Cournapeau
-
David Huard
-
Gael Varoquaux
-
Hanni Ali
-
Kevin Jacobs <jacobs@bioinformed.com>
-
Matthieu Brucher
-
Nathan Bell
-
Nils Wagner
-
Ondrej Certik
-
Robert Kern
-
Scott Ransom
-
Stéfan van der Walt