Hi all, I mostly develop software related cheminformatics. There isn't much direct overlap between the tools of that field and NumPy and SciPy provide, but it's increasing with the use of scikit-learn and pandas. I tend to write command-line tools which indirectly import numpy. I've noticed that 25% of the "import numpy" cost of about 0.081 seconds is due to the chebyshev, laguerre, legendre, hermite_e, and hermite_e modules. Each module takes about 0.004 seconds to import. This is because each of them does a: exec(polytemplate.substitute(name='Chebyshev', nick='cheb', domain='[-1,1]')) during import. It appears that *everyone* takes a 0.02 second overhead during "import numpy" in order to simplify maintenance. This balance doesn't seem correct, given the number of people who use numpy vs. how rarely the polytemplate changes. Last year I submitted a patch which pre-computed all of those templates, so they would only be byte-compiled once. I knew (and still know) almost nothing about git/github, so Scott Sinclair kindly took it up and made it a pull request at: https://github.com/numpy/numpy/pull/334 I see that there's been no activity for at least 10 months. Is there anything more I can do to encourage that this patch be accepted? Cheers, Andrew dalke@dalkescientific.com
On Tue, Aug 6, 2013 at 8:32 PM, Andrew Dalke <dalke@dalkescientific.com>wrote:
Hi all,
I mostly develop software related cheminformatics. There isn't much direct overlap between the tools of that field and NumPy and SciPy provide, but it's increasing with the use of scikit-learn and pandas.
I tend to write command-line tools which indirectly import numpy. I've noticed that 25% of the "import numpy" cost of about 0.081 seconds is due to the chebyshev, laguerre, legendre, hermite_e, and hermite_e modules. Each module takes about 0.004 seconds to import.
This is because each of them does a:
exec(polytemplate.substitute(name='Chebyshev', nick='cheb', domain='[-1,1]'))
during import. It appears that *everyone* takes a 0.02 second overhead during "import numpy" in order to simplify maintenance. This balance doesn't seem correct, given the number of people who use numpy vs. how rarely the polytemplate changes.
Last year I submitted a patch which pre-computed all of those templates, so they would only be byte-compiled once.
I knew (and still know) almost nothing about git/github, so Scott Sinclair kindly took it up and made it a pull request at:
https://github.com/numpy/numpy/pull/334
I see that there's been no activity for at least 10 months.
Is there anything more I can do to encourage that this patch be accepted?
Hi Andrew, I haven't forgotten and intend to look at it before the next release. Chuck
On Aug 7, 2013, at 4:37 AM, Charles R Harris wrote:
I haven't forgotten and intend to look at it before the next release.
Thanks! On a related topic, last night I looked into deferring the import for numpy.testing. This is the only other big place where numpy's import overhead might be reduced without breaking backwards compatibility. I made a _DeferredTester [1] and replaced the 10 __init__.py uses of: from .testing import Tester test = Tester().test bench = Tester().bench to use the _DeferredTester instead. With that in place the "import numpy" time (best of 5) goes from 0.0796 seconds to 0.0741 seconds, or 7%. That 0.0796 includes the 0.02 seconds for the exec() of the polynomial templates. Without that 0.02 seconds in the baseline would give a 10% speedup. [2] Would this sort of approach be acceptable to NumPy? If so, I could improve the patch to make it be acceptable. The outstanding code issues to be resolve before making a pull request are: 1) The current wrapper uses *args and **kwargs to forward any test() and bench() calls to the actual function. As a result, parameter introspection doesn't work. 2) The current wrapper doesn't have a __doc__ 3) The only way to fix 1) and 2) is to copy the signatures and docstring from the actual Tester() implementation, which causes code/docstring duplication. 4) I don't know if it's appropriate to add my _DeferredTester to numpy.core vs. some other place in the code base. If you want to see the patch, I followed the NumPy instructions at http://docs.scipy.org/doc/numpy/dev/gitwash/git_development.html and made an experimental fork at https://github.com/adalke/numpy/tree/no-default-tester-import I have no git/github experience beyond what I did for this patch, so let me know if there are problems in what I did. Cheers, Andrew dalke@dalkescientific.com [1] Inside of numpy/core/__init__.py I added class _DeferredTester(object): def __init__(self, package_filename): import os self._package_path = os.path.dirname(package_filename) def test(self, *args, **kwargs): from ..testing import Tester return Tester(self._package_path).test(*args, **kwargs) def bench(self, *args, **kwargs): from ..testing import Tester return Tester(self._package_path).bench(*args, **kwargs) def get_test_and_bench(self): return self.test, self.bench It's used like this: from ..core import _DeferredTester test, bench = _DeferredTester(__file__).get_test_and_bench() That's admittedly ugly. It could also be: test = _DeferredTester(__file__).test bench = _DeferredTester(__file__).bench [2] Is an import speedup (on my laptop) of 0.0055 seconds important? I obviously think so. This time affects everyone who uses NumPy, even if incidentally, as in my case. I don't actually use NumPy, but I use a chemistry toolkit with a C++ core that imports NumPy in order to have access to its array data structure, even though I don't make use of that ability. If there are 1,000,000 "import numpy"s per day, then that's 90 minutes of savings per day. Yes, I could also switch to an SSD and the overhead will decrease. But on the other hand, I've also worked on a networked file system for a cluster where "python -c pass" took over a second to run, because Lustre is lousy with lots of metadata requests. (See http://www.nas.nasa.gov/hecc/support/kb/Lustre-Best-Practices_226.html ) In that case we switched to a zip importer, but you get my point that the 0.0055 seconds is also a function of the filesystem time, and that performance varies.
On Wed, Aug 7, 2013 at 3:26 PM, Andrew Dalke <dalke@dalkescientific.com>wrote:
On Aug 7, 2013, at 4:37 AM, Charles R Harris wrote:
I haven't forgotten and intend to look at it before the next release.
Thanks!
On a related topic, last night I looked into deferring the import for numpy.testing. This is the only other big place where numpy's import overhead might be reduced without breaking backwards compatibility.
It does break backwards compatibility though, because now you can do: import numpy as np np.testing.assert_equal(x, y)
I made a _DeferredTester [1] and replaced the 10 __init__.py uses of:
from .testing import Tester test = Tester().test bench = Tester().bench
to use the _DeferredTester instead.
With that in place the "import numpy" time (best of 5) goes from 0.0796 seconds to 0.0741 seconds, or 7%.
That 0.0796 includes the 0.02 seconds for the exec() of the polynomial templates. Without that 0.02 seconds in the baseline would give a 10% speedup. [2]
Would this sort of approach be acceptable to NumPy? If so, I could improve the patch to make it be acceptable.
I think if it's tested well, implementing np.test() with a deferred tester is OK imho. However, I would not be happy with breaking backwards compatibility for the numpy.testing module. Do you have more detailed timings? I'm guessing the bottleneck is importing nose. If so, you can still import numpy.testing into the numpy namespace without losing your gain in import time (nose is an optional dependency, so not imported by default inside numpy.testing).
The outstanding code issues to be resolve before making a pull request are:
1) The current wrapper uses *args and **kwargs to forward any test() and bench() calls to the actual function. As a result, parameter introspection doesn't work.
2) The current wrapper doesn't have a __doc__
3) The only way to fix 1) and 2) is to copy the signatures and docstring from the actual Tester() implementation, which causes code/docstring duplication.
That all sounds fixable.
4) I don't know if it's appropriate to add my _DeferredTester to numpy.core vs. some other place in the code base.
numpy.lib I'd think.
If you want to see the patch, I followed the NumPy instructions at http://docs.scipy.org/doc/numpy/dev/gitwash/git_development.html and made an experimental fork at https://github.com/adalke/numpy/tree/no-default-tester-import
I have no git/github experience beyond what I did for this patch, so let me know if there are problems in what I did.
You did everything correctly. Cheers, Ralf P.S. I also see some unused imports and files. Will send a cleanup PR.
[Short version: It doesn't look like my proposal or any simple alternative is tenable.] On Aug 10, 2013, at 10:28 AM, Ralf Gommers wrote:
It does break backwards compatibility though, because now you can do:
import numpy as np np.testing.assert_equal(x, y)
Yes, it does. I realize that a design goal in numpy was that (most?) submodules are available without any additional imports. This is the main reason for the "import numpy" overhead. The tension between ease-of-use for some and overhead for others is well known. For example, Sage tickets 3577, 6494, and 11714 relate to deferring numpy import during startup. The three relevant questions are: 1) is numpy.testing part of that promise? This can be split into multiple ways. o The design goal could be that only the numerics that people use for interactive/REPL computing are accessible without additional explicit imports, which implies that the import of numpy.testing is an implementation side-effect of providing submodule-level "test()" and "bench()" APIs o all NumPy modules with user-facing APIs should be accessible from numpy without additional imports While I would like to believe that the import of numpy.testing is an implementation side-effect of providing test() and bench(), I believe that I'm unlikely to convince the majority. For justifiable reasons, the numpy project is loath to break backwards compatibility, and I don't think there's an existing bright-line policy which would say that "import numpy; numpy.testing" should be avoided. 2) If it isn't a promise that "numpy.testing" is usable after an "import numpy" then how many people will be affected by an implementation change, and at what level of severity? I looked to see which packages might fail. A Debian code search of "numpy.testing" showed no problems, and no one uses "np.testing". I did a code search at http://code.ohloh.net . Of the first 200 or so hits for "numpy.testing", nearly all of them fell into uses like: from numpy.testing import Tester from numpy.testing import assert_equal, TestCase from numpy.testing.utils import * from numpy.testing import * There were, however, several packages which would fail: test_io.py and test_image.py and test_array_bridge.py in MediPy (Interestingly, test_circle.py has a "import numpy.testing", so it's not universal practice in that package) calculators_test.py in OpenQuake Engine ForcePlatformsExtractorTest.py in b-tk Note that these failures are in the test programs, and not in the main body code, so are unlikely to break end-user programs. HOWEVER! The real test is for people who do "import numpy as np" then refer to "np.testing". There are "about 454" such matches in Ohloh. One example is 'test_polygon.py' from scikit-image. Others are: test_anova.py in statsmodel test_graph.py in scikit-learn test_rmagic.py in IPython test_mlab.py in matplotlib Nearly all the cases I looked at were in files starting "test", or a handful which ended in "test.py" or "Test.py". Others use np.test only as part of a unit test, such as: affine_grid.py and others in pyMor (as part of in-file unit tests) technical_indicators.py in QuantPy (as part of unittest.TestCase) coord_tools.py in NiPy-OLD (as part of in-file unit tests) predstd.py and others in statsmodels (as a main-line unit test) galsim_test_helpers.py in GalSim These would likely not break end-user code. Sadly, not all are that safe. For examples: simple_contrast.py example program for nippy try_signal_lti.py in joePython run.py in python-seminar verify.py in bell_d_project (a final project for a CS class) ex_shrink_pickle.py in statsmodels (as an example?) parametric_design.py in nippy (uses assert_almost_equal to verify an example) model.py in pymc-devs's pymc model.py in duffy zipline in olmar utils.py in MNE .... and I gave up at result 320 of 454. Based on this, about 1% of the programs which use numpy.testing would break. This tells me that there are enough user programs which would fail that I don't think numpy will decide to make this change. And the third question is 3) Are there other alternatives? Or as Ralf Gommers wrote:
Do you have more detailed timings? I'm guessing the bottleneck is importing nose.
I do have more detailed timings. "nose" is not imported during an "import numpy". (For one, "import nose" takes a full 0.11 seconds on my laptop and adds 199 modules to sys.modules!) The hit is the "import unittest" in numpy.testing, which exists only to place "TestCase" in the numpy.testing namespace. "numpy.testing.TestCase" is only used by the unit tests, and not by any direct end-user code. Here's the full hierarchical timing breakdown showing - module name - cumulative time to load - parent module testing: 0.0065 (numpy.core.numeric) unittest: 0.0055 (testing) result: 0.0011 (unittest) traceback: 0.0004 (result) linecache: 0.0000 (traceback) StringIO: 0.0004 (result) errno: 0.0000 (StringIO) case: 0.0021 (unittest) difflib: 0.0011 (case) pprint: 0.0004 (case) util: 0.0000 (case) suite: 0.0002 (unittest) loader: 0.0006 (unittest) fnmatch: 0.0002 (loader) main: 0.0010 (unittest) time: 0.0000 (main) signals: 0.0006 (main) signal: 0.0000 (signals) weakref: 0.0005 (signals) UserDict: 0.0000 (weakref) _weakref: 0.0000 (weakref) _weakrefset: 0.0000 (weakref) exceptions: 0.0000 (weakref) runner: 0.0000 (unittest) utils: 0.0005 (testing) nosetester: 0.0002 (utils) numpy.testing.utils: 0.0000 (nosetester) numpytest: 0.0001 (testing) As you can see, "unittest" imports a large number of modules. I see no good way to get rid of this unittest import. Even if all of the tests were rewritten to use unittest.TestCase, numpy.testing.TestCase would still need to be present so third-party packages could derive from it, and there's no (easy?) way to make that TestCase some sort of deferred object which gets the correct TestCase when needed. In conclusion, it looks like my proposal is not tenable and there's no easy way to squeeze out that ~5% of startup overhead. Cheers, Andrew dalke@dalkescientific.com
On Sat, Aug 10, 2013 at 5:21 PM, Andrew Dalke <dalke@dalkescientific.com>wrote:
[Short version: It doesn't look like my proposal or any simple alternative is tenable.]
On Aug 10, 2013, at 10:28 AM, Ralf Gommers wrote:
It does break backwards compatibility though, because now you can do:
import numpy as np np.testing.assert_equal(x, y)
Yes, it does.
I realize that a design goal in numpy was that (most?) submodules are available without any additional imports. This is the main reason for the "import numpy" overhead. The tension between ease-of-use for some and overhead for others is well known. For example, Sage tickets 3577, 6494, and 11714 relate to deferring numpy import during startup.
The three relevant questions are:
1) is numpy.testing part of that promise? This can be split into multiple ways.
o The design goal could be that only the numerics that people use for interactive/REPL computing are accessible without additional explicit imports, which implies that the import of numpy.testing is an implementation side-effect of providing submodule-level "test()" and "bench()" APIs
o all NumPy modules with user-facing APIs should be accessible from numpy without additional imports
While I would like to believe that the import of numpy.testing is an implementation side-effect of providing test() and bench(), I believe that I'm unlikely to convince the majority.
It likely is a side-effect rather than intentional design, but at this point that doesn't matter much anymore. There never was a clear distinction between private and public modules and now, as your investigation shows, the cost of removing the import is quite high. For justifiable reasons, the numpy project is loath to break
backwards compatibility, and I don't think there's an existing bright-line policy which would say that "import numpy; numpy.testing" should be avoided.
2) If it isn't a promise that "numpy.testing" is usable after an "import numpy" then how many people will be affected by an implementation change, and at what level of severity?
I looked to see which packages might fail. A Debian code search of "numpy.testing" showed no problems, and no one uses "np.testing".
I did a code search at http://code.ohloh.net . Of the first 200 or so hits for "numpy.testing", nearly all of them fell into uses like:
from numpy.testing import Tester from numpy.testing import assert_equal, TestCase from numpy.testing.utils import * from numpy.testing import *
There were, however, several packages which would fail:
test_io.py and test_image.py and test_array_bridge.py in MediPy (Interestingly, test_circle.py has a "import numpy.testing", so it's not universal practice in that package) calculators_test.py in OpenQuake Engine ForcePlatformsExtractorTest.py in b-tk
Note that these failures are in the test programs, and not in the main body code, so are unlikely to break end-user programs.
HOWEVER!
The real test is for people who do "import numpy as np" then refer to "np.testing". There are "about 454" such matches in Ohloh.
One example is 'test_polygon.py' from scikit-image. Others are: test_anova.py in statsmodel test_graph.py in scikit-learn test_rmagic.py in IPython test_mlab.py in matplotlib
Nearly all the cases I looked at were in files starting "test", or a handful which ended in "test.py" or "Test.py". Others use np.test only as part of a unit test, such as:
affine_grid.py and others in pyMor (as part of in-file unit tests) technical_indicators.py in QuantPy (as part of unittest.TestCase) coord_tools.py in NiPy-OLD (as part of in-file unit tests) predstd.py and others in statsmodels (as a main-line unit test) galsim_test_helpers.py in GalSim
These would likely not break end-user code.
Sadly, not all are that safe. For examples: simple_contrast.py example program for nippy try_signal_lti.py in joePython run.py in python-seminar verify.py in bell_d_project (a final project for a CS class) ex_shrink_pickle.py in statsmodels (as an example?) parametric_design.py in nippy (uses assert_almost_equal to verify an example) model.py in pymc-devs's pymc model.py in duffy zipline in olmar utils.py in MNE .... and I gave up at result 320 of 454.
Based on this, about 1% of the programs which use numpy.testing would break. This tells me that there are enough user programs which would fail that I don't think numpy will decide to make this change.
And the third question is
3) Are there other alternatives?
Or as Ralf Gommers wrote:
Do you have more detailed timings? I'm guessing the bottleneck is importing nose.
I do have more detailed timings. "nose" is not imported during an "import numpy". (For one, "import nose" takes a full 0.11 seconds on my laptop and adds 199 modules to sys.modules!)
The hit is the "import unittest" in numpy.testing, which exists only to place "TestCase" in the numpy.testing namespace. "numpy.testing.TestCase" is only used by the unit tests, and not by any direct end-user code.
Here's the full hierarchical timing breakdown showing - module name - cumulative time to load - parent module
testing: 0.0065 (numpy.core.numeric) unittest: 0.0055 (testing) result: 0.0011 (unittest) traceback: 0.0004 (result) linecache: 0.0000 (traceback) StringIO: 0.0004 (result) errno: 0.0000 (StringIO) case: 0.0021 (unittest) difflib: 0.0011 (case) pprint: 0.0004 (case) util: 0.0000 (case) suite: 0.0002 (unittest) loader: 0.0006 (unittest) fnmatch: 0.0002 (loader) main: 0.0010 (unittest) time: 0.0000 (main) signals: 0.0006 (main) signal: 0.0000 (signals) weakref: 0.0005 (signals) UserDict: 0.0000 (weakref) _weakref: 0.0000 (weakref) _weakrefset: 0.0000 (weakref) exceptions: 0.0000 (weakref) runner: 0.0000 (unittest) utils: 0.0005 (testing) nosetester: 0.0002 (utils) numpy.testing.utils: 0.0000 (nosetester) numpytest: 0.0001 (testing)
As you can see, "unittest" imports a large number of modules.
I see no good way to get rid of this unittest import.
Indeed. I had a quick look at the benefit of copying TestCase into numpy.testing so the import of unittest can be removed, but more than half the time is spent inside case.py and result.py, which would still be needed.
Even if all of the tests were rewritten to use unittest.TestCase, numpy.testing.TestCase would still need to be present so third-party packages could derive from it, and there's no (easy?) way to make that TestCase some sort of deferred object which gets the correct TestCase when needed.
In conclusion, it looks like my proposal is not tenable and there's no easy way to squeeze out that ~5% of startup overhead.
It does look that way. Ralf
On Aug 10, 2013 12:50 PM, "Ralf Gommers" <ralf.gommers@gmail.com> wrote:
On Sat, Aug 10, 2013 at 5:21 PM, Andrew Dalke <dalke@dalkescientific.com>
[Short version: It doesn't look like my proposal or any simple alternative is tenable.]
On Aug 10, 2013, at 10:28 AM, Ralf Gommers wrote:
It does break backwards compatibility though, because now you can do:
import numpy as np np.testing.assert_equal(x, y)
Yes, it does.
I realize that a design goal in numpy was that (most?) submodules are available without any additional imports. This is the main reason for the "import numpy" overhead. The tension between ease-of-use for some and overhead for others is well known. For example, Sage tickets 3577, 6494, and 11714 relate to deferring numpy import during startup.
The three relevant questions are:
1) is numpy.testing part of that promise? This can be split into multiple ways.
o The design goal could be that only the numerics that people use for interactive/REPL computing are accessible without additional explicit imports, which implies that the import of numpy.testing is an implementation side-effect of providing submodule-level "test()" and "bench()" APIs
o all NumPy modules with user-facing APIs should be accessible from numpy without additional imports
While I would like to believe that the import of numpy.testing is an implementation side-effect of providing test() and bench(), I believe that I'm unlikely to convince the majority.
It likely is a side-effect rather than intentional design, but at this
wrote: point that doesn't matter much anymore. There never was a clear distinction between private and public modules and now, as your investigation shows, the cost of removing the import is quite high.
For justifiable reasons, the numpy project is loath to break backwards compatibility, and I don't think there's an existing bright-line policy which would say that "import numpy; numpy.testing" should be avoided.
2) If it isn't a promise that "numpy.testing" is usable after an "import numpy" then how many people will be affected by an implementation change, and at what level of severity?
I looked to see which packages might fail. A Debian code search of "numpy.testing" showed no problems, and no one uses "np.testing".
I did a code search at http://code.ohloh.net . Of the first 200 or so hits for "numpy.testing", nearly all of them fell into uses like:
from numpy.testing import Tester from numpy.testing import assert_equal, TestCase from numpy.testing.utils import * from numpy.testing import *
There were, however, several packages which would fail:
test_io.py and test_image.py and test_array_bridge.py in MediPy (Interestingly, test_circle.py has a "import numpy.testing", so it's not universal practice in that package) calculators_test.py in OpenQuake Engine ForcePlatformsExtractorTest.py in b-tk
Note that these failures are in the test programs, and not in the main body code, so are unlikely to break end-user programs.
HOWEVER!
The real test is for people who do "import numpy as np" then refer to "np.testing". There are "about 454" such matches in Ohloh.
One example is 'test_polygon.py' from scikit-image. Others are: test_anova.py in statsmodel test_graph.py in scikit-learn test_rmagic.py in IPython test_mlab.py in matplotlib
Nearly all the cases I looked at were in files starting "test", or a handful which ended in "test.py" or "Test.py". Others use np.test only as part of a unit test, such as:
affine_grid.py and others in pyMor (as part of in-file unit tests) technical_indicators.py in QuantPy (as part of unittest.TestCase) coord_tools.py in NiPy-OLD (as part of in-file unit tests) predstd.py and others in statsmodels (as a main-line unit test) galsim_test_helpers.py in GalSim
These would likely not break end-user code.
Sadly, not all are that safe. For examples: simple_contrast.py example program for nippy try_signal_lti.py in joePython run.py in python-seminar verify.py in bell_d_project (a final project for a CS class) ex_shrink_pickle.py in statsmodels (as an example?) parametric_design.py in nippy (uses assert_almost_equal to verify an
model.py in pymc-devs's pymc model.py in duffy zipline in olmar utils.py in MNE .... and I gave up at result 320 of 454.
Based on this, about 1% of the programs which use numpy.testing would break. This tells me that there are enough user programs which would fail that I don't think numpy will decide to make this change.
And the third question is
3) Are there other alternatives?
Or as Ralf Gommers wrote:
Do you have more detailed timings? I'm guessing the bottleneck is importing nose.
I do have more detailed timings. "nose" is not imported during an "import numpy". (For one, "import nose" takes a full 0.11 seconds on my laptop and adds 199 modules to sys.modules!)
The hit is the "import unittest" in numpy.testing, which exists only to place "TestCase" in the numpy.testing namespace. "numpy.testing.TestCase" is only used by the unit tests, and not by any direct end-user code.
Here's the full hierarchical timing breakdown showing - module name - cumulative time to load - parent module
testing: 0.0065 (numpy.core.numeric) unittest: 0.0055 (testing) result: 0.0011 (unittest) traceback: 0.0004 (result) linecache: 0.0000 (traceback) StringIO: 0.0004 (result) errno: 0.0000 (StringIO) case: 0.0021 (unittest) difflib: 0.0011 (case) pprint: 0.0004 (case) util: 0.0000 (case) suite: 0.0002 (unittest) loader: 0.0006 (unittest) fnmatch: 0.0002 (loader) main: 0.0010 (unittest) time: 0.0000 (main) signals: 0.0006 (main) signal: 0.0000 (signals) weakref: 0.0005 (signals) UserDict: 0.0000 (weakref) _weakref: 0.0000 (weakref) _weakrefset: 0.0000 (weakref) exceptions: 0.0000 (weakref) runner: 0.0000 (unittest) utils: 0.0005 (testing) nosetester: 0.0002 (utils) numpy.testing.utils: 0.0000 (nosetester) numpytest: 0.0001 (testing)
As you can see, "unittest" imports a large number of modules.
I see no good way to get rid of this unittest import.
Indeed. I had a quick look at the benefit of copying TestCase into numpy.testing so the import of unittest can be removed, but more than half
example) the time is spent inside case.py and result.py, which would still be needed.
Even if all of the tests were rewritten to use unittest.TestCase, numpy.testing.TestCase would still need to be present so third-party packages could derive from it, and there's no (easy?) way to make that TestCase some sort of deferred object which gets the correct TestCase when needed.
In conclusion, it looks like my proposal is not tenable and there's no easy way to squeeze out that ~5% of startup overhead.
It does look that way.
Ralf
Would there be some sort of way to detect that numpy.testing wasn't explicitly imported and issue a deprecation warning? Say, move the code into numpy._testing, import in into the namespace as testing, but then have the testing.py file set a flag in _testing to indicate an explicit import has occurred? Eventually, even _testing would no longer get imported by default and all will be well. Of course, that might be too convoluted? Ben Root
On Sun, Aug 11, 2013 at 3:35 AM, Benjamin Root <ben.root@ou.edu> wrote:
Would there be some sort of way to detect that numpy.testing wasn't explicitly imported and issue a deprecation warning? Say, move the code into numpy._testing, import in into the namespace as testing, but then have the testing.py file set a flag in _testing to indicate an explicit import has occurred?
Eventually, even _testing would no longer get imported by default and all will be well.
Of course, that might be too convoluted?
I'm not sure how that would work (you didn't describe how to decide that the import was explicit), but imho the impact would be too high. Ralf
On Aug 11, 2013 5:02 AM, "Ralf Gommers" <ralf.gommers@gmail.com> wrote:
On Sun, Aug 11, 2013 at 3:35 AM, Benjamin Root <ben.root@ou.edu> wrote:
Would there be some sort of way to detect that numpy.testing wasn't
Eventually, even _testing would no longer get imported by default and
all will be well.
Of course, that might be too convoluted?
I'm not sure how that would work (you didn't describe how to decide that
explicitly imported and issue a deprecation warning? Say, move the code into numpy._testing, import in into the namespace as testing, but then have the testing.py file set a flag in _testing to indicate an explicit import has occurred? the import was explicit), but imho the impact would be too high.
Ralf
The idea would be that within numpy (and we should fix SciPy as well), we would always import numpy._testing as testing, and not import testing.py ourselves. Then, there would be a flag in _testing.py that would be set to emit, by default, warnings about using np.testing without an explicit import, and stating which version all code will have to be switched by perhaps 2.0?). testing.py would do a from _testing import *, but also set the flag in _testing to not emit warnings, because only a non-numpy (and SciPy) module would have imported it. It isn't foolproof. If a project has multiple dependencies that use np.testing, and only one of them explicitly imports np.testing, then the warning becomes hidden for the non-compliant parts. However, if we make sure that the core SciPy projects use np._testing, it would go a long way to get the word out. Again, I am just throwing it out there as an idea. The speedups we are getting right now so far are nice, so it is entirely possible that this kludge is just not worth the last remaining bits of extra time. Cheers! Ben Root
On Aug 11, 2013, at 10:24 PM, Benjamin Root wrote:
The idea would be that within numpy (and we should fix SciPy as well), we would always import numpy._testing as testing, and not import testing.py ourselves.
The problem is the existing code out there which does: import numpy as np ... np.testing.utils.assert_almost_equal(x, y) (That is, without an additional import), and other code which does from numpy.testing import * There's no way to make these both give a warning without some horrible hack, like interposing our own __import__ or sticking some non-module object into sys.modules. Down that path lies madness, and the corpses of many earlier attempts. Perhaps the latest Python import hooks might have a solution, but it's not something I want to spend time solving. Andrew dalke@dalkescientific.com
On Sun, Aug 11, 2013 at 2:24 PM, Benjamin Root <ben.root@ou.edu> wrote:
On Aug 11, 2013 5:02 AM, "Ralf Gommers" <ralf.gommers@gmail.com> wrote:
On Sun, Aug 11, 2013 at 3:35 AM, Benjamin Root <ben.root@ou.edu> wrote:
Would there be some sort of way to detect that numpy.testing wasn't
Eventually, even _testing would no longer get imported by default and
all will be well.
Of course, that might be too convoluted?
I'm not sure how that would work (you didn't describe how to decide that
explicitly imported and issue a deprecation warning? Say, move the code into numpy._testing, import in into the namespace as testing, but then have the testing.py file set a flag in _testing to indicate an explicit import has occurred? the import was explicit), but imho the impact would be too high.
Ralf
The idea would be that within numpy (and we should fix SciPy as well), we would always import numpy._testing as testing, and not import testing.py ourselves. Then, there would be a flag in _testing.py that would be set to emit, by default, warnings about using np.testing without an explicit import, and stating which version all code will have to be switched by perhaps 2.0?).
testing.py would do a from _testing import *, but also set the flag in _testing to not emit warnings, because only a non-numpy (and SciPy) module would have imported it.
It isn't foolproof. If a project has multiple dependencies that use np.testing, and only one of them explicitly imports np.testing, then the warning becomes hidden for the non-compliant parts. However, if we make sure that the core SciPy projects use np._testing, it would go a long way to get the word out.
Again, I am just throwing it out there as an idea. The speedups we are getting right now so far are nice, so it is entirely possible that this kludge is just not worth the last remaining bits of extra time.
OT: Benjamin, would you take a look at PR #3534 <http://The number of nests varies a lot here year to year. Some years the yellow jackets have nests in every little overhang, cranny, and all over the eaves. Other years there are very few. I usually leave them alone and watch their comings and goings. One year I think there was a big fight or plague because dead and dying wasps and larvae were falling onto the ground from a nest behind some siding. Honeybees can be worse, I once saw an old house where the bees had taken over the inside of a whole wall. The honey was just oozing through and dripping down the wall. That said, if I get stung I clean them buggers out. Darwinian selection in action.>. It is the continuation of your nanmean, nanvar, and nanstd work. Chuck
On Aug 11, 2013 4:37 PM, "Andrew Dalke" <dalke@dalkescientific.com> wrote:
On Aug 11, 2013, at 10:24 PM, Benjamin Root wrote:
The idea would be that within numpy (and we should fix SciPy as well),
we would always import numpy._testing as testing, and not import testing.py ourselves.
The problem is the existing code out there which does:
import numpy as np ... np.testing.utils.assert_almost_equal(x, y)
(That is, without an additional import), and other code which does
from numpy.testing import *
I wouldn't consider having then both emit a warning. The latter one is an explicit import (albeit horrible). Iirc, that should import the testing.py, and deactivate the warnings. However, "from numpy import testing" would be a problem... Drat... Forget I said anything. The idea wouldn't work. Ben
participants (4)
-
Andrew Dalke
-
Benjamin Root
-
Charles R Harris
-
Ralf Gommers