On Sat, Aug 10, 2013 at 5:21 PM, Andrew Dalke <dalke@dalkescientific.com>wrote:
[Short version: It doesn't look like my proposal or any simple alternative is tenable.]
On Aug 10, 2013, at 10:28 AM, Ralf Gommers wrote:
It does break backwards compatibility though, because now you can do:
import numpy as np np.testing.assert_equal(x, y)
Yes, it does.
I realize that a design goal in numpy was that (most?) submodules are available without any additional imports. This is the main reason for the "import numpy" overhead. The tension between ease-of-use for some and overhead for others is well known. For example, Sage tickets 3577, 6494, and 11714 relate to deferring numpy import during startup.
The three relevant questions are:
1) is numpy.testing part of that promise? This can be split into multiple ways.
o The design goal could be that only the numerics that people use for interactive/REPL computing are accessible without additional explicit imports, which implies that the import of numpy.testing is an implementation side-effect of providing submodule-level "test()" and "bench()" APIs
o all NumPy modules with user-facing APIs should be accessible from numpy without additional imports
While I would like to believe that the import of numpy.testing is an implementation side-effect of providing test() and bench(), I believe that I'm unlikely to convince the majority.
It likely is a side-effect rather than intentional design, but at this point that doesn't matter much anymore. There never was a clear distinction between private and public modules and now, as your investigation shows, the cost of removing the import is quite high. For justifiable reasons, the numpy project is loath to break
backwards compatibility, and I don't think there's an existing bright-line policy which would say that "import numpy; numpy.testing" should be avoided.
2) If it isn't a promise that "numpy.testing" is usable after an "import numpy" then how many people will be affected by an implementation change, and at what level of severity?
I looked to see which packages might fail. A Debian code search of "numpy.testing" showed no problems, and no one uses "np.testing".
I did a code search at http://code.ohloh.net . Of the first 200 or so hits for "numpy.testing", nearly all of them fell into uses like:
from numpy.testing import Tester from numpy.testing import assert_equal, TestCase from numpy.testing.utils import * from numpy.testing import *
There were, however, several packages which would fail:
test_io.py and test_image.py and test_array_bridge.py in MediPy (Interestingly, test_circle.py has a "import numpy.testing", so it's not universal practice in that package) calculators_test.py in OpenQuake Engine ForcePlatformsExtractorTest.py in b-tk
Note that these failures are in the test programs, and not in the main body code, so are unlikely to break end-user programs.
HOWEVER!
The real test is for people who do "import numpy as np" then refer to "np.testing". There are "about 454" such matches in Ohloh.
One example is 'test_polygon.py' from scikit-image. Others are: test_anova.py in statsmodel test_graph.py in scikit-learn test_rmagic.py in IPython test_mlab.py in matplotlib
Nearly all the cases I looked at were in files starting "test", or a handful which ended in "test.py" or "Test.py". Others use np.test only as part of a unit test, such as:
affine_grid.py and others in pyMor (as part of in-file unit tests) technical_indicators.py in QuantPy (as part of unittest.TestCase) coord_tools.py in NiPy-OLD (as part of in-file unit tests) predstd.py and others in statsmodels (as a main-line unit test) galsim_test_helpers.py in GalSim
These would likely not break end-user code.
Sadly, not all are that safe. For examples: simple_contrast.py example program for nippy try_signal_lti.py in joePython run.py in python-seminar verify.py in bell_d_project (a final project for a CS class) ex_shrink_pickle.py in statsmodels (as an example?) parametric_design.py in nippy (uses assert_almost_equal to verify an example) model.py in pymc-devs's pymc model.py in duffy zipline in olmar utils.py in MNE .... and I gave up at result 320 of 454.
Based on this, about 1% of the programs which use numpy.testing would break. This tells me that there are enough user programs which would fail that I don't think numpy will decide to make this change.
And the third question is
3) Are there other alternatives?
Or as Ralf Gommers wrote:
Do you have more detailed timings? I'm guessing the bottleneck is importing nose.
I do have more detailed timings. "nose" is not imported during an "import numpy". (For one, "import nose" takes a full 0.11 seconds on my laptop and adds 199 modules to sys.modules!)
The hit is the "import unittest" in numpy.testing, which exists only to place "TestCase" in the numpy.testing namespace. "numpy.testing.TestCase" is only used by the unit tests, and not by any direct end-user code.
Here's the full hierarchical timing breakdown showing - module name - cumulative time to load - parent module
testing: 0.0065 (numpy.core.numeric) unittest: 0.0055 (testing) result: 0.0011 (unittest) traceback: 0.0004 (result) linecache: 0.0000 (traceback) StringIO: 0.0004 (result) errno: 0.0000 (StringIO) case: 0.0021 (unittest) difflib: 0.0011 (case) pprint: 0.0004 (case) util: 0.0000 (case) suite: 0.0002 (unittest) loader: 0.0006 (unittest) fnmatch: 0.0002 (loader) main: 0.0010 (unittest) time: 0.0000 (main) signals: 0.0006 (main) signal: 0.0000 (signals) weakref: 0.0005 (signals) UserDict: 0.0000 (weakref) _weakref: 0.0000 (weakref) _weakrefset: 0.0000 (weakref) exceptions: 0.0000 (weakref) runner: 0.0000 (unittest) utils: 0.0005 (testing) nosetester: 0.0002 (utils) numpy.testing.utils: 0.0000 (nosetester) numpytest: 0.0001 (testing)
As you can see, "unittest" imports a large number of modules.
I see no good way to get rid of this unittest import.
Indeed. I had a quick look at the benefit of copying TestCase into numpy.testing so the import of unittest can be removed, but more than half the time is spent inside case.py and result.py, which would still be needed.
Even if all of the tests were rewritten to use unittest.TestCase, numpy.testing.TestCase would still need to be present so third-party packages could derive from it, and there's no (easy?) way to make that TestCase some sort of deferred object which gets the correct TestCase when needed.
In conclusion, it looks like my proposal is not tenable and there's no easy way to squeeze out that ~5% of startup overhead.
It does look that way. Ralf