NumPy 1.2.0b2 released
Hey, NumPy 1.2.0b2 is now available. Please test this so that we can uncover any problems ASAP. SVN tag: http://svn.scipy.org/svn/numpy/tags/1.2.0b2 Mac binary: https://cirl.berkeley.edu/numpy/numpy-1.2.0b2-py2.5-macosx10.5.dmg Windows binary: http://www.enthought.com/~gvaroquaux/numpy-1.2.0b2-win32.zip Source tarball: https://cirl.berkeley.edu/numpy/numpy-1.2.0b2.tar.gz Thanks, -- Jarrod Millman Computational Infrastructure for Research Labs 10 Giannini Hall, UC Berkeley phone: 510.643.4014 http://cirl.berkeley.edu/
Two odd failures in test_print.py. Platform: Win XP SP3 on Intel T2600. Alan Isaac
np.test() Running unit tests for numpy NumPy version 1.2.0b2 NumPy is installed in C:\Python25\lib\site-packages\numpy Python version 2.5.2 (r252:60911, Feb 21 2008, 13:11:45) [MSC v.1310 32 bit (Intel)] nose version 0.11.0 .......................................................................................... .......................................................................................... ..............................................................FF.......................... .......................................................................................... ...........................................................................S.............. ..................................................................Ignoring "Python was bui lt with Visual Studio 2003; extensions must be built with a compiler than can generate compatible binaries. Visual Studio 2003 was not found on this system. If you have Cygwin installed, you can try compiling with MingW32, by passing "-c mingw32" to setup.py." (one should fix me in fcompiler/compaq.py) .......................................................................................... .......................................................................................... .......................................................................................... .......................................................................................... .......................................................................................... .......................................................................................... .......................................................................................... .......................................................................................... .......................................................................................... .......................................................................................... .......................................................................................... ............................................................. ====================================================================== FAIL: Check formatting.
Traceback (most recent call last): File "C:\Python25\Lib\site-packages\numpy\core\tests\test_print.py", line 28, in test_complex_types assert_equal(str(t(x)), str(complex(x))) File "C:\Python25\Lib\site-packages\numpy\testing\utils.py", line 180, in assert_equal assert desired == actual, msg AssertionError: Items are not equal: ACTUAL: '(0+5.9287877500949585e-323j)' DESIRED: '(1+0j)' ====================================================================== FAIL: Check formatting. ---------------------------------------------------------------------- Traceback (most recent call last): File "C:\Python25\Lib\site-packages\numpy\core\tests\test_print.py", line 16, in test_fl oat_types assert_equal(str(t(x)), str(float(x))) File "C:\Python25\Lib\site-packages\numpy\testing\utils.py", line 180, in assert_equal assert desired == actual, msg AssertionError: Items are not equal: ACTUAL: '0.0' DESIRED: '1.0' ---------------------------------------------------------------------- Ran 1567 tests in 8.234s FAILED (SKIP=1, failures=2) <nose.result.TextTestResult run=1567 errors=0 failures=2>
On Aug 14, 2008, at 11:07 PM, Alan G Isaac wrote:
Btw, numpy loads noticeably faster.
Any chance of someone reviewing my suggestions for making the import somewhat faster still? http://scipy.org/scipy/numpy/ticket/874 Andrew dalke@dalkescientific.com
Andrew Dalke wrote:
On Aug 14, 2008, at 11:07 PM, Alan G Isaac wrote:
Btw, numpy loads noticeably faster.
Any chance of someone reviewing my suggestions for making the import somewhat faster still?
So, what is the attitude of people here? Here's my take: 1) Removing ctypeslib import * Can break code if somebody has been doing import numpy and then using numpy.ctypeslib * I'm fine with people needing to import numpy.ctypeslib to use the capability as long as we clearly indicate these breakages. 2&3) I think defering imports for _datasource.py is a great idea. 4) Removing "import doc" * This was added recently I think. I'm not sure why it's there, but it might be there as part of the documentation effort. It should be imported with ipython, probably, but not by default. 5) The testing code seems like a lot of duplication to save .01 seconds 6) Remove unused glob --- fine. 7 - 9) These seem fine. In sum: I think 2, 3, 6, 7, 8, and 9 can be done immediately. 1) and 4) could be O.K. but 1) does break code and 4 I'm not sure about. 5 seems like it's too much code duplication for too little savings for my taste. We need to push of the release of 1.2 I think. We are rushing to get it out by SciPy and it is causing some rushing of collaboration so that people who would like to comment are feeling that they can't or that their comments are not desired or valued. I'm sorry for what I've done that might have left that impression. -Travis O.
On Aug 15, 2008, at 9:00 AM, Travis E. Oliphant wrote:
5) The testing code seems like a lot of duplication to save .01 seconds
Personally I want to get rid of all in-body test code and use nosetests or something similar. I know that's not going to happen at the very least because I was told that people have been told for years to test numpy using: import numpy; numpy.test() Therefore my second choice is to only implement that top-level function (using deferred imports) and to get rid of all of the other test() and bench() functions. This patch is actually my third choice - full compatibility - but it's easier to trim code from a patch than it is to add code to a patch, so I submitted it that way. Andrew dalke@dalkescientific.com
2008/8/15 Travis E. Oliphant <oliphant@enthought.com>:
So, what is the attitude of people here? Here's my take:
I wonder if we are going about this process the right way. We will forever be adjusting imports to improve load times. Why don't we provide an alternate API, something like numpy.api from which you can import exactly what you want? We can disable populating the full numpy namespace by exporting an environment variable or setting some flag. This way, importing numpy is instantaneous, whilst our interactive users still get the full benefit of having everything available? Stéfan
On Fri, Aug 15, 2008 at 02:30, Stéfan van der Walt <stefan@sun.ac.za> wrote:
2008/8/15 Travis E. Oliphant <oliphant@enthought.com>:
So, what is the attitude of people here? Here's my take:
I wonder if we are going about this process the right way. We will forever be adjusting imports to improve load times. Why don't we provide an alternate API, something like numpy.api from which you can import exactly what you want? We can disable populating the full numpy namespace by exporting an environment variable or setting some flag. This way, importing numpy is instantaneous, whilst our interactive users still get the full benefit of having everything available?
The devil is in the details. What exactly do you propose? When we discussed this last time, the participants more or less agreed that environment variables could cause more fragility than they're worth. It also breaks the first time you try to import a numpy-using library that was not written with this in mind. Basically, you're stuck with only code that you've written. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco
2008/8/15 Robert Kern <robert.kern@gmail.com>:
The devil is in the details. What exactly do you propose? When we discussed this last time, the participants more or less agreed that environment variables could cause more fragility than they're worth. It also breaks the first time you try to import a numpy-using library that was not written with this in mind. Basically, you're stuck with only code that you've written.
First, I propose that I write some code. Second, I do not suggest the behaviour above, but: 1) Expose a new interface to numpy, called numpy.api 2) If a certain environment variable is set, the numpy namespace is not populated, and numpy.api becomes instantaneous to load. Even if the user forgets to set the variable, everything works as planned. If the user is aware of the variable, he won't be using numpy the normal way, so the fact that numpy.* is not available won't matter. Cheers Stéfan
On Fri, Aug 15, 2008 at 02:59, Stéfan van der Walt <stefan@sun.ac.za> wrote:
2008/8/15 Robert Kern <robert.kern@gmail.com>:
The devil is in the details. What exactly do you propose? When we discussed this last time, the participants more or less agreed that environment variables could cause more fragility than they're worth. It also breaks the first time you try to import a numpy-using library that was not written with this in mind. Basically, you're stuck with only code that you've written.
First, I propose that I write some code. Second, I do not suggest the behaviour above, but:
1) Expose a new interface to numpy, called numpy.api 2) If a certain environment variable is set, the numpy namespace is not populated, and numpy.api becomes instantaneous to load.
Even if the user forgets to set the variable, everything works as planned. If the user is aware of the variable, he won't be using numpy the normal way, so the fact that numpy.* is not available won't matter.
I'm afraid that I still don't understand. Please expand on the following four cases (let's call the environment variable NUMPY_FAST_IMPORT): 1) NUMPY_FAST_IMPORT=0 (or simply absent) import numpy print dir(numpy) 2) NUMPY_FAST_IMPORT=0 import numpy.api print dir(numpy.api) 3) NUMPY_FAST_IMPORT=1 import numpy print dir(numpy) 4) NUMPY_FAST_IMPORT=1 import numpy.api print dir(numpy.api) -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco
2008/8/15 Robert Kern <robert.kern@gmail.com>:
I'm afraid that I still don't understand. Please expand on the
Sorry, it's late. My explanation is probably not too lucid. The variable should rather read something like NUMPY_VIA_API, but here goes.
1) NUMPY_FAST_IMPORT=0 (or simply absent) import numpy print dir(numpy)
Full numpy import, exactly as it is now.
2) NUMPY_FAST_IMPORT=0 import numpy.api print dir(numpy.api)
Numpy.*, exactly as it is now. numpy.api provides a more nested API to NumPy. Import time is the same as current NumPy import.
3) NUMPY_FAST_IMPORT=1 import numpy print dir(numpy)
4) NUMPY_FAST_IMPORT=1 import numpy.api print dir(numpy.api)
numpy.* is now probably close to empty. numpy.api is accessible as before. Import time for numpy.api is now super snappy since numpy.* is not populated. If this is not clear, then I need to sleep and implement a proof of concept before I try to explain further. Cheers Stéfan
On Fri, Aug 15, 2008 at 02:59:43AM -0500, Stéfan van der Walt wrote:
2008/8/15 Robert Kern <robert.kern@gmail.com>:
The devil is in the details. What exactly do you propose? When we discussed this last time, the participants more or less agreed that environment variables could cause more fragility than they're worth. It also breaks the first time you try to import a numpy-using library that was not written with this in mind. Basically, you're stuck with only code that you've written.
First, I propose that I write some code. Second, I do not suggest the behaviour above, but:
1) Expose a new interface to numpy, called numpy.api 2) If a certain environment variable is set, the numpy namespace is not populated, and numpy.api becomes instantaneous to load.
That doesn't work because of a "feature" in Python's import: when loading foo.bar, Python loads foo.__init__ first. This is why we have "api" modules all over ETS. Gaël
Fri, 15 Aug 2008 15:30:20 +0200, Gael Varoquaux wrote:
On Fri, Aug 15, 2008 at 02:59:43AM -0500, Stéfan van der Walt wrote: [clip]
1) Expose a new interface to numpy, called numpy.api 2) If a certain environment variable is set, the numpy namespace is not populated, and numpy.api becomes instantaneous to load.
That doesn't work because of a "feature" in Python's import: when loading foo.bar, Python loads foo.__init__ first. This is why we have "api" modules all over ETS.
I think you can still do something evil, like this: import os if os.environ.get('NUMPY_VIA_API', '0') != '0': from numpy.lib.fromnumeric import * ... But I'm not sure how many milliseconds must be gained to justify this... -- Pauli Virtanen
On Aug 15, 2008, at 4:38 PM, Pauli Virtanen wrote:
I think you can still do something evil, like this:
import os if os.environ.get('NUMPY_VIA_API', '0') != '0': from numpy.lib.fromnumeric import * ...
But I'm not sure how many milliseconds must be gained to justify this...
I don't think it's enough. I don't like environmental variable tricks like that. My tests suggest: current SVN: 0.12 seconds my patch: 0.10 seconds removing some top-level imports: 0.09 seconds my patch and removing some additional top-level imports: 0.08 seconds (this is a guess) First, I reverted my patch, so my import times went from 0.10 second to 0.12 seconds. Second, I commented out the pure module imports from numpy/__init__.py import linalg import fft import random import ctypeslib import ma import doc The import time went to 0.089. Note that my patch also gets rid of "import doc" and "import ctypeslib", which take up a good chunk of time. The fft, linalg, and random libraries take 0.002 seconds each, and ma takes 0.007. Not doing these imports makes code about 0.01 second faster than my patches, which shaved off 0.02 seconds. That 0.01 second comes from not importing the fft, linalg, and ma modules. My patch does improve things in a few other places, so perhaps those other places adds another 0.01 seconds of performance. Why can't things be better? Take a look at the slowest imports. (Note, times are inclusive of the children) == Slowest (including children) == 0.089 numpy (None) 0.085 add_newdocs (numpy) 0.079 lib (add_newdocs) 0.041 type_check (lib) 0.040 numpy.core.numeric (type_check) 0.015 _internal (numpy.core.numeric) 0.014 numpy.testing (lib) 0.014 re (_internal) 0.010 unittest (numpy.testing) 0.010 numeric (numpy.core.numeric) 0.009 io (lib) Most of the time is spent importing 'lib'. Can that be made quicker? Not easily. "lib" is first imported in "add_newdocs". Personally, I want to get rid of add_newdocs and move the docstrings into the correct locations. Stubbing the function out by adding def add_newdoc(*args): pass to the tops of add_newdocs.py saves 0.005 seconds, but if you try it out and remove the "import lib" from add_newdocs.py then you'll have to fix a cyclical dependency. numpy/__init__.py: import core numpy/core/__init__.py: from defmatrix import * numpy/core/defmatrix.py: from numpy.lib.utils import issubdtype numpy/lib/__init__.py: from type_check import * numpy/lib/type_check.py: import numpy.core.numeric as _nx AttributeError: 'module' object has no attribute 'core' The only way out of the loop is to have numpy/__init__.py import lib before importing core. It's possible to clean up the code so this loop doesn't exist, and fix things so that fewer things are imported when some environment variable is set, but it doesn't look easy. Modules depend on other modules a bit too much to make me happy. Andrew dalke@dalkescientific.com
On Fri, Aug 15, 2008 at 10:41 AM, Andrew Dalke <dalke@dalkescientific.com>wrote:
On Aug 15, 2008, at 4:38 PM, Pauli Virtanen wrote:
I think you can still do something evil, like this:
import os if os.environ.get('NUMPY_VIA_API', '0') != '0': from numpy.lib.fromnumeric import * ...
But I'm not sure how many milliseconds must be gained to justify this...
I don't think it's enough. I don't like environmental variable tricks like that. My tests suggest: current SVN: 0.12 seconds my patch: 0.10 seconds removing some top-level imports: 0.09 seconds my patch and removing some additional top-level imports: 0.08 seconds (this is a guess)
First, I reverted my patch, so my import times went from 0.10 second to 0.12 seconds.
Second, I commented out the pure module imports from numpy/__init__.py
import linalg import fft import random import ctypeslib import ma import doc
The import time went to 0.089. Note that my patch also gets rid of "import doc" and "import ctypeslib", which take up a good chunk of time. The fft, linalg, and random libraries take 0.002 seconds each, and ma takes 0.007.
Not doing these imports makes code about 0.01 second faster than my patches, which shaved off 0.02 seconds. That 0.01 second comes from not importing the fft, linalg, and ma modules.
My patch does improve things in a few other places, so perhaps those other places adds another 0.01 seconds of performance.
Why can't things be better? Take a look at the slowest imports. (Note, times are inclusive of the children)
== Slowest (including children) == 0.089 numpy (None) 0.085 add_newdocs (numpy) 0.079 lib (add_newdocs) 0.041 type_check (lib) 0.040 numpy.core.numeric (type_check) 0.015 _internal (numpy.core.numeric) 0.014 numpy.testing (lib) 0.014 re (_internal) 0.010 unittest (numpy.testing) 0.010 numeric (numpy.core.numeric) 0.009 io (lib)
Most of the time is spent importing 'lib'.
Can that be made quicker? Not easily. "lib" is first imported in "add_newdocs". Personally, I want to get rid of add_newdocs and move the docstrings into the correct locations.
And those would be? I hope you aren't thinking of moving them into the C code. Chuck
2008/8/15 Andrew Dalke <dalke@dalkescientific.com>:
I don't think it's enough. I don't like environmental variable tricks like that. My tests suggest: current SVN: 0.12 seconds my patch: 0.10 seconds removing some top-level imports: 0.09 seconds my patch and removing some additional top-level imports: 0.08 seconds (this is a guess)
There are two different concerns being addressed here: ease of use and load time. I am not sure the two can be optimised simultaneously. On the other hand, if we had two public APIs for importing (similar to matplotlib's pylab vs. pyplot), we could satisfy both parties, without placing too much of a burden upon developers. Stéfan
On Fri, Aug 15, 2008 at 11:41 AM, Andrew Dalke <dalke@dalkescientific.com> wrote:
It's possible to clean up the code so this loop doesn't exist, and fix things so that fewer things are imported when some environment variable is set, but it doesn't look easy. Modules depend on other modules a bit too much to make me happy.
Yes. numpy.core should not depend on anything else. That would be the easy thing to do: there is only one function used IIRC from numpy.lib. As you said, the hairy stuff (from import dependency POV) is in numpy.lib. cheers, David
Andrew Dalke wrote:
Can that be made quicker? Not easily. "lib" is first imported in "add_newdocs". Personally, I want to get rid of add_newdocs and move the docstrings into the correct locations.
Where would that be, in the C-code? The reason for add_newdocs is to avoid writing docstrings in C-code which is a pain.
It's possible to clean up the code so this loop doesn't exist, and fix things so that fewer things are imported when some environment variable is set, but it doesn't look easy. Modules depend on other modules a bit too much to make me happy.
I've removed this loop. Are there other places in numpy.core that depend on numpy.lib? Thanks for the very helpful analysis. -Travis
On Aug 15, 2008, at 11:18 PM, Travis E. Oliphant wrote:
I've removed this loop. Are there other places in numpy.core that depend on numpy.lib?
That fixed the loop I identified. I removed the "import lib" in add_newdocs.py and things imported fine. I then commented out the following lines #import lib #from lib import * in numpy/__init__.py . This identified a loop in fft. [josiah:~/src] dalke% python time_import.py Traceback (most recent call last): File "time_import.py", line 31, in <module> import numpy File "/Library/Frameworks/Python.framework/Versions/2.5/lib/ python2.5/site-packages/numpy/__init__.py", line 146, in <module> import fft File "/Library/Frameworks/Python.framework/Versions/2.5/lib/ python2.5/site-packages/numpy/fft/__init__.py", line 38, in <module> from fftpack import * File "/Library/Frameworks/Python.framework/Versions/2.5/lib/ python2.5/site-packages/numpy/fft/fftpack.py", line 541, in <module> from numpy import deprecate ImportError: cannot import name deprecate Removing the "import fft" gives another loop for deprecate: import numpy File "/Library/Frameworks/Python.framework/Versions/2.5/lib/ python2.5/site-packages/numpy/__init__.py", line 148, in <module> import ctypeslib File "/Library/Frameworks/Python.framework/Versions/2.5/lib/ python2.5/site-packages/numpy/ctypeslib.py", line 56, in <module> from numpy import integer, ndarray, dtype as _dtype, deprecate, array ImportError: cannot import name deprecate Removing the "import ctypeslib" gives the following loop: Traceback (most recent call last): File "time_import.py", line 31, in <module> import numpy File "/Library/Frameworks/Python.framework/Versions/2.5/lib/ python2.5/site-packages/numpy/__init__.py", line 149, in <module> import ma File "/Library/Frameworks/Python.framework/Versions/2.5/lib/ python2.5/site-packages/numpy/ma/__init__.py", line 44, in <module> import core File "/Library/Frameworks/Python.framework/Versions/2.5/lib/ python2.5/site-packages/numpy/ma/core.py", line 66, in <module> from numpy import ndarray, typecodes, amax, amin, iscomplexobj,\ ImportError: cannot import name iscomplexobj Removing the "import ma" and I ended up with no ImportErrors. The code still end up importing numpy.lib because of File "/Library/Frameworks/Python.framework/Versions/2.5/lib/ python2.5/site-packages/numpy/linalg/linalg.py", line 28, in <module> from numpy.lib import triu Take that out and "import numpy" does not imply "import nupy.lib"
Andrew dalke@dalkescientific.com
On Aug 15, 2008, at 11:18 PM, Travis E. Oliphant wrote:
Where would that be, in the C-code? The reason for add_newdocs is to avoid writing docstrings in C-code which is a pain.
That was my thought. I could see that the code might useful during module development, where you don't want text changes to incur a recompile hit. But come release time, if someone volunteers to migrate the docstrings to C, in order to get a small bit of performance increase, then I don't see why not. Andrew dalke@dalkescientific.com
On Aug 15, 2008, at 6:41 PM, Andrew Dalke wrote:
I don't think it's enough. I don't like environmental variable tricks like that. My tests suggest: current SVN: 0.12 seconds my patch: 0.10 seconds removing some top-level imports: 0.09 seconds my patch and removing some additional top-level imports: 0.08 seconds (this is a guess)
First, I reverted my patch, so my import times went from 0.10 second to 0.12 seconds.
Turns out I didn't revert everything. As of the SVN version from 10 minutes ago, "import numpy" on my machine takes 0.18 seconds, not 0.12 seconds. My patch should cut the import time by about 30-40% more from what it is. On some machines. Your milage may vary :) In my issue report I said the import time was 0.15 seconds. Adding up the times I saved doesn't match up with my final value. So take my numbers as guidelines. For those curious, top cumulative import times from SVN are: 0.184 numpy (None) 0.103 add_newdocs (numpy) 0.097 lib (add_newdocs) 0.049 type_check (lib) 0.048 numpy.core.numeric (type_check) 0.028 io (lib) 0.022 ctypeslib (numpy) 0.022 ctypes (ctypeslib) 0.021 random (numpy) 0.021 mtrand (random) 0.019 _import_tools (numpy) 0.019 glob (_import_tools) 0.018 _datasource (io) 0.016 fnmatch (glob) 0.015 numpy.testing (numpy.core.numeric) 0.014 re (fnmatch) Andrew dalke@dalkescientific.com
2008/8/15 Andrew Dalke <dalke@dalkescientific.com>:
On Aug 15, 2008, at 6:41 PM, Andrew Dalke wrote:
I don't think it's enough. I don't like environmental variable tricks like that. My tests suggest: current SVN: 0.12 seconds my patch: 0.10 seconds removing some top-level imports: 0.09 seconds my patch and removing some additional top-level imports: 0.08 seconds (this is a guess)
First, I reverted my patch, so my import times went from 0.10 second to 0.12 seconds.
Turns out I didn't revert everything.
As of the SVN version from 10 minutes ago, "import numpy" on my machine takes 0.18 seconds, not 0.12 seconds. My patch should cut the import time by about 30-40% more from what it is. On some machines. Your milage may vary :)
I realize this is already a very complicated issue, but it's worth pointing out that the times you measure are not necessarily the times users care about. These numbers are once everything is loaded into disk cache. They don't reflect, say, interactive startup time, or time it takes in a script that uses substantial disk access (i.e. which fills the cache with something else). I realize this is the only available basis for comparison, but do keep in mind that improvements of a few milliseconds here may make a much larger difference in practice - or a much smaller difference. Anne
I forgot to mention.. On Aug 15, 2008, at 9:00 AM, Travis E. Oliphant wrote:
1) Removing ctypeslib import
* Can break code if somebody has been doing import numpy and then using numpy.ctypeslib * I'm fine with people needing to import numpy.ctypeslib to use the capability as long as we clearly indicate these breakages.
You were the one who had numpy/__init__.py always import ctypeslib r3027 | oliphant | 2006-08-15 11:53:49 +0200 (Tue, 15 Aug 2006) | 1 line import ctypeslib on numpy load and change name from ctypes_load_library to load_library Was there a driving reason for that other than decreased user burden? There will be breakage in the wild. I found: http://mail.python.org/pipermail/python-list/2007-December/469132.html http://www.scipy.org/Cookbook/Ctypes and a Google Code search found a couple hits too: http://www.google.com/codesearch?q=numpy +ctypeslib&hl=en&btnG=Search+Code It doesn't looks like there will be a big impact. This is not a widely used package (in public code), and many examples seem to prefer this form: from numpy.ctypeslib import ndpointer, load_library Andrew dalke@dalkescientific.com
Andrew Dalke wrote:
I forgot to mention..
On Aug 15, 2008, at 9:00 AM, Travis E. Oliphant wrote:
1) Removing ctypeslib import
* Can break code if somebody has been doing import numpy and then using numpy.ctypeslib * I'm fine with people needing to import numpy.ctypeslib to use the capability as long as we clearly indicate these breakages.
You were the one who had numpy/__init__.py always import ctypeslib
r3027 | oliphant | 2006-08-15 11:53:49 +0200 (Tue, 15 Aug 2006) | 1 line
import ctypeslib on numpy load and change name from ctypes_load_library to load_library
Was there a driving reason for that other than decreased user burden?
Not that I can recall. -Travis
Andrew Dalke:
Any chance of someone reviewing my suggestions for making the import somewhat faster still?
Travis E. Oliphant:
In sum: I think 2, 3, 6, 7, 8, and 9 can be done immediately. 1) and 4) could be O.K. but 1) does break code and 4 I'm not sure about. 5 seems like it's too much code duplication for too little savings for my taste.
Since no one else has said yea or nay, and 2.1 release draws nigh[*], the simplest solution is to do 2, 3, 6, 7, 8, and 9. I showed that 1 will break existing code. As for #4 - as far as I can tell the code in 'doc' is recent, so no user code depends on it. Plus, the documentation that's there is effectively unusable, with files like: """ ====== Jargon ====== Placeholder for computer science, engineering and other jargon. """ so I still want to remove the "import doc" in numpy/__init__.py . As for #5, that should probably be tied to the nosetests migration, so it will be done "soon"ish, but not for this release. Is there a lack of disagreement on this? Should I construct a patch accordingly? Or wait longer? Andrew dalke@dalkescientific.com [*] nigh: a word I don't use often except after "draw". http://en.wiktionary.org/wiki/nigh Interesting. English once used "nigh, near, next" instead of "near", "nearer", "nearest".
Andrew Dalke wrote:
Andrew Dalke:
Any chance of someone reviewing my suggestions for making the import somewhat faster still?
Travis E. Oliphant:
In sum: I think 2, 3, 6, 7, 8, and 9 can be done immediately. 1) and 4) could be O.K. but 1) does break code and 4 I'm not sure about. 5 seems like it's too much code duplication for too little savings for my taste.
Since no one else has said yea or nay, and 2.1 release draws nigh[*], the simplest solution is to do 2, 3, 6, 7, 8, and 9. I showed that 1 will break existing code. As for #4 - as far as I can tell the code in 'doc' is recent, so no user code depends on it. Plus, the documentation that's there is effectively unusable, with files like:
I say go ahead including changing #1 and #4. Let's leave 5 for the moment. -Travis
On Mon, Aug 18, 2008 at 15:04, Travis E. Oliphant <oliphant@enthought.com> wrote:
I say go ahead including changing #1 and #4. Let's leave 5 for the moment.
I think we can just delete all of the test() and bench() functions except for numpy.{bench,test}(). That way, there is no code duplication. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco
2008/8/18 Travis E. Oliphant <oliphant@enthought.com>:
I say go ahead including changing #1 and #4. Let's leave 5 for the moment.
I ran several benchmarks and made sure that these imports take a minimal amount of time. Wouldn't we want users to have access with the doc framework without doing anything special? And, yes, some of the documents are empty, but a number of them have already been written. I still think we are going about this the wrong way. We have two different sets of expectations, and we can't satisfy both by ripping everything apart. I'd much prefer two entry points into NumPy: one for people who need speed, and one for those who need the convenience of everything being at hand. Stéfan
Stéfan van der Walt wrote:
2008/8/18 Travis E. Oliphant <oliphant@enthought.com>:
I say go ahead including changing #1 and #4. Let's leave 5 for the moment.
I ran several benchmarks and made sure that these imports take a minimal amount of time. Wouldn't we want users to have access with the doc framework without doing anything special? And, yes, some of the documents are empty, but a number of them have already been written.
I still think we are going about this the wrong way. We have two different sets of expectations, and we can't satisfy both by ripping everything apart. I'd much prefer two entry points into NumPy: one for people who need speed, and one for those who need the convenience of everything being at hand.
I think you are right Stefan. It would be great to have another name-space that is lighter from which numpy imports. But there is no reason to hold up these useful speed increases waiting for a better solution. -Travis
2008/8/18 Travis E. Oliphant <oliphant@enthought.com>:
I still think we are going about this the wrong way. We have two different sets of expectations, and we can't satisfy both by ripping everything apart. I'd much prefer two entry points into NumPy: one for people who need speed, and one for those who need the convenience of everything being at hand.
I think you are right Stefan. It would be great to have another name-space that is lighter from which numpy imports. But there is no reason to hold up these useful speed increases waiting for a better solution.
Just to be clear: I am very happy for the speed improvements. I'm just urging for the same caution in changing the user-visible API that has been shown for the C-level API. While the C-level changes require a recompile, the user-visible changes require a rewrite. This does not pertain to numpy.doc, which has never been exposed to the world before. There is a bigger issue that needs to be considered here, though, and that is whether NumPy will move from its historic idiom of exposing everything to the user upon import, or whether we'll require more imports to be made manually. Regards Stéfan
On Aug 19, 2008, at 1:48 AM, Stéfan van der Walt wrote:
Wouldn't we want users to have access with the doc framework without doing anything special? And, yes, some of the documents are empty, but a number of them have already been written.
How do users know that those are present? How do users view those docs? You're the one who added that directory, yes?, so you've probably got the most experience with it. I couldn't figure out it, and the README in the doc/ directory wasn't helpful. [josiah:numpy/numpy/doc] dalke% svn log __init__.py ------------------------------------------------------------------------ r5371 | stefan | 2008-07-09 10:13:18 +0200 (Wed, 09 Jul 2008) | 2 lines Add numpy.doc topical documentation framework. The files with 1K bytes or less are undocumented -rw-r--r-- 1 dalke staff 307 Aug 3 00:59 __init__.py -rw-r--r-- 1 dalke staff 5203 Aug 15 17:44 basics.py -rw-r--r-- 1 dalke staff 5413 Aug 15 17:44 broadcasting.py -rw-r--r-- 1 dalke staff 5078 Aug 15 17:44 creation.py -rw-r--r-- 1 dalke staff 9854 Aug 15 17:44 glossary.py -rw-r--r-- 1 dalke staff 94 Aug 3 00:59 howtofind.py -rw-r--r-- 1 dalke staff 14286 Aug 15 17:44 indexing.py -rw-r--r-- 1 dalke staff 9608 Aug 15 17:44 internals.py -rw-r--r-- 1 dalke staff 82 Aug 3 00:59 io.py -rw-r--r-- 1 dalke staff 96 Aug 3 00:59 jargon.py -rw-r--r-- 1 dalke staff 130 Aug 3 00:59 methods_vs_functions.py -rw-r--r-- 1 dalke staff 81 Aug 3 00:59 misc.py -rw-r--r-- 1 dalke staff 100 Aug 3 00:59 performance.py -rw-r--r-- 1 dalke staff 7256 Aug 15 17:44 structured_arrays.py -rw-r--r-- 1 dalke staff 5520 Aug 15 17:44 ufuncs.py 8 documentation files, 6 placeholder files. I agree, the load time is very small. But with all my patches in place the import time goes down from about 0.18 second to about 0.10 seconds. Times add up.
I still think we are going about this the wrong way. We have two different sets of expectations, and we can't satisfy both by ripping everything apart. I'd much prefer two entry points into NumPy: one for people who need speed, and one for those who need the convenience of everything being at hand.
I thought I was very careful to not rip things apart. :( Everything I did was API compatible except for the proposed removals of numpy.ctypeslib and numpy.doc. I chose ctypeslib because importing ctypes takes 10% of the total load time on my box. I chose numpy.doc because I couldn't figure out how it's used. Now with Robert's go-ahead I'll also remove the "test" and "bench" entry points from everywhere except numpy.test and numpy.bench, so I will break some more compatibility. But not that "bench" doesn't currently work. I agree about two entry points but that's not going to happen by the next release. Actually, here's my quote from elsewhere in this discussion. I happen to think it's a mistake and there are other ways to have addressed the underlying requirement, but I know that's not going to change. (For example, follow matplotlib approach where there's a special library designed to be imported in interactive use. But I am *not* proposing this change.) I stressed the *not* because so far I've gone through: import Numeric import numarray import numpy and there's probably also a "from matplotlib import numerix" somewhere in there. It seems like every time I use num* (which isn't often) I need to learn a new library. I don't want to switch again for a few years. Andrew dalke@dalkescientific.com
2008/8/18 Andrew Dalke <dalke@dalkescientific.com>:
How do users know that those are present? How do users view those docs? You're the one who added that directory, yes?, so you've probably got the most experience with it. I couldn't figure out it, and the README in the doc/ directory wasn't helpful.
The numpy/doc directory existed before I implemented this, which may explain some "odd" design decisions. Usage is meant to happen via "help" or IPython's "?": In [2]: np.doc? Type: module Base Class: <type 'module'> String Form: <module 'numpy.doc' from '/Users/stefan/lib/python2.5/site-packages/numpy/doc/__init__.pyc'> Namespace: Interactive File: /Users/stefan/lib/python2.5/site-packages/numpy/doc/__init__.py Docstring: The following topics are available: - basics - broadcasting - creation - glossary - howtofind - indexing - internals - io - jargon - methods_vs_functions - misc - performance - structured_arrays - ufuncs In [3]: np.doc.broadcasting? Type: module Base Class: <type 'module'> String Form: <module 'numpy.doc.reference.broadcasting' from '/Users/stefan/lib/python2.5/site-packages/numpy/doc/reference/broadcasting.pyc'> Namespace: Interactive File: /Users/stefan/lib/python2.5/site-packages/numpy/doc/reference/broadcasting.py Docstring: ======================== Broadcasting over arrays ======================== [...]
I agree, the load time is very small. But with all my patches in place the import time goes down from about 0.18 second to about 0.10 seconds. Times add up.
Here are some of the timings I did, for interest's sake: For each trial, I included N copies of the NumPy documentation guide as topics under "numpy.do", and took the best of 3 trials. The topic number is currently 14. Without numpy.doc: real 0m0.259s user 0m0.082s sys 0m0.169s -------------------- 200 files real 0m0.341s user 0m0.095s sys 0m0.232s --------------------- 100 real 0m0.282s user 0m0.087s sys 0m0.190s --------------------- 50 real 0m0.273s user 0m0.085s sys 0m0.179s stefan@appel:/tmp$ time python -c 'import numpy' --------------------- 20 real 0m0.262s user 0m0.083s sys 0m0.173s ------------------------
I still think we are going about this the wrong way. We have two different sets of expectations, and we can't satisfy both by ripping everything apart. I'd much prefer two entry points into NumPy: one for people who need speed, and one for those who need the convenience of everything being at hand.
I thought I was very careful to not rip things apart. :(
Everything I did was API compatible except for the proposed removals of numpy.ctypeslib and numpy.doc. I chose ctypeslib because importing ctypes takes 10% of the total load time on my box. I chose numpy.doc because I couldn't figure out how it's used.
Sorry, I did not mean to make you sound like a back-yard surgeon! Maybe hyperbole is best avoided. I am quite happy with the non-API changing modifications you propose, and probably with the others too: I just want us to get our heads together and decide on a policy before we proceed (see my reply to Travis).
It seems like every time I use num* (which isn't often) I need to learn a new library. I don't want to switch again for a few years.
Sure, we all need to get work done. But in "we" I include those who already wrote apps using numpy.ctypeslib. Cheers Stéfan
On Thu, Aug 14, 2008 at 3:58 PM, Alan G Isaac <aisaac@american.edu> wrote:
Two odd failures in test_print.py. Platform: Win XP SP3 on Intel T2600. Alan Isaac
I got the fixes to make numpy buildable again with VS 2003, and the errors are mingw specific. Either a compiler bug or more likely a configuration problem (mingw and vs not using the same codepath somewhere). At least, now, I can compare the two and it should not take too much time to sorting that out. cheers, David
Jarrod Millman wrote:
Hey,
NumPy 1.2.0b2 is now available. Please test this so that we can uncover any problems ASAP.
Windows binary: http://www.enthought.com/~gvaroquaux/numpy-1.2.0b2-win32.zip
As well as the ones from Alan, if you add the "-O" for optimise flag to your python, there is still the interpreter crash as well as seeing some extra failures. -Jon ---- c:\python25\python -O -c "import numpy; numpy.test()" Running unit tests for numpy NumPy version 1.2.0b2 NumPy is installed in c:\python25\lib\site-packages\numpy Python version 2.5.1 (r251:54863, Apr 18 2007, 08:51:08) [MSC v.1310 32 bit (Intel)] nose version 0.10.3 ........................................................................................................................ ........................................................................................................................ ........................................F............................................................................... ...........................................................................S............................................ ....................................Ignoring "Python was built with Visual Studio 2003; extensions must be built with a compiler than can generate compatible binaries. Visual Studio 2003 was not found on this system. If you have Cygwin installed, you can try compiling with MingW32, by passing "-c mingw32" to setup.py." (one should fix me in fcompiler/compaq.py) ........................................................................................................................ ........................................................................................................................ ........................................................................................................................ ........................................................................................................................ ........................................................................................................................ ........................................................................................................................ ........................................................................................................................ ........................................................................................................................ .....................................................................F.F.F.F.FFFFF......... ====================================================================== FAIL: Convolve should raise an error for empty input array. ---------------------------------------------------------------------- Traceback (most recent call last): File "C:\Python25\Lib\site-packages\numpy\core\tests\test_regression.py", line 626, in test_convolve_empty self.failUnlessRaises(AssertionError,np.convolve,[],[1]) AssertionError: AssertionError not raised ====================================================================== FAIL: Test two arrays with different shapes are found not equal. ---------------------------------------------------------------------- Traceback (most recent call last): File "C:\Python25\Lib\site-packages\numpy\testing\tests\test_utils.py", line 46, in test_array_diffshape self._test_not_equal(a, b) File "C:\Python25\Lib\site-packages\numpy\testing\tests\test_utils.py", line 18, in _test_not_equal raise AssertionError("a and b are found equal but are not") AssertionError: a and b are found equal but are not ====================================================================== FAIL: Test two different array of rank 1 are found not equal. ---------------------------------------------------------------------- Traceback (most recent call last): File "C:\Python25\Lib\site-packages\numpy\testing\tests\test_utils.py", line 32, in test_array_rank1_noteq self._test_not_equal(a, b) File "C:\Python25\Lib\site-packages\numpy\testing\tests\test_utils.py", line 18, in _test_not_equal raise AssertionError("a and b are found equal but are not") AssertionError: a and b are found equal but are not ====================================================================== FAIL: Test two arrays with different shapes are found not equal. ---------------------------------------------------------------------- Traceback (most recent call last): File "C:\Python25\Lib\site-packages\numpy\testing\tests\test_utils.py", line 46, in test_array_diffshape self._test_not_equal(a, b) File "C:\Python25\Lib\site-packages\numpy\testing\tests\test_utils.py", line 18, in _test_not_equal raise AssertionError("a and b are found equal but are not") AssertionError: a and b are found equal but are not ====================================================================== FAIL: Test two different array of rank 1 are found not equal. ---------------------------------------------------------------------- Traceback (most recent call last): File "C:\Python25\Lib\site-packages\numpy\testing\tests\test_utils.py", line 32, in test_array_rank1_noteq self._test_not_equal(a, b) File "C:\Python25\Lib\site-packages\numpy\testing\tests\test_utils.py", line 18, in _test_not_equal raise AssertionError("a and b are found equal but are not") AssertionError: a and b are found equal but are not ====================================================================== FAIL: Test rank 1 array for all dtypes. ---------------------------------------------------------------------- Traceback (most recent call last): File "C:\Python25\Lib\site-packages\numpy\testing\tests\test_utils.py", line 65, in test_generic_rank1 foo(t) File "C:\Python25\Lib\site-packages\numpy\testing\tests\test_utils.py", line 61, in foo self._test_not_equal(c, b) File "C:\Python25\Lib\site-packages\numpy\testing\tests\test_utils.py", line 18, in _test_not_equal raise AssertionError("a and b are found equal but are not") AssertionError: a and b are found equal but are not ====================================================================== FAIL: Test rank 3 array for all dtypes. ---------------------------------------------------------------------- Traceback (most recent call last): File "C:\Python25\Lib\site-packages\numpy\testing\tests\test_utils.py", line 84, in test_generic_rank3 foo(t) File "C:\Python25\Lib\site-packages\numpy\testing\tests\test_utils.py", line 80, in foo self._test_not_equal(c, b) File "C:\Python25\Lib\site-packages\numpy\testing\tests\test_utils.py", line 18, in _test_not_equal raise AssertionError("a and b are found equal but are not") AssertionError: a and b are found equal but are not ====================================================================== FAIL: Test arrays with nan values in them. ---------------------------------------------------------------------- Traceback (most recent call last): File "C:\Python25\Lib\site-packages\numpy\testing\tests\test_utils.py", line 98, in test_nan_array self._test_not_equal(c, b) File "C:\Python25\Lib\site-packages\numpy\testing\tests\test_utils.py", line 18, in _test_not_equal raise AssertionError("a and b are found equal but are not") AssertionError: a and b are found equal but are not ====================================================================== FAIL: Test record arrays. ---------------------------------------------------------------------- Traceback (most recent call last): File "C:\Python25\Lib\site-packages\numpy\testing\tests\test_utils.py", line 124, in test_recarrays self._test_not_equal(c, b) File "C:\Python25\Lib\site-packages\numpy\testing\tests\test_utils.py", line 18, in _test_not_equal raise AssertionError("a and b are found equal but are not") AssertionError: a and b are found equal but are not ====================================================================== FAIL: Test two arrays with different shapes are found not equal. ---------------------------------------------------------------------- Traceback (most recent call last): File "C:\Python25\Lib\site-packages\numpy\testing\tests\test_utils.py", line 109, in test_string_arrays self._test_not_equal(c, b) File "C:\Python25\Lib\site-packages\numpy\testing\tests\test_utils.py", line 18, in _test_not_equal raise AssertionError("a and b are found equal but are not") AssertionError: a and b are found equal but are not ---------------------------------------------------------------------- Ran 1567 tests in 8.503s FAILED (SKIP=1, failures=10) C:\>
Jarrod Millman wrote:
Mac binary: https://cirl.berkeley.edu/numpy/numpy-1.2.0b2-py2.5-macosx10.5.dmg
is it really necessary to label these dmg's for 10.5 only? i assume more than myself run 10.4 but have python 2.5.X installed on their machine. will this dmg install on 10.4 if py2.5 is available? thanks Les
On Thu, Aug 14, 2008 at 6:45 PM, Les Schaffer <schaffer@optonline.net> wrote:
is it really necessary to label these dmg's for 10.5 only?
No. This is done automatically by the tool used to build the mpkg. I'll look at changing this to 10.4, thanks for the reminder.
will this dmg install on 10.4 if py2.5 is available?
It should. Let us know otherwise. -- Christopher Burns Computational Infrastructure for Research Labs 10 Giannini Hall, UC Berkeley phone: 510.643.4014 http://cirl.berkeley.edu/
is it really necessary to label these dmg's for 10.5 only? No. This is done automatically by the tool used to build the mpkg. I'll look at changing this to 10.4, thanks for the reminder.
If the dmg name is generated from the distribution name that the python distutils makes (e.g. macosx-10.5-i386-2.5), then the following may be of note: It appears that the MACOSX_DEPLOYMENT_TARGET environment variable controls (among other things) the distutils name. I generally set mine to 10.4, or even 10.3, depending on whether anything that I'm building requires later features (I'm pretty sure that numpy builds don't.) Zach On Aug 14, 2008, at 11:41 PM, Christopher Burns wrote:
On Thu, Aug 14, 2008 at 6:45 PM, Les Schaffer <schaffer@optonline.net> wrote:
is it really necessary to label these dmg's for 10.5 only?
No. This is done automatically by the tool used to build the mpkg. I'll look at changing this to 10.4, thanks for the reminder.
will this dmg install on 10.4 if py2.5 is available?
It should. Let us know otherwise.
-- Christopher Burns Computational Infrastructure for Research Labs 10 Giannini Hall, UC Berkeley phone: 510.643.4014 http://cirl.berkeley.edu/ _______________________________________________ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Jarrod Millman wrote:
Hey,
NumPy 1.2.0b2 is now available. Please test this so that we can uncover any problems ASAP.
Windows binary: http://www.enthought.com/~gvaroquaux/numpy-1.2.0b2-win32.zip
Hello Again, It seems the new release breaks matplotlib, for those pauvres who are using pre-compiled at least. If this means all C-modules compiled against numpy have to be recompiled, then this will make me very unhappy. -Jon ---- H:\>c:\python25\python -c "import matplotlib; print matplotlib.__version__; import matplotlib.pylab" 0.98.3 RuntimeError: module compiled against version 1000009 of C-API but this version of numpy is 100000a Traceback (most recent call last): File "<string>", line 1, in <module> File "C:\Python25\Lib\site-packages\matplotlib\pylab.py", line 206, in <module> from matplotlib import mpl # pulls in most modules File "C:\Python25\Lib\site-packages\matplotlib\mpl.py", line 1, in <module> from matplotlib import artist File "C:\Python25\Lib\site-packages\matplotlib\artist.py", line 4, in <module> from transforms import Bbox, IdentityTransform, TransformedBbox, TransformedPath File "C:\Python25\Lib\site-packages\matplotlib\transforms.py", line 34, in <module> from matplotlib._path import affine_transform ImportError: numpy.core.multiarray failed to import
On Fri, Aug 15, 2008 at 4:21 AM, Jon Wright <wright@esrf.fr> wrote:
It seems the new release breaks matplotlib, for those pauvres who are using pre-compiled at least. If this means all C-modules compiled against numpy have to be recompiled, then this will make me very unhappy.
Yes, the new release requires a recompile. -- Jarrod Millman Computational Infrastructure for Research Labs 10 Giannini Hall, UC Berkeley phone: 510.643.4014 http://cirl.berkeley.edu/
Jarrod Millman wrote:
NumPy 1.2.0b2 is now available. Please test this so that we can uncover any problems ASAP.
Mac binary: https://cirl.berkeley.edu/numpy/numpy-1.2.0b2-py2.5-macosx10.5.dmg
Ran 1715 tests in 12.671s OK (SKIP=1) OS-X 10.4.11 Dual G5 PPC Python version 2.5.2 (r252:60911, Feb 22 2008, 07:57:53) [GCC 4.0.1 It also seems to work so far with my code... -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov
participants (16)
-
Alan G Isaac
-
Andrew Dalke
-
Anne Archibald
-
Charles R Harris
-
Christopher Barker
-
Christopher Burns
-
David Cournapeau
-
Gael Varoquaux
-
Jarrod Millman
-
Jon Wright
-
Les Schaffer
-
Pauli Virtanen
-
Robert Kern
-
Stéfan van der Walt
-
Travis E. Oliphant
-
Zachary Pincus