I think it's time for the first beta-release of NumPy 1.0 I'd like to put it out within 2 weeks. Please make any comments or voice major concerns so that the 1.0 release series can be as stable as possible. -Travis
On Thu, 29 Jun 2006, Travis Oliphant apparently wrote:
Please make any comments or voice major concerns
A rather minor issue, but I would just like to make sure that a policy decision was made not to move to a float default for identity(), ones(), zeros(), and empty(). (I leave aside arange().) I see the argument for a change to be 3-fold: 1. It is easier to introduce people to numpy if default data types are all float. (I teach, and I want my students to use numpy.) 2. It is a better match to languages from which users are likely to migrate (e.g., GAUSS or Matlab). 3. In the uses I am most familiar with, float is the most frequently desired data type. (I guess this may be field specific, especially for empty().) Cheers, Alan Isaac
On 6/29/06, Alan G Isaac <aisaac@american.edu> wrote:
On Thu, 29 Jun 2006, Travis Oliphant apparently wrote:
Please make any comments or voice major concerns
A rather minor issue, but I would just like to make sure that a policy decision was made not to move to a float default for identity(), ones(), zeros(), and empty(). (I leave aside arange().)
I see the argument for a change to be 3-fold: 1. It is easier to introduce people to numpy if default data types are all float. (I teach, and I want my students to use numpy.) 2. It is a better match to languages from which users are likely to migrate (e.g., GAUSS or Matlab). 3. In the uses I am most familiar with, float is the most frequently desired data type. (I guess this may be field specific, especially for empty().)
I vote float.
On 6/30/06, Keith Goodman <kwgoodman@gmail.com> wrote:
On 6/29/06, Alan G Isaac <aisaac@american.edu> wrote:
A rather minor issue, but I would just like to make sure that a policy decision was made not to move to a float default for identity(), ones(), zeros(), and empty(). (I leave aside arange().)
I see the argument for a change to be 3-fold: 1. It is easier to introduce people to numpy if default data types are all float. (I teach, and I want my students to use numpy.) 2. It is a better match to languages from which users are likely to migrate (e.g., GAUSS or Matlab). 3. In the uses I am most familiar with, float is the most frequently desired data type. (I guess this may be field specific, especially for empty().)
I vote float.
+1 float Tim
Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 _______________________________________________ Numpy-discussion mailing list Numpy-discussion@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/numpy-discussion
I also find the int behavior of these functions strange. +1 float default (or double) --bb On 6/30/06, Tim Leslie <tim.leslie@gmail.com> wrote:
On 6/30/06, Keith Goodman <kwgoodman@gmail.com> wrote:
On 6/29/06, Alan G Isaac <aisaac@american.edu> wrote:
A rather minor issue, but I would just like to make sure that a policy decision was made not to move to a float default for identity(), ones(), zeros(), and empty(). (I leave aside arange().)
I see the argument for a change to be 3-fold: 1. It is easier to introduce people to numpy if default data types are all float. (I teach, and I want my students to use numpy.) 2. It is a better match to languages from which users are likely to migrate (e.g., GAUSS or Matlab). 3. In the uses I am most familiar with, float is the most frequently desired data type. (I guess this may be field specific, especially for empty().)
I vote float.
+1 float
Tim
Rand at least returns doubles:
num.rand(3,3).dtype.name 'float64'
--bb On 6/30/06, Keith Goodman <kwgoodman@gmail.com> wrote:
On 6/29/06, Bill Baxter <wbaxter@gmail.com> wrote:
I also find the int behavior of these functions strange.
+1 float default (or double)
Oh, wait. Which do I want, float or double? What does rand, eigh, lstsq, etc return?
I vote for no change. It will be a major backward compatibility headache with applications that rely on integer arrays breaking in mysterious ways. If float wins, I hope there will be a script to update old code. Detecting single argument calls to these functions is probably not very hard. On 6/29/06, Keith Goodman <kwgoodman@gmail.com> wrote:
On 6/29/06, Alan G Isaac <aisaac@american.edu> wrote:
On Thu, 29 Jun 2006, Travis Oliphant apparently wrote:
Please make any comments or voice major concerns
A rather minor issue, but I would just like to make sure that a policy decision was made not to move to a float default for identity(), ones(), zeros(), and empty(). (I leave aside arange().)
I see the argument for a change to be 3-fold: 1. It is easier to introduce people to numpy if default data types are all float. (I teach, and I want my students to use numpy.) 2. It is a better match to languages from which users are likely to migrate (e.g., GAUSS or Matlab). 3. In the uses I am most familiar with, float is the most frequently desired data type. (I guess this may be field specific, especially for empty().)
I vote float.
Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 _______________________________________________ Numpy-discussion mailing list Numpy-discussion@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/numpy-discussion
I guess this is a change which would just break too much code. And if the default type should by changed for these functions, why not also for array constructors? On the other hand, many people probably use Numpy almost exclusively with Float64's. A convenient way to change the default type could make their code easier to read. How much effort would it be to provide a convenience module that after importing replaces the relevant functions with wrappers that make Float64's the default? Regards, Stephan Alan G Isaac wrote:
On Thu, 29 Jun 2006, Travis Oliphant apparently wrote:
Please make any comments or voice major concerns
A rather minor issue, but I would just like to make sure that a policy decision was made not to move to a float default for identity(), ones(), zeros(), and empty(). (I leave aside arange().)
I see the argument for a change to be 3-fold: 1. It is easier to introduce people to numpy if default data types are all float. (I teach, and I want my students to use numpy.) 2. It is a better match to languages from which users are likely to migrate (e.g., GAUSS or Matlab). 3. In the uses I am most familiar with, float is the most frequently desired data type. (I guess this may be field specific, especially for empty().)
Cheers, Alan Isaac
Alan G Isaac wrote:
On Thu, 29 Jun 2006, Travis Oliphant apparently wrote:
Please make any comments or voice major concerns
A rather minor issue, but I would just like to make sure that a policy decision was made not to move to a float default for identity(), ones(), zeros(), and empty(). (I leave aside arange().)
This was a policy decision made many months ago after discussion on this list and would need over-whelming pressure to change.
I see the argument for a change to be 3-fold:
I am, however, sympathetic to the arguments for wanting floating-point defaults. I wanted to change this originally but was convinced to not make such a major change for back-ward compatibility (more on that later). Nonetheless, I would support the creation of a module called something like defaultfloat or some-other equally impressive name ;-) which contained floating-point defaults of these functions (with the same names). Feel free to contribute (or at least find a better name). Regarding the problem of backward compatibility: I am very enthused about the future of both NumPy and SciPy. There have been a large number of new-comers to the community who have contributed impressively and I see very impressive things going on. This is "a good thing" because these projects need many collaborators and contributors to be successful. However, I have not lost sight of the fact that we still have a major adoption campaign to win before declaring NumPy a success. There are a lot of people who still haven't come-over from Numeric and numarray. Consider these download numbers: Numeric-24.2 (released Nov. 11, 2005) 14275 py24.exe 2905 py23.exe 9144 tar.gz Numarray 1.5.1 (released Feb, 7, 2006) 10272 py24.exe 11883 py23.exe 12779 tar.gz NumPy 0.9.8 (May 17, 2006) 3713 py24.exe 558 py23.exe 4111 tar.gz While it is hard to read too much into numbers, this tells me that there are about 10,000 current users of Numeric/Numarray who have not even *tried* NumPy. In fact, Numarray downloads of 1.5.1 went up significantly from its earlier releases. Why is that? It could be that many of the downloads are "casual" users who need it for some other application (in which case they wouldn't feel inclined to try NumPy). On the other hand, it is also possible that many are still scared away by the pre-1.0 development-cycle --- it has been a bit bumpy for the stalwarts who've braved the rapids as NumPy has matured. Changes like the proposal to move common functions from default integer to default float are exactly the kind of thing that leads people to wait on getting NumPy. One thing I've learned about Open Source development is that it can be hard to figure out exactly what is bothering people and get good critical feedback: people are more likely to just walk away with their complaints than to try and verbalize and/or post them. So, looking at adoption patterns can be a reasonable way to pick up on attitudes. It would appear that there is still a remarkable number of people who are either waiting for NumPy 1.0 or waiting for something else. I'm not sure. I think we have to wait until 1.0 to find out. Therefore, bug-fixes and stabilizing the NumPy API is my #1 priority right now. The other day I read a post by Alex Martelli (an influential Googler) to the Python list where he was basically suggesting that people stick with Numeric until things "stabilize". I can hope he meant "until NumPy 1.0 comes out" but he didn't say that and maybe he meant "until the array in Python stabilizes." I hope he doesn't mean the rumors about an array object in Python itself. Let me be the first to assure everyone that rumors of a "capable" array object in Python have been greatly exaggerated. I would be thrilled if we could just get the "infra-structure" into Python so that different extension modules could at least agree on an array interface. That is a far cry from fulfilling the needs of any current Num user, however. I say all this only to point out why de-stabilizing changes are difficult to do at this point, and to encourage anyone with an interest to continue to promote NumPy. If you are at all grateful for its creation, then please try to encourage those whom you know to push for NumPy adoption (or at least a plan for its adoption) in the near future. Best regards, -Travis
Travis Oliphant wrote:
Nonetheless, I would support the creation of a module called something like defaultfloat or some-other equally impressive name ;-) which contained floating-point defaults of these functions (with the same names).
I'd also like to see a way to make the constructors create floating-point arrays by default.
Numeric-24.2 (released Nov. 11, 2005)
14275 py24.exe 2905 py23.exe 9144 tar.gz
Numarray 1.5.1 (released Feb, 7, 2006)
10272 py24.exe 11883 py23.exe 12779 tar.gz
NumPy 0.9.8 (May 17, 2006)
3713 py24.exe 558 py23.exe 4111 tar.gz
While it is hard to read too much into numbers, this tells me that there are about 10,000 current users of Numeric/Numarray who have not even *tried* NumPy. In fact, Numarray downloads of 1.5.1 went up significantly from its earlier releases. Why is that? It could be that many of the downloads are "casual" users who need it for some other application (in which case they wouldn't feel inclined to try NumPy).
On the other hand, it is also possible that many are still scared away by the pre-1.0 development-cycle --- it has been a bit bumpy for the stalwarts who've braved the rapids as NumPy has matured. Changes like the proposal to move common functions from default integer to default float are exactly the kind of thing that leads people to wait on getting NumPy.
(just as an aside, a further possibility is the relative availability of documentation for numpy and the other array packages. I entirely understand the reasoning behind the Guide to NumPy being a for-money offering but it does present a significant barrier to adoption, particularly in an environment where the alternatives all offer for-free documentation above and beyond what is available in the docstrings). -- "You see stars that clear have been dead for years But the idea just lives on..." -- Bright Eyes
Hi, You should be encouraged by the trend from Numeric to numarray because the tar users clearly are prepared to upgrade. In terms of the education program, the 1.0 release is the best starting point as there is a general phobia for pre-1.0 releases (and dot zero releases). Also, Python 2.5 is coming so it probably a good time to attempt to educate the exe users on numpy. One way is to provide numpy first (it may be a little too harsh to say only) so people see it when they upgrade. There are two key aspects that are probably very much related that needs to happen with the 1.0 release: 1) Identify those "[s]econdary dependency" projects as Louis states (BioPython also comes to mind) and get them to convert. 2) Get the major distros (e.g. openSUSE) to include numpy and not Numeric. In turn this should also make people who packages (like rpms) also use numpy. This may mean having to support both Numeric and numpy in the initial phase. Regards Bruce On 6/30/06, Travis Oliphant <oliphant.travis@ieee.org> wrote:
Alan G Isaac wrote:
On Thu, 29 Jun 2006, Travis Oliphant apparently wrote:
Please make any comments or voice major concerns
A rather minor issue, but I would just like to make sure that a policy decision was made not to move to a float default for identity(), ones(), zeros(), and empty(). (I leave aside arange().)
This was a policy decision made many months ago after discussion on this list and would need over-whelming pressure to change.
I see the argument for a change to be 3-fold:
I am, however, sympathetic to the arguments for wanting floating-point defaults. I wanted to change this originally but was convinced to not make such a major change for back-ward compatibility (more on that later).
Nonetheless, I would support the creation of a module called something like defaultfloat or some-other equally impressive name ;-) which contained floating-point defaults of these functions (with the same names).
Feel free to contribute (or at least find a better name).
Regarding the problem of backward compatibility:
I am very enthused about the future of both NumPy and SciPy. There have been a large number of new-comers to the community who have contributed impressively and I see very impressive things going on. This is "a good thing" because these projects need many collaborators and contributors to be successful.
However, I have not lost sight of the fact that we still have a major adoption campaign to win before declaring NumPy a success. There are a lot of people who still haven't come-over from Numeric and numarray. Consider these download numbers:
Numeric-24.2 (released Nov. 11, 2005)
14275 py24.exe 2905 py23.exe 9144 tar.gz
Numarray 1.5.1 (released Feb, 7, 2006)
10272 py24.exe 11883 py23.exe 12779 tar.gz
NumPy 0.9.8 (May 17, 2006)
3713 py24.exe 558 py23.exe 4111 tar.gz
While it is hard to read too much into numbers, this tells me that there are about 10,000 current users of Numeric/Numarray who have not even *tried* NumPy. In fact, Numarray downloads of 1.5.1 went up significantly from its earlier releases. Why is that? It could be that many of the downloads are "casual" users who need it for some other application (in which case they wouldn't feel inclined to try NumPy).
On the other hand, it is also possible that many are still scared away by the pre-1.0 development-cycle --- it has been a bit bumpy for the stalwarts who've braved the rapids as NumPy has matured. Changes like the proposal to move common functions from default integer to default float are exactly the kind of thing that leads people to wait on getting NumPy.
One thing I've learned about Open Source development is that it can be hard to figure out exactly what is bothering people and get good critical feedback: people are more likely to just walk away with their complaints than to try and verbalize and/or post them. So, looking at adoption patterns can be a reasonable way to pick up on attitudes.
It would appear that there is still a remarkable number of people who are either waiting for NumPy 1.0 or waiting for something else. I'm not sure. I think we have to wait until 1.0 to find out. Therefore, bug-fixes and stabilizing the NumPy API is my #1 priority right now.
The other day I read a post by Alex Martelli (an influential Googler) to the Python list where he was basically suggesting that people stick with Numeric until things "stabilize". I can hope he meant "until NumPy 1.0 comes out" but he didn't say that and maybe he meant "until the array in Python stabilizes."
I hope he doesn't mean the rumors about an array object in Python itself. Let me be the first to assure everyone that rumors of a "capable" array object in Python have been greatly exaggerated. I would be thrilled if we could just get the "infra-structure" into Python so that different extension modules could at least agree on an array interface. That is a far cry from fulfilling the needs of any current Num user, however.
I say all this only to point out why de-stabilizing changes are difficult to do at this point, and to encourage anyone with an interest to continue to promote NumPy. If you are at all grateful for its creation, then please try to encourage those whom you know to push for NumPy adoption (or at least a plan for its adoption) in the near future.
Best regards,
-Travis
Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 _______________________________________________ Numpy-discussion mailing list Numpy-discussion@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/numpy-discussion
On Fri, 30 Jun 2006 03:33:56 -0600 Travis Oliphant <oliphant.travis@ieee.org> wrote:
One thing I've learned about Open Source development is that it can be hard to figure out exactly what is bothering people and get good critical feedback: people are more likely to just walk away with their complaints than to try and verbalize and/or post them. So, looking at adoption patterns can be a reasonable way to pick up on attitudes.
General confusion in the community. The whole numeric->numarray->numpy story is a little strange for people to believe. Or at least the source for many jokes. Also, there is no mention of numpy on the numarray page. The whole thing smells a little fishy :) Mose of the (more casual) users of python for science that i talk to are quite confused about what is going on. It also "looks" like numpy is only a few months old. Personally, I am ready to evangelise numpy wherever i can. (eg. Europython in 4 days time:) ) Simon.
On 6/30/06, Simon Burton <simon@arrowtheory.com> wrote:
General confusion in the community. The whole numeric->numarray->numpy story is a little strange for people to believe. Or at least the source for many jokes. Also, there is no mention of numpy on the numarray page. The whole thing smells a little fishy :)
I can say that coming to numpy early this year I was confused by this, and in fact I began by using numarray because the documentation was available and clearly written. I now support Travis on his book, since none of this would be happening so rapidly without him, but as I was looking for relief from my IDL license woes this turned me off a bit.
From Googling, It just wasn't clear which was the future, especially since as I dug deeper I saw old references to numpy that were not referring to the current project. I do think that this is more clear now, but the pages
http://numeric.scipy.org/ -- Looks antiquated http://www.numpy.org/ -- is empty are not helping. numeric.scipy.org needs to be converted to the wiki look and feel of the rest of scipy.org, or at least made to look modern. numpy.org should point to the new page perhaps. And the numarray page should at least discuss the move to numpy and have links. Erin
On 6/30/06, Erin Sheldon <erin.sheldon@gmail.com> wrote:
http://www.numpy.org/ -- is empty I see this is now pointing to the sourceforge site. Must have been a glitch there earlier as it was returning an empty page.
On Friday 30 June 2006 16:29, Erin Sheldon wrote: [ES]: <snip> the pages [ES]: [ES]: http://numeric.scipy.org/ -- Looks antiquated [ES]: [ES]: are not helping. My opinion too. If that page is the first page you learn about NumPy, you won't have a good impression. Travis, would you accept concrete suggestions or 'help' to improve that page? Cheers, Joris Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm
Joris De Ridder wrote:
On Friday 30 June 2006 16:29, Erin Sheldon wrote: [ES]: <snip> the pages [ES]: [ES]: http://numeric.scipy.org/ -- Looks antiquated [ES]: [ES]: are not helping.
My opinion too. If that page is the first page you learn about NumPy, you won't have a good impression.
Travis, would you accept concrete suggestions or 'help' to improve that page?
Cheers, Joris
Speaking for the other Travis...I think he's open to suggestions (he hasn't yelled at me yet for suggesting the same sort of things). There was an earlier conversation on this list about the numpy page, in which we proposed redirecting all numeric/numpy links to numpy.scipy.org. I'll ask Jeff to do these redirects if: - everyone agrees that address is a good one - we have the content shaped up on that page. For now, I've copied the content with some basic cleanup (and adding a style sheet) here: http://numpy.scipy.org If anyone with a modicum of web design experience wants access to edit this site...please (please) speak up and it will be so. Other suggestions are welcome. Travis (Vaught)
On Fri, 30 Jun 2006, Travis Oliphant wrote:
I am, however, sympathetic to the arguments for wanting floating-point defaults. I wanted to change this originally but was convinced to not make such a major change for back-ward compatibility (more on that later).
Before 1.0, it seems right to go with the best design and take some short-run grief for it if necessary. If the right default is float, but extant code will be hurt, then let float be the default and put the legacy-code fix (function redefinition) in the compatability module. One view ... Alan Isaac
Before 1.0, it seems right to go with the best design and take some short-run grief for it if necessary.
If the right default is float, but extant code will be hurt, then let float be the default and put the legacy-code fix (function redefinition) in the compatability module
+1 on this very idea. (sorry for sending this directly to you @ first, Alan)
Travis Oliphant wrote:
I hope he doesn't mean the rumors about an array object in Python itself. Let me be the first to assure everyone that rumors of a "capable" array object in Python have been greatly exaggerated. I would be thrilled if we could just get the "infra-structure" into Python so that different extension modules could at least agree on an array interface. That is a far cry from fulfilling the needs of any current Num user, however.
Having {pointer + dimensions + strides + type} in the python core would be an incredible step forward - this is far more important than changing my python code to do functionally the same thing with numpy instead of Numeric. If the new array object supports most of the interface of the current "array" module then it is already very capable for many tasks. It would be great if it also works with Jython (etc). Bruce Southley wrote:
1) Identify those "[s]econdary dependency" projects as Louis states (BioPython also comes to mind) and get them to convert.
As author of a (fairly obscure) secondary dependency package it is not clear that this is right time to convert. I very much admire the matplotlib approach of using Numerix and see this as a better solution than switching (or indeed re-writing in java/c++ etc). However, looking into the matplotlib SVN I see: _image.cpp 2420 4 weeks cmoad applied Andrew Straw's numpy patch numerix/_sp_imports.py 2478 2 weeks teoliphant Make recent changes backward compatible with numpy 0.9.8 numerix/linearalgebra/__init__.py 2474 2 weeks teoliphant Fix import error for new numpy While I didn't look at either the code or the diff the comments clearly read as: "DON'T SWITCH YET". Get the basearray into the python core and for sure I will be using that, whatever it is called. I was tempted to switch to numarray in the past because of the nd_image, but I don't see that in numpy just yet? Seeing this on the mailing list:
So far the vote is 8 for float, 1 for int.
... is yet another hint that I can remain with Numeric as a library, at least until numpy has a frozen interface/behaviour. I am very supportive of the work going on but have some technical concerns about switching. To pick some examples, it appears that numpy.lib.function_base.median makes a copy, sorts and picks the middle element. Some reading at http://ndevilla.free.fr/median/median/index.html or even (eek!) numerical recipes indicates this is not good news. Not to single one routine out, I was also saddened to find both Numeric and numpy use double precision lapack routines for single precision arguments. A diff of numpy's linalg.py with Numeric's LinearAlgebra.py goes a long way to explaining why there is resistance to change from Numeric to numpy. The boilerplate changes and you only get "norm" (which I am suspicious about - vector 2 norms are in blas, some matrix 2 norms are in lapack/*lange.f and computing all singular values when you only want the biggest or smallest one is a surprising algorithmic choice). I realise it might sound like harsh criticism - but I don't see what numpy adds for number crunching over and above Numeric. Clearly there *is* a lot more in terms of python integration, but I really don't want to do number crunching with python itself ;-) For numpy to really be better than Numeric I would like to find algorithm selections according to the array dimensions and type. Getting the basearray type into the python core is the key - then it makes sense to get the best of breed algorithms working as you can rely on the basearray being around for many years to come. Please please please get basearray into the python core! How can we help with that? Jon
Jon, Thanks for the great feedback. You make some really good points.
Having {pointer + dimensions + strides + type} in the python core would be an incredible step forward - this is far more important than changing my python code to do functionally the same thing with numpy instead of Numeric.
Guido has always wanted consensus before putting things into Python. We need to rally behind NumPy if we are going to get something of it's infrastructure into Python itself.
As author of a (fairly obscure) secondary dependency package it is not clear that this is right time to convert. I very much admire the matplotlib approach of using Numerix and see this as a better solution than switching (or indeed re-writing in java/c++ etc).
I disagree with this approach. It's fine for testing and for transition, but it is a headache long term. You are basically supporting three packages. The community is not large enough to do that. I also think it leads people to consider adopting that approach instead of just switching. I'm not particularly thrilled with strategies that essentially promote the existence of three different packages.
However, looking into the matplotlib SVN I see:
_image.cpp 2420 4 weeks cmoad applied Andrew Straw's numpy patch numerix/_sp_imports.py 2478 2 weeks teoliphant Make recent changes backward compatible with numpy 0.9.8 numerix/linearalgebra/__init__.py 2474 2 weeks teoliphant Fix import error for new numpy
While I didn't look at either the code or the diff the comments clearly read as: "DON'T SWITCH YET".
I don't understand why you interpret it that way? When I moved old-style names to numpy.oldnumeric for SVN numpy, I needed to make sure that matplotlib still works with numpy 0.9.8 (which has the old-style names in the main location). Why does this say "DON'T SWITCH"? If anything it should tell you that we are conscious of trying to keep things working together and compatible with current releases of NumPy.
Get the basearray into the python core and for sure I will be using that, whatever it is called. I was tempted to switch to numarray in the past because of the nd_image, but I don't see that in numpy just yet?
It is in SciPy where it belongs (you can also install it as a separate package). It builds and runs on top of NumPy just fine. In fact it was the predecessor to the now fully-capable-but-in-need-of-more-testing numarray C-API that is now in NumPy.
I am very supportive of the work going on but have some technical concerns about switching. To pick some examples, it appears that numpy.lib.function_base.median makes a copy, sorts and picks the middle element.
I'm sure we need lots of improvements in the code-base. This has always been true. We rely on the ability of contributors which doesn't work well unless we have a lot of contributors which are hard to get unless we consolidate around a single array package. Please contribute a fix.
single one routine out, I was also saddened to find both Numeric and numpy use double precision lapack routines for single precision arguments.
The point of numpy.linalg is to provide the functionality of Numeric not extend it. This is because SciPy provides a much more capable linalg sub-package that works with single and double precision. It sounds like you want SciPy.
For numpy to really be better than Numeric I would like to find algorithm selections according to the array dimensions and type.
These are good suggestions but for SciPy. The linear algebra in NumPy is just for getting your feet wet and having access to basic functionality.
Getting the basearray type into the python core is the key - then it makes sense to get the best of breed algorithms working as you can rely on the basearray being around for many years to come.
Please please please get basearray into the python core! How can we help with that?
There is a PEP in SVN (see the array interface link at http://numeric.scipy.org) Karol Langner is a Google summer-of-code student working on it this summer. I'm not sure how far he'll get, but I'm hopeful. I could spend more time on it, if I had funding to do it, but right now I'm up against a wall. Again, thanks for the feedback. Best, -Travis
Get the basearray into the python core and for sure I will be using that, whatever it is called. I was tempted to switch to numarray in the past because of the nd_image, but I don't see that in numpy just yet?
It is in SciPy where it belongs (you can also install it as a separate package). It builds and runs on top of NumPy just fine. In fact it was the predecessor to the now fully-capable-but-in-need-of-more-testing numarray C-API that is now in NumPy.
Hi Travis, Where can one find and download nd_image separate from the rest of scipy? As for the the numarray C-API, we are currently doing testing here at STScI. Chris
+1 for some sort of float. I am a little confused as to why Float64 is a particularly good choice. Can someone explain in more detail? Presumably this is the most sensible ctype and translates to a python float well? In general though I agree that this is a now or never change. I suspect we will change a lot of matlab -> Numeric/numarray transitions into matlab -> numpy transitions with this change. I guess it will take a little longer for 1.0 to get out though :( Ah well. Cheers. Jon. On 6/30/06, Travis Oliphant <oliphant@ee.byu.edu> wrote:
Jon,
Thanks for the great feedback. You make some really good points.
Having {pointer + dimensions + strides + type} in the python core would be an incredible step forward - this is far more important than changing my python code to do functionally the same thing with numpy instead of Numeric.
Guido has always wanted consensus before putting things into Python. We need to rally behind NumPy if we are going to get something of it's infrastructure into Python itself.
As author of a (fairly obscure) secondary dependency package it is not clear that this is right time to convert. I very much admire the matplotlib approach of using Numerix and see this as a better solution than switching (or indeed re-writing in java/c++ etc).
I disagree with this approach. It's fine for testing and for transition, but it is a headache long term. You are basically supporting three packages. The community is not large enough to do that. I also think it leads people to consider adopting that approach instead of just switching. I'm not particularly thrilled with strategies that essentially promote the existence of three different packages.
However, looking into the matplotlib SVN I see:
_image.cpp 2420 4 weeks cmoad applied Andrew Straw's numpy patch numerix/_sp_imports.py 2478 2 weeks teoliphant Make recent changes backward compatible with numpy 0.9.8 numerix/linearalgebra/__init__.py 2474 2 weeks teoliphant Fix import error for new numpy
While I didn't look at either the code or the diff the comments clearly read as: "DON'T SWITCH YET".
I don't understand why you interpret it that way? When I moved old-style names to numpy.oldnumeric for SVN numpy, I needed to make sure that matplotlib still works with numpy 0.9.8 (which has the old-style names in the main location).
Why does this say "DON'T SWITCH"? If anything it should tell you that we are conscious of trying to keep things working together and compatible with current releases of NumPy.
Get the basearray into the python core and for sure I will be using that, whatever it is called. I was tempted to switch to numarray in the past because of the nd_image, but I don't see that in numpy just yet?
It is in SciPy where it belongs (you can also install it as a separate package). It builds and runs on top of NumPy just fine. In fact it was the predecessor to the now fully-capable-but-in-need-of-more-testing numarray C-API that is now in NumPy.
I am very supportive of the work going on but have some technical concerns about switching. To pick some examples, it appears that numpy.lib.function_base.median makes a copy, sorts and picks the middle element.
I'm sure we need lots of improvements in the code-base. This has always been true. We rely on the ability of contributors which doesn't work well unless we have a lot of contributors which are hard to get unless we consolidate around a single array package. Please contribute a fix.
single one routine out, I was also saddened to find both Numeric and numpy use double precision lapack routines for single precision arguments.
The point of numpy.linalg is to provide the functionality of Numeric not extend it. This is because SciPy provides a much more capable linalg sub-package that works with single and double precision. It sounds like you want SciPy.
For numpy to really be better than Numeric I would like to find algorithm selections according to the array dimensions and type.
These are good suggestions but for SciPy. The linear algebra in NumPy is just for getting your feet wet and having access to basic functionality.
Getting the basearray type into the python core is the key - then it makes sense to get the best of breed algorithms working as you can rely on the basearray being around for many years to come.
Please please please get basearray into the python core! How can we help with that?
There is a PEP in SVN (see the array interface link at http://numeric.scipy.org) Karol Langner is a Google summer-of-code student working on it this summer. I'm not sure how far he'll get, but I'm hopeful.
I could spend more time on it, if I had funding to do it, but right now I'm up against a wall.
Again, thanks for the feedback.
Best,
-Travis
Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 _______________________________________________ Numpy-discussion mailing list Numpy-discussion@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/numpy-discussion
Just one more vote for float. On the basis that Travis mentioned, of all those first-timers downloading, trying, finding something they didn't expect that was rather confusing, and giving up.
Jonathan Taylor wrote:
+1 for some sort of float. I am a little confused as to why Float64 is a particularly good choice. Can someone explain in more detail? Presumably this is the most sensible ctype and translates to a python float well?
O.K. I'm convinced that we should change to float as the default, but *everywhere* as Sasha says. We will provide two tools to make the transition easier. 1) The numpy.oldnumeric sub-package will contain definitions of changed functions that keep the old defaults (integer). This is what convertcode replaces for import Numeric calls so future users who make the transition won't really notice. 2) A function/script that can be run to convert all type-less uses of the changed functions to explicitly insert dtype=int. Yes, it will be a bit painful (I made the change and count 6 failures in NumPy tests and 34 in SciPy). But, it sounds like there is support for doing it. And yes, we must do it prior to 1.0 if we do it at all. Comments? -Travis
Travis Oliphant wrote:
Comments?
Whatever else you do, leave arange() alone. It should never have accepted floats in the first place. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco
Robert Kern wrote:
Whatever else you do, leave arange() alone. It should never have accepted floats in the first place.
Just to make sure we're clear: Because one should use linspace() for that? If so, this would be the time to raise an error (or at least a deprecation warning) when arange() is called with Floats. I have a LOT of code that does that! In fact, I posted a question here recently and got a lot of answers and suggested code, and not one person suggested that I shouldn't use arange() with floats. Did Numeric have linspace() It doesn't look like it to me. -Chris -- Christopher Barker, Ph.D. Oceanographer NOAA/OR&R/HAZMAT (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov
Christopher Barker wrote:
Robert Kern wrote:
Whatever else you do, leave arange() alone. It should never have accepted floats in the first place.
Just to make sure we're clear: Because one should use linspace() for that?
More or less. Depending on the step and endpoint that you choose, it can be nearly impossible for the programmer to predict how many elements are going to be generated.
If so, this would be the time to raise an error (or at least a deprecation warning) when arange() is called with Floats.
I have a LOT of code that does that! In fact, I posted a question here recently and got a lot of answers and suggested code, and not one person suggested that I shouldn't use arange() with floats.
I should have been more specific, but I did express disapproval in the code sample I gave: x = arange(minx, maxx+step, step) # oy. Since your question wasn't about that specifically, I used the technique that your original sample did.
Did Numeric have linspace() It doesn't look like it to me.
It doesn't. It was originally contributed to Scipy by Fernando, IIRC. It's small, so it is easy to copy if you need to maintain support for Numeric, still. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco
Robert Kern wrote:
Travis Oliphant wrote:
Comments?
Whatever else you do, leave arange() alone. It should never have accepted floats in the first place.
Actually, Robert makes a good point. arange with floats is problematic. We should direct people to linspace instead of changing the default of arange. Most new users will probably expect arange to return a type similar to Python's range which is int. Also: Keeping arange as ints reduces the number of errors from the change in the unit tests to 2 in NumPy 3 in SciPy So, I think from both a pragmatic and idealized situtation, arange should stay with the default of ints. People who want arange to return floats should be directed to linspace. -Travis
On Fri, Jun 30, 2006 at 01:25:23PM -0600, Travis Oliphant wrote:
Robert Kern wrote:
Whatever else you do, leave arange() alone. It should never have accepted floats in the first place.
Actually, Robert makes a good point. arange with floats is problematic. We should direct people to linspace instead of changing the default of arange. Most new users will probably expect arange to return a type similar to Python's range which is int. ... So, I think from both a pragmatic and idealized situtation, arange should stay with the default of ints. People who want arange to return floats should be directed to linspace.
I agree that arange with floats is problematic. However, if you want, for example, arange(10.0) (as I often do), you have to do: linspace(0.0, 9.0, 10), which is very un-pythonic and not at all what a new user would expect... I think of linspace as a convenience function, not as a replacement for arange with floats. Scott -- Scott M. Ransom Address: NRAO Phone: (434) 296-0320 520 Edgemont Rd. email: sransom@nrao.edu Charlottesville, VA 22903 USA GPG Fingerprint: 06A9 9553 78BE 16DB 407B FFCA 9BFA B6FF FFD3 2989
Scott Ransom wrote:
On Fri, Jun 30, 2006 at 01:25:23PM -0600, Travis Oliphant wrote:
Robert Kern wrote:
Whatever else you do, leave arange() alone. It should never have accepted floats in the first place.
Actually, Robert makes a good point. arange with floats is problematic. We should direct people to linspace instead of changing the default of arange. Most new users will probably expect arange to return a type similar to Python's range which is int. ... So, I think from both a pragmatic and idealized situtation, arange should stay with the default of ints. People who want arange to return floats should be directed to linspace.
I agree that arange with floats is problematic. However, if you want, for example, arange(10.0) (as I often do), you have to do: linspace(0.0, 9.0, 10), which is very un-pythonic and not at all what a new user would expect...
I think of linspace as a convenience function, not as a replacement for arange with floats.
I don't mind arange(10.0) so much, now that it exists. I would mind, a lot, about arange(10) returning a float64 array. Similarity to the builtin range() is much more important in my mind than an arbitrary "consistency" with ones() and zeros(). It's arange(0.0, 1.0, 0.1) that I think causes the most problems with arange and floats. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco
Robert Kern wrote:
It's arange(0.0, 1.0, 0.1) that I think causes the most problems with arange and floats.
actually, much to my surprise:
import numpy as N N.arange(0.0, 1.0, 0.1) array([ 0. , 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9])
But I'm sure there are other examples that don't work out. -Chris -- Christopher Barker, Ph.D. Oceanographer NOAA/OR&R/HAZMAT (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov
Scott Ransom wrote:
On Fri, Jun 30, 2006 at 01:25:23PM -0600, Travis Oliphant wrote:
Robert Kern wrote:
Whatever else you do, leave arange() alone. It should never have accepted floats in the first place.
Actually, Robert makes a good point. arange with floats is problematic. We should direct people to linspace instead of changing the default of arange. Most new users will probably expect arange to return a type similar to Python's range which is int.
...
So, I think from both a pragmatic and idealized situtation, arange should stay with the default of ints. People who want arange to return floats should be directed to linspace.
I should have worded this as:
"People who want arange to return floats *as a default* should be directed to linspace" So, basically, arange is not going to change. Because of this, shifting over was a cinch. I still need to write the convert-script code that inserts dtype=int in routines that use old defaults: *plea* anybody want to write that?? -Travis
On 6/30/06, Travis Oliphant <oliphant@ee.byu.edu> wrote:
... I still need to write the convert-script code that inserts dtype=int in routines that use old defaults: *plea* anybody want to write that??
I will try to do it at some time over the long weekend. I was bitten by the fact that the current convert-script changes anything that resembles an old typecode such as 'b' regardless of context. (I was unlucky to have database columns called 'b'!) Fixing that is very similar to the problem at hand.
On 30/06/2006, at 10:11 PM, Travis Oliphant wrote:
I should have worded this as:
"People who want arange to return floats *as a default* should be directed to linspace"
So, basically, arange is not going to change.
Because of this, shifting over was a cinch. I still need to write the convert-script code that inserts dtype=int in routines that use old defaults: *plea* anybody want to write that??
Hmm ... couldn't we make the transition easier and more robust by writing compatibility interfaces for zeros, ones, empty, called e.g. intzeros, intones, intempty? These functions could of course live in oldnumeric.py. Then we can get convertcode.py to do a simple search and replace -- and, more importantly, it's easy for users to do the same manually should they choose not to use convertcode.py. I could work on this this weekend. I'm pleased we're changing the defaults to float. The combination of the int defaults and silent downcasting with in-place operators trips me up once every few months when I forget to specify dtype=float explicitly. Another wart gone from NumPy! I'm surprised and impressed that the community's willing to make this change after 10+ years with int defaults. I think it's a small but important improvement in usability. -- Ed
On Sat, 1 Jul 2006, Ed Schofield apparently wrote:
couldn't we make the transition easier and more robust by writing compatibility interfaces for zeros, ones, empty, called e.g. intzeros, intones, intempty
I think Robert or Tim suggested int.zeros() etc. fwiw, Alan Isaac
I don't see how that will simplify the transition. Convertcode will still need to detect use of the dtype argument (keyword or positional). Simple s/zeros/int.zeros/ will not work. I read Ed's suggestion as retaining current default in intzeros so that intzeros(n, float) is valid. On the other hand Tim's int.zeros would not take dtype argument because dtype is already bound as self. The bottom line: int.zeros will not work and intzeros(n, float) is ugly. I would not mind oldnumeric.zeros, but context aware convertcode is still worth the effort. Let's see how far I will get with that ... On 7/1/06, Alan G Isaac <aisaac@american.edu> wrote:
On Sat, 1 Jul 2006, Ed Schofield apparently wrote:
couldn't we make the transition easier and more robust by writing compatibility interfaces for zeros, ones, empty, called e.g. intzeros, intones, intempty
I think Robert or Tim suggested int.zeros() etc.
fwiw, Alan Isaac
Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 _______________________________________________ Numpy-discussion mailing list Numpy-discussion@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/numpy-discussion
Sasha wrote:
I don't see how that will simplify the transition. Convertcode will still need to detect use of the dtype argument (keyword or positional). Simple s/zeros/int.zeros/ will not work. I read Ed's suggestion as retaining current default in intzeros so that intzeros(n, float) is valid. On the other hand Tim's int.zeros would not take dtype argument because dtype is already bound as self.
It's just like a game of telephone! That was Robert's suggestion not mine. What I said was: Personally, given no other constraints, I would probably just get rid of the defaults all together and make the user choose. Since I've been dragged back into this again let me make a quick comment. If we are choosing a floating point default, there are at least two other choices that make as much sense as using float64. The first possibility is to use the same thing that python uses, that is 'float'. On my box and probably most current boxes that turns out to be float64 anyway, but choosing 'float' as the default rather than 'float64' will change the way numpy is expected to behave as hardware and / or Python evolves. The second choice is to use the longest floating point type available on a given platform, that is, 'longfloat'. Again, on my box that is the same as using float64, but on other boxes I suspect it gives somewhat different results. The advantage of using 'float64' as the default is that we can expect programs to run consistently across platforms. The advantage of choosing 'float' is that interactions with Python proper may be less suprising when python's float is not 'float64. The advantage of using 'longfloat' is that it is the safest type to use when interacting with other unknown types. I don't care much which gets chosen, but I think we should know which of these we intend and why. Since there often the same thing at present I have a suspicion that these three cases may be conflated in some people heads. -tim
The bottom line: int.zeros will not work and intzeros(n, float) is ugly. I would not mind oldnumeric.zeros, but context aware convertcode is still worth the effort. Let's see how far I will get with that ...
On 7/1/06, Alan G Isaac <aisaac@american.edu> wrote:
On Sat, 1 Jul 2006, Ed Schofield apparently wrote:
couldn't we make the transition easier and more robust by writing compatibility interfaces for zeros, ones, empty, called e.g. intzeros, intones, intempty
I think Robert or Tim suggested int.zeros() etc.
fwiw, Alan Isaac
Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 _______________________________________________ Numpy-discussion mailing list Numpy-discussion@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/numpy-discussion
Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 _______________________________________________ Numpy-discussion mailing list Numpy-discussion@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/numpy-discussion
On 6/30/06, Robert Kern <robert.kern@gmail.com> wrote:
Travis Oliphant wrote:
Comments?
Whatever else you do, leave arange() alone. It should never have accepted floats in the first place.
Hear, hear. Using floats in arange is a lousy temptation that must be avoided. Apart from that I think that making float64 the default for most things is the right way to go. Numpy is primarily for numeric computation, and numeric computation is primarily in float64. Specialist areas like imaging can be dealt with as special cases. BTW, can someone suggest the best way to put new code into Numpy at this point? Is there a test branch of some sort? I have some free time coming up in a few weeks and would like to do the following: 1) add a left/right option to searchsorted, 2) add faster normals to random, 3) add the MWC8222 generator to random, 3) add the kind keyword to the functional forms of sort (sort, argsort) as in numarray. Chuck
All, This is bit off topic, but a while ago there were some complaints about the usefulness of distutils. I note that KDE has gone over to using cmake after trying scon. I am not familiar with cmake, but perhaps someone here knows more and can comment on its suitability. Chuck
Charles R Harris wrote:
All,
This is bit off topic, but a while ago there were some complaints about the usefulness of distutils. I note that KDE has gone over to using cmake after trying scon. I am not familiar with cmake, but perhaps someone here knows more and can comment on its suitability.
None at all. I have nightmares about it every time I need to rebuild VTK. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco
Hey Chuck
-----Original Message----- From: numpy-discussion-bounces@lists.sourceforge.net [mailto:numpy- discussion-bounces@lists.sourceforge.net] On Behalf Of Charles R Harris Sent: 01 July 2006 19:57 To: Robert Kern Cc: numpy-discussion@lists.sourceforge.net Subject: Re: [Numpy-discussion] Time for beta1 of NumPy 1.0
All,
This is bit off topic, but a while ago there were some complaints about the usefulness of distutils. I note that KDE has gone over to using cmake after trying scon. I am not familiar with cmake, but perhaps someone here knows more and can comment on its suitability.
CMake definately warrants investigation, but I think SCons might be a better way to go. I think it would make it easier to reuse large parts of the existing build code (e.g. conv_template.py could be converted into a SCons builder without too much effort). Reusing parts of distutils and setuptools would also be easier if the new tool is somehow Python-aware. I think the main problem with distutils in the NumPy context is that it was never designed to build C/Fortran code over so many platforms with to many possible build configurations. python setup.py install works pretty well, but any kind of non-default configuration can become a bit hairy, despite the excellent work on NumPy extensions to distutils. I'd like to take a stab at doing something with SCons in a few weeks' time. Does anybody want to brainstorm on some ideas for what is needed from a better build system for NumPy? Maybe a wiki page? Regards, Albert
Albert Strasheim wrote:
I'd like to take a stab at doing something with SCons in a few weeks' time. Does anybody want to brainstorm on some ideas for what is needed from a better build system for NumPy? Maybe a wiki page?
I strongly believe that we need to be using whatever build system is the standard for Python packages. I'm happy to see distutils go away in favor of something better, but that "something better" needs to be actively promoted as *the* replacement for distutils for *all* Python packages, not just numpy. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco
On 7/1/06, Robert Kern <robert.kern@gmail.com> wrote:
... I strongly believe that we need to be using whatever build system is the standard for Python packages. I'm happy to see distutils go away in favor of something better, but that "something better" needs to be actively promoted as *the* replacement for distutils for *all* Python packages, not just numpy.
I strongly agree. Distutils is in such a bad shape partially because extension packages either not use distutils or extend distutils in non-standard ways. Python-dev is not an easy crowd to deal with, but in the long run investing effort in improving core distutils will pay off.
Hi, Linux Weekly News (http://lwn.net) had an very interesting article on KDE's switch on June 19, 2006 by Alexander Neundorf: http://lwn.net/Articles/187923/ The full article is at: http://lwn.net/Articles/188693/ This should be freely available to all. Also, the current US Linux Magazine (June or July 2006 ) has a small feature on cmake as well. Regards Bruce On 7/1/06, Albert Strasheim <fullung@gmail.com> wrote:
Hey Chuck
-----Original Message----- From: numpy-discussion-bounces@lists.sourceforge.net [mailto:numpy- discussion-bounces@lists.sourceforge.net] On Behalf Of Charles R Harris Sent: 01 July 2006 19:57 To: Robert Kern Cc: numpy-discussion@lists.sourceforge.net Subject: Re: [Numpy-discussion] Time for beta1 of NumPy 1.0
All,
This is bit off topic, but a while ago there were some complaints about the usefulness of distutils. I note that KDE has gone over to using cmake after trying scon. I am not familiar with cmake, but perhaps someone here knows more and can comment on its suitability.
CMake definately warrants investigation, but I think SCons might be a better way to go. I think it would make it easier to reuse large parts of the existing build code (e.g. conv_template.py could be converted into a SCons builder without too much effort). Reusing parts of distutils and setuptools would also be easier if the new tool is somehow Python-aware.
I think the main problem with distutils in the NumPy context is that it was never designed to build C/Fortran code over so many platforms with to many possible build configurations. python setup.py install works pretty well, but any kind of non-default configuration can become a bit hairy, despite the excellent work on NumPy extensions to distutils.
I'd like to take a stab at doing something with SCons in a few weeks' time. Does anybody want to brainstorm on some ideas for what is needed from a better build system for NumPy? Maybe a wiki page?
Regards,
Albert
Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 _______________________________________________ Numpy-discussion mailing list Numpy-discussion@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/numpy-discussion
Albert Strasheim wrote:
Hey Chuck
-----Original Message----- From: numpy-discussion-bounces@lists.sourceforge.net [mailto:numpy- discussion-bounces@lists.sourceforge.net] On Behalf Of Charles R Harris Sent: 01 July 2006 19:57 To: Robert Kern Cc: numpy-discussion@lists.sourceforge.net Subject: Re: [Numpy-discussion] Time for beta1 of NumPy 1.0
All,
This is bit off topic, but a while ago there were some complaints about the usefulness of distutils. I note that KDE has gone over to using cmake after trying scon. I am not familiar with cmake, but perhaps someone here knows more and can comment on its suitability.
CMake definately warrants investigation, but I think SCons might be a better way to go. I think it would make it easier to reuse large parts of the existing build code (e.g. conv_template.py could be converted into a SCons builder without too much effort). Reusing parts of distutils and setuptools would also be easier if the new tool is somehow Python-aware.
I think the main problem with distutils in the NumPy context is that it was never designed to build C/Fortran code over so many platforms with to many possible build configurations. python setup.py install works pretty well, but any kind of non-default configuration can become a bit hairy, despite the excellent work on NumPy extensions to distutils.
I'd like to take a stab at doing something with SCons in a few weeks' time. Does anybody want to brainstorm on some ideas for what is needed from a better build system for NumPy? Maybe a wiki page?
I have a small experience with scons, as a replacement of the auto* tools for small packages on my own (requirements for cross-building, library and header dependency, build of libraries, etc...). So I am willing to share my somewhat limited experience with scons (the code I am building scons with is using cblas/clapack, and has libraries + some unit testing, so we would not start from scratch). Also, I have access to x86 and ppc linux + mac os x + windows easily, which makes it easy to test on some common platforms, David P.S: Some comments on scons : I don't know distutils, so I can only compare to autotools: from *my* experience, you should think about scons as a Makefile replacement, and as a build framework to build onto. The main pro of scons: - having a real language for build rules programming is a real plus. It makes it much easier to extend that autoconf, for example (debugging m4 macro is not something I can enjoy much, and I am sure I am not alone). - the dependency checking works great - parallel build is explicitly handled - scons knows how to build library (static and shared) on the plateforms it supports - can be included in the project so scons does not need to be installed if needed (I have never used this feature myself). The main cons: - configuration part: there are some tools to test library/header a la autoconf, but this is far from great in the present situation, mainly because of the next point - option handling from the command line: there is some support, but nothing is automatic. On the long run, this is painful. - No support for library versioning; I am not sure about rpath support, which is useful for non-standard path installation. I don't know how difficult it would be to implement for all platforms (I have some - can be slow for big projects ? I have seen quite big projects (eg ardour: several hundred of .c and .h files) using scons, and it was not really slow, and I don't think it would be a problem for something like numpy which size is nothing compared to kde. To sum it up: as a make replacement, from a developer POV, it works great. As a tool for *distribution*, I am less convinced. For people familiar with autotools, scons is a great automake replacement. Everything else has to be implemented: autoconf, libtool, etc... My understanding is that those two tools (autoconf and libtool) are the ones most needed for numpy, so there is a lot of work to do if we want to use scons.
Charles R Harris wrote:
On 6/30/06, *Robert Kern* <robert.kern@gmail.com <mailto:robert.kern@gmail.com>> wrote:
Travis Oliphant wrote:
> Comments?
Whatever else you do, leave arange() alone. It should never have accepted floats in the first place.
Hear, hear. Using floats in arange is a lousy temptation that must be avoided. Apart from that I think that making float64 the default for most things is the right way to go. Numpy is primarily for numeric computation, and numeric computation is primarily in float64. Specialist areas like imaging can be dealt with as special cases.
BTW, can someone suggest the best way to put new code into Numpy at this point? Is there a test branch of some sort?
My favorite is to make changes in piece-meal steps and just commit them to the turnk as they get created. I think your projects 2 and 4 could be done that way. If a change requires a more elaborate re-write, then I usually construct a branch, switch over to the branch and make changes there. When I'm happy with the result, the branch is merged back into the trunk. Be careful with branches though. It is easy to get too far away from main-line trunk development (although at this point the trunk should be stabilizing toward release 1.0). 1) To construct a branch (just a copy of the trunk): (Make note of the revision number when you create the branch-- you can get it later but it's easier to just record it at copy). svn cp http://svn.scipy.org/svn/numpy/trunk http://svn.scipy.org/svn/numpy/branches/<somename> 2) To switch to using the branch: svn switch http://svn.scipy.org/svn/numpy/branches/<somename> You can also just have another local directory where you work on the branch so that you still have a local directory with the main trunk. Just check out the branch: svn co http://svn.scipy.org/svn/numpy/branches/<somename> mybranch 3) To merge back: a) Get back to the trunk repository: svn switch http://svn.scipy.org/svn/numpy/trunk or go to your local copy of the trunk and do an svn update b) Merge the changes from the branch back in to your local copy of the trunk: svn merge -r <branch#>:HEAD http://svn.scipy.org/svn/numpy/branches/<somename> This assumes that <branch#> is the revision number when the branch is created c) You have to now commit your local copy of the trunk (after you've dealt with and resolved any potential conflicts). If your branch is continuing a while, you may need to update your branch with changes that have happened in the main-line trunk. This will make it easier to merge back when you are done. To update your branch with changes from the main trunk do: svn merge -r <lastmerge#>:<end#> http://svn.scipy.org/svn/numpy/trunk where <lastmerge#> is the last revision number you used to update your branch (or the revision number at which you made your branch) and <end#> is the ending revision number for changes in the trunk you'd like to merge. Here is a good link explaining the process more. http://svnbook.red-bean.com/en/1.1/ch04s03.html -Travis -Travis
Thanks Travis, Your directions are very helpful and much appreciated. Chuck On 7/1/06, Travis Oliphant <oliphant.travis@ieee.org> wrote:
Charles R Harris wrote:
On 6/30/06, *Robert Kern* <robert.kern@gmail.com <mailto:robert.kern@gmail.com>> wrote:
Travis Oliphant wrote:
> Comments?
Whatever else you do, leave arange() alone. It should never have accepted floats in the first place.
Hear, hear. Using floats in arange is a lousy temptation that must be avoided. Apart from that I think that making float64 the default for most things is the right way to go. Numpy is primarily for numeric computation, and numeric computation is primarily in float64. Specialist areas like imaging can be dealt with as special cases.
BTW, can someone suggest the best way to put new code into Numpy at this point? Is there a test branch of some sort?
My favorite is to make changes in piece-meal steps and just commit them to the turnk as they get created. I think your projects 2 and 4 could be done that way.
If a change requires a more elaborate re-write, then I usually construct a branch, switch over to the branch and make changes there. When I'm happy with the result, the branch is merged back into the trunk.
Be careful with branches though. It is easy to get too far away from main-line trunk development (although at this point the trunk should be stabilizing toward release 1.0).
1) To construct a branch (just a copy of the trunk):
(Make note of the revision number when you create the branch-- you can get it later but it's easier to just record it at copy).
svn cp http://svn.scipy.org/svn/numpy/trunk http://svn.scipy.org/svn/numpy/branches/<somename>
2) To switch to using the branch:
svn switch http://svn.scipy.org/svn/numpy/branches/<somename>
You can also just have another local directory where you work on the branch so that you still have a local directory with the main trunk. Just check out the branch:
svn co http://svn.scipy.org/svn/numpy/branches/<somename> mybranch
3) To merge back:
a) Get back to the trunk repository:
svn switch http://svn.scipy.org/svn/numpy/trunk
or go to your local copy of the trunk and do an svn update
b) Merge the changes from the branch back in to your local copy of the trunk:
svn merge -r <branch#>:HEAD http://svn.scipy.org/svn/numpy/branches/<somename>
This assumes that <branch#> is the revision number when the branch is created
c) You have to now commit your local copy of the trunk (after you've dealt with and resolved any potential conflicts).
If your branch is continuing a while, you may need to update your branch with changes that have happened in the main-line trunk. This will make it easier to merge back when you are done.
To update your branch with changes from the main trunk do:
svn merge -r <lastmerge#>:<end#> http://svn.scipy.org/svn/numpy/trunk
where <lastmerge#> is the last revision number you used to update your branch (or the revision number at which you made your branch) and <end#> is the ending revision number for changes in the trunk you'd like to merge.
Here is a good link explaining the process more.
http://svnbook.red-bean.com/en/1.1/ch04s03.html
-Travis
-Travis
Charles R Harris wrote:
Thanks Travis,
Your directions are very helpful and much appreciated. I placed these instructions at
http://projects.scipy.org/scipy/numpy/wiki/MakingBranches Please make any changes needed to that wiki page. -Travis
Travis Oliphant wrote:
Charles R Harris wrote:
Thanks Travis,
Your directions are very helpful and much appreciated. I placed these instructions at
http://projects.scipy.org/scipy/numpy/wiki/MakingBranches
Please make any changes needed to that wiki page.
I will add (here as well as the wiki) that using the svnmerge tool to be enormously helpful in maintaining branches. http://www.dellroad.org/svnmerge/index Among other things, it makes merge commit messages with the contents of the individual commit messages, so history isn't lost when changes are merged back into the trunk. Here is how I tend to set things up for bidirectional merging: (untested with this specific example, though) $ cd ~/svn/scipy $ svn cp http://svn.scipy.org/svn/scipy/trunk http://svn.scipy.org/svn/scipy/branches/mine $ svnmerge init http://svn.scipy.org/svn/scipy/branches/mine $ svn commit -F svnmerge-commit-message.txt $ svn switch http://svn.scipy.org/svn/scipy/branches/mine $ svnmerge init http://svn.scipy.org/svn/scipy/trunk $ svn commit -F svnmerge-commit-message.txt Then, when you need to pull in changes from the trunk, view them with $ svnmerge avail and pull them in with $ svnmerge merge $ svn ci -F svnmerge-commit-message.txt When you're finally done with the branch, the same procedure on the trunk pulls in all of the (real, not merged in from the trunk) changes you've made to the branch. Also, if you're only going to be making changes in one directory, I've found that it's much easier to simply branch that directory and svn switch just that directory over. That way, you don't have to worry about pulling in everyone else's changes to the rest of the package into the branch. You can just svn up. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco
On Fri, 30 Jun 2006 14:42:33 -0400 "Jonathan Taylor" <jonathan.taylor@utoronto.ca> wrote:
+1 for some sort of float. I am a little confused as to why Float64 is a particularly good choice. Can someone explain in more detail? Presumably this is the most sensible ctype and translates to a python float well?
It's "float64", btw. Float64 is the old Numeric name. Python's "float" type is a C "double" (just like Python's "int" is a C "long"). In practice, C doubles are 64-bit. In NumPy, these are the same type: float32 == single (32-bit float, which is a C float) float64 == double (64-bit float, which is a C double) Also, some Python types have equivalent NumPy types (as in, they can be used interchangably as dtype arguments): int == long (C long, could be int32 or int64) float == double complex == cdouble (also complex128) Personally, I'd suggest using "single", "float", and "longdouble" in numpy code. [While we're on the subject, for portable code don't use float96 or float128: one or other or both probably won't exist; use longdouble]. -- |>|\/|< /--------------------------------------------------------------------------\ |David M. Cooke http://arbutus.physics.mcmaster.ca/dmc/ |cookedm@physics.mcmaster.ca
On Fri, 30 Jun 2006, Jonathan Taylor apparently wrote:
In general though I agree that this is a now or never change.
Sasha has also made that argument. I see one possible additional strategy. I think every agrees that the long view is important. Now even Sasha agrees that float64 is the best default. Suppose 1. float64 is the ideal default (I agree with this) 2. there is substantial concern about the change of default on extant code for the unwary One approach proposed is to include a different function definition in a compatability module. This seems acceptable to me, but as Sasha notes it is not without drawbacks. Here is another possibility: transition by requiring an explicit data type for some period of time (say, 6-12 months). After that time, provide the default of float64. This would require some short term pain, but for the long term gain of the desired outcome. Just a thought, Alan Isaac PS I agree with Sasha's following observations: "arrays other than float64 are more of the hard-hat area and their properties may be surprising to the novices. Exposing novices to non-float64 arrays through default constructors is a bad thing. ... No one expects that their Numeric or numarray code will work in numpy 1.0 without changes, but I don't think people will tolerate major breaks in backward compatibility in the future releases. ... If we decide to change the default, let's do it everywhere including array constructors and arange."
Numeric-24.2 (released Nov. 11, 2005)
14275 py24.exe 2905 py23.exe 9144 tar.gz
Numarray 1.5.1 (released Feb, 7, 2006)
10272 py24.exe 11883 py23.exe 12779 tar.gz
NumPy 0.9.8 (May 17, 2006)
3713 py24.exe 558 py23.exe 4111 tar.gz
Here is some trends with a pretty picture. http://www.google.com/trends?q=numarray%2C+NumPy%2C+Numeric+Python Unfortunatle Numeric alone is to general a term to use. But I would say NumPy is looking good. ;) -- Louis Cordier <lcordier@point45.com> cell: +27721472305 Point45 Entertainment (Pty) Ltd. http://www.point45.org
+1 for float64. I'll teach Introduction to Numerical Linear Algebra next term and I will use numpy! Best, Paulo -- Paulo José da Silva e Silva Professor Assistente do Dep. de Ciência da Computação (Assistant Professor of the Computer Science Dept.) Universidade de São Paulo - Brazil e-mail: pjssilva at ime.usp.br Web: http://www.ime.usp.br/~pjssilva Teoria é o que não entendemos o (Theory is something we don't) suficiente para chamar de prática. (understand well enough to call practice)
On 6/29/06, Alan G Isaac <aisaac@american.edu> wrote:
On Thu, 29 Jun 2006, Travis Oliphant apparently wrote:
Please make any comments or voice major concerns
A rather minor issue, but I would just like to make sure that a policy decision was made not to move to a float default for identity(), ones(), zeros(), and empty(). (I leave aside arange().)
I see the argument for a change to be 3-fold: 1. It is easier to introduce people to numpy if default data types are all float. (I teach, and I want my students to use numpy.) 2. It is a better match to languages from which users are likely to migrate (e.g., GAUSS or Matlab). 3. In the uses I am most familiar with, float is the most frequently desired data type. (I guess this may be field specific, especially for empty().)
So far the vote is 8 for float, 1 for int.
Regarding choice of float or int for default: The number one priority for numpy should be to unify the three disparate Python numeric packages. Whatever choice of defaults facilitates that is what I support. Personally, given no other constraints, I would probably just get rid of the defaults all together and make the user choose. -tim
Tim Hochberg wrote:
Regarding choice of float or int for default:
The number one priority for numpy should be to unify the three disparate Python numeric packages. Whatever choice of defaults facilitates that is what I support.
+10
Personally, given no other constraints, I would probably just get rid of the defaults all together and make the user choose.
My preferred solution is to add class methods to the scalar types rather than screw up compatibility. In [1]: float64.ones(10) -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco
On 6/30/06, Robert Kern <robert.kern@gmail.com> wrote:
Tim Hochberg wrote:
Regarding choice of float or int for default:
The number one priority for numpy should be to unify the three disparate Python numeric packages. Whatever choice of defaults facilitates that is what I support.
+10
Personally, given no other constraints, I would probably just get rid of the defaults all together and make the user choose.
My preferred solution is to add class methods to the scalar types rather than screw up compatibility.
In [1]: float64.ones(10)
I don't think an int will be able to hold the number of votes for float64.
Tim Hochberg wrote:
The number one priority for numpy should be to unify the three disparate Python numeric packages.
I think the number one priority should be the best it can be. As someone said, two (or ten) years from now, there will be more new users than users migrating from the older packages.
Personally, given no other constraints, I would probably just get rid of the defaults all together and make the user choose.
I like that too, and it would keep the incompatibility from causing silent errors. -Chris -- Christopher Barker, Ph.D. Oceanographer NOAA/OR&R/HAZMAT (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov
+1 for float64 for me as well. Scott On Fri, Jun 30, 2006 at 10:29:42AM -0400, Darren Dale wrote:
+1 for float64
Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 _______________________________________________ Numpy-discussion mailing list Numpy-discussion@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/numpy-discussion
-- -- Scott M. Ransom Address: NRAO Phone: (434) 296-0320 520 Edgemont Rd. email: sransom@nrao.edu Charlottesville, VA 22903 USA GPG Fingerprint: 06A9 9553 78BE 16DB 407B FFCA 9BFA B6FF FFD3 2989
On 6/30/06, Scott Ransom <sransom@nrao.edu> wrote:
+1 for float64 for me as well.
+1 for float64 I have lots of code overriding the int defaults by hand which were giving me grief with hand-written extensions (which were written double-only for speed reasons). I'll be happy to clean this up. I completely understand Travis' concerns about backwards compatibility, but frankly, I think that right now the quality and community momentum of numpy is already enough that it will carry things forward. People will suffer a little during the porting days, but they'll be better off in the long run. I don't think we should undrestimate the value of eternal happiness :) Besides, decent unit tests will catch these problems. We all know that every scientific code in existence is unit tested to the smallest routine, so this shouldn't be a problem for anyone. Cheers, f
On 6/30/06, Fernando Perez <fperez.net@gmail.com> wrote:
... Besides, decent unit tests will catch these problems. We all know that every scientific code in existence is unit tested to the smallest routine, so this shouldn't be a problem for anyone.
Is this a joke? Did anyone ever measured the coverage of numpy unittests? I would be surprized if it was more than 10%.
Sasha wrote:
On 6/30/06, Fernando Perez <fperez.net@gmail.com> wrote:
... Besides, decent unit tests will catch these problems. We all know that every scientific code in existence is unit tested to the smallest routine, so this shouldn't be a problem for anyone.
Is this a joke? Did anyone ever measured the coverage of numpy unittests? I would be surprized if it was more than 10%.
Very obviously a joke...uh...with the exception of enthought-written scientific code, of course ;-)
On 6/30/06, Sasha <ndarray@mac.com> wrote:
On 6/30/06, Fernando Perez <fperez.net@gmail.com> wrote:
... Besides, decent unit tests will catch these problems. We all know that every scientific code in existence is unit tested to the smallest routine, so this shouldn't be a problem for anyone.
Is this a joke? Did anyone ever measured the coverage of numpy unittests? I would be surprized if it was more than 10%.
That's a conundrum. A joke is no longer a joke once you point out, yes it is a joke.
On 6/30/06, Fernando Perez <fperez.net@gmail.com> wrote:
Besides, decent unit tests will catch these problems. We all know that every scientific code in existence is unit tested to the smallest routine, so this shouldn't be a problem for anyone.
On Fri, 30 Jun 2006, Sasha apparently wrote:
Is this a joke?
It had me chuckling. ;-) The dangers of email ... Cheers, Alan Isaac
On 6/30/06, Sasha <ndarray@mac.com> wrote:
On 6/30/06, Fernando Perez <fperez.net@gmail.com> wrote:
... Besides, decent unit tests will catch these problems. We all know that every scientific code in existence is unit tested to the smallest routine, so this shouldn't be a problem for anyone.
Is this a joke? Did anyone ever measured the coverage of numpy unittests? I would be surprized if it was more than 10%.
Of course it's a joke. So obviously one for anyone who knows the field, that the smiley shouldn't be needed (and yes, I despise background laughs on television, too). Maybe a sad joke, given the realities of scientific computing, and maybe a poor joke, but at least an attempt at humor. Cheers, f
"In the good old days physicists repeated each other's experiments, just to be sure. Today they stick to FORTRAN, so that they can share each other's programs, bugs included." --- Edsger W.Dijkstra, "How do we tell truths that might hurt?" 18 June 1975 I just miss the good old days ... On 6/30/06, Fernando Perez <fperez.net@gmail.com> wrote:
On 6/30/06, Sasha <ndarray@mac.com> wrote:
On 6/30/06, Fernando Perez <fperez.net@gmail.com> wrote:
... Besides, decent unit tests will catch these problems. We all know that every scientific code in existence is unit tested to the smallest routine, so this shouldn't be a problem for anyone.
Is this a joke? Did anyone ever measured the coverage of numpy unittests? I would be surprized if it was more than 10%.
Of course it's a joke. So obviously one for anyone who knows the field, that the smiley shouldn't be needed (and yes, I despise background laughs on television, too). Maybe a sad joke, given the realities of scientific computing, and maybe a poor joke, but at least an attempt at humor.
Cheers,
f
On Fri, 2006-06-30 at 12:35 -0400, Sasha wrote:
Besides, decent unit tests will catch these problems. We all know that every scientific code in existence is unit tested to the smallest routine, so this shouldn't be a problem for anyone.
Is this a joke? Did anyone ever measured the coverage of numpy unittests? I would be surprized if it was more than 10%.
Given the coverage is so low, how can people help by contributing unit tests? Are there obvious areas with poor coverage? Travis, do you have any opinions on this? ...Eric
On 7/1/06, Eric Jonas <jonas@mit.edu> wrote:
On Fri, 2006-06-30 at 12:35 -0400, Sasha wrote:
Besides, decent unit tests will catch these problems. We all know that every scientific code in existence is unit tested to the smallest routine, so this shouldn't be a problem for anyone.
Is this a joke? Did anyone ever measured the coverage of numpy unittests? I would be surprized if it was more than 10%.
Given the coverage is so low, how can people help by contributing unit tests? Are there obvious areas with poor coverage? Travis, do you have any opinions on this? ...Eric
A handy tool for finding these things out is coverage.py. I've found it quite helpful in checking unittest coverage in the past. http://www.nedbatchelder.com/code/modules/coverage.html I don't think I'll have a chance in the immediate future to try it out with numpy, but if someone does, I'm sure it will give some answers to your questions Eric. Cheers, Tim Leslie
Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 _______________________________________________ Numpy-discussion mailing list Numpy-discussion@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/numpy-discussion
On Fri, 30 Jun 2006 12:35:35 -0400 Sasha <ndarray@mac.com> wrote:
On 6/30/06, Fernando Perez <fperez.net@gmail.com> wrote:
... Besides, decent unit tests will catch these problems. We all know that every scientific code in existence is unit tested to the smallest routine, so this shouldn't be a problem for anyone.
Is this a joke? Did anyone ever measured the coverage of numpy unittests? I would be surprized if it was more than 10%.
A very quick application of the coverage module, available at http://www.garethrees.org/2001/12/04/python-coverage/ gives me 41%: Name Stmts Exec Cover --------------------------------------------------- numpy 25 20 80% numpy._import_tools 235 175 74% numpy.add_newdocs 2 2 100% numpy.core 28 26 92% numpy.core.__svn_version__ 1 1 100% numpy.core._internal 99 48 48% numpy.core.arrayprint 251 92 36% numpy.core.defchararray 221 58 26% numpy.core.defmatrix 259 186 71% numpy.core.fromnumeric 319 153 47% numpy.core.info 3 3 100% numpy.core.ma 1612 1145 71% numpy.core.memmap 64 14 21% numpy.core.numeric 323 138 42% numpy.core.numerictypes 236 204 86% numpy.core.records 272 32 11% numpy.dft 6 4 66% numpy.dft.fftpack 128 31 24% numpy.dft.helper 35 32 91% numpy.dft.info 3 3 100% numpy.distutils 13 9 69% numpy.distutils.__version__ 4 4 100% numpy.distutils.ccompiler 296 49 16% numpy.distutils.exec_command 409 27 6% numpy.distutils.info 2 2 100% numpy.distutils.log 37 18 48% numpy.distutils.misc_util 945 174 18% numpy.distutils.unixccompiler 34 11 32% numpy.dual 41 27 65% numpy.f2py.info 2 2 100% numpy.lib 30 28 93% numpy.lib.arraysetops 121 59 48% numpy.lib.function_base 501 70 13% numpy.lib.getlimits 76 61 80% numpy.lib.index_tricks 223 56 25% numpy.lib.info 4 4 100% numpy.lib.machar 174 154 88% numpy.lib.polynomial 357 52 14% numpy.lib.scimath 51 19 37% numpy.lib.shape_base 220 24 10% numpy.lib.twodim_base 77 51 66% numpy.lib.type_check 110 75 68% numpy.lib.ufunclike 37 24 64% numpy.lib.utils 42 23 54% numpy.linalg 5 3 60% numpy.linalg.info 2 2 100% numpy.linalg.linalg 440 71 16% numpy.random 10 6 60% numpy.random.info 4 4 100% numpy.testing 3 3 100% numpy.testing.info 2 2 100% numpy.testing.numpytest 430 214 49% numpy.testing.utils 151 62 41% numpy.version 7 7 100% --------------------------------------------------- TOTAL 8982 3764 41% (I filtered out all the *.tests.* modules). Note that you have to import numpy after starting the coverage, because we use a lot of module-level code that wouldn't be caught otherwise. -- |>|\/|< /--------------------------------------------------------------------------\ |David M. Cooke http://arbutus.physics.mcmaster.ca/dmc/ |cookedm@physics.mcmaster.ca
As soon as I sent out my 10% estimate, I realized that someone will challenge it with a python level coverage statistics. My main concern is not what fraction of numpy functions is called by unit tests, but what fraction of special cases in the C code is exercised. I am not sure that David's statistics even answers the first question - I would guess it only counts statements in the pure python methods and ignores methods implemented in C. Can someone post C-level statistics from gcov <http://gcc.gnu.org/onlinedocs/gcc/Gcov.html> or a similar tool? On 6/30/06, David M. Cooke <cookedm@physics.mcmaster.ca> wrote:
On Fri, 30 Jun 2006 12:35:35 -0400 Sasha <ndarray@mac.com> wrote:
On 6/30/06, Fernando Perez <fperez.net@gmail.com> wrote:
... Besides, decent unit tests will catch these problems. We all know that every scientific code in existence is unit tested to the smallest routine, so this shouldn't be a problem for anyone.
Is this a joke? Did anyone ever measured the coverage of numpy unittests? I would be surprized if it was more than 10%.
A very quick application of the coverage module, available at http://www.garethrees.org/2001/12/04/python-coverage/ gives me 41%:
Name Stmts Exec Cover --------------------------------------------------- numpy 25 20 80% numpy._import_tools 235 175 74% numpy.add_newdocs 2 2 100% numpy.core 28 26 92% numpy.core.__svn_version__ 1 1 100% numpy.core._internal 99 48 48% numpy.core.arrayprint 251 92 36% numpy.core.defchararray 221 58 26% numpy.core.defmatrix 259 186 71% numpy.core.fromnumeric 319 153 47% numpy.core.info 3 3 100% numpy.core.ma 1612 1145 71% numpy.core.memmap 64 14 21% numpy.core.numeric 323 138 42% numpy.core.numerictypes 236 204 86% numpy.core.records 272 32 11% numpy.dft 6 4 66% numpy.dft.fftpack 128 31 24% numpy.dft.helper 35 32 91% numpy.dft.info 3 3 100% numpy.distutils 13 9 69% numpy.distutils.__version__ 4 4 100% numpy.distutils.ccompiler 296 49 16% numpy.distutils.exec_command 409 27 6% numpy.distutils.info 2 2 100% numpy.distutils.log 37 18 48% numpy.distutils.misc_util 945 174 18% numpy.distutils.unixccompiler 34 11 32% numpy.dual 41 27 65% numpy.f2py.info 2 2 100% numpy.lib 30 28 93% numpy.lib.arraysetops 121 59 48% numpy.lib.function_base 501 70 13% numpy.lib.getlimits 76 61 80% numpy.lib.index_tricks 223 56 25% numpy.lib.info 4 4 100% numpy.lib.machar 174 154 88% numpy.lib.polynomial 357 52 14% numpy.lib.scimath 51 19 37% numpy.lib.shape_base 220 24 10% numpy.lib.twodim_base 77 51 66% numpy.lib.type_check 110 75 68% numpy.lib.ufunclike 37 24 64% numpy.lib.utils 42 23 54% numpy.linalg 5 3 60% numpy.linalg.info 2 2 100% numpy.linalg.linalg 440 71 16% numpy.random 10 6 60% numpy.random.info 4 4 100% numpy.testing 3 3 100% numpy.testing.info 2 2 100% numpy.testing.numpytest 430 214 49% numpy.testing.utils 151 62 41% numpy.version 7 7 100% --------------------------------------------------- TOTAL 8982 3764 41%
(I filtered out all the *.tests.* modules). Note that you have to import numpy after starting the coverage, because we use a lot of module-level code that wouldn't be caught otherwise.
-- |>|\/|< /--------------------------------------------------------------------------\ |David M. Cooke http://arbutus.physics.mcmaster.ca/dmc/ |cookedm@physics.mcmaster.ca
It is not as bad as I thought, but there is certainly room for improvement. File `numpy/core/src/multiarraymodule.c' Lines executed:63.56% of 3290 File `numpy/core/src/arrayobject.c' Lines executed:59.70% of 5280 File `numpy/core/src/scalartypes.inc.src' Lines executed:31.67% of 963 File `numpy/core/src/arraytypes.inc.src' Lines executed:47.35% of 868 File `numpy/core/src/arraymethods.c' Lines executed:57.65% of 739 On 6/30/06, Sasha <ndarray@mac.com> wrote:
As soon as I sent out my 10% estimate, I realized that someone will challenge it with a python level coverage statistics. My main concern is not what fraction of numpy functions is called by unit tests, but what fraction of special cases in the C code is exercised. I am not sure that David's statistics even answers the first question - I would guess it only counts statements in the pure python methods and ignores methods implemented in C.
Can someone post C-level statistics from gcov <http://gcc.gnu.org/onlinedocs/gcc/Gcov.html> or a similar tool?
On 6/30/06, David M. Cooke <cookedm@physics.mcmaster.ca> wrote:
On Fri, 30 Jun 2006 12:35:35 -0400 Sasha <ndarray@mac.com> wrote:
On 6/30/06, Fernando Perez <fperez.net@gmail.com> wrote:
... Besides, decent unit tests will catch these problems. We all know that every scientific code in existence is unit tested to the smallest routine, so this shouldn't be a problem for anyone.
Is this a joke? Did anyone ever measured the coverage of numpy unittests? I would be surprized if it was more than 10%.
A very quick application of the coverage module, available at http://www.garethrees.org/2001/12/04/python-coverage/ gives me 41%:
Name Stmts Exec Cover --------------------------------------------------- numpy 25 20 80% numpy._import_tools 235 175 74% numpy.add_newdocs 2 2 100% numpy.core 28 26 92% numpy.core.__svn_version__ 1 1 100% numpy.core._internal 99 48 48% numpy.core.arrayprint 251 92 36% numpy.core.defchararray 221 58 26% numpy.core.defmatrix 259 186 71% numpy.core.fromnumeric 319 153 47% numpy.core.info 3 3 100% numpy.core.ma 1612 1145 71% numpy.core.memmap 64 14 21% numpy.core.numeric 323 138 42% numpy.core.numerictypes 236 204 86% numpy.core.records 272 32 11% numpy.dft 6 4 66% numpy.dft.fftpack 128 31 24% numpy.dft.helper 35 32 91% numpy.dft.info 3 3 100% numpy.distutils 13 9 69% numpy.distutils.__version__ 4 4 100% numpy.distutils.ccompiler 296 49 16% numpy.distutils.exec_command 409 27 6% numpy.distutils.info 2 2 100% numpy.distutils.log 37 18 48% numpy.distutils.misc_util 945 174 18% numpy.distutils.unixccompiler 34 11 32% numpy.dual 41 27 65% numpy.f2py.info 2 2 100% numpy.lib 30 28 93% numpy.lib.arraysetops 121 59 48% numpy.lib.function_base 501 70 13% numpy.lib.getlimits 76 61 80% numpy.lib.index_tricks 223 56 25% numpy.lib.info 4 4 100% numpy.lib.machar 174 154 88% numpy.lib.polynomial 357 52 14% numpy.lib.scimath 51 19 37% numpy.lib.shape_base 220 24 10% numpy.lib.twodim_base 77 51 66% numpy.lib.type_check 110 75 68% numpy.lib.ufunclike 37 24 64% numpy.lib.utils 42 23 54% numpy.linalg 5 3 60% numpy.linalg.info 2 2 100% numpy.linalg.linalg 440 71 16% numpy.random 10 6 60% numpy.random.info 4 4 100% numpy.testing 3 3 100% numpy.testing.info 2 2 100% numpy.testing.numpytest 430 214 49% numpy.testing.utils 151 62 41% numpy.version 7 7 100% --------------------------------------------------- TOTAL 8982 3764 41%
(I filtered out all the *.tests.* modules). Note that you have to import numpy after starting the coverage, because we use a lot of module-level code that wouldn't be caught otherwise.
-- |>|\/|< /--------------------------------------------------------------------------\ |David M. Cooke http://arbutus.physics.mcmaster.ca/dmc/ |cookedm@physics.mcmaster.ca
Sasha wrote:
It is not as bad as I thought, but there is certainly room for improvement.
File `numpy/core/src/multiarraymodule.c' Lines executed:63.56% of 3290
File `numpy/core/src/arrayobject.c' Lines executed:59.70% of 5280
File `numpy/core/src/scalartypes.inc.src' Lines executed:31.67% of 963
File `numpy/core/src/arraytypes.inc.src' Lines executed:47.35% of 868
File `numpy/core/src/arraymethods.c' Lines executed:57.65% of 739
This is great. How did you generate that? This is exactly the kind of thing we need to be doing for the beta release cycle. I would like these numbers very close to 100% by the time 1.0 final comes out at the end of August / first of September. But, we need help to write the unit tests. What happens if you run the scipy test suite? -Travis
On 6/30/06, Travis Oliphant <oliphant@ee.byu.edu> wrote:
This is great. How did you generate [the coverage statistic]?
It was really a hack. I've configured python using $ ./configure --enable-debug CC="gcc -fprofile-arcs -ftest-coverage" CXX="c++ gcc -fprofile-arcs -ftest-coverage" (I hate distutils!) Then I installed numpy and ran numpy.test(). Some linalg related tests failed which should be fixed by figuring out how to pass -fprofile-arcs -ftest-coverage options to the fortran compiler. The only non-obvious step in using gcov was that I had to tell it where to find object files: $ gcov -o build/temp.linux-x86_64-2.4/numpy/core/src numpy/core/src/*.c
... What happens if you run the scipy test suite?
I don't know because I don't use scipy. Sorry.
"Software developers also use coverage testing in concert with testsuites, to make sure software is actually good enough for a release. " -- Gcov Manual I think if we can improve the test coverage, it will speak volumes about the quality of numpy. Does anyone know if it is possible to instrument numpy libraries without having to instrument python itself? It would be nice to make the coverage reports easily available either by including a generating script with the source distribution or by publishing the reports for the releases. On 6/30/06, Sasha <ndarray@mac.com> wrote:
It is not as bad as I thought, but there is certainly room for improvement.
File `numpy/core/src/multiarraymodule.c' Lines executed:63.56% of 3290
File `numpy/core/src/arrayobject.c' Lines executed:59.70% of 5280
File `numpy/core/src/scalartypes.inc.src' Lines executed:31.67% of 963
File `numpy/core/src/arraytypes.inc.src' Lines executed:47.35% of 868
File `numpy/core/src/arraymethods.c' Lines executed:57.65% of 739
On 6/30/06, Sasha <ndarray@mac.com> wrote:
As soon as I sent out my 10% estimate, I realized that someone will challenge it with a python level coverage statistics. My main concern is not what fraction of numpy functions is called by unit tests, but what fraction of special cases in the C code is exercised. I am not sure that David's statistics even answers the first question - I would guess it only counts statements in the pure python methods and ignores methods implemented in C.
Can someone post C-level statistics from gcov <http://gcc.gnu.org/onlinedocs/gcc/Gcov.html> or a similar tool?
On 6/30/06, David M. Cooke <cookedm@physics.mcmaster.ca> wrote:
On Fri, 30 Jun 2006 12:35:35 -0400 Sasha <ndarray@mac.com> wrote:
On 6/30/06, Fernando Perez <fperez.net@gmail.com> wrote:
... Besides, decent unit tests will catch these problems. We all know that every scientific code in existence is unit tested to the smallest routine, so this shouldn't be a problem for anyone.
Is this a joke? Did anyone ever measured the coverage of numpy unittests? I would be surprized if it was more than 10%.
A very quick application of the coverage module, available at http://www.garethrees.org/2001/12/04/python-coverage/ gives me 41%:
Name Stmts Exec Cover --------------------------------------------------- numpy 25 20 80% numpy._import_tools 235 175 74% numpy.add_newdocs 2 2 100% numpy.core 28 26 92% numpy.core.__svn_version__ 1 1 100% numpy.core._internal 99 48 48% numpy.core.arrayprint 251 92 36% numpy.core.defchararray 221 58 26% numpy.core.defmatrix 259 186 71% numpy.core.fromnumeric 319 153 47% numpy.core.info 3 3 100% numpy.core.ma 1612 1145 71% numpy.core.memmap 64 14 21% numpy.core.numeric 323 138 42% numpy.core.numerictypes 236 204 86% numpy.core.records 272 32 11% numpy.dft 6 4 66% numpy.dft.fftpack 128 31 24% numpy.dft.helper 35 32 91% numpy.dft.info 3 3 100% numpy.distutils 13 9 69% numpy.distutils.__version__ 4 4 100% numpy.distutils.ccompiler 296 49 16% numpy.distutils.exec_command 409 27 6% numpy.distutils.info 2 2 100% numpy.distutils.log 37 18 48% numpy.distutils.misc_util 945 174 18% numpy.distutils.unixccompiler 34 11 32% numpy.dual 41 27 65% numpy.f2py.info 2 2 100% numpy.lib 30 28 93% numpy.lib.arraysetops 121 59 48% numpy.lib.function_base 501 70 13% numpy.lib.getlimits 76 61 80% numpy.lib.index_tricks 223 56 25% numpy.lib.info 4 4 100% numpy.lib.machar 174 154 88% numpy.lib.polynomial 357 52 14% numpy.lib.scimath 51 19 37% numpy.lib.shape_base 220 24 10% numpy.lib.twodim_base 77 51 66% numpy.lib.type_check 110 75 68% numpy.lib.ufunclike 37 24 64% numpy.lib.utils 42 23 54% numpy.linalg 5 3 60% numpy.linalg.info 2 2 100% numpy.linalg.linalg 440 71 16% numpy.random 10 6 60% numpy.random.info 4 4 100% numpy.testing 3 3 100% numpy.testing.info 2 2 100% numpy.testing.numpytest 430 214 49% numpy.testing.utils 151 62 41% numpy.version 7 7 100% --------------------------------------------------- TOTAL 8982 3764 41%
(I filtered out all the *.tests.* modules). Note that you have to import numpy after starting the coverage, because we use a lot of module-level code that wouldn't be caught otherwise.
-- |>|\/|< /--------------------------------------------------------------------------\ |David M. Cooke http://arbutus.physics.mcmaster.ca/dmc/ |cookedm@physics.mcmaster.ca
On 6/30/06, Sasha <ndarray@mac.com> wrote:
File `numpy/core/src/arraytypes.inc.src' Lines executed:47.35% of 868
This is was an overly optimistic number. More relevant is the following obtained by disabling the #line directives: File `build/src.linux-x86_64-2.4/numpy/core/src/arraytypes.inc' Lines executed:26.71% of 5010
---------- Forwarded message ---------- From: Alexander Belopolsky <alexander.belopolsky@gmail.com> Date: Jun 30, 2006 7:01 PM Subject: Re: [Numpy-discussion] Time for beta1 of NumPy 1.0 To: "David M. Cooke" <cookedm@physics.mcmaster.ca> Cc: Fernando Perez <fperez.net@gmail.com>, numpy-discussion@lists.sourceforge.net On 6/30/06, Sasha <ndarray@mac.com> wrote:
File `numpy/core/src/arraytypes.inc.src' Lines executed:47.35% of 868
This is was an overly optimistic number. More relevant is the following obtained by disabling the #line directives: File `build/src.linux-x86_64-2.4/numpy/core/src/arraytypes.inc' Lines executed:26.71% of 5010
Alexander Belopolsky wrote:
On 6/30/06, Sasha <ndarray@mac.com> wrote:
File `numpy/core/src/arraytypes.inc.src' Lines executed:47.35% of 868
This is was an overly optimistic number. More relevant is the following obtained by disabling the #line directives:
File `build/src.linux-x86_64-2.4/numpy/core/src/arraytypes.inc' Lines executed:26.71% of 5010
Yes, this is true, but the auto-generation means that success for one instantiation increases the likelihood for success in the others. So, the 26.7% is probably too pessimistic. -Travis
On 6/30/06, Travis Oliphant <oliphant@ee.byu.edu> wrote:
... Yes, this is true, but the auto-generation means that success for one instantiation increases the likelihood for success in the others. So, the 26.7% is probably too pessimistic.
Agree, but "increases the likelihood" != "guarantees". For example, relying on nan propagation is a fine strategy for the floating point case, but will not work for integer types. Similarly code relying on wrap on overflow will fail when type=float. The best solution would be to autogenerate test cases so that all types are tested where appropriate.
Sasha wrote:
On 6/30/06, Travis Oliphant <oliphant@ee.byu.edu> wrote:
... Yes, this is true, but the auto-generation means that success for one instantiation increases the likelihood for success in the others. So, the 26.7% is probably too pessimistic.
Agree, but "increases the likelihood" != "guarantees".
Definitely...
The best solution would be to autogenerate test cases so that all types are tested where appropriate.
Right on again... Here's a chance for all the Python-only coders to jump in and make a splash.... -Travis
I've got to say +1 for Float64 too. I write a lot of numpy code, and this bites me at least once a week. You'd think I'd learn better, but it's just so easy to screw this up when you have to switch back and forth between matlab (which I'm forced to TA) and numpy (which I use for Real Work). ...Eric
Since I was almost alone with my negative vote on the float64 default, I decided to give some more thought to the issue. I agree there are strong reasons to make the change. In addition to the points in the original post, float64 type is much more closely related to the well-known Python float than int32 to Python long. For example no-one would be surprised by either
float64(0)/float64(0) nan
or
float(0)/float(0) Traceback (most recent call last): File "<stdin>", line 1, in ? ZeroDivisionError: float division
but
int32(0)/int32(0) 0
is much more difficult to explain. As is
int32(2)**32 0
compared to
int(2)**32 4294967296L
In short, arrays other than float64 are more of the hard-hat area and their properties may be surprising to the novices. Exposing novices to non-float64 arrays through default constructors is a bad thing. Another argument that I find compelling is that we are in a now or never situation. No one expects that their Numeric or numarray code will work in numpy 1.0 without changes, but I don't think people will tolerate major breaks in backward compatibility in the future releases. If we decide to change the default, let's do it everywhere including array constructors and arange. The later is more controversial, but I still think it is worth doing (will give reasons in the future posts). Changing the defaults only in some functions or providing overrides to functions will only lead to more confusion. My revised vote is -0. On 6/30/06, Eric Jonas <jonas@mwl.mit.edu> wrote:
I've got to say +1 for Float64 too. I write a lot of numpy code, and this bites me at least once a week. You'd think I'd learn better, but it's just so easy to screw this up when you have to switch back and forth between matlab (which I'm forced to TA) and numpy (which I use for Real Work).
...Eric
On Thu, 29 Jun 2006, Travis Oliphant wrote:
I think it's time for the first beta-release of NumPy 1.0
I'd like to put it out within 2 weeks. Please make any comments or voice major concerns so that the 1.0 release series can be as stable as possible.
One issue I ran across that I have not seen addressed is the namespace of arrayobject.h. I'm not refering to C++ namespaces but prefixing symbols to avoid clashes with user's code. The externals start with PyArray. But I had symbol redefinition errors for byte, MAX_DIMS, and ERR. That is, I already had defines for MAX_DIMS and ERR and a typedef for byte in my code. When adding a numpy interface to my library I had to undef these symbols before including arrayobject.h. Is there a way to move implemention defines, like ERR, into a separate header. Or if they're part of the API, prefix the symbols? Lee Taylor
participants (38)
-
Alan G Isaac
-
Alan Isaac
-
Albert Strasheim
-
Alexander Belopolsky
-
Bill Baxter
-
Bruce Southey
-
Charles R Harris
-
Christopher Barker
-
Christopher Hanley
-
Darren Dale
-
David Cournapeau
-
David M. Cooke
-
Ed Schofield
-
Eric Jonas
-
Eric Jonas
-
Erin Sheldon
-
Fernando Perez
-
Glen W. Mabey
-
James Graham
-
Jon Wright
-
Jonathan Taylor
-
Joris De Ridder
-
Keith Goodman
-
Lee Taylor
-
Louis Cordier
-
Matthew Brett
-
Paulo J. S. Silva
-
Robert Kern
-
Sasha
-
Scott Ransom
-
Simon Burton
-
Stephan Tolksdorf
-
Steve Lianoglou
-
Tim Hochberg
-
Tim Leslie
-
Travis N. Vaught
-
Travis Oliphant
-
Travis Oliphant