
Hi List, I would like to make a formal proposal regarding with the subject of previous discussions in that list. This message is a bit long, but I've tried my best to expose my thoughts as clearly as possible. Is Numarray a good replacement of Numeric? ========================================== It has been a debate lately with regard to the convinience of claiming numarray to be a replacement of Numeric. Perhaps the main source for this claim has been the home page of the Numeric project [1]: """ If you are new to Numerical Python, please use Numarray. The older module, Numeric, is unsupported. At this writing Numarray is slower for very small arrays but faster for large ones. Numarray contains facilities to help you convert older code to use it. Some parts of the community have not made the switch yet but the Numarray libraries have been carefully named differently so that Numeric and Numarray can coexist in one application. """ So the paragraph is giving the impression that Numeric was going to be deprecated. While I recognize that I was between those that this statement lent us to think about numarray as a kind of 'Next Generation of Numeric', it seems now (from the previous discussions) that this was sort of unfortunate/misleading observation. In fact, Perry Greenfield, one of the main authors of numarray will be taking some steps in order to correct that observation in the near future [2]. However, I'd like to believe (and with me, quite a few more people for sure) that the mentioned statement, apart of creating some confusion, would eventually easy the long term convergence of both packages. This would be great not only to unify efforts, but also to allow the inclusion of Numeric/Numarray in the Python Standard Library, which would be a Good Thing. Numarray vs Numeric: Pros and Cons ================================== It's worth remembering that Numeric has been a major breakthrough in introducing the capability to deal with large (homogeneous) datasets in Python in a very efficient mannner. In my opinion Numarray is, generally speaking, a very good package as well with many interesting new features that lack Numeric. Between the main advantages of Numarray vs Numeric I can list the next (although I can be a bit misleaded here because of my own user cases of both libraries): - Memory-mapped objects: Allow working with on-disk numarray objects like if they were in-memory. - RecArrays: Objects that allow to deal with heterogeneous datasets (tables) in an efficient manner. This ought to be very beneficial in many fields. - CharArrays: Allow to work with large amounts of fixed and variable length strings. I see this implementation much more powerful that Numeric. - Index arrays within subscripts: e.g. if ind = array([4, 4, 0, 2]) and x = 2*arange(6), x[inx] results in array([8, 8, 0, 4]) - New design interface: We should not forget that numarray has been designed from the ground with Python Library integration in mind (or at least, this is my impression). So, it should have more chances (if there is some hope) to enter in the Standard Library than Numeric. [See [3] for a more acurate description of differences] In this point, it would be also fair to recognize the important effort that has been done by the Numarray crew (and others) to create a fairly good replacement for Numeric: the API is getting closer bit a bit, the numerix module makes easier to support both Numeric and numarray by an application (see [5] for a concrete case of switching between Numeric and Numarray in SciPy or [6] for matplotlib), the current effort to support Numarray in SciPy, and last but not least, their good responsiveness to enhancements in that respect. The real problem for Numarray: Object Creation Time =================================================== On the other hand, the main drawback of Numarray vs Numeric is, in my opinion, its poor performance regarding object creation. This might look like a banal thing at first glance, but it is not in many cases. One example recently reported in this list is:
from timeit import Timer setup = 'import Numeric; a = Numeric.arange(2000);a.shape=(1000,2)' Timer('for i in range(len(a)): row=a[i]', setup).timeit(number=100) 0.12782907485961914 setup = 'import numarray; a = numarray.arange(2000);a.shape=(1000,2)' Timer('for i in range(len(a)): row=a[i]', setup).timeit(number=100) 1.2013700008392334
So, numarray performs 10 times slower than Numeric not because its indexing access code would be 10 times slower, but mainly due to the fact that object creation is between 5 and 10 times slower, and the loop above implies an object creation on each iteration. Other case of use where object creation time is important can be seen in [4]. Proposal for making of Numarray a real Numeric 'NG' (Next Generation) ===================================================================== Provided that the most important reason (IMO) to not consider Numarray to be a good replacement of Numeric is object creation time, I would like to propose a coordinated effort to solve this precise issue. First of all, it would be nice if the most experienced people with Numarray (i.e. the Numarray crew) would give a deep analysis to that, and end with a series of small, autocontained benchmarks files that clearly exposes the possible bottlenecks. This maybe hard to do, but this is crucial. Once the problem has been reduced to optimize these small, auto-contained benchmarks, they can be made publicly accessible together with an explanation of what the problem is and what the benchmarks are intended for. After this, I suggest a call for contributions (in this list and scipy list, for example) on optimizing this code and spark discussions on that (a Wiki can work great here). I'm pretty sure that there is enough brain and challenge-hungry people in these lists to contribute solving the problem. If after these efforts, there are issues that can't be solved yet, at least the problem would be much more centered, and much more people can think on that (hopefully, the solution may not depend on the intricacies of Numeric/Numarray), so it maybe possible to sent it to the general Python list and hope that some guru would be willing to help us on that. Well, this is my proposal. Uh, sorry for the length of the message. Perhaps you may think that I've smoked too much and maybe you are right. However, I'm so convinced that such a Numeric/Numarray unification is going to be a Very Good Thing that I unrecklessly spend some time making this proposal (and look forward contributing in some way or another if this is going to be done). Cheers, [1] http://www.pfdubois.com/numpy/ [2] http://sourceforge.net/mailarchive/message.php?msg_id=10608642 [3] http://stsdas.stsci.edu/numarray/numarray-1.1.html/node18.html [4] http://sourceforge.net/mailarchive/message.php?msg_id=10582525 [5] http://aspn.activestate.com/ASPN/Mail/Message/scipy-dev/2299767 [6] http://matplotlib.sourceforge.net/matplotlib.numerix.html --
qo< Francesc Altet http://www.carabos.com/ V V Cárabos Coop. V. Enjoy Data ""