Proposal for making of Numarray a real Numeric 'NG'
Hi List, I would like to make a formal proposal regarding with the subject of previous discussions in that list. This message is a bit long, but I've tried my best to expose my thoughts as clearly as possible. Is Numarray a good replacement of Numeric? ========================================== It has been a debate lately with regard to the convinience of claiming numarray to be a replacement of Numeric. Perhaps the main source for this claim has been the home page of the Numeric project [1]: """ If you are new to Numerical Python, please use Numarray. The older module, Numeric, is unsupported. At this writing Numarray is slower for very small arrays but faster for large ones. Numarray contains facilities to help you convert older code to use it. Some parts of the community have not made the switch yet but the Numarray libraries have been carefully named differently so that Numeric and Numarray can coexist in one application. """ So the paragraph is giving the impression that Numeric was going to be deprecated. While I recognize that I was between those that this statement lent us to think about numarray as a kind of 'Next Generation of Numeric', it seems now (from the previous discussions) that this was sort of unfortunate/misleading observation. In fact, Perry Greenfield, one of the main authors of numarray will be taking some steps in order to correct that observation in the near future [2]. However, I'd like to believe (and with me, quite a few more people for sure) that the mentioned statement, apart of creating some confusion, would eventually easy the long term convergence of both packages. This would be great not only to unify efforts, but also to allow the inclusion of Numeric/Numarray in the Python Standard Library, which would be a Good Thing. Numarray vs Numeric: Pros and Cons ================================== It's worth remembering that Numeric has been a major breakthrough in introducing the capability to deal with large (homogeneous) datasets in Python in a very efficient mannner. In my opinion Numarray is, generally speaking, a very good package as well with many interesting new features that lack Numeric. Between the main advantages of Numarray vs Numeric I can list the next (although I can be a bit misleaded here because of my own user cases of both libraries): - Memory-mapped objects: Allow working with on-disk numarray objects like if they were in-memory. - RecArrays: Objects that allow to deal with heterogeneous datasets (tables) in an efficient manner. This ought to be very beneficial in many fields. - CharArrays: Allow to work with large amounts of fixed and variable length strings. I see this implementation much more powerful that Numeric. - Index arrays within subscripts: e.g. if ind = array([4, 4, 0, 2]) and x = 2*arange(6), x[inx] results in array([8, 8, 0, 4]) - New design interface: We should not forget that numarray has been designed from the ground with Python Library integration in mind (or at least, this is my impression). So, it should have more chances (if there is some hope) to enter in the Standard Library than Numeric. [See [3] for a more acurate description of differences] In this point, it would be also fair to recognize the important effort that has been done by the Numarray crew (and others) to create a fairly good replacement for Numeric: the API is getting closer bit a bit, the numerix module makes easier to support both Numeric and numarray by an application (see [5] for a concrete case of switching between Numeric and Numarray in SciPy or [6] for matplotlib), the current effort to support Numarray in SciPy, and last but not least, their good responsiveness to enhancements in that respect. The real problem for Numarray: Object Creation Time =================================================== On the other hand, the main drawback of Numarray vs Numeric is, in my opinion, its poor performance regarding object creation. This might look like a banal thing at first glance, but it is not in many cases. One example recently reported in this list is:
from timeit import Timer setup = 'import Numeric; a = Numeric.arange(2000);a.shape=(1000,2)' Timer('for i in range(len(a)): row=a[i]', setup).timeit(number=100) 0.12782907485961914 setup = 'import numarray; a = numarray.arange(2000);a.shape=(1000,2)' Timer('for i in range(len(a)): row=a[i]', setup).timeit(number=100) 1.2013700008392334
So, numarray performs 10 times slower than Numeric not because its indexing access code would be 10 times slower, but mainly due to the fact that object creation is between 5 and 10 times slower, and the loop above implies an object creation on each iteration. Other case of use where object creation time is important can be seen in [4]. Proposal for making of Numarray a real Numeric 'NG' (Next Generation) ===================================================================== Provided that the most important reason (IMO) to not consider Numarray to be a good replacement of Numeric is object creation time, I would like to propose a coordinated effort to solve this precise issue. First of all, it would be nice if the most experienced people with Numarray (i.e. the Numarray crew) would give a deep analysis to that, and end with a series of small, autocontained benchmarks files that clearly exposes the possible bottlenecks. This maybe hard to do, but this is crucial. Once the problem has been reduced to optimize these small, auto-contained benchmarks, they can be made publicly accessible together with an explanation of what the problem is and what the benchmarks are intended for. After this, I suggest a call for contributions (in this list and scipy list, for example) on optimizing this code and spark discussions on that (a Wiki can work great here). I'm pretty sure that there is enough brain and challenge-hungry people in these lists to contribute solving the problem. If after these efforts, there are issues that can't be solved yet, at least the problem would be much more centered, and much more people can think on that (hopefully, the solution may not depend on the intricacies of Numeric/Numarray), so it maybe possible to sent it to the general Python list and hope that some guru would be willing to help us on that. Well, this is my proposal. Uh, sorry for the length of the message. Perhaps you may think that I've smoked too much and maybe you are right. However, I'm so convinced that such a Numeric/Numarray unification is going to be a Very Good Thing that I unrecklessly spend some time making this proposal (and look forward contributing in some way or another if this is going to be done). Cheers, [1] http://www.pfdubois.com/numpy/ [2] http://sourceforge.net/mailarchive/message.php?msg_id=10608642 [3] http://stsdas.stsci.edu/numarray/numarray-1.1.html/node18.html [4] http://sourceforge.net/mailarchive/message.php?msg_id=10582525 [5] http://aspn.activestate.com/ASPN/Mail/Message/scipy-dev/2299767 [6] http://matplotlib.sourceforge.net/matplotlib.numerix.html --
qo< Francesc Altet http://www.carabos.com/ V V Cárabos Coop. V. Enjoy Data ""
On Fri, Jan 21, 2005 at 02:13:45PM +0100, Francesc Altet wrote:
""" If you are new to Numerical Python, please use Numarray. The older module, Numeric, is unsupported. At this writing Numarray is slower for very small arrays but faster for large ones. Numarray contains facilities to help you convert older code to use it. Some parts of the community have not made the switch yet but the Numarray libraries have been carefully named differently so that Numeric and Numarray can coexist in one application. """
Another problem is that Numeric is extremely poorly advertised/marketed. - There is no single keyword for Numeric: it is referred to as "Numerical", "Numeric" and "numpy". Both "Numerical" and "numpy" are also used to refer to numarray. - Numeric does not have a home page of its own. The Sourceforge "Numerical" page lists both numarray and Numeric (which, coincidentally, is referred to as "numpy"). - The #1 & #2 Google results for "numeric python" are the numpy.org page, which is out-of-date, and advertises numarray as being a replacement for Numeric. Plus, what appears to be the main link for Numeric, "Release 22.0" points to a page with both numarray and Numeric releases, numarray first, and Numeric releases named "numpy". Could you try to be more confusing? - None of the top 10 Google links for "numeric python" point to the Sourceforge page. - A "numeric python" search on sourceforge lists 24 projects before the Numerical Python page. Jason
Somehow 349,000+ accesses to the Numeric home page have occurred despite the fact that those searching for it did not get a good education at MIT. I had it on my own site so as to be able to use better tools than SF lets you do. We're in the middle of changing the ownership of the page due to my impending retirement, so perhaps that caused some confusion. The Numeric/numpy/Numerical thing has a long funny history. You had to be there. It isn't right but it is what it is. When I was leading the project there was a general feeling that a lot of the things we wanted to do with Numeric were going to be very hard to do with the existing implementation, some of which was generated by a code generator that had gotten lost, and some of which was impeneratable because it was written by a genius who went to (you guessed it) MIT. My intention was to replace Numeric with a quickly-written better implementation. That is why the Numeric page says what it says. I've left it that way as a reminder of the goal, which I continue to believe is important. Besides cleaning it up, the other motivation was to back off the 'performance at all cost' design enough that we would be 'safe' enough to qualify for the Python distribution and become a standard module. Numeric was written without many safety checks *on purpose*. Over time opinions about that philosphy changed. In fact, the team that wrote numarray did not do what I asked for, leading to the present confusion but also to, as noted by Altet, some nice features. I think it was unfortunate that this happened but as with most open source projects the person doing the work does the work the way they want and partly to satisfy their own needs. But they do the work, all credit to them. I'm not complaining. There are really only a couple of problems (object arrays and array creation time) that can be fixed. What is wrong with the array creation time is obvious. It is written in Python and has too much flexibility, which costs time to decode. Make a raw C-level creator with less choice and I bet it will be ok. Somebody help these guys; this isn't a product it is an open source project. Let's get to the promised land and retire Numeric/Numerical/numpy. Jason Rennie wrote:
On Fri, Jan 21, 2005 at 02:13:45PM +0100, Francesc Altet wrote:
""" If you are new to Numerical Python, please use Numarray. The older module, Numeric, is unsupported. At this writing Numarray is slower for very small arrays but faster for large ones. Numarray contains facilities to help you convert older code to use it. Some parts of the community have not made the switch yet but the Numarray libraries have been carefully named differently so that Numeric and Numarray can coexist in one application. """
Another problem is that Numeric is extremely poorly advertised/marketed.
- There is no single keyword for Numeric: it is referred to as "Numerical", "Numeric" and "numpy". Both "Numerical" and "numpy" are also used to refer to numarray.
- Numeric does not have a home page of its own. The Sourceforge "Numerical" page lists both numarray and Numeric (which, coincidentally, is referred to as "numpy").
- The #1 & #2 Google results for "numeric python" are the numpy.org page, which is out-of-date, and advertises numarray as being a replacement for Numeric. Plus, what appears to be the main link for Numeric, "Release 22.0" points to a page with both numarray and Numeric releases, numarray first, and Numeric releases named "numpy". Could you try to be more confusing?
- None of the top 10 Google links for "numeric python" point to the Sourceforge page.
- A "numeric python" search on sourceforge lists 24 projects before the Numerical Python page.
Jason
------------------------------------------------------- This SF.Net email is sponsored by: IntelliVIEW -- Interactive Reporting Tool for open source databases. Create drag-&-drop reports. Save time by over 75%! Publish reports on the web. Export to DOC, XLS, RTF, etc. Download a FREE copy at http://www.intelliview.com/go/osdn_nl _______________________________________________ Numpy-discussion mailing list Numpy-discussion@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/numpy-discussion
On Fri, Jan 21, 2005 at 10:17:49AM -0800, Paul F. Dubois wrote:
Somehow 349,000+ accesses to the Numeric home page have occurred despite the fact that those searching for it did not get a good education at MIT. I had it on my own site so as to be able to use better tools than SF lets you do. We're in the middle of changing the ownership of the page due to my impending retirement, so perhaps that caused some confusion.
Sorry if I came off "big headed." Was just trying to point out that, to an outsider, it's, well, confusing. And, there are some very simple things that could be done to alleviate the confusion: a Numeric (not Numerical, not numarray) home page, consistent nomenclature. I'm not asking you to take your page down. I agree, it's a cool snapshot of history. And, I agree with you: it's often easier to host a home page on your own server. I've gone through hell trying to host the ifile home page on Savannah. I just think there needs to be a "Numeric" page somewhere with updated release information, pointers to current documentation, short explanation of how Numeric is different from numarray and maybe a short synopsis of the history behind the project(s). :) I'm also not trying to belittle the great achievements that are Numeric and numarray. I think these are both awesome packages. I sure can't claim to have written anything as useful. Jason
Paul Dubois wrote:
My intention was to replace Numeric with a quickly-written better implementation. That is why the Numeric page says what it says. I've left it that way as a reminder of the goal, which I continue to believe is important. Besides cleaning it up, the other motivation was to back off the 'performance at all cost' design enough that we would be 'safe' enough to qualify for the Python distribution and become a standard module. Numeric was written without many safety checks *on purpose*. Over time opinions about that philosphy changed.
In fact, the team that wrote numarray did not do what I asked for, leading to the present confusion but also to, as noted by Altet, some nice features. I think it was unfortunate that this happened but as with most open source projects the person doing the work does the work the way they want and partly to satisfy their own needs. But they do the work, all credit to them. I'm not complaining.
Just to clarify, if we could have found a way of doing a basic version and layering on the extra features we would have. To take a specific example, if you want to be able to access data in a buffer that is spaced by intervals not a multiple of the data element size (which is what recarray needs to do) then one needs to handle non-aligned data in the basic version (otherwise segfaults will happen). I couldn't see a way of handling such arrays without the mechanism for handling non-aligned data being built into the basic mechanism (if someone else can, I'd like to see it). So it's a good design approach, but sometimes things can't work that way. Perry
On Jan 21, 2005, at 8:13 AM, Francesc Altet wrote:
Hi List,
I would like to make a formal proposal regarding with the subject of previous discussions in that list. This message is a bit long, but I've tried my best to expose my thoughts as clearly as possible.
[...] I think Francesc has summarized things very well and offered up some good ideas for how to proceed in speeding up small array performance. Particularly key is understanding exactly where the the time is going in the processing. We (read Todd, really) has some suspicions about what the bottlenecks are and I'll include his conclusions about these below. I just caution that getting good benchmark information to determine this correctly can be more difficult that it would first seem for the reasons he mentions. But anyway I'm certainly supportive of getting an effort going to address this issue (again, we can give support as we described before, but if it is to be done in the near term, others will have to actually do most of the work). A wiki page sounds like a good idea, and it probably should be hosted on scipy.org. If we see any response to this I'll ask to have one set up.
The real problem for Numarray: Object Creation Time ===================================================
On the other hand, the main drawback of Numarray vs Numeric is, in my opinion, its poor performance regarding object creation. This might look like a banal thing at first glance, but it is not in many cases. One example recently reported in this list is:
from timeit import Timer setup = 'import Numeric; a = Numeric.arange(2000);a.shape=(1000,2)' Timer('for i in range(len(a)): row=a[i]', setup).timeit(number=100) 0.12782907485961914 setup = 'import numarray; a = numarray.arange(2000);a.shape=(1000,2)' Timer('for i in range(len(a)): row=a[i]', setup).timeit(number=100) 1.2013700008392334
So, numarray performs 10 times slower than Numeric not because its indexing access code would be 10 times slower, but mainly due to the fact that object creation is between 5 and 10 times slower, and the loop above implies an object creation on each iteration.
Other case of use where object creation time is important can be seen in [4].
It probably is perhaps too narrow to focus on just array creation. It likely is the biggest factor but there may be other issues as well. For the above case it's possible that the indexing mechanism itself can be speeded up, and that is likely part of the ratio of speeds being 5 to 10 times slower. Todd's comments: Here's a little input for how someone can continue looking at this. Here's the semi-solid info I have at the moment on ufunc execution time; included within it is a breakdown of some of the costs in the C-API function NA_NewAllFromBuffer() located in newarray.ch. I haven't been working on this; this is where I left off. My timing module, numarray.teacup, may be useful to someone else trying to measure timing; the accuracy of the measurements is questionable either due to bugs or the intrusiveness of the inline code disturbing the processor cache (it does dictionary operations for each timing measurement). I tried to design it so that timing measurements can be nested, with limited success. Nevertheless, as a rough guide that provides microsecond level measurements, I have found it useful. It only works on linux. Build numarray like this: % rm -rf build % python setup.py install --timing --force Then do this to see the cost of the generated output array in an add():
import numarray as na a = na.arange(10) b = a.copy() for i in range(101): ... jnk = na.add(a,b) ... import numarray.teacup as tc tc.report() Src/_ufuncmodule.c _cache_exec2 fast count: 101 avg_usec: 4.73 cycles: 0 Src/_ufuncmodule.c _cache_lookup2 broadcast count: 101 avg_usec: 4.46 cycles: 0 Src/_ufuncmodule.c _cache_lookup2 hit or miss count: 101 avg_usec: 27.50 cycles: 6 Src/_ufuncmodule.c _cache_lookup2 hit output count: 100 avg_usec: 25.22 cycles: 5 Src/_ufuncmodule.c _cache_lookup2 internal count: 101 avg_usec: 5.20 cycles: 0 Src/_ufuncmodule.c _cache_lookup2 miss count: 0 avg_usec: nan cycles: 0 Src/_ufuncmodule.c cached_dispatch2 exec count: 101 avg_usec: 13.65 cycles: 1 Src/_ufuncmodule.c cached_dispatch2 lookup count: 101 avg_usec: 37.35 cycles: 9 Src/_ufuncmodule.c cached_dispatch2 overall count: 101 avg_usec: 53.36 cycles: 12 Src/libnumarraymodule.c NewArray __new__ count: 304 avg_usec: 8.12 cycles: 0 Src/libnumarraymodule.c NewArray buffer count: 304 avg_usec: 5.37 cycles: 0 Src/libnumarraymodule.c NewArray misc count: 304 avg_usec: 0.25 cycles: 0 Src/libnumarraymodule.c NewArray type count: 304 avg_usec: 0.27 cycles: 0 Src/libnumarraymodule.c NewArray update count: 304 avg_usec: 1.16 cycles: 0 Src/libteacupmodule.c calibration nested count:999999 avg_usec: -0.00 cycles: 1 Src/libteacupmodule.c calibration top count:999999 avg_usec: -0.00 cycles: 0
I would caution anyone working on this that there are at least three locations in the code (some of it redundancy inserted for the purpose of performance optimization, some of it the consequences of having a class hierarchy) that need to be considered: _ndarraymodule.c, _numarraymodule.c, and newarray.ch. My suspicions: 1. Having an independent buffer/memory object rather than just mallocing the array storage. This is expensive in a lot of ways: it's an extra hidden object and also a Python function call. The ways I've thought of for Eliminating this add complexity and make numarray even more modal than it already is. 2. Being implemented as a new style class; this is an unknown cost and involves the creation of still other extra objects, like the object() dictionary, but presumably that has been fairly well optimized already. Calling up the object hierarchy to build the object (__new__) probably has additional overheads. Things to try: 1. Retain a free-list/cache of small objects and re-use them rather than creating/destroying all the time. Use a constant storage size and fit any small array into that space. I think this is the killer technique that would solve the small array problem without kludging up everything else. Do this first, and only then see if (2) or (3) need to be done. 2. Flatten the class hiearchy more (at least for construction) and remove any redundancy by refactoring. 3. Build in a malloc/free mode for array storage which bypasses the memorymodule completely and creates buffer objects when _data is accessed. Use the OWN_DATA bit in arrayobject.flags.
The real problem for Numarray: Object Creation Time
===================================================
On the other hand, the main drawback of Numarray vs Numeric is, in my opinion, its poor performance regarding object creation. This might look like a banal thing at first glance, but it is not in many cases. One example recently reported in this list is:
from timeit import Timer setup = 'import Numeric; a = Numeric.arange(2000);a.shape=(1000,2)' Timer('for i in range(len(a)): row=a[i]', setup).timeit(number=100) 0.12782907485961914 setup = 'import numarray; a = numarray.arange(2000);a.shape=(1000,2)' Timer('for i in range(len(a)): row=a[i]', setup).timeit(number=100) 1.2013700008392334
So, numarray performs 10 times slower than Numeric not because its indexing access code would be 10 times slower, but mainly due to the fact that object creation is between 5 and 10 times slower, and the loop above implies an object creation on each iteration.
Other case of use where object creation time is important can be seen in [4].
One thing to note here is that NumArray() is really used to create numarray arrays, while array() is used to create Numeric arrays. In numarray, array() is a Python function which can be optimized to C in its own right. That alone will not fix the problem though. NumArray itself must be optimized.
Proposal for making of Numarray a real Numeric 'NG' (Next Generation) =====================================================================
Provided that the most important reason (IMO) to not consider Numarray to be a good replacement of Numeric is object creation time, I would like to propose a coordinated effort to solve this precise issue.
I think that is one place to optimize, and the best I'm aware of, but there's a lot of Python in numarray, and a single "." is enough to blow performance out of the water. I think this problem is easily solveable for small arrays with a modest effort. There are a lot of others though (moving the NumArray number protocol to C is one that comes to mind.)
Francesc Altet wrote:
Hi List,
I would like to make a formal proposal regarding with the subject of previous discussions in that list. This message is a bit long, but I've tried my best to expose my thoughts as clearly as possible.
I did not have time to respond to this mail, but it is very good. I will be placing some of its comments in the scipy site.
It's worth remembering that Numeric has been a major breakthrough in introducing the capability to deal with large (homogeneous) datasets in Python in a very efficient mannner. In my opinion Numarray is, generally speaking, a very good package as well with many interesting new features that lack Numeric. Between the main advantages of Numarray vs Numeric I can list the next (although I can be a bit misleaded here because of my own user cases of both libraries):
I think numarray has made some incrdedible strides in showing what the numeric object needs to be and in implementing some really neat functionality. I just think its combination of Python and C code must be redone to overcome the speed issues that have arisen. My opinion after perusing the numarray code is that it would be easier (for me anyway) to adjust Numeric to support the features of numarray, than to re-write and re-organize the relevant sections of numarray code. One of the advantages of Numeric is it's tight implementation that added only two fundamental types, both written entirely in C. I was hoping that the Python dependencies for the fundamental types would fade as numarray matured, but it appears to me that this is not going to happen. I did not have the time in the past to deal with this. I wish I had looked at it more closely two years ago. If I had done this I would have seen how to support the features that Perry wanted without completely re-writing everything. But, then again, Python 2.2 changed what is possible on the C level and that has had an impact on the discussion.
- Memory-mapped objects: Allow working with on-disk numarray objects like if they were in-memory.
Numeric3 supports this cleanly and old Numeric did too (there was a memory-mapped module), it's just that byteswapping, and alignment had to be done manually.
- RecArrays: Objects that allow to deal with heterogeneous datasets (tables) in an efficient manner. This ought to be very beneficial in many fields.
Heterogeneous arrays is the big one for old Numeric. It is a good idea. In Numeric3 it has required far fewer changes than I had at first imagined.
- CharArrays: Allow to work with large amounts of fixed and variable length strings. I see this implementation much more powerful that Numeric.
Also a good idea, and comees along for the ride with in Numeric3. Numeric had CHAR arrays but a vision was never specified for how to make them more useful. This change would have been a good step towards heterogeneous arrays.
- Index arrays within subscripts: e.g. if ind = array([4, 4, 0, 2]) and x = 2*arange(6), x[inx] results in array([8, 8, 0, 4])
For scipy this was implemented on top of Numeric (so it is in Numeric3 too), the multidimensional version needs to be worked on, still.
- New design interface: We should not forget that numarray has been designed from the ground with Python Library integration in mind (or at least, this is my impression). So, it should have more chances (if there is some hope) to enter in the Standard Library than Numeric.
Numeric has had this in mind for some time. In fact the early Numeric developers were quite instrumental in getting significant changes into Python istelf, including Complex Objects, Ellipses, and Extended Slicing. Guido was quite keen on the idea of including Numeric at one point. Our infighting made him lose interest I think. So claiming this as an advantage of numarray over Numeric is simply inaccurate.
The real problem for Numarray: Object Creation Time ===================================================
On the other hand, the main drawback of Numarray vs Numeric is, in my opinion, its poor performance regarding object creation. This might look like a banal thing at first glance, but it is not in many cases. One example recently reported in this list is:
Ah, and there's the rub. I don't think this object creation time will go away until Numarray's infrastructure becomes essentially like that of Numeric. One tight object all in C. Getting it there seems harder than fixing Numeric, with the additional features of Numarray. Thanks for these comments. It is very good to hear what the most important features for users are. -Travis
A Diumenge 06 Febrer 2005 08:23, Travis Oliphant va escriure:
The real problem for Numarray: Object Creation Time ===================================================
On the other hand, the main drawback of Numarray vs Numeric is, in my opinion, its poor performance regarding object creation. This might look like a banal thing at first glance, but it is not in many cases. One example recently reported in this list is:
Ah, and there's the rub. I don't think this object creation time will go away until Numarray's infrastructure becomes essentially like that of Numeric. One tight object all in C. Getting it there seems harder than fixing Numeric, with the additional features of Numarray.
Well, I guess I have to believe in you provided that I'm not an expert on these issues and you are. However, I think Todd Miller has some hopes that Numarray can eventually get rid of this object creation latency. That would be very nice as well. In any case, and as a person already said before during the message storm that took place in this distribution list this weekend, I was rather sceptical when you first announced the project for Numeric 3.0. But now, and after reading this bunch of messages, I've changed my mind and think (hope) that your approach can be very beneficial for unifying the Numeric and numarray split. I whish you the best of luck in your attempt, and I'll keep an eye on Numeric 3.0 evolution. It would be wonderful if you finally succeed, and we can finally have a standard numerical package for Python, irregardless it will be in the Standard Library or not. Cheers!, --
qo< Francesc Altet http://www.carabos.com/ V V Cárabos Coop. V. Enjoy Data ""
I thought I would clarify some historical issues (at least as far as I recall them) in helping understand how things got the way they are today. Travis Oliphant wrote:
Francesc Altet wrote:
I think numarray has made some incrdedible strides in showing what the numeric object needs to be and in implementing some really neat functionality. I just think its combination of Python and C code must be redone to overcome the speed issues that have arisen. My opinion after perusing the numarray code is that it would be easier (for me anyway) to adjust Numeric to support the features of numarray, than to re-write and re-organize the relevant sections of numarray code. One of the advantages of Numeric is it's tight implementation that added only two fundamental types, both written entirely in C. I was hoping that the Python dependencies for the fundamental types would fade as numarray matured, but it appears to me that this is not going to happen.
When we started this, one of those that suggested putting much of the code in Python was Guido himself if I recall correctly (I'd have to dig up the message he wrote on the subject to see what exactly he said). That was one of the factors influencing us to go that route (as well as other mentioned below).
I did not have the time in the past to deal with this. I wish I had looked at it more closely two years ago. If I had done this I would have seen how to support the features that Perry wanted without completely re-writing everything. But, then again, Python 2.2 changed what is possible on the C level and that has had an impact on the discussion.
Indeed, numarray was started well before Python 2.2 and was another it wasn't done in C. Knowing what would have been available in 2.2 would likely have changed the approach used.
- Memory-mapped objects: Allow working with on-disk numarray objects like if they were in-memory.
Numeric3 supports this cleanly and old Numeric did too (there was a memory-mapped module), it's just that byteswapping, and alignment had to be done manually.
Just to clarify (it may not be immediately apparent to those who haven't had to deal with it) many memory mapped files do not use the machine representation (as is often the case for astronomical data files). Memory mapping isn't nearly as useful if one has to create temporaries to handles these cases.
- RecArrays: Objects that allow to deal with heterogeneous datasets (tables) in an efficient manner. This ought to be very
beneficial in many
fields.
Heterogeneous arrays is the big one for old Numeric. It is a good idea. In Numeric3 it has required far fewer changes than I had at first imagined.
Numeric has had this in mind for some time. In fact the early Numeric developers were quite instrumental in getting significant changes into Python istelf, including Complex Objects, Ellipses, and Extended Slicing. Guido was quite keen on the idea of including Numeric at one point. Our infighting made him lose interest I think. So claiming this as an advantage of numarray over Numeric is simply inaccurate.
Actually, as I understand it, Guido had ruled out including Numeric well before numarray even got started. I can't claim to know exactly his reasons (I've heard various things such as he looked at the code and didn't like it, or Paul Dubois and others advised him against it; I can't say exactly why), but I am sure that that decision was made before numarray. No doubt the split has prevented any further consideration of inclusion. Perry
On Feb 9, 2005, at 17:48, Perry Greenfield wrote:
When we started this, one of those that suggested putting much of the code in Python was Guido himself if I recall correctly (I'd have to dig up the message he wrote on the subject to see what exactly he
My memory tells me that the NumPy community asked if there was any chance to get arrays into the core if the code was decent, and Guido replied positively.
Actually, as I understand it, Guido had ruled out including Numeric well before numarray even got started. I can't claim to know exactly his reasons (I've heard various things such as he looked at the code and didn't like it, or Paul Dubois and others advised him against it; I
There was pretty much a concensus at the time that we shouldn't make fools of ourselves by proposing the existing codebase for inclusion in the core. Konrad. -- --------------------------------------------------------------------- Konrad Hinsen Laboratoire Léon Brillouin, CEA Saclay, 91191 Gif-sur-Yvette Cedex, France Tel.: +33-1 69 08 79 25 Fax: +33-1 69 08 82 61 E-Mail: hinsen@llb.saclay.cea.fr ---------------------------------------------------------------------
participants (6)
-
Francesc Altet
-
Jason Rennie
-
konrad.hinsen@laposte.net
-
Paul F. Dubois
-
Perry Greenfield
-
Travis Oliphant