
Francesc Altet wrote:
Hi List,
I would like to make a formal proposal regarding with the subject of previous discussions in that list. This message is a bit long, but I've tried my best to expose my thoughts as clearly as possible.
I did not have time to respond to this mail, but it is very good. I will be placing some of its comments in the scipy site.
It's worth remembering that Numeric has been a major breakthrough in introducing the capability to deal with large (homogeneous) datasets in Python in a very efficient mannner. In my opinion Numarray is, generally speaking, a very good package as well with many interesting new features that lack Numeric. Between the main advantages of Numarray vs Numeric I can list the next (although I can be a bit misleaded here because of my own user cases of both libraries):
I think numarray has made some incrdedible strides in showing what the numeric object needs to be and in implementing some really neat functionality. I just think its combination of Python and C code must be redone to overcome the speed issues that have arisen. My opinion after perusing the numarray code is that it would be easier (for me anyway) to adjust Numeric to support the features of numarray, than to re-write and re-organize the relevant sections of numarray code. One of the advantages of Numeric is it's tight implementation that added only two fundamental types, both written entirely in C. I was hoping that the Python dependencies for the fundamental types would fade as numarray matured, but it appears to me that this is not going to happen. I did not have the time in the past to deal with this. I wish I had looked at it more closely two years ago. If I had done this I would have seen how to support the features that Perry wanted without completely re-writing everything. But, then again, Python 2.2 changed what is possible on the C level and that has had an impact on the discussion.
- Memory-mapped objects: Allow working with on-disk numarray objects like if they were in-memory.
Numeric3 supports this cleanly and old Numeric did too (there was a memory-mapped module), it's just that byteswapping, and alignment had to be done manually.
- RecArrays: Objects that allow to deal with heterogeneous datasets (tables) in an efficient manner. This ought to be very beneficial in many fields.
Heterogeneous arrays is the big one for old Numeric. It is a good idea. In Numeric3 it has required far fewer changes than I had at first imagined.
- CharArrays: Allow to work with large amounts of fixed and variable length strings. I see this implementation much more powerful that Numeric.
Also a good idea, and comees along for the ride with in Numeric3. Numeric had CHAR arrays but a vision was never specified for how to make them more useful. This change would have been a good step towards heterogeneous arrays.
- Index arrays within subscripts: e.g. if ind = array([4, 4, 0, 2]) and x = 2*arange(6), x[inx] results in array([8, 8, 0, 4])
For scipy this was implemented on top of Numeric (so it is in Numeric3 too), the multidimensional version needs to be worked on, still.
- New design interface: We should not forget that numarray has been designed from the ground with Python Library integration in mind (or at least, this is my impression). So, it should have more chances (if there is some hope) to enter in the Standard Library than Numeric.
Numeric has had this in mind for some time. In fact the early Numeric developers were quite instrumental in getting significant changes into Python istelf, including Complex Objects, Ellipses, and Extended Slicing. Guido was quite keen on the idea of including Numeric at one point. Our infighting made him lose interest I think. So claiming this as an advantage of numarray over Numeric is simply inaccurate.
The real problem for Numarray: Object Creation Time ===================================================
On the other hand, the main drawback of Numarray vs Numeric is, in my opinion, its poor performance regarding object creation. This might look like a banal thing at first glance, but it is not in many cases. One example recently reported in this list is:
Ah, and there's the rub. I don't think this object creation time will go away until Numarray's infrastructure becomes essentially like that of Numeric. One tight object all in C. Getting it there seems harder than fixing Numeric, with the additional features of Numarray. Thanks for these comments. It is very good to hear what the most important features for users are. -Travis