From: Chris Barker email@example.com To: Perry Greenfield firstname.lastname@example.org, email@example.com Subject: [Numpy-discussion] Re: Re-implementation of Python Numerical arrays (Numeric) available for download
I used Poor wording. When I wrote "datatypes", I meant data types in a much higher order sense. Perhaps structures or classes would be a better term. What I mean is that is should be easy to use an manipulate the same multidimensional arrays from both Python and C/C++. In the current Numeric, most folks generate a contiguous array, and then just use the array->data pointer to get what is essentially a C array. That's fine if you are using it in a traditional C way, with fixed dimension, one datatype, etc. What I'm imagining is having an object in C or C++ that could be easily used as a multidimentional array. I'm thinking C++ would probably neccesary, and probably templates as well, which is why blitz++ looked promising. Of course, blitz++ only compiles with a few up-to-date compilers, so you'd never get it into the standard library that way!
Yes, that was an important issue (C++ and the Python Standard Library). And yes, it is not terribly convenient to access multi-dimensional arrays in C (of varying sizes). We don't solve that problem in the way a C++ library could. But I suppose that some might say that C++ libraries may introduce their own, new problems. But coming up with the one solution to all scientific computing appears well beyond our grasp at the moment. If someone does see that solution, let us know!
I agree, but from the newsgroup, it is clear that a lot of folks are very reluctant to use something that is not part of the standard library.
We agree that getting into the standard library is important.
We estimate that numarray is probably another order of magnitude worse, i.e., that 20K element arrays are at half the asymptotic speed. How much should this be improved?
A lot. I use arrays smaller than that most of the time!
What is good enough? As fast as current Numeric?
As fast as current Numeric would be "good enough" for me. It would be a shame to go backwards in performance!
(IDL does much better than that for example).
My personal benchmark is MATLAB, which I imagine is similar to IDL in performance.
We'll see if we can match current performance (or at least present usable alternative approaches that are faster).
10 element arrays will never be close to C speed in any array based language embedded in an interpreted environment.
Well, sure, I'm not expecting that
100, maybe, but will be very hard. 1000 should be possible with some work.
I suppose MATLAB has it easier, as all arrays are doubles, and, (untill recently anyway), all variable where arrays, and all arrays were 2-d. NumPy is a lot more flexible that that. Is is the type and size checking that takes the time?
Probably, but we haven't started serious benchmarking yet so I wouldn't put much stock in what I say now.
One of the things I do a lot with are coordinates of points and polygons. Sets if points I can handle easily as an NX2 array, but polygons don't work so well, as each polgon has a different number of points, so I use a list of arrays, which I have to loop over. Each polygon can have from about 10 to thousands of points (mostly 10-20, however). One way I have dealt with this is to store a polygon set as a large array of all the points, and another array with the indexes of the start and end of each polygon. That way I can transform the coordinates of all the polygons in one operation. It works OK, but sometimes it is more useful to have them in a sequence.
This is a good example of an ensemble of variable sized arrays.
As mentioned, we tend to deal with large data sets and so I don't think we have a lot of such examples ourselves.
I know large datasets were one of your driving factors, but I really don't want to make performance on smaller datasets secondary.
-- Christopher Barker,
That's why we are asking, and it seems so far that there are enough of those that do care about small arrays to spend the effort to significantly improve the performance.