Mailman 3 Re: Re-implementation of Python Numerical arrays (Numeric) available for download - NumPy-Discussion

Nov. 20, 2001

      Perry Greenfield wrote:
...
...
One major comment that isn't directly addressed on the web page is the
ease of writing new functions, I suppose Ufuncs, although I don't
usually care if they work on anything other than Arrays. I hope the new
system will make it easier to write new ones. 
<snip>
...
Absolutely. We will provide examples of how to write new ufuncs. It should
be very simple in one sense (requiring few lines of code) if our code
generator machinery is used (but context is important here so this
is why examples or a template is extremely important). But it isn't
particularly hard to do without the code generator. And such ufuncs
will handle *all* the generality of arrays including slices, non-aligned
arrays, byteswapped arrays, and type conversion. I'd like to provide
examples of writing ufuncs within a few weeks (along with examples
of other kinds of functions using the C-API as well).
This sounds great! The code generting machinery sound very promising,
and examples are, of course, key. I found digging through the NumPy
source to figure out how to do things very treacherous. Making writing
Ufuncs easy will enocourage a lot more C Ufuncs to be written which
should help perfomance.
...
...
Also, I can't help wondering if this could leverage more existing code.
The blitz++ package being used by Eric Jones in the SciPy.compiler
project looks very promising. It's probably too late, but I'm wondering
what the reasons are for re-inventing such a general purpose wheel.
I'm not sure which "wheel" you are talking about :-)
The wheel I'm talking about are multi-dimensional array objects...
...
We certainly
aren't trying to replilcate what Eric Jones has done with the
SciPy.compiler approach (which is very interesting in its own right).
I know, I just think using an existing set of C++ classes for multiple
typed multidimansional arrays would make sense, although I imagine it is
too late now!
...
If the issue is why we are redoing Numeric:
Actually, I think I had a pretty good idea why you were working on this.
...
1) it has to be rewritten to be acceptable to Guido before it can be
   part of the Standard Library.
2) to add new types (e.g. unsigned) and representations (e.g., non-aligned,
   byteswapped, odd strides, etc). Using memory mapped data requires some
   of these.
3) to make it more memory efficient with large arrays.
4) to make it more generally extensible
I'm particualry excited about 1) and 4)
...
...
As a whole I have found that I would like the transition from Python to
Compiled laguages to be smoother. The standard answer to Python
perfomance is to profile, and then re-write the computationally intesive
pertions in C. This would be a whole lot easier if Python used datatypes
that are easy to use from C/C++ as well as Python. I hope NumPy2 can
move in this direction.
What do you see as missing in numarray in that sense? Aside from UInt32
I'm not aware of any missing type that is available on all platforms.
There is the issue of Float128 and such. Adding these is not hard.
The real issue is how to deal with the platforms that don't support them.
I used Poor wording. When I wrote "datatypes", I meant data types in a
much higher order sense. Perhaps structures or classes would be a better
term. What I mean is that is should be easy to use an manipulate the
same multidimensional arrays from both Python and C/C++. In the current
Numeric, most folks generate a contiguous array, and then just use the
array->data pointer to get what is essentially a C array. That's fine if
you are using it in a traditional C way, with fixed dimension, one
datatype, etc. What I'm imagining is having an object in C or C++ that
could be easily used as a multidimentional array. I'm thinking C++ would
probably neccesary, and probably templates as well, which is why blitz++
looked promising. Of course, blitz++ only compiles with a few up-to-date
compilers, so you'd never get it into the standard library that way!

This could also lead the way to being able to compile NumPy code....<end
fantasy>
...
I think it is pretty easy to install since it use distutils.
I agree, but from the newsgroup, it is clear that a lot of folks are
very reluctant to use something that is not part of the standard
library.
...
...
...
We estimate
   that numarray is probably another order of magnitude worse,
   i.e., that 20K element arrays are at half the asymptotic
   speed. How much should this be improved?
A lot. I use arrays smaller than that most of the time!
What is good enough? As fast as current Numeric?
As fast as current Numeric would be "good enough" for me. It would be a
shame to go backwards in performance!
...
(IDL does much
better than that for example).
My personal benchmark is MATLAB, which I imagine is similar to IDL in
performance.
...
10 element arrays will never be
close to C speed in any array based language embedded in an
interpreted environment.
Well, sure, I'm not expecting that
...
100, maybe, but will be very hard.
1000 should be possible with some work.
I suppose MATLAB has it easier, as all arrays are doubles, and, (untill
recently anyway), all variable where arrays, and all arrays were 2-d.
NumPy is a lot more flexible that that. Is is the type and size checking
that takes the time?
...
Another approach is to try to cast many of the functions as being
able to broadcast over repeated small arrays. After all, if one
is only doing a computation on one small array, it seems unlikely
that the overhead of Python will be objectionable. Only if you
have many such arrays to repeat calculations on, should it be
a problem (or am I wrong about that).
You are probably right about that.
...
If these repeated calculations
can be "assembled"  into a higher dimensionality array (which
I understand isn't always possible) and operated on in that sense,
the efficiency issue can be dealt with.
I do that when possible, but it's not always possible.
...
But I guess this can only
be seen with specific existing examples and programs. I would
be interested in seeing the kinds of applications you have now
to gauge what the most effective solution would be.
One of the things I do a lot with are coordinates of points and
polygons. Sets if points I can handle easily as an NX2 array, but
polygons don't work so well, as each polgon has a different number of
points, so I use a list of arrays, which I have to loop over. Each
polygon can have from about 10 to thousands of points (mostly 10-20,
however). One way I have dealt with this is to store a polygon set as a
large array of all the points, and another array with the indexes of the
start and end of each polygon. That way I can transform the coordinates
of all the polygons in one operation. It works OK, but sometimes it is
more useful to have them in a sequence.
...
As mentioned,
we tend to deal with large data sets and so I don't think we have
a lot of such examples ourselves.
I know large datasets were one of your driving factors, but I really
don't want to make performance on smaller datasets secondary.

I hope I'll get a chance to play with it soon....

-Chris

-- 
Christopher Barker,
Ph.D.                                                           
ChrisHBarker@home.net                 ---           ---           ---
http://members.home.net/barkerlohmann ---@@       -----@@       -----@@
                                   ------@@@     ------@@@     ------@@@
Oil Spill Modeling                ------   @    ------   @   ------   @
Water Resources Engineering       -------      ---------     --------    
Coastal and Fluvial Hydrodynamics --------------------------------------
------------------------------------------------------------------------

Re: Re-implementation of Python Numerical arrays (Numeric) available for download

Chris Barker

tags

participants (1)