[IronPython] RE: Numpy port

Paul Barrett pebarrett at gmail.com
Wed Aug 31 14:59:45 CEST 2005


On 8/31/05, Jeffrey Sax <jeffrey at extremeoptimization.com> wrote:
> 
> I had a look at this a few months ago. Both numarray and Numeric use an
> amorphous char array for storage. A simple wrapper would mean a lot of
> unmanaged memory moving around... not a desirable situation.


True.

It seems to me that a rewrite using generics is most appropriate. The CLR
> generics should take a lot of the hard work out of the code, since the
> current C code spends a lot of time bookkeeping and converting to and from
> the element type. There are some special cases currently handled by
> numarray, like misaligned data and byte-swapped data. IMO these should be
> handled at I/O time, if possible.


No. The conversion of misaligned data and byte-swapping should be done as in 
numarray, i.e. just before and/or after the operation in a pipelined 
sequence. Otherwise, the array module would not be useful with 
memory-mapping, where the data may be stored as misaligned and/or 
byte-swapped data. The requirement for numarray is to handle large (>2GB) 
arrays or images, which is often best handled using memory-mapped files and 
are becoming more common in the physical sciences. The primary reason for 
developing Numarray was that its precursor, Numeric, did not handle this 
case. Much of astronomical data is stored as a one dimensional array of 
records or structures; hence, the misaligned data. These files are also used 
between little and big endian machines; hence, the byte-swapping.

In fact, doing these operations at I/O time decreases performance and 
efficiency, since two separation operations are being done instead of one 
pipelined operation and a large temporary array must be allocated. During 
the design phase of numarray, we ran some tests and found that the best 
performance for large arrays occurs when the input and output buffers can 
both reside in the L2 memory cache. So for a 256kB L2 cache, the best 
performance is when the input and output buffers are 128kB. This implies 
that it is faster to divide a 1 MB array into eight 128 kB chunks than to 
try processing it all at once. This is just a result of the memory system 
not keeping up with the processor. When not being used, these optional 
operations decrease performance on slightly. However, they provide large 
performance gains.

Trust me on this one. ;-)

The biggest part of numpy is the large library of legacy code (mostly in
> FORTRAN) for which numpy provides interfaces. I don't know enough about
> Python's interop mechanisms to know the best way to port these.


True. So, a solution needs to be found to interface the C and C# code.

Given the interest in this topic, it appears that I should post a high level 
design document for a multidimensional array module. I'll see what I can do 
over the next week or so. My current schedule is really full, so if it 
doesn't happen you'll know why.

-- Paul
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/ironpython-users/attachments/20050831/ae71c95f/attachment.html>


More information about the Ironpython-users mailing list