[IronPython] RE: Numpy port
Paul Barrett
pebarrett at gmail.com
Wed Aug 31 14:59:45 CEST 2005
On 8/31/05, Jeffrey Sax <jeffrey at extremeoptimization.com> wrote:
>
> I had a look at this a few months ago. Both numarray and Numeric use an
> amorphous char array for storage. A simple wrapper would mean a lot of
> unmanaged memory moving around... not a desirable situation.
True.
It seems to me that a rewrite using generics is most appropriate. The CLR
> generics should take a lot of the hard work out of the code, since the
> current C code spends a lot of time bookkeeping and converting to and from
> the element type. There are some special cases currently handled by
> numarray, like misaligned data and byte-swapped data. IMO these should be
> handled at I/O time, if possible.
No. The conversion of misaligned data and byte-swapping should be done as in
numarray, i.e. just before and/or after the operation in a pipelined
sequence. Otherwise, the array module would not be useful with
memory-mapping, where the data may be stored as misaligned and/or
byte-swapped data. The requirement for numarray is to handle large (>2GB)
arrays or images, which is often best handled using memory-mapped files and
are becoming more common in the physical sciences. The primary reason for
developing Numarray was that its precursor, Numeric, did not handle this
case. Much of astronomical data is stored as a one dimensional array of
records or structures; hence, the misaligned data. These files are also used
between little and big endian machines; hence, the byte-swapping.
In fact, doing these operations at I/O time decreases performance and
efficiency, since two separation operations are being done instead of one
pipelined operation and a large temporary array must be allocated. During
the design phase of numarray, we ran some tests and found that the best
performance for large arrays occurs when the input and output buffers can
both reside in the L2 memory cache. So for a 256kB L2 cache, the best
performance is when the input and output buffers are 128kB. This implies
that it is faster to divide a 1 MB array into eight 128 kB chunks than to
try processing it all at once. This is just a result of the memory system
not keeping up with the processor. When not being used, these optional
operations decrease performance on slightly. However, they provide large
performance gains.
Trust me on this one. ;-)
The biggest part of numpy is the large library of legacy code (mostly in
> FORTRAN) for which numpy provides interfaces. I don't know enough about
> Python's interop mechanisms to know the best way to port these.
True. So, a solution needs to be found to interface the C and C# code.
Given the interest in this topic, it appears that I should post a high level
design document for a multidimensional array module. I'll see what I can do
over the next week or so. My current schedule is really full, so if it
doesn't happen you'll know why.
-- Paul
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/ironpython-users/attachments/20050831/ae71c95f/attachment.html>
More information about the Ironpython-users
mailing list