[Tutor] Handling large arrays/lists
duncan at thermal.esa.int
Mon Oct 9 12:58:47 CEST 2006
One part of the application I'm dealing with handles a conceptual
'cube' of data, but is in fact implemented as a single python
list. Items are stored and retrieved - no manipulation at the
moment - by the classical approach of multiplying the appropriate
x, y and z indices with the x, y and z extents to give the unique
index into the single list.
However, now that requirements have moved to include much larger
cubes of data than originally envisaged, this single list has
just become far too slow.
Unfortunately the single list concept is currently a requirement
in the tool, but how it is implemented is open to discussion.
I've been mulling over the documentation for Numeric/numpy to
see whether it makes sense to replace the single standard Python
list with an array or multiarray. The first thing to strike me
is that, in Numeric, the size of an array is fixed. To grow the
'cube' as each x,y slice is added, I can create a new array in
one of three ways, but as far as I can see, each will require
copying all of the data from the old to the new array, so I'm
concerned that any speed benefit gained from replacing a standard
list will be lost to repeated copying.
Have I correctly understood the Numeric array handling?
Does anyone have any suggestions for a more efficient way of
handling a large list of data? Other modules perhaps?
And yes, I know that Numeric has been replaced by numpy, but I
understand that they are very similar, and it's been easier to
find tutorial documentation for Numeric than for numpy.
More information about the Tutor