[PYTHON MATRIX-SIG] Efficient data reading
James Hugunin
jjh@Goldilocks.LCS.MIT.EDU
Wed, 6 Mar 96 08:29:41 EST
From: Earl Spillar <me@galaxies.plk.af.mil>
I'm going to implement some astronomical image processing stuff
using the Matrix extensions. I think it's a perfect application for
Python- I need to perform arithmatic on 2MB numerical arrays quickly and
efficiently, while dealing with some simple variables associated with
each array. I could post more details if there is interest-
BTW - has anybody heard anything more about the image module discussed
a while back on this list? It sounds like it might be very helpful
here.
Anyway, the question is how to most efficiently read in data already
written to disk in a standard format (FITS). I know how to parse
the headers, its the 2 MB data arrays I'm worried about. I'm thinking about
reading the data using the array module, and feeding the result of that to the
Numerical module. Is that reasonably efficient? It seems like I'll have 2
copies floating around for a moment. Is there another mechanism
in the Numerical module that I'm missing?
I explicitly designed the numeric module to not access file objects
(well, there are a couple of hidden methods lying around, but those
wil be going away very soon). This makes safe code (things like
grail) much easier to think about. It also should strongly encourage
people to use pickling as their "native" format for storing arrays
which I think is almost always the best choice. Also, for every case
I've had to deal with, reading data into a string, and then converting
the string to an array works just fine. ie.
data = Numeric.fromString(fp.read(n_bytes), type_of_data)
It is true, this does temporarily require twice as much memory as data
"should" require, and it memcpy's all of that data once. However, in
comparision to the time required for reading 2MB of data off even a
fast local disk, the memory copies/allocations are negligible.
Almost anything that you do with arrays (like a = a + 2) will
temporarily require this double memory.
(Note to experts, this can actually be written as add(a, 2, a) in
which case you don't need that extra memory, but if you want to write
a lot of code like this, then you should probably be working in C
anyway).
-Jim
=================
MATRIX-SIG - SIG on Matrix Math for Python
send messages to: matrix-sig@python.org
administrivia to: matrix-sig-request@python.org
=================