[PYTHON MATRIX-SIG] Efficient data reading

James Hugunin jjh@Goldilocks.LCS.MIT.EDU
Wed, 6 Mar 96 08:29:41 EST


   From: Earl Spillar <me@galaxies.plk.af.mil>

   I'm going to implement some astronomical image processing stuff 
   using the Matrix extensions.  I think it's a perfect application for 
   Python-  I need to perform arithmatic on 2MB numerical arrays quickly and 
   efficiently, while dealing with some simple variables associated with
   each array.  I could post more details if there is interest-

BTW - has anybody heard anything more about the image module discussed
a while back on this list?  It sounds like it might be very helpful
here.

   Anyway, the question is how to most efficiently read in data already
   written to disk in a standard format (FITS).  I   know how to parse
   the headers, its  the 2 MB data arrays I'm worried about.  I'm thinking about 
   reading the data using the array module, and feeding the result of that to the
   Numerical module. Is that reasonably efficient?  It seems like I'll have 2  
   copies floating around for a moment. Is there another mechanism
   in the Numerical module that I'm missing?  

I explicitly designed the numeric module to not access file objects
(well, there are a couple of hidden methods lying around, but those
wil be going away very soon).  This makes safe code (things like
grail) much easier to think about.  It also should strongly encourage
people to use pickling as their "native" format for storing arrays
which I think is almost always the best choice.  Also, for every case
I've had to deal with, reading data into a string, and then converting
the string to an array works just fine. ie.

data = Numeric.fromString(fp.read(n_bytes), type_of_data)

It is true, this does temporarily require twice as much memory as data
"should" require, and it memcpy's all of that data once.  However, in
comparision to the time required for reading 2MB of data off even a
fast local disk, the memory copies/allocations are negligible.

Almost anything that you do with arrays (like a = a + 2) will
temporarily require this double memory.

(Note to experts, this can actually be written as add(a, 2, a) in
which case you don't need that extra memory, but if you want to write
a lot of code like this, then you should probably be working in C
anyway).

-Jim

=================
MATRIX-SIG  - SIG on Matrix Math for Python

send messages to: matrix-sig@python.org
administrivia to: matrix-sig-request@python.org
=================