[Numpy-discussion] "formstring()" in place?

Chris Barker cbarker at jps.net
Thu Nov 2 16:46:05 EST 2000


I have a narge array of type "1" (single bytes). I need to convert it to
Int32, in the manner that fromstring() would. Right now, I am doing:

Array = fromstring(Array.tostring(),'f')

This works fine, but what concerns me is that I need to do this on
potentially HUGE arrays, and if I understand this right, I am going to
create a copy with tostring, and then another copy with fromstring, that
then gets referenced to Array, at which point the first original copy
gets de-referenced, and should be deleted, and the temporary one gets
deleted at some point in this process. I don't know when stuff created
in the middle of a statement gets deleted, so I could potentially have
three copies of the data around at the same time, and at least two.
Since it is exactly the same C array, I'd like to be able to do this
without making any copies at all. Is it possible? It seems like it
should be a simple matter of changing the typecode and shape, but is
this possible?

While I'm asking questions: can I byteswap in place as well?


The greater problem:

To give a little background, and to see if anyone has a better idea of
how to do what I am doing, I thought I'd run through the task that I
really need to do.

I am reading a binary file full of a lot of data. I have some control
over the form of the file, but it needs to be compact, so I can't just
make everything the same large type. The file is essentially a whole
bunch of records, each of which is a collection of a couple of different
types, and which I would eventually like to get into a couple of NumPy
arrays. My first cut at the problem was to read each record one at a
time in a loop, and use the struct module to convert everything. This
worked fine, but was pretty darn slow, so I am now doing it all with
NumPy like this (one example, I have more complex ones):


num_bytes = 9 # number of bytes in a record: two longs and a char

# read all the data into a single byte array
data = fromstring(file.read(num_bytes*num_timesteps*num_LEs),'1')

# rearrange into 3-d array
data.shape = (num_timesteps,num_LEs,num_bytes)

# extract LE data:
LEs = data[:,:,:8]
# extract flag data
flags = data[:,:,8]

# convert LE data to longs
LEs = fromstring(LEs.tostring(),Int32)

if endian == 'L': # byteswap if required
	LEs = LEs.byteswapped()

# convert to 3-d array
LEs.shape = (num_timesteps,num_LEs,2)


Anyone have any better ideas on how to do this?


Thanks,

-Chris


-- 
Christopher Barker,
Ph.D.                                                           
cbarker at jps.net                      ---           ---           ---
http://www.jps.net/cbarker          -----@@       -----@@       -----@@
                                   ------@@@     ------@@@     ------@@@
Water Resources Engineering       ------   @    ------   @   ------   @
Coastal and Fluvial Hydrodynamics -------      ---------     --------    
------------------------------------------------------------------------
------------------------------------------------------------------------



More information about the NumPy-Discussion mailing list