[Tutor] Re: Fastest way to write/read numerical data into/from a file?

Andrei project5 at redrival.net
Wed Jun 30 14:02:49 EDT 2004


Karthikesh Raju wrote on Wed, 30 Jun 2004 19:16:40 +0300:

> i have written a module that saves matrices in a file. Basically, a
> matrix as
> A = numarray.array([1,2]) will be saved as 1 2 in the file, and
> A = numarray.array([[1,2],[3,4]]) as
> 1 2
> 3 4

I've never used numarray, so I'll just assume there's no built-in method
which does this for you :).
 
> i.e each row is a line and there are spaces between each column.
> Presntly, when writing i do the following:
> 
> try:
>    i,j = data.shape
>    for ii in range(0,i):
>        for jj in range(0,j):
>               value = "%s" %data[ii,jj]
>               file.write(value)
>               file.write(' ')

Don't use "file" as a variable name - you're overwriting a built-in.

It would be better to write value = "%s " and drop the write(' '). It
probably also wouldn't hurt to skip the value assignment and do the
formatting inside the write() method. Otherwise the solution is OK and
doesn't seem wasteful in any way. Any idea what the bottleneck is? I/O, the
loops or data lookup? If it's the first or the last, I don't see how you
can save any time. For the second, you could put it in a list
comprehension, but I'm not sure how much of a speed boost that would
provide:

for ii in range(0, i):
    [afile.write("%s " % data[ii, jj]) for jj in range(0, j)]
    afile.write("\n")

You could also try iterating directly over the array if numarray supports
it - again, not sure how much of a boost it would give:

for row in data:
    [afile.write("%s " % row[jj] for jj in range(j))]
    afile.write("\n")

Or you could go really to the extreme and try this (I've split the list
comprehension up over a couple of lines in an attempt to make it slightly
more readable - didn't really work :) ):

i, j = data.shape
ilist = range(i)
jlist = range(j)
afile.write("\n".join([ 
                       " ".join([ str(data[ii, jj]) 
                                  for jj in jlist ]) 
                       for ii in ilist
                      ]))

This might consume a lot of memory though - no idea how large your data is.

> While reading, i do the following:
> 
> while(line):

Use "for line in myfile:" instead of the while loop combined with an
explicit readline. 

>     temp = []
>     list = string.split(line)

Don't use the string module. Use line.split() instead.

>     for e in list: temp.append(string.atof(e))
>     x = numarray.concatenate(x,numarray.array(temp))
>     line = open1.readline()
> open1.close()


How about this:

nc = numarray.concatenate
na = numarray.array
for line in file('location.txt', 'r'):
    x = nc(x, na([ float(nr) for nr in line.split() ]))

If numarray.array can handle more complex (nested) lists, you could also
try to build up the entire list in-memory first (the opposite of what I
propose for writing) and then convert the whole of it to array at once.

-- 
Yours,

Andrei

=====
Real contact info (decode with rot13):
cebwrpg5 at jnanqbb.ay. Fcnz-serr! Cyrnfr qb abg hfr va choyvp cbfgf. V ernq
gur yvfg, fb gurer'f ab arrq gb PP.




More information about the Tutor mailing list