On 2/13/07, Mark Janikas <mjanikas@esri.com> wrote:

Good call Stefan,

I decoupled the timing from the application (duh!) and got better results:

from numpy import *
import numpy.random as RAND
import time as TIME

x = RAND.random(1000)
xl = x.tolist ()

t1 = TIME.clock()
xStringOut = [ str(i) for i in xl ]
xStringOut = " ".join(xStringOut)
f = file('blah.dat','w'); f.write(xStringOut)
t2 = TIME.clock()
total = t2 - t1
t1 = TIME.clock()
f = file('blah.bwt','wb')
xBinaryOut = x.tostring()
f.write(xBinaryOut)
t2 = TIME.clock()
total1 = t2 - t1

>>> total
0.00661
>>> total1
0.00229

Printing x directly to a string took REALLY long: f.write(str(x)) = 0.0258

The problem therefore, must be in the way I am appending values to the empty arrays. I am currently using the append method:

myArray = append(myArray, newValue)

Or would it be faster to concat or use a list append then convert?

I am going to guess that a list would be faster for appending. Concat and, I suspect, append make new arrays for each use, rather like string concatenation in Python. A list, on the other hand, is no doubt optimized for adding new values. Another option might be using PyTables with extensible arrays. In any case, a bit of timing should show the way if the performance is that crucial to your application.

Chuck