[Numpy-discussion] Timing array construction

Mark Janikas mjanikas at esri.com
Thu Apr 30 18:36:12 EDT 2009


Thanks Chris and Bruce for the further input.  I kindof like the "c_" method because it is still relatively speedy and easy to implement.  But, the empty method seems to be closest to what is actually done no matter which direction you go in... I.e. preallocate space and insert.  I am in the process of ripping all of my zip calls out.  The profile of my first set of techniques is already significantly better.  This whole exercise has been very enlightening, as I spend so much time working on speeding up my algorithms and simple things like this should be tackled first.  Thanks again!

MJ 

-----Original Message-----
From: numpy-discussion-bounces at scipy.org [mailto:numpy-discussion-bounces at scipy.org] On Behalf Of Christopher Barker
Sent: Thursday, April 30, 2009 12:16 PM
To: Discussion of Numerical Python
Subject: Re: [Numpy-discussion] Timing array construction

Mark Janikas wrote:
> I have a lot of array constructions in my code that use
> NUM.array([list of values])... I am going to replace it with the
> empty allocation and insertion.

It may not be worth it, depending on where list_of_values comes from/is. 
A rule of thumb may be: it's going to be slow going from a numpy array 
to a regular old python list or tuple, back to a numpy array. If your 
data is a python list already, than np.array(list) is a fine choice.


>> def useAsArray(xCoords, yCoords):
>>
>>     return NUM.asarray(zip(xCoords, yCoords))

Here are some of the issues with this one:

zip unpacks two generic python sequences and then put the items into 
tuple, then puts them in a list. Essentially this:

new_list = []
for i in range(len(xCoords)):
     new_list.append((xCoords[i], yCoords[i]))


In each iteration of that loop, it's indexing into the numpy arrays, 
making a python object out of them, putting them into a tuple, and 
appending that tuple to the list, which may have to re-allocate memory a 
few times.

Then the np.array() call loops through that list, unpacks each tuple, 
examines the python object, decides what it is, and turn it into a raw 
c-type to put into the array.

whereas:

def useEmpty(xCoords, yCoords):
      out = np.empty((len(xCoords), 2), dtype=xCoords.dtype)
      out[:,0] = xCoords
      out[:,1] = yCoords
      return out

allocates an array the right size.
directly copies the data from xCoords and yCoords to it.

that's it.

You can see why it's so much faster!

-Chris


-- 
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R            (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception

Chris.Barker at noaa.gov
_______________________________________________
Numpy-discussion mailing list
Numpy-discussion at scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion




More information about the NumPy-Discussion mailing list