[Numpy-discussion] Timing array construction
Mark Janikas
mjanikas at esri.com
Thu Apr 30 18:36:12 EDT 2009
Thanks Chris and Bruce for the further input. I kindof like the "c_" method because it is still relatively speedy and easy to implement. But, the empty method seems to be closest to what is actually done no matter which direction you go in... I.e. preallocate space and insert. I am in the process of ripping all of my zip calls out. The profile of my first set of techniques is already significantly better. This whole exercise has been very enlightening, as I spend so much time working on speeding up my algorithms and simple things like this should be tackled first. Thanks again!
MJ
-----Original Message-----
From: numpy-discussion-bounces at scipy.org [mailto:numpy-discussion-bounces at scipy.org] On Behalf Of Christopher Barker
Sent: Thursday, April 30, 2009 12:16 PM
To: Discussion of Numerical Python
Subject: Re: [Numpy-discussion] Timing array construction
Mark Janikas wrote:
> I have a lot of array constructions in my code that use
> NUM.array([list of values])... I am going to replace it with the
> empty allocation and insertion.
It may not be worth it, depending on where list_of_values comes from/is.
A rule of thumb may be: it's going to be slow going from a numpy array
to a regular old python list or tuple, back to a numpy array. If your
data is a python list already, than np.array(list) is a fine choice.
>> def useAsArray(xCoords, yCoords):
>>
>> return NUM.asarray(zip(xCoords, yCoords))
Here are some of the issues with this one:
zip unpacks two generic python sequences and then put the items into
tuple, then puts them in a list. Essentially this:
new_list = []
for i in range(len(xCoords)):
new_list.append((xCoords[i], yCoords[i]))
In each iteration of that loop, it's indexing into the numpy arrays,
making a python object out of them, putting them into a tuple, and
appending that tuple to the list, which may have to re-allocate memory a
few times.
Then the np.array() call loops through that list, unpacks each tuple,
examines the python object, decides what it is, and turn it into a raw
c-type to put into the array.
whereas:
def useEmpty(xCoords, yCoords):
out = np.empty((len(xCoords), 2), dtype=xCoords.dtype)
out[:,0] = xCoords
out[:,1] = yCoords
return out
allocates an array the right size.
directly copies the data from xCoords and yCoords to it.
that's it.
You can see why it's so much faster!
-Chris
--
Christopher Barker, Ph.D.
Oceanographer
Emergency Response Division
NOAA/NOS/OR&R (206) 526-6959 voice
7600 Sand Point Way NE (206) 526-6329 fax
Seattle, WA 98115 (206) 526-6317 main reception
Chris.Barker at noaa.gov
_______________________________________________
Numpy-discussion mailing list
Numpy-discussion at scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion
More information about the NumPy-Discussion
mailing list