Wed, 21 Jul 2010 15:12:14 -0400, wheres pythonmonks wrote:
I have an recarray -- the first column is date.
I have the following function to compute the number of unique dates in my data set:
def byName(): return(len(list(set(d['Date'])) ))
What this code does is: 1. d['Date'] Extract an array slice containing the dates. This is fast. 2. set(d['Date']) Make copies of each array item, and box them into Python objects. This is slow. Insert each of the objects in the set. Also this is somewhat slow. 3. list(set(d['Date'])) Get each item in the set, and insert them to a new list. This is somewhat slow, and unnecessary if you only want to count. 4. len(list(set(d['Date']))) So the slowness arises because the code is copying data around, and boxing it into Python objects. You should try using Numpy functions (these don't re-box the data) to do this. http://docs.scipy.org/doc/numpy/reference/routines.set.html -- Pauli Virtanen