[Numpy-discussion] recarray slow?

wheres pythonmonks wherespythonmonks at gmail.com
Wed Jul 21 16:22:37 EDT 2010


However: is there an automatic way to convert a named index to a position?

What about looping over tuples of my recarray:

for t in d:
    date = t['Date']
    ....

I guess that the above does have to lookup 'Date' each time.
But the following does not need the hash lookup for each tuple:

for t in d:
    date = t[0]
    ....

Should I create a map from dtype.names(), and use that to look up the
index based on the name in advance?  (if I really really want to
factorize out the lookup of 'Date']



On Wed, Jul 21, 2010 at 3:47 PM, wheres pythonmonks
<wherespythonmonks at gmail.com> wrote:
> Thank you very much....  better crack open a numpy reference manual
> instead of relying on my python "intuition".
>
> On Wed, Jul 21, 2010 at 3:44 PM, Pauli Virtanen <pav at iki.fi> wrote:
>> Wed, 21 Jul 2010 15:12:14 -0400, wheres pythonmonks wrote:
>>
>>> I have an recarray -- the first column is date.
>>>
>>> I have the following function to compute the number of unique dates in
>>> my data set:
>>>
>>>
>>> def byName(): return(len(list(set(d['Date'])) ))
>>
>> What this code does is:
>>
>> 1. d['Date']
>>
>>   Extract an array slice containing the dates. This is fast.
>>
>> 2. set(d['Date'])
>>
>>   Make copies of each array item, and box them into Python objects.
>>   This is slow.
>>
>>   Insert each of the objects in the set. Also this is somewhat slow.
>>
>> 3. list(set(d['Date']))
>>
>>   Get each item in the set, and insert them to a new list.
>>   This is somewhat slow, and unnecessary if you only want to
>>   count.
>>
>> 4. len(list(set(d['Date'])))
>>
>>
>> So the slowness arises because the code is copying data around, and
>> boxing it into Python objects.
>>
>> You should try using Numpy functions (these don't re-box the data) to do
>> this. http://docs.scipy.org/doc/numpy/reference/routines.set.html
>>
>> --
>> Pauli Virtanen
>>
>> _______________________________________________
>> NumPy-Discussion mailing list
>> NumPy-Discussion at scipy.org
>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>>
>



More information about the NumPy-Discussion mailing list