[Numpy-discussion] Numpy array performance issue

Bruno Santos bacmsantos at gmail.com
Wed Feb 24 13:38:46 EST 2010


This is probably me just being stupid. But what is the reason for this peace
of code not to be working:
index_nSize=numpy.arange(0,length,nSize)
lsPhasedValues = set([aLoci[i] for i in xrange(length) if (i%nSize==0 and
aLoci[i]>0)])
lsPhasedValues1 = numpy.where(aLoci[index_nSize]>0)
print aLoci[index_nSize]
print lsPhasedValues==lsPhasedValues1,lsPhasedValues,lsPhasedValues1
[0 0 6 0 0 3]
False set([3, 6]) (array([2, 5]),)


2010/2/24 Bruno Santos <bacmsantos at gmail.com>

>
>
> 2010/2/24 Chris Colbert <sccolbert at gmail.com>
>
> In [4]: %timeit a = np.random.randint(0, 20, 100)
>> 100000 loops, best of 3: 4.32 us per loop
>>
>> In [5]: %timeit (a>=10).sum()
>> 100000 loops, best of 3: 7.32 us per loop
>>
>> In [8]: %timeit np.where(a>=10)
>> 100000 loops, best of 3: 5.36 us per loop
>>
>>
>> am i missing something?
>>
>
> I guess you are.
> In [23]: a = np.random.randint(0, 20, 1000)
>
> In [24]: %timeit np.where(a>=10)
> 10000 loops, best of 3: 22.4 us per loop
>
> In [25]: %timeit (a>=10).sum()
> 100000 loops, best of 3: 11.7 us per loop
>
> np.random.where doesn't scale very well.
>
>>
>> On Wed, Feb 24, 2010 at 12:50 PM, Bruno Santos <bacmsantos at gmail.com>wrote:
>>
>>> In both versions your lsPhasedValues contains the number of positions in
>>> the array that match a certain criteria. What I need in that step is the
>>> unique values and not their positions.
>>>
>>> 2010/2/24 Robert Kern <robert.kern at gmail.com>
>>>
>>>> On Wed, Feb 24, 2010 at 11:19, Bruno Santos <bacmsantos at gmail.com>
>>>> wrote:
>>>>
>>>> > It seems that the python 2.6.4 has a more efficient implementation of
>>>> the
>>>> > lists. It runs faster on this version and slower on 2.5.4 on the same
>>>> > machine with debian. A lot faster in fact.
>>>> > I was trying to change my headche for the last couple of weeks. But
>>>> you
>>>> > migth give me a lot more optimizations that I can pick. I am trying to
>>>> > optimize the following function
>>>> > def hypergeometric(self,lindex,rindex):
>>>> >         """
>>>> >         loc.hypergeometric(lindex,rindex)
>>>> >         Performs the hypergeometric test for the loci between lindex
>>>> and
>>>> > rindex.
>>>> >         Returns the minimum p-Value
>>>> >         """
>>>> >         aASense = self.aASCounts[lindex*nSize:(rindex+1)*nSize]
>>>> >         #Create the subarray to test
>>>> >         aLoci =
>>>> >
>>>> numpy.hstack([self.aSCounts[lindex*nSize:(rindex+1)*nSize],aASense[::-1]])
>>>> >         #Get the values to test
>>>> >         length = len(aLoci)
>>>> >         lsPhasedValues = set([aLoci[i] for i in xrange(length) if
>>>> i%nSize==0
>>>> > and aLoci[i]>0])
>>>> >         m = length/nSize
>>>> >         n = (length-1)-(length/nSize-1)
>>>> >         #Create an array to store the Pvalues
>>>> >         lsPvalues = []
>>>> >         append = lsPvalues.append
>>>> >         #Calculate matches in Phased and non Phased position
>>>> >         for r in lsPhasedValues:
>>>> >             #Initiate number of matches to 0
>>>> >             q = sum([1 for j in xrange(length) if j%nSize==0 and
>>>> > aLoci[j]>=r])
>>>> >             k = sum([1 for j in xrange(length) if aLoci[j]>=r])
>>>> >             key = '%i,%i,%i,%i'%(q-1,m,n,k)
>>>> >             try:append(dtPhyper[key])
>>>> >             except KeyError:
>>>> >                 value = self.lphyper(q-1, m, n, k)
>>>> >                 append(value)
>>>> >                 dtPhyper[key]=value
>>>> >         return min(lsPvalues)
>>>> > Is there any efficient way to test the array simultaneous for two
>>>> different
>>>> > conditions?
>>>>
>>>> j = np.arange(length)
>>>> j_nSize_mask = ((j % nSize) == 0)
>>>> lsPhasedValues = (j_nSize_mask & (aLoci >= 0)).sum()
>>>> ...
>>>>    bigALoci = (aLoci >= r)
>>>>    q = (j_nSize_mask & bigALoci).sum()
>>>>    k = bigALoci.sum()
>>>>
>>>>
>>>> Another way to do it:
>>>>
>>>> j_nSize = np.arange(0, length, nSize)
>>>> lsPhasedValues = (aLoci[j_nSize] >= 0).sum()
>>>> ...
>>>>    q = (aLoci[j_nSize] >= r).sum()
>>>>    k = (aLoci >= r).sum()
>>>>
>>>>
>>>> --
>>>> Robert Kern
>>>>
>>>> "I have come to believe that the whole world is an enigma, a harmless
>>>> enigma that is made terrible by our own mad attempt to interpret it as
>>>> though it had an underlying truth."
>>>>  -- Umberto Eco
>>>> _______________________________________________
>>>> NumPy-Discussion mailing list
>>>> NumPy-Discussion at scipy.org
>>>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>>>>
>>>
>>>
>>> _______________________________________________
>>> NumPy-Discussion mailing list
>>> NumPy-Discussion at scipy.org
>>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>>>
>>>
>>
>> _______________________________________________
>> NumPy-Discussion mailing list
>> NumPy-Discussion at scipy.org
>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20100224/f88da285/attachment.html>


More information about the NumPy-Discussion mailing list