[Numpy-discussion] Numpy array performance issue
Chris Colbert
sccolbert at gmail.com
Wed Feb 24 12:53:06 EST 2010
In [4]: %timeit a = np.random.randint(0, 20, 100)
100000 loops, best of 3: 4.32 us per loop
In [5]: %timeit (a>=10).sum()
100000 loops, best of 3: 7.32 us per loop
In [8]: %timeit np.where(a>=10)
100000 loops, best of 3: 5.36 us per loop
am i missing something?
On Wed, Feb 24, 2010 at 12:50 PM, Bruno Santos <bacmsantos at gmail.com> wrote:
> In both versions your lsPhasedValues contains the number of positions in
> the array that match a certain criteria. What I need in that step is the
> unique values and not their positions.
>
> 2010/2/24 Robert Kern <robert.kern at gmail.com>
>
>> On Wed, Feb 24, 2010 at 11:19, Bruno Santos <bacmsantos at gmail.com> wrote:
>>
>> > It seems that the python 2.6.4 has a more efficient implementation of
>> the
>> > lists. It runs faster on this version and slower on 2.5.4 on the same
>> > machine with debian. A lot faster in fact.
>> > I was trying to change my headche for the last couple of weeks. But you
>> > migth give me a lot more optimizations that I can pick. I am trying to
>> > optimize the following function
>> > def hypergeometric(self,lindex,rindex):
>> > """
>> > loc.hypergeometric(lindex,rindex)
>> > Performs the hypergeometric test for the loci between lindex and
>> > rindex.
>> > Returns the minimum p-Value
>> > """
>> > aASense = self.aASCounts[lindex*nSize:(rindex+1)*nSize]
>> > #Create the subarray to test
>> > aLoci =
>> >
>> numpy.hstack([self.aSCounts[lindex*nSize:(rindex+1)*nSize],aASense[::-1]])
>> > #Get the values to test
>> > length = len(aLoci)
>> > lsPhasedValues = set([aLoci[i] for i in xrange(length) if
>> i%nSize==0
>> > and aLoci[i]>0])
>> > m = length/nSize
>> > n = (length-1)-(length/nSize-1)
>> > #Create an array to store the Pvalues
>> > lsPvalues = []
>> > append = lsPvalues.append
>> > #Calculate matches in Phased and non Phased position
>> > for r in lsPhasedValues:
>> > #Initiate number of matches to 0
>> > q = sum([1 for j in xrange(length) if j%nSize==0 and
>> > aLoci[j]>=r])
>> > k = sum([1 for j in xrange(length) if aLoci[j]>=r])
>> > key = '%i,%i,%i,%i'%(q-1,m,n,k)
>> > try:append(dtPhyper[key])
>> > except KeyError:
>> > value = self.lphyper(q-1, m, n, k)
>> > append(value)
>> > dtPhyper[key]=value
>> > return min(lsPvalues)
>> > Is there any efficient way to test the array simultaneous for two
>> different
>> > conditions?
>>
>> j = np.arange(length)
>> j_nSize_mask = ((j % nSize) == 0)
>> lsPhasedValues = (j_nSize_mask & (aLoci >= 0)).sum()
>> ...
>> bigALoci = (aLoci >= r)
>> q = (j_nSize_mask & bigALoci).sum()
>> k = bigALoci.sum()
>>
>>
>> Another way to do it:
>>
>> j_nSize = np.arange(0, length, nSize)
>> lsPhasedValues = (aLoci[j_nSize] >= 0).sum()
>> ...
>> q = (aLoci[j_nSize] >= r).sum()
>> k = (aLoci >= r).sum()
>>
>>
>> --
>> Robert Kern
>>
>> "I have come to believe that the whole world is an enigma, a harmless
>> enigma that is made terrible by our own mad attempt to interpret it as
>> though it had an underlying truth."
>> -- Umberto Eco
>> _______________________________________________
>> NumPy-Discussion mailing list
>> NumPy-Discussion at scipy.org
>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>>
>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20100224/4a92aea2/attachment.html>
More information about the NumPy-Discussion
mailing list