[Numpy-discussion] Distance Matrix speed

Mon Jun 19 16:28:53 EDT 2006

Sebastian Beca wrote:

>I just ran Alan's script and I don't get consistent results for 100
>repetitions. I boosted it to 1000, and ran it several times. The
>faster one varied alot, but both came into a  ~ +-1.5% difference.
>
>When it comes to scaling, for my problem(fuzzy clustering), N is the
>size of the dataset, which should span from thousands to millions. C
>is the amount of clusters, usually less than 10, and K the amount of
>features (the dimension I want to sum over) is also usually less than
>100. So mainly I'm concerned with scaling across N. I tried C=3, K=4,
>N=1000, 2500, 5000, 7500, 10000. Also using 1000 runs, the results
>were:
>dist_beca: 1.1, 4.5, 16, 28, 37
>dist_loehner1: 1.7, 6.5, 22, 35, 47
>
>I also tried scaling across K, with C=3, N=2500, and K=5-50. I
>couldn't get any consistent results for small K, but both tend to
>perform as well (+-2%) for large K (K>15).
>
>I'm not sure how these work in the backend so I can't argument as to
>why one should scale better than the other.
>  
>
The reason I suspect that dist_beca should scale better is that 
dist_loehner1 generates an intermediate array of size NxCxK, while 
dist_beca produces intermediate matrices that are only NxK or CxK.  For 
large problems, allocating that extra memory and fetching it into and 
out of the cache can be a bottleneck.

Here's another version that allocates even less in the way of 
temporaries at the expenses of being borderline incomprehensible. It 
still allocates an NxK temporary array, but it allocates it once ahead 
of time and then reuses it for all subsequent calculations. Your welcome 
to use it, but I'm not sure I'd recomend it unless this function is 
really a speed bottleneck as it could end up being hard to read later (I 
left implementing the N<C case as an exercise to the reader....).

I have another idea that might reduce the memory overhead still further, 
if I get a chance I'll try it out and let you know if it results in a 
further speed up.

-tim

    def dist2(A, B):
        d = zeros([N, C], dtype=float)
        if N < C:
            raise NotImplemented
        else:
            tmp = empty([N, K], float)
            tmp0 = tmp[:,0]
            rangek = range(1,K)
            for j in range(C):
                subtract(A, B[j], tmp)
                tmp *= tmp
                for k in rangek:
                    tmp0 += tmp[:,k]
                sqrt(tmp0, d[:,j])
        return d

>Regards,
>
>Sebastian.
>
>On 6/19/06, Alan G Isaac <aisaac at american.edu> wrote:
>  
>
>>On Sun, 18 Jun 2006, Tim Hochberg apparently wrote:
>>
>>    
>>
>>>Alan G Isaac wrote:
>>>      
>>>
>>>>On Sun, 18 Jun 2006, Sebastian Beca apparently wrote:
>>>>        
>>>>
>>>>>def dist():
>>>>>d = zeros([N, C], dtype=float)
>>>>>if N < C: for i in range(N):
>>>>>xy = A[i] - B d[i,:] = sqrt(sum(xy**2, axis=1))
>>>>>return d
>>>>>else:
>>>>>for j in range(C):
>>>>>xy = A - B[j] d[:,j] = sqrt(sum(xy**2, axis=1))
>>>>>return d
>>>>>          
>>>>>
>>>>But that is 50% slower than Johannes's version:
>>>>        
>>>>
>>>>def dist_loehner1():
>>>>       d = A[:, newaxis, :] - B[newaxis, :, :]
>>>>       d = sqrt((d**2).sum(axis=2))
>>>>     return d
>>>>        
>>>>
>>>Are you sure about that? I just ran it through timeit, using Sebastian's
>>>array sizes and I get Sebastian's version being 150% faster. This
>>>could well be cache size dependant, so may vary from box to box, but I'd
>>>expect Sebastian's current version to scale better in general.
>>>      
>>>
>>No, I'm not sure.
>>Script attached bottom.
>>Most recent output follows:
>>for reasons I have not determined,
>>it doesn't match my previous runs ...
>>Alan
>>
>>    
>>
>>>>>execfile(r'c:\temp\temp.py')
>>>>>          
>>>>>
>>dist_beca :       3.042277
>>dist_loehner1:    3.170026
>>
>>
>>#################################
>>#THE SCRIPT
>>import sys
>>sys.path.append("c:\\temp")
>>import numpy
>>from numpy import *
>>import timeit
>>
>>
>>K = 10
>>C = 2500
>>N = 3 # One could switch around C and N now.
>>A = numpy.random.random( [N, K] )
>>B = numpy.random.random( [C, K] )
>>
>># beca
>>def dist_beca():
>>    d = zeros([N, C], dtype=float)
>>    if N < C:
>>        for i in range(N):
>>            xy = A[i] - B
>>            d[i,:] = sqrt(sum(xy**2, axis=1))
>>        return d
>>    else:
>>        for j in range(C):
>>            xy = A - B[j]
>>            d[:,j] = sqrt(sum(xy**2, axis=1))
>>    return d
>>
>>#loehnert
>>def dist_loehner1():
>>        # drawback: memory usage temporarily doubled
>>        # solution see below
>>        d = A[:, newaxis, :] - B[newaxis, :, :]
>>        # written as 3 expressions for more clarity
>>        d = sqrt((d**2).sum(axis=2))
>>        return d
>>
>>
>>if __name__ == "__main__":
>>        t1 = timeit.Timer('dist_beca()', 'from temp import dist_beca').timeit(100)
>>        t8 = timeit.Timer('dist_loehner1()', 'from temp import dist_loehner1').timeit(100)
>>        fmt="%-10s:\t"+"%10.6f"
>>        print fmt%('dist_beca', t1)
>>        print fmt%('dist_loehner1', t8)
>>
>>
>>
>>
>>_______________________________________________
>>Numpy-discussion mailing list
>>Numpy-discussion at lists.sourceforge.net
>>https://lists.sourceforge.net/lists/listinfo/numpy-discussion
>>
>>    
>>
>
>
>_______________________________________________
>Numpy-discussion mailing list
>Numpy-discussion at lists.sourceforge.net
>https://lists.sourceforge.net/lists/listinfo/numpy-discussion
>
>
>  
>