# [Numpy-discussion] Efficient square distance computation

Ke Sun sunk.cs at gmail.com
Tue Oct 8 04:06:25 EDT 2013

```Dear all,

I have written the following function to compute the square distances of a large
matrix (each sample a row). It compute row by row and print the overall progress.
The progress output is important and I didn't use matrix multiplication.

I give as input a 70,000x800 matrix. The output should be a 70,000x70,000
matrix. The program runs really slow (16 hours for 1/3 progress). And it eats
36G memory (fortunately I have enough).

Could you give some insights on how to modify the code to be efficient and
to eat less memory?

thanks,
Ke Sun

def dist2_large( data ):
import time
if data.ndim != 2: raise RuntimeError( "data should be a matrix" )
N,D = data.shape

print 'using the sample-wise implementation'
print '%d samples, %d dimensions' % (N,D)

start_t = time.time()
d2 = np.zeros( [N,N] )
for i in range( N ):
print "\r%5d/%d" % (i+1, N),
for j in range( N ):
d2[i,j] = ((data[i] - data[j])**2).sum()

total_t = time.time() - start_t
hours = (total_t / 3600)
minutes = (total_t % 3600) / 60
print "\nfinished in %2dh%2dm" % (hours, minutes)

return d2

```