[Numpy-discussion] Efficient square distance computation

Matthew Brett matthew.brett at gmail.com
Tue Oct 8 04:49:14 EDT 2013


On Tue, Oct 8, 2013 at 1:06 AM, Ke Sun <sunk.cs at gmail.com> wrote:
> Dear all,
> I have written the following function to compute the square distances of a large
> matrix (each sample a row). It compute row by row and print the overall progress.
> The progress output is important and I didn't use matrix multiplication.
> I give as input a 70,000x800 matrix. The output should be a 70,000x70,000
> matrix. The program runs really slow (16 hours for 1/3 progress). And it eats
> 36G memory (fortunately I have enough).

That is very slow.

As a matter of interest - why didn't you use matrix multiplication?
On a machine I had access to it took about 20 minutes.

You've got a 70000 by 70000 element output matrix so I think that's
37G already (if the matrix is double precision float).

> Could you give some insights on how to modify the code to be efficient and
> to eat less memory?

You could try using Cython - but I'm guessing that the BLAS routines
in numpy will already do this the most efficient way.



More information about the NumPy-Discussion mailing list