[Numpy-discussion] Efficient square distance computation
Ke Sun
sunk.cs at gmail.com
Tue Oct 8 07:38:28 EDT 2013
On Tue, Oct 08, 2013 at 01:49:14AM -0700, Matthew Brett wrote:
> Hi,
>
> On Tue, Oct 8, 2013 at 1:06 AM, Ke Sun <sunk.cs at gmail.com> wrote:
> > Dear all,
> >
> > I have written the following function to compute the square distances of a large
> > matrix (each sample a row). It compute row by row and print the overall progress.
> > The progress output is important and I didn't use matrix multiplication.
> >
> > I give as input a 70,000x800 matrix. The output should be a 70,000x70,000
> > matrix. The program runs really slow (16 hours for 1/3 progress). And it eats
> > 36G memory (fortunately I have enough).
>
> That is very slow.
>
> As a matter of interest - why didn't you use matrix multiplication?
Because it will cost hours and I want to see the progress and
know how far it goes. Another concern is to save memory and
compute one sample at a time.
> On a machine I had access to it took about 20 minutes.
How? I am using matrix multiplication (the same code as
http://stackoverflow.com/a/4856692) and it runs for around 18 hours.
Best,
Ke Sun
>
> You've got a 70000 by 70000 element output matrix so I think that's
> 37G already (if the matrix is double precision float).
>
> > Could you give some insights on how to modify the code to be efficient and
> > to eat less memory?
>
> You could try using Cython - but I'm guessing that the BLAS routines
> in numpy will already do this the most efficient way.
More information about the NumPy-Discussion
mailing list