2008/12/4 Charles R Harris <charlesr.harris@gmail.com>:
On Thu, Dec 4, 2008 at 8:26 AM, Olivier Grisel <olivier.grisel@ensta.org> wrote:
Hi list,
Suppose I have array a with dimensions (d1, d3) and array b with dimensions (d2, d3). I want to compute array c with dimensions (d1, d2) holding the squared euclidian norms of vectors in a and b with size d3.
Just to clarify the problem a bit, it looks like you want to compute the squared euclidean distance between every vector in a and every vector in b, i.e., a distance matrix. Is that correct? Also, how big are d1,d2,d3?
I would target d1 >> d2 ~ d3 with d1 as large as possible to fit in memory and d2 and d3 in the order of a couple hundreds or thousands for a start.
If you *are* looking to compute the distance matrix I suspect your end goal is something beyond that. Could you describe what you are trying to do?
My end goal it to compute the activation of an array of Radial Basis Function units where the activation of unit with center b_j for data vector a_i is given by: f(a_i, b_j) = exp(-||a_i - bj|| ** 2 / (2 * sigma)) The end goal is to have building blocks of various parameterized array of homogeneous units (linear, sigmoid and RBF) along with their gradient in parameter space so as too build various machine learning algorithms such as multi layer perceptrons with various training strategies such as Stochastic Gradient Descent. That code might be integrated into the Modular Data Processing (MPD toolkit) project [1] at some point. The current stat of the python code is here: http://www.bitbucket.org/ogrisel/oglab/src/186eab341408/simdkernel/src/simdk... You can find an SSE optimized C implementation wrapped with ctypes here: http://www.bitbucket.org/ogrisel/oglab/src/186eab341408/simdkernel/src/simdk... http://www.bitbucket.org/ogrisel/oglab/src/186eab341408/simdkernel/src/simdk...
It could be that scipy.spatial or scipy.cluster are what you should look at.
I'll have a look at those, thanks for the pointer. [1] http://mdp-toolkit.sourceforge.net/ -- Olivier