[SciPy-user] k means

eric jones eric at enthought.com
Sun Sep 22 15:58:08 EDT 2002


The expensive part of kmeans is the underlying vq algorithm.  It is very
parallelizable.  

The kmeans algorithm lives in scipy/cluster/vq.py.  The C++ version of
the vq algorithm lives in scipy/cluster/src/vq.h.  There is a template
in this algorithm that looks like:

template<class T>
void tvq(T* obs,T* code_book, int Nobs, int Ncodes, int Nfeatures,
	    int* codes, T* lowest_dist)
{
    int i;
	for( i = 0; i < Nobs; i++)
	{		
	
tvq_obs<T>(&(obs[i*Nfeatures]),code_book,Ncodes,Nfeatures,
				  codes[i],lowest_dist[i]);
	}
}

Parallelizing this loop with MPI or whatever is probably a good first
cut.

Good luck with your project,
eric

------
Dear sir,
i am a student doing Masters in Computer Science. I am doing a project
on parallel computing. For that my instructor wants me to run K-Means
algorithm on a cluster of 5 nodes and d some some sort of performance
analysis of this distributed architecture...since u were in search of
the source code..i request u if u could help me in providing teh
algorithm in c or c++ which could be implemented in pararl;lel here..i
would be extremely grateful if u help me 
M farhan ul haq


Do you Yahoo!?
New DSL Internet Access from SBC & Yahoo!




More information about the SciPy-User mailing list