Re: [Numpy-discussion] (K-Mean) Clustering
![](https://secure.gravatar.com/avatar/edec3a100ce356c1974687b42b16c4e3.jpg?s=120&d=mm&r=g)
Maybe this helps (old code, may contain some suboptimal or otherwise weird things): from Numeric import * from RandomArray import randint import sys def squared_distances(X,Y): return add.outer(sum(X*X,-1),sum(Y*Y,-1))- 2*dot(X,transpose(Y)) def kmeans(data,M, wegstein=0.2, r_convergence=0.001, epsilon=0.001, debug=0, minit=20): """Computes kmeans for DATA with M centers until convergence in the sense that relative change of the quantization error is less than the optional RCONV (3rd param). WEGSTEIN (2nd param), by default .2 but always between 0 and 1, stabilizes the convergence process. EPSILON is used to quarantee centers are initially all different. DEBUG causes some intermediate output to appear to stderr. Returns centers and the average (squared) quantization error. """ N,D=data.shape # Selecting the initial centers has to be done carefully. # We have to ensure all of them are different, otherwise the # algorithm below will produce empty classes. centers=[] if debug: sys.stderr.write("kmeans: Picking centers.\n") while len(centers)<M: # Pick one data item randomly candidate=data[randint(N)] if len(centers)>0: d=minimum.reduce(squared_distances(array(centers), candidate)) else: d=2*epsilon if d>epsilon: centers.append(candidate) if debug: sys.stderr.write("kmeans: Iterating.\n") centers=array(centers) qerror,old_qerror,counter=None,None,0 while (counter<minit or old_qerror==None or (old_qerror-qerror)/qerror>r_convergence): # Initialize # Not like this, you get doubles: centers=take(data,randint(0,N,(M,))) # Iterate: # Squared distances from data to centers (all pairs) distances=squared_distances(data,centers) # Matrix telling which data item is closest to which center x=equal.outer(argmin(distances), arange(centers.shape[0])).astype(Float32) # Compute new centers centers=( ( wegstein)*(dot(transpose(x),data)/sum(x)[...,NewAxis]) + (1.0-wegstein)*centers) # Quantization error old_qerror=qerror qerror=sum(minimum.reduce(distances,1))/N counter=counter+1 if debug: try: sys.stderr.write("%f %f %i\n" %(qerror,old_qerror,counter)) except TypeError: sys.stderr.write("%f None %i\n" %(qerror,counter)) return centers, qerror -- Janne
![](https://secure.gravatar.com/avatar/86776c6c595af5117de5ba7b41bc33b5.jpg?s=120&d=mm&r=g)
Janne Sinkkonen <Janne.Sinkkonen@hut.fi>:
[snip]
Maybe this helps (old code, may contain some suboptimal or otherwise weird things):
Thanks :) -- Magnus Lie Hetland The Anygui Project http://hetland.org http://anygui.org
![](https://secure.gravatar.com/avatar/86776c6c595af5117de5ba7b41bc33b5.jpg?s=120&d=mm&r=g)
Janne Sinkkonen <Janne.Sinkkonen@hut.fi>:
[snip]
Maybe this helps (old code, may contain some suboptimal or otherwise weird things):
Thanks :) -- Magnus Lie Hetland The Anygui Project http://hetland.org http://anygui.org
participants (2)
-
Janne Sinkkonen
-
Magnus Lie Hetland