Hi James, Usually, we run the optimisation several times and take the solution with the smallest inertia. The technic you use don't ensure you to keep the best solution. There's a full implementation in scikit-learn with several runs. You can have a look at the code to see how it works. Cheers, N On 8 Aug 2012 20:53, "James Abel" <j@abel.co> wrote:
Thanks Gael.****
** **
BTW, I modified my code to loop until it gets the same clustering twice in a row. This yields more consistent results. I don’t know if this is a general solution but it worked for my simple test case. Code below.****
** **
James****
** **
import sys****
import scipy****
import warnings****
from scipy.cluster.vq import *****
** **
print sys.version****
vals = scipy.array((0.0,0.1,0.5,0.6,1.0,1.1))****
print vals****
white_vals = whiten(vals)****
print white_vals.shape, white_vals****
** **
# Check for same clustering****
def clustering_test(a,b):****
# have to create copies, then sort so we don't modify the original****
ea = a.copy()****
eb = b.copy()****
ea.sort()****
eb.sort()****
r = (ea == eb).all()****
print a,b,ea,eb,r****
return r****
** **
# try it until we get the same clustering twice in a row****
found = False****
prior_idx = None****
while not found:****
with warnings.catch_warnings():****
warnings.simplefilter("ignore") # suppress the warning message (happens if it doesn't find the right number of clusters)****
res, idx = kmeans2(white_vals, 3) # changing iter doesn't seem to matter****
#print res, idx****
if prior_idx is not None:****
eq = clustering_test(idx, prior_idx)****
#print eq.all()****
if eq:****
found = True****
prior_idx = idx****
print "result", res, idx****
_______________________________________________ SciPy-User mailing list SciPy-User@scipy.org http://mail.scipy.org/mailman/listinfo/scipy-user