Thanks Gael. BTW, I modified my code to loop until it gets the same clustering twice in a row. This yields more consistent results. I don't know if this is a general solution but it worked for my simple test case. Code below. James import sys import scipy import warnings from scipy.cluster.vq import * print sys.version vals = scipy.array((0.0,0.1,0.5,0.6,1.0,1.1)) print vals white_vals = whiten(vals) print white_vals.shape, white_vals # Check for same clustering def clustering_test(a,b): # have to create copies, then sort so we don't modify the original ea = a.copy() eb = b.copy() ea.sort() eb.sort() r = (ea == eb).all() print a,b,ea,eb,r return r # try it until we get the same clustering twice in a row found = False prior_idx = None while not found: with warnings.catch_warnings(): warnings.simplefilter("ignore") # suppress the warning message (happens if it doesn't find the right number of clusters) res, idx = kmeans2(white_vals, 3) # changing iter doesn't seem to matter #print res, idx if prior_idx is not None: eq = clustering_test(idx, prior_idx) #print eq.all() if eq: found = True prior_idx = idx print "result", res, idx
Hi James, Usually, we run the optimisation several times and take the solution with the smallest inertia. The technic you use don't ensure you to keep the best solution. There's a full implementation in scikit-learn with several runs. You can have a look at the code to see how it works. Cheers, N On 8 Aug 2012 20:53, "James Abel" <j@abel.co> wrote:
Thanks Gael.****
** **
BTW, I modified my code to loop until it gets the same clustering twice in a row. This yields more consistent results. I don’t know if this is a general solution but it worked for my simple test case. Code below.****
** **
James****
** **
import sys****
import scipy****
import warnings****
from scipy.cluster.vq import *****
** **
print sys.version****
vals = scipy.array((0.0,0.1,0.5,0.6,1.0,1.1))****
print vals****
white_vals = whiten(vals)****
print white_vals.shape, white_vals****
** **
# Check for same clustering****
def clustering_test(a,b):****
# have to create copies, then sort so we don't modify the original****
ea = a.copy()****
eb = b.copy()****
ea.sort()****
eb.sort()****
r = (ea == eb).all()****
print a,b,ea,eb,r****
return r****
** **
# try it until we get the same clustering twice in a row****
found = False****
prior_idx = None****
while not found:****
with warnings.catch_warnings():****
warnings.simplefilter("ignore") # suppress the warning message (happens if it doesn't find the right number of clusters)****
res, idx = kmeans2(white_vals, 3) # changing iter doesn't seem to matter****
#print res, idx****
if prior_idx is not None:****
eq = clustering_test(idx, prior_idx)****
#print eq.all()****
if eq:****
found = True****
prior_idx = idx****
print "result", res, idx****
_______________________________________________ SciPy-User mailing list SciPy-User@scipy.org http://mail.scipy.org/mailman/listinfo/scipy-user
participants (2)
-
James Abel -
Nelle Varoquaux