kmeans2 question/issue
Hi, I'm trying to use scipy.cluster.vq.kmeans2() but I'm getting inconsistent output. With a simple test input that should have 3 clusters, I'm getting good results most of the time but every so often the output creates the wrong clustering. If anyone could point to what I'm doing wrong I'd appreciate it! Code and sample output below. Thanks! James Code: import sys import scipy from scipy.cluster.vq import * print sys.version vals = scipy.array((0.0,0.1,0.5,0.6,1.0,1.1)) print vals white_vals = whiten(vals) print white_vals.shape, white_vals # try it several times to see if we get similar answers count = 0 while count < 5: res, idx = kmeans2(white_vals, 3) # changing iter doesn't seem to matter print res, idx count += 1 Output: 2.7.3 (default, Apr 10 2012, 23:31:26) [MSC v.1500 32 bit (Intel)] [ 0. 0.1 0.5 0.6 1. 1.1] (6,) [ 0. 0.24313227 1.21566135 1.45879362 2.4313227 2.67445496] [ 0.12156613 2.55288883 1.33722748] [0 0 2 2 1 1] [ 0.12156613 2.55288883 1.33722748] [0 0 2 2 1 1] [ 1.33722748 2.55288883 0.12156613] [2 2 0 0 1 1] [ 2.18819043 0.48626454 -0.97292963] [1 1 1 0 0 0] <-- unexpected result [ 0.12156613 2.55288883 1.33722748] [0 0 2 2 1 1] C:\PYTHON27\lib\site-packages\scipy\cluster\vq.py:588: UserWarning: One of the clusters is empty. Re-run kmean with a different initialization. warnings.warn("One of the clusters is empty. "
On Sun, Aug 05, 2012 at 04:09:06PM -0700, James Abel wrote:
I'm trying to use scipy.cluster.vq.kmeans2() but I'm getting inconsistent output. With a simple test input that should have 3 clusters, I'm getting good results most of the time but every so often the output creates the wrong clustering.
K Means is a non-convex problem: it is dependent on the (random) initialization. In addition, it is not garantied to find the 'true' clusters, because quite often it is not possible from the data. You are not doing anything wrong, you are just asking for something that is not possible. HTH, Gael
participants (2)
-
Gael Varoquaux -
James Abel