[scikit-learn] Question about Kmeans implementation in sklearn

serafim loukas seralouk at hotmail.com
Mon Aug 5 13:57:01 EDT 2019


Dear Sklearn community,


I have a simple question concerning the implementation of KMeans clustering algorithm.
Two of the input arguments are the “n_init” and “random_state”.

Consider a case where  “n_init=10” and “random_state=0”.

By looking at the source code (https://github.com/scikit-learn/scikit-learn/blob/1495f69242646d239d89a5713982946b8ffcf9d9/sklearn/cluster/k_means_.py#L187), we have the following:

for it in range(n_init):
# run a k-means once
labels, inertia, centers, n_iter_ = kmeans_single(
X, sample_weight, n_clusters, max_iter=max_iter, init=init,
verbose=verbose, precompute_distances=precompute_distances,
tol=tol, x_squared_norms=x_squared_norms,
random_state=random_state)


My question is: Why the results are not going to be the same for all `n_init` iterations since `random_state` is fixed?


Bests,
Makis
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20190805/7d9fca69/attachment.html>


More information about the scikit-learn mailing list