[scikit-learn] fit before partial_fit ?

Sun Jun 9 22:10:53 EDT 2019

federico vaggi <vaggi.federico at gmail.com> 於 2019年6月7日 週五 上午1:08寫道：

> k-means isn't a convex problem, unless you freeze the initialization, you
> are going to get very different solutions (depending on the dataset) with
> different initializations.
>
>
Nope, I specify the random_state=0. u can try it.

>>> x = np.array([[1,2],[2,3]])
>>> y = np.array([[3,4],[4,5],[5,6]])
>>> z = np.append(x,y, axis=0)
>>> from sklearn.cluster import MiniBatchKMeans as MBK
>>> m = MBK(random_state=0, n_clusters=2)
>>> m.fit(x) ; m.labels_
array([1,0], dtype=int32)  <-- (1-a)
>>> m.partial_fit(y) ; m.labels_
array([0,0,0], dtype=int32)  <-- (1-b)

>>> m = MBK(random_state=0, n_clusters=2)
>>> m.partial_fit(x) ; m.labels_
array([0,1], dtype=int32)  <-- (2-a)
>>> m.partial_fit(y) ; m.labels_
array([1,1,1], dtype=int32)  <-- (2-b)

1-a,1-b and 2-a, 2-b are all different, especially the members of each
cluster.
I'm just confused about what usage of partial_fit and fit is the
suitable(reasonable?) way to cluster incrementally?

thx
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20190610/ed2faad2/attachment.html>