fit before partial_fit ?
I tried MiniBatchKMeans with two order: fit -> partial_fit partial_fit -> partial_fit The clustering results are different what's their difference?
k-means isn't a convex problem, unless you freeze the initialization, you are going to get very different solutions (depending on the dataset) with different initializations. On Thu, Jun 6, 2019 at 12:05 AM lampahome <pahome.chen@mirlab.org> wrote:
I tried MiniBatchKMeans with two order: fit -> partial_fit partial_fit -> partial_fit
The clustering results are different
what's their difference? _______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
federico vaggi <vaggi.federico@gmail.com> 於 2019年6月7日 週五 上午1:08寫道:
k-means isn't a convex problem, unless you freeze the initialization, you are going to get very different solutions (depending on the dataset) with different initializations.
Nope, I specify the random_state=0. u can try it.
x = np.array([[1,2],[2,3]]) y = np.array([[3,4],[4,5],[5,6]]) z = np.append(x,y, axis=0) from sklearn.cluster import MiniBatchKMeans as MBK m = MBK(random_state=0, n_clusters=2) m.fit(x) ; m.labels_ array([1,0], dtype=int32) <-- (1-a) m.partial_fit(y) ; m.labels_ array([0,0,0], dtype=int32) <-- (1-b)
m = MBK(random_state=0, n_clusters=2) m.partial_fit(x) ; m.labels_ array([0,1], dtype=int32) <-- (2-a) m.partial_fit(y) ; m.labels_ array([1,1,1], dtype=int32) <-- (2-b)
1-a,1-b and 2-a, 2-b are all different, especially the members of each cluster. I'm just confused about what usage of partial_fit and fit is the suitable(reasonable?) way to cluster incrementally? thx
The clusters produces by your examples are actually the same (despite the different labels). I'd guess that "fit" and "partial_fit" draw a different amount of random_numbers before actually assigning a label to the first (randomly drawn) sample from "x" (in your code). This is why the labeling is permutated. Best regards Christian Am Mo., 10. Juni 2019 um 04:12 Uhr schrieb lampahome <pahome.chen@mirlab.org
:
federico vaggi <vaggi.federico@gmail.com> 於 2019年6月7日 週五 上午1:08寫道:
k-means isn't a convex problem, unless you freeze the initialization, you are going to get very different solutions (depending on the dataset) with different initializations.
Nope, I specify the random_state=0. u can try it.
x = np.array([[1,2],[2,3]]) y = np.array([[3,4],[4,5],[5,6]]) z = np.append(x,y, axis=0) from sklearn.cluster import MiniBatchKMeans as MBK m = MBK(random_state=0, n_clusters=2) m.fit(x) ; m.labels_ array([1,0], dtype=int32) <-- (1-a) m.partial_fit(y) ; m.labels_ array([0,0,0], dtype=int32) <-- (1-b)
m = MBK(random_state=0, n_clusters=2) m.partial_fit(x) ; m.labels_ array([0,1], dtype=int32) <-- (2-a) m.partial_fit(y) ; m.labels_ array([1,1,1], dtype=int32) <-- (2-b)
1-a,1-b and 2-a, 2-b are all different, especially the members of each cluster. I'm just confused about what usage of partial_fit and fit is the suitable(reasonable?) way to cluster incrementally?
thx _______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
participants (3)
-
Christian Braune -
federico vaggi -
lampahome