Hello Sema,
as far as I can tell, in your dataset you has n_samples=65909, n_features=539. Clustering high dimensional data is problematic for a number of reasons, https://en.wikipedia.org/wiki/Clustering_high-dimensional_da ta#Problems
besides the BIRCH implementation doesn't scale well for n_features >> 50 (see for instance the discussion in the second part of https://github.com/scikit-learn/scikit-learn/pull/8808#issue also in ).comment-300776216
As a workaround for the memory error, you could try using the out-of-core version of Birch (using `partial_fit` on chunks of the dataset, instead of `fit`) but in any case it might also be better to reduce dimensionality beforehand (e.g. with PCA), if that's acceptable. Also the threshold parameter may need to be increased: since in your dataset it looks like the Euclidean distances are more in the 1-10 range?
--
Roman
On 03/07/17 17:09, Sema Atasever wrote:
Dear Roman,
When I try the code with the original data (*data.dat*) as you
suggested, I get the following error : *Memory Error* --> (*error.png*),
how can i overcome this problem, thank you so much in advance.
data.dat
<https://drive.google.com/file/d/0B4rY6f4kvHeCYlpZOURKNnR0Q1 >k/view?usp=drive_web
On Fri, Jun 30, 2017 at 5:42 PM, Roman Yurchak <rth.yurchak@gmail.comscikit-learn@python.org <mailto:scikit-learn@python.or<mailto:rth.yurchak@gmail.com>> wrote:
Hello Sema,
On 30/06/17 17:14, Sema Atasever wrote:
I want to cluster them using Birch clustering algorithm.
Does this method have 'precomputed' option.
No it doesn't, see
http://scikit-learn.org/stable/modules/generated/sklearn. cluster.Birch.html
<http://scikit-learn.org/stable/modules/generated/sklearn. >cluster.Birch.html
so you would need to provide it with the original features matrix
(not the precomputed distance matrix). Since your dataset is fairly
small, there is no reason in precomputing it anyway.
I needed train an SVM on the centroids of the microclusters so
*How can i get the centroids of the microclusters?*
By "microclusters" do you mean sub-clusters? If you are interested
in the leaves subclusters see the Birch.subcluster_centers_ parameter.
Otherwise if you want all the centroids in the hierarchy of
subclusters, you can browse the hierarchical tree via the
Birch.root_ attribute then look at _CFSubcluster.centroid_ for each
subcluster.
Hope this helps,
--
Roman
_______________________________________________
scikit-learn mailing listg >
https://mail.python.org/mailman/listinfo/scikit-learn
<https://mail.python.org/mailman/listinfo/scikit-learn >