Agglomerative Clustering without knowing number of clusters
I want to perform agglomerative clustering, but I have no idea of number of clusters before hand. But I want that every cluster has at least 40 data points in it. How can I apply this to sklearn.agglomerative clustering? Should I use dendrogram and cut it somehow? I have no idea how to relate dendrogram to this and cutting it out. Any help will be appreciated!
This sounds like it may be a problem more amenable to either DBSCAN or OPTICS. Both algorithms don't require a priori knowledge of the number of clusters, and both let you specify a minimum point membership threshold for cluster membership. The OPTICS algorithm will also produce a dendrogram that you can cut for sub clusters if need be. DBSCAN is part of the stable release and has been for some time; OPTICS is pending as a pull request, but it's stable and you can try it if you like: https://github.com/scikit-learn/scikit-learn/pull/1984 Cheers, Shane On 06/30, Ariani A wrote:
I want to perform agglomerative clustering, but I have no idea of number of clusters before hand. But I want that every cluster has at least 40 data points in it. How can I apply this to sklearn.agglomerative clustering? Should I use dendrogram and cut it somehow? I have no idea how to relate dendrogram to this and cutting it out. Any help will be appreciated!
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
-- *PhD candidate & Research Assistant* *Cooperative Institute for Research in Environmental Sciences (CIRES)* *University of Colorado at Boulder*
Dear Shane, Thanks for your time. But I have to implement it by agglomerative clustering and cut it when each cluster has at least 40 data points. But I am not sure how to do cut it. I was guessing maybe it can be done by cutting the dandrogram? Is it correct? If so, I do not know how to apply it. Could you give me a point? Best, Ariani On Thu, Jul 6, 2017 at 12:32 PM, Shane Grigsby <shane.grigsby@colorado.edu> wrote:
This sounds like it may be a problem more amenable to either DBSCAN or OPTICS. Both algorithms don't require a priori knowledge of the number of clusters, and both let you specify a minimum point membership threshold for cluster membership. The OPTICS algorithm will also produce a dendrogram that you can cut for sub clusters if need be.
DBSCAN is part of the stable release and has been for some time; OPTICS is pending as a pull request, but it's stable and you can try it if you like:
https://github.com/scikit-learn/scikit-learn/pull/1984
Cheers, Shane
On 06/30, Ariani A wrote:
I want to perform agglomerative clustering, but I have no idea of number of clusters before hand. But I want that every cluster has at least 40 data points in it. How can I apply this to sklearn.agglomerative clustering? Should I use dendrogram and cut it somehow? I have no idea how to relate dendrogram to this and cutting it out. Any help will be appreciated!
_______________________________________________
scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
-- *PhD candidate & Research Assistant* *Cooperative Institute for Research in Environmental Sciences (CIRES)* *University of Colorado at Boulder* _______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
Dear Shane, Thanks for your answer. Does DBSCAN works with distance matrix/? I have a distance matrix (symmetric matrix which contains pairwise distances). Can you help me? I did not find DBSCAN code in that link. Best, -Ariani On Thu, Jul 6, 2017 at 12:32 PM, Shane Grigsby <shane.grigsby@colorado.edu> wrote:
This sounds like it may be a problem more amenable to either DBSCAN or OPTICS. Both algorithms don't require a priori knowledge of the number of clusters, and both let you specify a minimum point membership threshold for cluster membership. The OPTICS algorithm will also produce a dendrogram that you can cut for sub clusters if need be.
DBSCAN is part of the stable release and has been for some time; OPTICS is pending as a pull request, but it's stable and you can try it if you like:
https://github.com/scikit-learn/scikit-learn/pull/1984
Cheers, Shane
On 06/30, Ariani A wrote:
I want to perform agglomerative clustering, but I have no idea of number of clusters before hand. But I want that every cluster has at least 40 data points in it. How can I apply this to sklearn.agglomerative clustering? Should I use dendrogram and cut it somehow? I have no idea how to relate dendrogram to this and cutting it out. Any help will be appreciated!
_______________________________________________
scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
-- *PhD candidate & Research Assistant* *Cooperative Institute for Research in Environmental Sciences (CIRES)* *University of Colorado at Boulder* _______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
Hi Ariani, Yes, you can use a distance matrix-- I think that what you want is metric='precomputed', and then X would be your N by N distance matrix. Hope that helps, ~Shane On 07/13, Ariani A wrote:
Dear Shane, Thanks for your answer. Does DBSCAN works with distance matrix/? I have a distance matrix (symmetric matrix which contains pairwise distances). Can you help me? I did not find DBSCAN code in that link. Best, -Ariani
On Thu, Jul 6, 2017 at 12:32 PM, Shane Grigsby <shane.grigsby@colorado.edu> wrote:
This sounds like it may be a problem more amenable to either DBSCAN or OPTICS. Both algorithms don't require a priori knowledge of the number of clusters, and both let you specify a minimum point membership threshold for cluster membership. The OPTICS algorithm will also produce a dendrogram that you can cut for sub clusters if need be.
DBSCAN is part of the stable release and has been for some time; OPTICS is pending as a pull request, but it's stable and you can try it if you like:
https://github.com/scikit-learn/scikit-learn/pull/1984
Cheers, Shane
On 06/30, Ariani A wrote:
I want to perform agglomerative clustering, but I have no idea of number of clusters before hand. But I want that every cluster has at least 40 data points in it. How can I apply this to sklearn.agglomerative clustering? Should I use dendrogram and cut it somehow? I have no idea how to relate dendrogram to this and cutting it out. Any help will be appreciated!
_______________________________________________
scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
-- *PhD candidate & Research Assistant* *Cooperative Institute for Research in Environmental Sciences (CIRES)* *University of Colorado at Boulder* _______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
-- *PhD candidate & Research Assistant* *Cooperative Institute for Research in Environmental Sciences (CIRES)* *University of Colorado at Boulder*
Dear Shane, Thanks for your prompt answer. Do you mean that for DBSCAN there is no need to feed other parameters? Do I just call the function or I have to manipulate the code? P.S. I was not able to find the DBSCAN code on github. Looking forward to hearing from you. Best, -Noushin On Thu, Jul 13, 2017 at 5:38 PM, Shane Grigsby <shane.grigsby@colorado.edu> wrote:
Hi Ariani, Yes, you can use a distance matrix-- I think that what you want is metric='precomputed', and then X would be your N by N distance matrix. Hope that helps, ~Shane
On 07/13, Ariani A wrote:
Dear Shane, Thanks for your answer. Does DBSCAN works with distance matrix/? I have a distance matrix (symmetric matrix which contains pairwise distances). Can you help me? I did not find DBSCAN code in that link. Best, -Ariani
On Thu, Jul 6, 2017 at 12:32 PM, Shane Grigsby < shane.grigsby@colorado.edu> wrote:
This sounds like it may be a problem more amenable to either DBSCAN or
OPTICS. Both algorithms don't require a priori knowledge of the number of clusters, and both let you specify a minimum point membership threshold for cluster membership. The OPTICS algorithm will also produce a dendrogram that you can cut for sub clusters if need be.
DBSCAN is part of the stable release and has been for some time; OPTICS is pending as a pull request, but it's stable and you can try it if you like:
https://github.com/scikit-learn/scikit-learn/pull/1984
Cheers, Shane
On 06/30, Ariani A wrote:
I want to perform agglomerative clustering, but I have no idea of number
of clusters before hand. But I want that every cluster has at least 40 data points in it. How can I apply this to sklearn.agglomerative clustering? Should I use dendrogram and cut it somehow? I have no idea how to relate dendrogram to this and cutting it out. Any help will be appreciated!
_______________________________________________
scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
-- *PhD candidate & Research Assistant* *Cooperative Institute for Research in Environmental Sciences (CIRES)* *University of Colorado at Boulder* _______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
-- *PhD candidate & Research Assistant* *Cooperative Institute for Research in Environmental Sciences (CIRES)* *University of Colorado at Boulder* _______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
Dear Shane, Sorry bothering you! Is the "precomputed" and "distance matrix" you are talking about, are about "DBSCAN" ? Thanks, Best. On Thu, Jul 13, 2017 at 7:03 PM, Ariani A <b.noushin7@gmail.com> wrote:
Dear Shane, Thanks for your prompt answer. Do you mean that for DBSCAN there is no need to feed other parameters? Do I just call the function or I have to manipulate the code? P.S. I was not able to find the DBSCAN code on github. Looking forward to hearing from you. Best, -Noushin
On Thu, Jul 13, 2017 at 5:38 PM, Shane Grigsby <shane.grigsby@colorado.edu
wrote:
Hi Ariani, Yes, you can use a distance matrix-- I think that what you want is metric='precomputed', and then X would be your N by N distance matrix. Hope that helps, ~Shane
On 07/13, Ariani A wrote:
Dear Shane, Thanks for your answer. Does DBSCAN works with distance matrix/? I have a distance matrix (symmetric matrix which contains pairwise distances). Can you help me? I did not find DBSCAN code in that link. Best, -Ariani
On Thu, Jul 6, 2017 at 12:32 PM, Shane Grigsby < shane.grigsby@colorado.edu> wrote:
This sounds like it may be a problem more amenable to either DBSCAN or
OPTICS. Both algorithms don't require a priori knowledge of the number of clusters, and both let you specify a minimum point membership threshold for cluster membership. The OPTICS algorithm will also produce a dendrogram that you can cut for sub clusters if need be.
DBSCAN is part of the stable release and has been for some time; OPTICS is pending as a pull request, but it's stable and you can try it if you like:
https://github.com/scikit-learn/scikit-learn/pull/1984
Cheers, Shane
On 06/30, Ariani A wrote:
I want to perform agglomerative clustering, but I have no idea of number
of clusters before hand. But I want that every cluster has at least 40 data points in it. How can I apply this to sklearn.agglomerative clustering? Should I use dendrogram and cut it somehow? I have no idea how to relate dendrogram to this and cutting it out. Any help will be appreciated!
_______________________________________________
scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
-- *PhD candidate & Research Assistant* *Cooperative Institute for Research in Environmental Sciences (CIRES)* *University of Colorado at Boulder* _______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
-- *PhD candidate & Research Assistant* *Cooperative Institute for Research in Environmental Sciences (CIRES)* *University of Colorado at Boulder* _______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
participants (2)
-
Ariani A -
Shane Grigsby