Can you use nearest neighbors with a KD tree to build a distance matrix that is sparse, in that distances to all but the nearest neighbors of a point are (near-)infinite? Yes, this again has an additional parameter (neighborhood size), just as BIRCH has its threshold. I suspect you will not be able to improve on having another, approximating, parameter. You do not need to set n_clusters to a fixed value for BIRCH. You only need to provide another clusterer, which has its own parameters, although you should be able to experiment with different "global clusterers". On 4 January 2018 at 11:04, Shiheng Duan <shiduan@ucdavis.edu> wrote:
Yes, it is an efficient method, still, we need to specify the number of clusters or the threshold. Is there another way to run hierarchy clustering on the big dataset? The main problem is the distance matrix. Thanks.
On Tue, Jan 2, 2018 at 6:02 AM, Olivier Grisel <olivier.grisel@ensta.org> wrote:
Have you had a look at BIRCH?
http://scikit-learn.org/stable/modules/clustering.html#birch
-- Olivier
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn