[scikit-learn] How to get centroids from SciPy's hierarchical agglomerative clustering?

Sebastian Raschka se.raschka at gmail.com
Fri Oct 20 13:08:40 EDT 2017


Independent from the implementation, and unless you use the 'centroid' or 'average linkage' method, cluster centroids don't need to be computed when performing the agglomerative hierarchical clustering . But you can always compute it manually by simply averaging all samples from a cluster (for each feature).

Best.
Sebastian

> On Oct 20, 2017, at 9:13 AM, Sema Atasever <s.atasever at gmail.com> wrote:
> 
> Dear scikit-learn members,
> 
> I am using SciPy's hierarchical agglomerative clustering methods to cluster a 
> 1000 x 22 matrix of features, after clustering my data set with scipy.cluster.hierarchy.linkage and and assigning each sample to a cluster,
> I can't seem to figure out how to get the centroid from the resulting clusters. 
> I would like to extract one element or a few out of each cluster, which is the closest to that cluster's centroid.
> 
> Below follows my code:
> 
> D=np.loadtxt(open("C:\dataset.txt", "rb"), delimiter=";")
> Y = hierarchy.linkage(D, 'ward')
> assignments = hierarchy.fcluster(Y, 5, criterion="maxclust")
> 
> I am taking my matrix of features, computing the euclidean distance between them, and then passing them onto the hierarchical clustering method. From there, I am creating flat clusters, with a maximum of 5 clusters
> 
> Now, based on the flat clusters assignments, how do I get the 1 x 22 centroid that represents each flat cluster?
> 
> Best.
> <SciPy_python_codes.py><dataset.txt><assignments.out>_______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn



More information about the scikit-learn mailing list