[scikit-learn] hierarchical clustering

Roman Yurchak rth.yurchak at gmail.com
Fri Nov 4 05:28:13 EDT 2016


Hi Jaime,

Alternatively, in scikit learn I think, you could use
   hac = AgglomerativeClustering(n_clusters, linkage="ward")
   hac.fit(data)
   clusters = hac.labels_
there in an example on how to plot a dendrogram from this in
   https://github.com/scikit-learn/scikit-learn/pull/3464

AgglomerativeClustering internally calls scikit learn's version of
cut_tree. I would be curious to know whether this is equivalent to
scipy's fcluster.

Roman

On 03/11/16 23:12, Jaime Lopez Carvajal wrote:
> Hi Juan,
> 
> The fcluster function was that I needed. I can now proceed from here to
> classify images. 
> Thank you very much, 
> 
> Jaime
> 
> On Thu, Nov 3, 2016 at 5:00 PM, Juan Nunez-Iglesias <jni.soma at gmail.com
> <mailto:jni.soma at gmail.com>> wrote:
> 
>     Hi Jaime,
> 
>     From /Elegant SciPy/:
> 
>     """
>     The *fcluster* function takes a linkage matrix, as returned by
>     linkage, and a threshold, and returns cluster identities. It's
>     difficult to know a-priori what the threshold should be, but we can
>     obtain the appropriate threshold for a fixed number of clusters by
>     checking the distances in the linkage matrix.
> 
>     from scipy.cluster.hierarchy import fcluster
>     n_clusters = 3
>     threshold_distance = (Z[-n_clusters, 2] + Z[-n_clusters+1, 2]) / 2
>     clusters = fcluster(Z, threshold_distance, 'distance')
> 
>     """
> 
>     As an aside, I imagine this question is better placed in the SciPy
>     mailing list than scikit-learn (which has its own hierarchical
>     clustering API).
> 
>     Juan.
> 
>     On Fri, Nov 4, 2016 at 2:16 AM, Jaime Lopez Carvajal
>     <jalopcar at gmail.com <mailto:jalopcar at gmail.com>> wrote:
> 
>         Hi there,
> 
>         I am trying to do image classification using hierarchical
>         clustering.
>         So, I have my data, and apply this steps:
> 
>         from scipy.cluster.hierarchy import dendrogram, linkage
> 
>         data1 = np.array(data) 
>         Z = linkage(data, 'ward')
>         dendrogram(Z, truncate_mode='lastp',  p=12,
>         show_leaf_counts=False, leaf_rotation=90.,
>         leaf_font_size=12.,show_contracted=True)
>         plt.show()
> 
>         So, I can see the dendrogram with 12 clusters as I want, but I
>         dont know how to use this to classify the image.
>         Also, I understand that funtion cluster.hierarchy.cut_tree(Z,
>         n_clusters), that cut the tree at that number of clusters, but
>         again I dont know how to procedd from there. I would like to
>         have something like: cluster = predict(Z, instance) 
> 
>         Any advice or direction would be really appreciate, 
> 
>         Thanks in advance, Jaime
> 
> 
>         -- 
>         /*Jaime Lopez Carvajal
>         */
> 
>         _______________________________________________
>         scikit-learn mailing list
>         scikit-learn at python.org <mailto:scikit-learn at python.org>
>         https://mail.python.org/mailman/listinfo/scikit-learn
>         <https://mail.python.org/mailman/listinfo/scikit-learn>
> 
> 
> 
>     _______________________________________________
>     scikit-learn mailing list
>     scikit-learn at python.org <mailto:scikit-learn at python.org>
>     https://mail.python.org/mailman/listinfo/scikit-learn
>     <https://mail.python.org/mailman/listinfo/scikit-learn>
> 
> 
> 
> 
> -- 
> /*Jaime Lopez Carvajal
> */
> 
> 
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
> 



More information about the scikit-learn mailing list