[scikit-learn] Inconsistencies in clustering documentations

Andreas Mueller t3kcit at gmail.com
Wed May 23 12:09:41 EDT 2018


+1 for a PR on fit_predict docs. This is probably due to the inheritance 
structure.
Though it's weird that DBSCAN has the correct docs.

I'm not sure about renaming affinity, but we can discuss that. I agree 
it's misleading.


On 5/23/18 8:01 AM, Tom DLT wrote:
> Hi Anaël,
>
> Thanks for spotting these inconsistencies.
> You are very welcome to open pull-requests and/or issues on the GitHub 
> tracker (cf. 
> http://scikit-learn.org/stable/developers/contributing.html#contributing-code)
> The documentation issue should be straightforward.
> The parameter renaming would need a proper deprecation cycle (cf 
> http://scikit-learn.org/stable/developers/contributing.html#deprecation).
>
> See you on GitHub,
>
> Tom
>
> 2018-05-23 11:50 GMT+02:00 Beaugnon Anael <anael.beaugnon at ssi.gouv.fr 
> <mailto:anael.beaugnon at ssi.gouv.fr>>:
>
>     Dear all,
>
>     Three clustering algorithms can take as input distance or
>     similarity matrices instead of the observations
>     (AgglomerativeClustering
>     <http://scikit-learn.org/stable/modules/generated/sklearn.cluster.AgglomerativeClustering.html#sklearn.cluster.AgglomerativeClustering>,
>     AffinityPropagation
>     <http://scikit-learn.org/stable/modules/generated/sklearn.cluster.AffinityPropagation.html#sklearn.cluster.AffinityPropagation>,
>     and DBSCAN
>     <http://scikit-learn.org/stable/modules/generated/sklearn.cluster.DBSCAN.html#sklearn.cluster.DBSCAN>),
>     but there are inconsistencies in their documentations.
>
>
>     *DBSCAN :*
>        The documentation explains clearly how to run DBSCAN with a
>     precomputed distance matrix.
>        Constructor:/
>            metric: If metric is “precomputed”, X is assumed to be a
>     distance matrix and must be square.
>     /
>        fit / fit_predict /:
>            X: A feature array, or array of distances between samples
>     if |metric='precomputed'|.
>
>
>     /
>     *AffinityPropagation :
>     *
>         Constructor:
>             affinity: /Which affinity to use. At the moment
>     |precomputed| and |euclidean| are supported. |euclidean| uses the
>     negative squared euclidean distance between points.
>     /
>         fit : /
>             X: //Data matrix or, if affinity is |precomputed|, matrix
>     of similarities / affinities.
>     /
>         fit_predict :/
>     /
>     /        X: Input data. /
>             X can also be a matrix of similarities ? fit and
>     fit_predict should share the same documentation for the input X ?/
>
>
>     /
>     *AgglomerativeClustering :
>     *    Constructor:
>     /affinity: Metric used to compute the linkage. Can be “euclidean”,
>     “l1”, “l2”, “manhattan”, “cosine”, or ‘precomputed’. If linkage is
>     “ward”, only “euclidean” is accepted/.
>     The name of the parameter 'affinity' seems misleading, since it
>     does not correspond to similarity functions, but to distance
>     functions.
>         fit : /
>             X: //The samples a.k.a. observations./
>         fit_predict :/
>     //        X: //Input data.
>     /The documentation of fit and fit_predict does not specify that X
>     can also be a matrix of distances.
>
>     The user may be confused whether he/she should provide a distance
>     or a similarity matrix to AgglomerativeClustering.
>     The documentation of fit and fit_predict can be easily updated. As
>     for the name of the 'affinity' parameter, it is more difficult
>     since it involves an API change.
>
>
>     What do you think of these potential updates of the documentation ?
>
>     Cheers,
>
>     Anaël Beaugnon
>     //
>
>     _______________________________________________
>     scikit-learn mailing list
>     scikit-learn at python.org <mailto:scikit-learn at python.org>
>     https://mail.python.org/mailman/listinfo/scikit-learn
>     <https://mail.python.org/mailman/listinfo/scikit-learn>
>
>
>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20180523/c6ab498c/attachment-0001.html>


More information about the scikit-learn mailing list