[scikit-learn] Inconsistencies in clustering documentations
Andreas Mueller
t3kcit at gmail.com
Wed May 23 12:09:41 EDT 2018
+1 for a PR on fit_predict docs. This is probably due to the inheritance
structure.
Though it's weird that DBSCAN has the correct docs.
I'm not sure about renaming affinity, but we can discuss that. I agree
it's misleading.
On 5/23/18 8:01 AM, Tom DLT wrote:
> Hi Anaël,
>
> Thanks for spotting these inconsistencies.
> You are very welcome to open pull-requests and/or issues on the GitHub
> tracker (cf.
> http://scikit-learn.org/stable/developers/contributing.html#contributing-code)
> The documentation issue should be straightforward.
> The parameter renaming would need a proper deprecation cycle (cf
> http://scikit-learn.org/stable/developers/contributing.html#deprecation).
>
> See you on GitHub,
>
> Tom
>
> 2018-05-23 11:50 GMT+02:00 Beaugnon Anael <anael.beaugnon at ssi.gouv.fr
> <mailto:anael.beaugnon at ssi.gouv.fr>>:
>
> Dear all,
>
> Three clustering algorithms can take as input distance or
> similarity matrices instead of the observations
> (AgglomerativeClustering
> <http://scikit-learn.org/stable/modules/generated/sklearn.cluster.AgglomerativeClustering.html#sklearn.cluster.AgglomerativeClustering>,
> AffinityPropagation
> <http://scikit-learn.org/stable/modules/generated/sklearn.cluster.AffinityPropagation.html#sklearn.cluster.AffinityPropagation>,
> and DBSCAN
> <http://scikit-learn.org/stable/modules/generated/sklearn.cluster.DBSCAN.html#sklearn.cluster.DBSCAN>),
> but there are inconsistencies in their documentations.
>
>
> *DBSCAN :*
> The documentation explains clearly how to run DBSCAN with a
> precomputed distance matrix.
> Constructor:/
> metric: If metric is “precomputed”, X is assumed to be a
> distance matrix and must be square.
> /
> fit / fit_predict /:
> X: A feature array, or array of distances between samples
> if |metric='precomputed'|.
>
>
> /
> *AffinityPropagation :
> *
> Constructor:
> affinity: /Which affinity to use. At the moment
> |precomputed| and |euclidean| are supported. |euclidean| uses the
> negative squared euclidean distance between points.
> /
> fit : /
> X: //Data matrix or, if affinity is |precomputed|, matrix
> of similarities / affinities.
> /
> fit_predict :/
> /
> / X: Input data. /
> X can also be a matrix of similarities ? fit and
> fit_predict should share the same documentation for the input X ?/
>
>
> /
> *AgglomerativeClustering :
> * Constructor:
> /affinity: Metric used to compute the linkage. Can be “euclidean”,
> “l1”, “l2”, “manhattan”, “cosine”, or ‘precomputed’. If linkage is
> “ward”, only “euclidean” is accepted/.
> The name of the parameter 'affinity' seems misleading, since it
> does not correspond to similarity functions, but to distance
> functions.
> fit : /
> X: //The samples a.k.a. observations./
> fit_predict :/
> // X: //Input data.
> /The documentation of fit and fit_predict does not specify that X
> can also be a matrix of distances.
>
> The user may be confused whether he/she should provide a distance
> or a similarity matrix to AgglomerativeClustering.
> The documentation of fit and fit_predict can be easily updated. As
> for the name of the 'affinity' parameter, it is more difficult
> since it involves an API change.
>
>
> What do you think of these potential updates of the documentation ?
>
> Cheers,
>
> Anaël Beaugnon
> //
>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org <mailto:scikit-learn at python.org>
> https://mail.python.org/mailman/listinfo/scikit-learn
> <https://mail.python.org/mailman/listinfo/scikit-learn>
>
>
>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20180523/c6ab498c/attachment-0001.html>
More information about the scikit-learn
mailing list