[scikit-learn] Inconsistencies in clustering documentations
Beaugnon Anael
anael.beaugnon at ssi.gouv.fr
Wed May 23 12:53:44 EDT 2018
Thanks for your answers.
DBSCAN has the correct doc because the fit_predict method is not
inherited, but it has its own implementation (because of the additional
parameter sample_weight).
I have forked the sklearn repo. I work in a virtualenv (virtualenv venv3
--no-site-packages --python python3.5).
*python3 setup.py install* completes, but *make test-code* and *make
doc-noplot* fail.
Do you have any idea about the origin of these errors ?
I intend to install work on the python3 version. When I run make
test-code, I am surprise that there are references to /usr/lib/python2.7/.
Thanks for your help,
Anaël Beaugnon
*
**make doc-noplot*
Exception occurred:
File "/usr/lib/python3.5/zipfile.py", line 1435, in write
st = os.stat(filename)
FileNotFoundError: [Errno 2] No such file or directory:
'/<dir>/scikit-learn/doc/auto_examples/plot_digits_pipe.ipynb'
The full traceback has been saved in /tmp/sphinx-err-ivjeif0v.log, if
you want to report the issue to the developers.
Please also report this if it was a user error, so that a better error
message can be provided next time.
A bug report can be filed in the tracker at
<https://github.com/sphinx-doc/sphinx/issues>. Thanks!
File /tmp/sphinx-err-ivjeif0v.log
# Sphinx version: 1.7.4
# Python version: 3.5.3 (CPython)
# Docutils version: 0.14
# Jinja2 version: 2.10
# Last messages:
# Loaded extensions:
Traceback (most recent call last):
File
"/<dir>/scikit-learn/venv3/lib/python3.5/site-packages/sphinx/cmdline.py",
line 303, in main
args.warningiserror, args.tags, args.verbosity, args.jobs)
File
"/<dir>/scikit-learn/venv3/lib/python3.5/site-packages/sphinx/application.py",
line 233, in __init__
self._init_builder()
File
"/<dir>/scikit-learn/venv3/lib/python3.5/site-packages/sphinx/application.py",
line 311, in _init_builder
self.emit('builder-inited')
File
"/<dir>/scikit-learn/venv3/lib/python3.5/site-packages/sphinx/application.py",
line 444, in emit
return self.events.emit(event, self, *args)
File
"/<dir>/scikit-learn/venv3/lib/python3.5/site-packages/sphinx/events.py",
line 79, in emit
results.append(callback(*args))
File
"/<dir>/scikit-learn/venv3/lib/python3.5/site-packages/sphinx_gallery/gen_gallery.py",
line 247, in generate_gallery_rst
download_fhindex = generate_zipfiles(gallery_dir)
File
"/<dir>/scikit-learn/venv3/lib/python3.5/site-packages/sphinx_gallery/downloads.py",
line 115, in generate_zipfiles
jy_zipfile = python_zip(listdir, gallery_dir, ".ipynb")
File
"/<dir>/scikit-learn/venv3/lib/python3.5/site-packages/sphinx_gallery/downloads.py",
line 69, in python_zip
zipf.write(file_src, os.path.relpath(file_src, gallery_path))
File "/usr/lib/python3.5/zipfile.py", line 1435, in write
st = os.stat(filename)
FileNotFoundError: [Errno 2] No such file or directory:
'/<dir>/scikit-learn/doc/auto_examples/plot_digits_pipe.ipynb'
*make test-code*
=======================================================================
ERRORS
=======================================================================
_________________________________________________________________ ERROR
collecting
__________________________________________________________________
/usr/lib/python2.7/dist-packages/py/_path/common.py:366: in visit
for x in Visitor(fil, rec, ignore, bf, sort).gen(self):
/usr/lib/python2.7/dist-packages/py/_path/common.py:405: in gen
if p.check(dir=1) and (rec is None or rec(p))])
/usr/lib/python2.7/dist-packages/_pytest/main.py:682: in _recurse
ihook = self.gethookproxy(path)
/usr/lib/python2.7/dist-packages/_pytest/main.py:587: in gethookproxy
my_conftestmodules = pm._getconftestmodules(fspath)
/usr/lib/python2.7/dist-packages/_pytest/config.py:339: in
_getconftestmodules
mod = self._importconftest(conftestpath)
/usr/lib/python2.7/dist-packages/_pytest/config.py:364: in _importconftest
raise ConftestImportFailure(conftestpath, sys.exc_info())
E ConftestImportFailure: ImportError('No module named
_check_build\n___________________________________________________________________________\nContents
of /<dir>/scikit-learn/sklearn/__check_build:\n__pycache__
setup.py __init__.pyc\n_check_build.pyx
_check_build.cpython-35m-x86_64-linux-gnu.so_check_build.c\n__init__.py\n___________________________________________________________________________\nIt
seems that scikit-learn has not been built correctly.\n\nIf you have
installed scikit-learn from source, please do not forget\nto build the
package before using it: run `python setup.py install` or\n`make` in the
source directory.\n\nIf you have used an installer, please check that it
is suited for your\nPython version, your operating system and your
platform.',)
E File "/<dir>/scikit-learn/sklearn/__init__.py", line 63, in <module>
E from . import __check_build
E File "/<dir>/scikit-learn/sklearn/__check_build/__init__.py", line
46, in <module>
E raise_build_error(e)
E File "/<dir>/scikit-learn/sklearn/__check_build/__init__.py", line
41, in raise_build_error
E %s""" % (e, local_dir, ''.join(dir_content).strip(), msg))
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Interrupted: 1
errors during collection
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
============================================================== 1 error
in 0.27 seconds
===============================================================
Le 23/05/2018 à 18:09, Andreas Mueller a écrit :
>
> +1 for a PR on fit_predict docs. This is probably due to the
> inheritance structure.
> Though it's weird that DBSCAN has the correct docs.
>
> I'm not sure about renaming affinity, but we can discuss that. I agree
> it's misleading.
>
>
> On 5/23/18 8:01 AM, Tom DLT wrote:
>> Hi Anaël,
>>
>> Thanks for spotting these inconsistencies.
>> You are very welcome to open pull-requests and/or issues on the
>> GitHub tracker
>> (cf. http://scikit-learn.org/stable/developers/contributing.html#contributing-code)
>> The documentation issue should be straightforward.
>> The parameter renaming would need a proper deprecation cycle (cf
>> http://scikit-learn.org/stable/developers/contributing.html#deprecation).
>>
>> See you on GitHub,
>>
>> Tom
>>
>> 2018-05-23 11:50 GMT+02:00 Beaugnon Anael <anael.beaugnon at ssi.gouv.fr
>> <mailto:anael.beaugnon at ssi.gouv.fr>>:
>>
>> Dear all,
>>
>> Three clustering algorithms can take as input distance or
>> similarity matrices instead of the observations
>> (AgglomerativeClustering
>> <http://scikit-learn.org/stable/modules/generated/sklearn.cluster.AgglomerativeClustering.html#sklearn.cluster.AgglomerativeClustering>,
>> AffinityPropagation
>> <http://scikit-learn.org/stable/modules/generated/sklearn.cluster.AffinityPropagation.html#sklearn.cluster.AffinityPropagation>,
>> and DBSCAN
>> <http://scikit-learn.org/stable/modules/generated/sklearn.cluster.DBSCAN.html#sklearn.cluster.DBSCAN>),
>> but there are inconsistencies in their documentations.
>>
>>
>> *DBSCAN :*
>> The documentation explains clearly how to run DBSCAN with a
>> precomputed distance matrix.
>> Constructor:/
>> metric: If metric is “precomputed”, X is assumed to be a
>> distance matrix and must be square.
>> /
>> fit / fit_predict /:
>> X: A feature array, or array of distances between samples
>> if |metric='precomputed'|.
>>
>>
>> /
>> *AffinityPropagation :
>> *
>> Constructor:
>> affinity: /Which affinity to use. At the moment
>> |precomputed| and |euclidean| are supported. |euclidean| uses the
>> negative squared euclidean distance between points.
>> /
>> fit : /
>> X: //Data matrix or, if affinity is |precomputed|, matrix
>> of similarities / affinities.
>> /
>> fit_predict :/
>> /
>> / X: Input data. /
>> X can also be a matrix of similarities ? fit and
>> fit_predict should share the same documentation for the input X ?/
>>
>>
>> /
>> *AgglomerativeClustering :
>> * Constructor:
>> /affinity: Metric used to compute the linkage. Can be
>> “euclidean”, “l1”, “l2”, “manhattan”, “cosine”, or ‘precomputed’.
>> If linkage is “ward”, only “euclidean” is accepted/.
>> The name of the parameter 'affinity' seems misleading,
>> since it does not correspond to similarity functions, but to
>> distance functions.
>> fit : /
>> X: //The samples a.k.a. observations./
>> fit_predict :/
>> // X: //Input data.
>> / The documentation of fit and fit_predict does not
>> specify that X can also be a matrix of distances.
>>
>> The user may be confused whether he/she should provide a distance
>> or a similarity matrix to AgglomerativeClustering.
>> The documentation of fit and fit_predict can be easily updated.
>> As for the name of the 'affinity' parameter, it is more difficult
>> since it involves an API change.
>>
>>
>> What do you think of these potential updates of the documentation ?
>>
>> Cheers,
>>
>> Anaël Beaugnon
>> //
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20180523/c49193a1/attachment-0001.html>
More information about the scikit-learn
mailing list