<div dir="ltr">Hello Pavel and Joel,<div><br></div><div>I forked the repository and cloned it on my machine. I'm using pycharm on a Mac, and while looking at text.py, I'm getting an unresolved reference for "xrange" at line 28: </div><div><br></div><div><pre style="color:rgb(0,0,0);font-family:Menlo;font-size:9pt"><span style="color:rgb(0,0,128);font-weight:bold">from </span>..externals.six.moves <span style="color:rgb(0,0,128);font-weight:bold">import </span>range</pre><pre><div style="color:rgb(34,34,34);font-family:arial,sans-serif;white-space:normal">Pycharm says Function 'six.py' is too large to analyze, so I'm not sure if this error is somehow related to that. I decided to try to build the code as a sanity check but I can't find any reliable instructions as to how to do that. Naively, I opened terminal and cd to the directory above "scikit-learn" folder (where I had cloned my fork) and tried to run:</div><div style="color:rgb(34,34,34);font-family:arial,sans-serif;white-space:normal"><br></div><div><p style="color:rgb(34,34,34);white-space:normal;font-family:Menlo;margin:0px;font-size:11px;line-height:normal"><span style="">$ python3 setup.py install</span></p><p style="color:rgb(34,34,34);white-space:normal;font-family:Menlo;margin:0px;font-size:11px;line-height:normal"><span style=""><br></span></p><p style="color:rgb(34,34,34);white-space:normal;font-family:Menlo;margin:0px;line-height:normal"><span style="font-family:arial,sans-serif">Which didn't work. I got this error:</span><span style="font-size:11px"><br></span></p><p style="color:rgb(34,34,34);white-space:normal;font-family:Menlo;margin:0px;line-height:normal"><span style="font-family:arial,sans-serif"><br></span></p><p style="color:rgb(34,34,34);white-space:normal;font-family:Menlo;margin:0px;font-size:11px;line-height:normal"><span style="">ImportError: No module named 'sklearn'</span></p><p style="color:rgb(34,34,34);white-space:normal;font-family:Menlo;margin:0px;font-size:11px;line-height:normal"><span style=""><br></span></p><p style="margin:0px;line-height:normal"><font face="arial, sans-serif"><span style="white-space:normal">Can someone point me in the right direction? And how can the code try to import sklearn if it doesn't exist yet? Note I haven't installed the release version of scikit-learn using pip or any other tool, but I should be able to bootstrap it from the source code, right? </span></font></p><p style="margin:0px;line-height:normal"><font face="arial, sans-serif"><span style="white-space:normal"><br></span></font></p><p style="margin:0px;line-height:normal"><font face="arial, sans-serif"><span style="white-space:normal">Here's the full error message if it helps. Forgive me if it's a silly mistake, but I haven't found any reliable guidelines online. </span></font></p><p style="margin:0px;line-height:normal"><font face="arial, sans-serif"><span style="white-space:normal"><br></span></font></p><p style="margin:0px;font-size:11px;line-height:normal;font-family:Menlo"><span style="">  File "setup.py", line 84, in <module></span></p><p style="margin:0px;line-height:normal">
</p><p style="margin:0px;font-size:11px;line-height:normal;font-family:Menlo"><span style="">    from numpy.distutils.core import setup</span></p><p style="margin:0px;line-height:normal">
</p><p style="margin:0px;font-size:11px;line-height:normal;font-family:Menlo"><span style="">  File "/System/Library/Frameworks/Python.framework/Versions/2.7/Extras/lib/python/numpy/distutils/core.py", line 26, in <module></span></p><p style="margin:0px;line-height:normal">
</p><p style="margin:0px;font-size:11px;line-height:normal;font-family:Menlo"><span style="">    from numpy.distutils.command import config, config_compiler, \</span></p><p style="margin:0px;line-height:normal">
</p><p style="margin:0px;font-size:11px;line-height:normal;font-family:Menlo"><span style="">  File "/System/Library/Frameworks/Python.framework/Versions/2.7/Extras/lib/python/numpy/distutils/command/build_ext.py", line 18, in <module></span></p><p style="margin:0px;line-height:normal">
</p><p style="margin:0px;font-size:11px;line-height:normal;font-family:Menlo"><span style="">    from numpy.distutils.system_info import combine_paths</span></p><p style="margin:0px;line-height:normal">
</p><p style="margin:0px;font-size:11px;line-height:normal;font-family:Menlo"><span style="">  File "/System/Library/Frameworks/Python.framework/Versions/2.7/Extras/lib/python/numpy/distutils/system_info.py", line 232, in <module></span></p><p style="margin:0px;line-height:normal">
</p><p style="margin:0px;font-size:11px;line-height:normal;font-family:Menlo"><span style="">    triplet = str(p.communicate()[0].decode().strip())</span></p><p style="margin:0px;line-height:normal">
</p><p style="margin:0px;font-size:11px;line-height:normal;font-family:Menlo"><span style="">  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/subprocess.py", line 791, in communicate</span></p><p style="margin:0px;line-height:normal">
</p><p style="margin:0px;font-size:11px;line-height:normal;font-family:Menlo"><span style="">    stdout = _eintr_retry_call(self.stdout.read)</span></p><p style="margin:0px;line-height:normal">
</p><p style="margin:0px;font-size:11px;line-height:normal;font-family:Menlo"><span style="">  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/subprocess.py", line 476, in _eintr_retry_call</span></p><p style="margin:0px;line-height:normal">
</p><p style="margin:0px;font-size:11px;line-height:normal;font-family:Menlo"><span style="">    return func(*args)</span></p><p style="margin:0px;line-height:normal">
</p><p style="margin:0px;font-size:11px;line-height:normal;font-family:Menlo"><span style="">KeyboardInterrupt</span></p><p style="margin:0px;line-height:normal">
</p><p style="margin:0px;font-size:11px;line-height:normal;font-family:Menlo"><span style="">Basils-MacBook-Pro:sklearn basilbeirouti$ python3 setup.py install</span></p><p style="margin:0px;line-height:normal">
</p><p style="margin:0px;font-size:11px;line-height:normal;font-family:Menlo"><span style="">non-existing path in '__check_build': '_check_build.c'</span></p><p style="margin:0px;line-height:normal">
</p><p style="margin:0px;font-size:11px;line-height:normal;font-family:Menlo"><span style="">Appending sklearn.__check_build configuration to sklearn</span></p><p style="margin:0px;line-height:normal">
</p><p style="margin:0px;font-size:11px;line-height:normal;font-family:Menlo"><span style="">Ignoring attempt to set 'name' (from 'sklearn' to 'sklearn.__check_build')</span></p><p style="margin:0px;line-height:normal">
</p><p style="margin:0px;font-size:11px;line-height:normal;font-family:Menlo"><span style="">Appending sklearn._build_utils configuration to sklearn</span></p><p style="margin:0px;line-height:normal">
</p><p style="margin:0px;font-size:11px;line-height:normal;font-family:Menlo"><span style="">Ignoring attempt to set 'name' (from 'sklearn' to 'sklearn._build_utils')</span></p><p style="margin:0px;line-height:normal">
</p><p style="margin:0px;font-size:11px;line-height:normal;font-family:Menlo"><span style="">Appending sklearn.covariance configuration to sklearn</span></p><p style="margin:0px;line-height:normal">
</p><p style="margin:0px;font-size:11px;line-height:normal;font-family:Menlo"><span style="">Ignoring attempt to set 'name' (from 'sklearn' to 'sklearn.covariance')</span></p><p style="margin:0px;line-height:normal">
</p><p style="margin:0px;font-size:11px;line-height:normal;font-family:Menlo"><span style="">Appending sklearn.covariance/tests configuration to sklearn</span></p><p style="margin:0px;line-height:normal">
</p><p style="margin:0px;font-size:11px;line-height:normal;font-family:Menlo"><span style="">Ignoring attempt to set 'name' (from 'sklearn' to 'sklearn.covariance/tests')</span></p><p style="margin:0px;line-height:normal">
</p><p style="margin:0px;font-size:11px;line-height:normal;font-family:Menlo"><span style="">Appending sklearn.cross_decomposition configuration to sklearn</span></p><p style="margin:0px;line-height:normal">
</p><p style="margin:0px;font-size:11px;line-height:normal;font-family:Menlo"><span style="">Ignoring attempt to set 'name' (from 'sklearn' to 'sklearn.cross_decomposition')</span></p><p style="margin:0px;line-height:normal">
</p><p style="margin:0px;font-size:11px;line-height:normal;font-family:Menlo"><span style="">Appending sklearn.cross_decomposition/tests configuration to sklearn</span></p><p style="margin:0px;line-height:normal">
</p><p style="margin:0px;font-size:11px;line-height:normal;font-family:Menlo"><span style="">Ignoring attempt to set 'name' (from 'sklearn' to 'sklearn.cross_decomposition/tests')</span></p><p style="margin:0px;line-height:normal">
</p><p style="margin:0px;font-size:11px;line-height:normal;font-family:Menlo"><span style="">Appending sklearn.feature_selection configuration to sklearn</span></p><p style="margin:0px;line-height:normal">
</p><p style="margin:0px;font-size:11px;line-height:normal;font-family:Menlo"><span style="">Ignoring attempt to set 'name' (from 'sklearn' to 'sklearn.feature_selection')</span></p><p style="margin:0px;line-height:normal">
</p><p style="margin:0px;font-size:11px;line-height:normal;font-family:Menlo"><span style="">Appending sklearn.feature_selection/tests configuration to sklearn</span></p><p style="margin:0px;line-height:normal">
</p><p style="margin:0px;font-size:11px;line-height:normal;font-family:Menlo"><span style="">Ignoring attempt to set 'name' (from 'sklearn' to 'sklearn.feature_selection/tests')</span></p><p style="margin:0px;line-height:normal">
</p><p style="margin:0px;font-size:11px;line-height:normal;font-family:Menlo"><span style="">Appending sklearn.gaussian_process configuration to sklearn</span></p><p style="margin:0px;line-height:normal">
</p><p style="margin:0px;font-size:11px;line-height:normal;font-family:Menlo"><span style="">Ignoring attempt to set 'name' (from 'sklearn' to 'sklearn.gaussian_process')</span></p><p style="margin:0px;line-height:normal">
</p><p style="margin:0px;font-size:11px;line-height:normal;font-family:Menlo"><span style="">Appending sklearn.gaussian_process/tests configuration to sklearn</span></p><p style="margin:0px;line-height:normal">
</p><p style="margin:0px;font-size:11px;line-height:normal;font-family:Menlo"><span style="">Ignoring attempt to set 'name' (from 'sklearn' to 'sklearn.gaussian_process/tests')</span></p><p style="margin:0px;line-height:normal">
</p><p style="margin:0px;font-size:11px;line-height:normal;font-family:Menlo"><span style="">Appending sklearn.mixture configuration to sklearn</span></p><p style="margin:0px;line-height:normal">
</p><p style="margin:0px;font-size:11px;line-height:normal;font-family:Menlo"><span style="">Ignoring attempt to set 'name' (from 'sklearn' to 'sklearn.mixture')</span></p><p style="margin:0px;line-height:normal">
</p><p style="margin:0px;font-size:11px;line-height:normal;font-family:Menlo"><span style="">Appending sklearn.mixture/tests configuration to sklearn</span></p><p style="margin:0px;line-height:normal">
</p><p style="margin:0px;font-size:11px;line-height:normal;font-family:Menlo"><span style="">Ignoring attempt to set 'name' (from 'sklearn' to 'sklearn.mixture/tests')</span></p><p style="margin:0px;line-height:normal">
</p><p style="margin:0px;font-size:11px;line-height:normal;font-family:Menlo"><span style="">Appending sklearn.model_selection configuration to sklearn</span></p><p style="margin:0px;line-height:normal">
</p><p style="margin:0px;font-size:11px;line-height:normal;font-family:Menlo"><span style="">Ignoring attempt to set 'name' (from 'sklearn' to 'sklearn.model_selection')</span></p><p style="margin:0px;line-height:normal">
</p><p style="margin:0px;font-size:11px;line-height:normal;font-family:Menlo"><span style="">Appending sklearn.model_selection/tests configuration to sklearn</span></p><p style="margin:0px;line-height:normal">
</p><p style="margin:0px;font-size:11px;line-height:normal;font-family:Menlo"><span style="">Ignoring attempt to set 'name' (from 'sklearn' to 'sklearn.model_selection/tests')</span></p><p style="margin:0px;line-height:normal">
</p><p style="margin:0px;font-size:11px;line-height:normal;font-family:Menlo"><span style="">Appending sklearn.neural_network configuration to sklearn</span></p><p style="margin:0px;line-height:normal">
</p><p style="margin:0px;font-size:11px;line-height:normal;font-family:Menlo"><span style="">Ignoring attempt to set 'name' (from 'sklearn' to 'sklearn.neural_network')</span></p><p style="margin:0px;line-height:normal">
</p><p style="margin:0px;font-size:11px;line-height:normal;font-family:Menlo"><span style="">Appending sklearn.neural_network/tests configuration to sklearn</span></p><p style="margin:0px;line-height:normal">
</p><p style="margin:0px;font-size:11px;line-height:normal;font-family:Menlo"><span style="">Ignoring attempt to set 'name' (from 'sklearn' to 'sklearn.neural_network/tests')</span></p><p style="margin:0px;line-height:normal">
</p><p style="margin:0px;font-size:11px;line-height:normal;font-family:Menlo"><span style="">Appending sklearn.preprocessing configuration to sklearn</span></p><p style="margin:0px;line-height:normal">
</p><p style="margin:0px;font-size:11px;line-height:normal;font-family:Menlo"><span style="">Ignoring attempt to set 'name' (from 'sklearn' to 'sklearn.preprocessing')</span></p><p style="margin:0px;line-height:normal">
</p><p style="margin:0px;font-size:11px;line-height:normal;font-family:Menlo"><span style="">Appending sklearn.preprocessing/tests configuration to sklearn</span></p><p style="margin:0px;line-height:normal">
</p><p style="margin:0px;font-size:11px;line-height:normal;font-family:Menlo"><span style="">Ignoring attempt to set 'name' (from 'sklearn' to 'sklearn.preprocessing/tests')</span></p><p style="margin:0px;line-height:normal">
</p><p style="margin:0px;font-size:11px;line-height:normal;font-family:Menlo"><span style="">Appending sklearn.semi_supervised configuration to sklearn</span></p><p style="margin:0px;line-height:normal">
</p><p style="margin:0px;font-size:11px;line-height:normal;font-family:Menlo"><span style="">Ignoring attempt to set 'name' (from 'sklearn' to 'sklearn.semi_supervised')</span></p><p style="margin:0px;line-height:normal">
</p><p style="margin:0px;font-size:11px;line-height:normal;font-family:Menlo"><span style="">Appending sklearn.semi_supervised/tests configuration to sklearn</span></p><p style="margin:0px;line-height:normal">
</p><p style="margin:0px;font-size:11px;line-height:normal;font-family:Menlo"><span style="">Ignoring attempt to set 'name' (from 'sklearn' to 'sklearn.semi_supervised/tests')</span></p><p style="margin:0px;line-height:normal">
</p><p style="margin:0px;font-size:11px;line-height:normal;font-family:Menlo"><span style="">Warning: Assuming default configuration (./_build_utils/{setup__build_utils,setup}.py was not found)Warning: Assuming default configuration (./covariance/{setup_covariance,setup}.py was not found)Warning: Assuming default configuration (./covariance/tests/setup_covariance/{setup_covariance/tests,setup}.py was not found)Warning: Assuming default configuration (./cross_decomposition/{setup_cross_decomposition,setup}.py was not found)Warning: Assuming default configuration (./cross_decomposition/tests/setup_cross_decomposition/{setup_cross_decomposition/tests,setup}.py was not found)Warning: Assuming default configuration (./feature_selection/{setup_feature_selection,setup}.py was not found)Warning: Assuming default configuration (./feature_selection/tests/setup_feature_selection/{setup_feature_selection/tests,setup}.py was not found)Warning: Assuming default configuration (./gaussian_process/{setup_gaussian_process,setup}.py was not found)Warning: Assuming default configuration (./gaussian_process/tests/setup_gaussian_process/{setup_gaussian_process/tests,setup}.py was not found)Warning: Assuming default configuration (./mixture/{setup_mixture,setup}.py was not found)Warning: Assuming default configuration (./mixture/tests/setup_mixture/{setup_mixture/tests,setup}.py was not found)Warning: Assuming default configuration (./model_selection/{setup_model_selection,setup}.py was not found)Warning: Assuming default configuration (./model_selection/tests/setup_model_selection/{setup_model_selection/tests,setup}.py was not found)Warning: Assuming default configuration (./neural_network/{setup_neural_network,setup}.py was not found)Warning: Assuming default configuration (./neural_network/tests/setup_neural_network/{setup_neural_network/tests,setup}.py was not found)Warning: Assuming default configuration (./preprocessing/{setup_preprocessing,setup}.py was not found)Warning: Assuming default configuration (./preprocessing/tests/setup_preprocessing/{setup_preprocessing/tests,setup}.py was not found)Warning: Assuming default configuration (./semi_supervised/{setup_semi_supervised,setup}.py was not found)Warning: Assuming default configuration (./semi_supervised/tests/setup_semi_supervised/{setup_semi_supervised/tests,setup}.py was not found)Traceback (most recent call last):</span></p><p style="margin:0px;line-height:normal">
</p><p style="margin:0px;font-size:11px;line-height:normal;font-family:Menlo"><span style="">  File "setup.py", line 85, in <module></span></p><p style="margin:0px;line-height:normal">
</p><p style="margin:0px;font-size:11px;line-height:normal;font-family:Menlo"><span style="">    setup(**configuration(top_path='').todict())</span></p><p style="margin:0px;line-height:normal">
</p><p style="margin:0px;font-size:11px;line-height:normal;font-family:Menlo"><span style="">  File "setup.py", line 44, in configuration</span></p><p style="margin:0px;line-height:normal">
</p><p style="margin:0px;font-size:11px;line-height:normal;font-family:Menlo"><span style="">    config.add_subpackage('cluster')</span></p><p style="margin:0px;line-height:normal">
</p><p style="margin:0px;font-size:11px;line-height:normal;font-family:Menlo"><span style="">  File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/numpy/distutils/misc_util.py", line 1003, in add_subpackage</span></p><p style="margin:0px;line-height:normal">
</p><p style="margin:0px;font-size:11px;line-height:normal;font-family:Menlo"><span style="">    caller_level = 2)</span></p><p style="margin:0px;line-height:normal">
</p><p style="margin:0px;font-size:11px;line-height:normal;font-family:Menlo"><span style="">  File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/numpy/distutils/misc_util.py", line 972, in get_subpackage</span></p><p style="margin:0px;line-height:normal">
</p><p style="margin:0px;font-size:11px;line-height:normal;font-family:Menlo"><span style="">    caller_level = caller_level + 1)</span></p><p style="margin:0px;line-height:normal">
</p><p style="margin:0px;font-size:11px;line-height:normal;font-family:Menlo"><span style="">  File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/numpy/distutils/misc_util.py", line 884, in _get_configuration_from_setup_py</span></p><p style="margin:0px;line-height:normal">
</p><p style="margin:0px;font-size:11px;line-height:normal;font-family:Menlo"><span style="">    ('.py', 'U', 1))</span></p><p style="margin:0px;line-height:normal">
</p><p style="margin:0px;font-size:11px;line-height:normal;font-family:Menlo"><span style="">  File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/imp.py", line 234, in load_module</span></p><p style="margin:0px;line-height:normal">
</p><p style="margin:0px;font-size:11px;line-height:normal;font-family:Menlo"><span style="">    return load_source(name, filename, file)</span></p><p style="margin:0px;line-height:normal">
</p><p style="margin:0px;font-size:11px;line-height:normal;font-family:Menlo"><span style="">  File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/imp.py", line 172, in load_source</span></p><p style="margin:0px;line-height:normal">
</p><p style="margin:0px;font-size:11px;line-height:normal;font-family:Menlo"><span style="">    module = _load(spec)</span></p><p style="margin:0px;line-height:normal">
</p><p style="margin:0px;font-size:11px;line-height:normal;font-family:Menlo"><span style="">  File "<frozen importlib._bootstrap>", line 693, in _load</span></p><p style="margin:0px;line-height:normal">
</p><p style="margin:0px;font-size:11px;line-height:normal;font-family:Menlo"><span style="">  File "<frozen importlib._bootstrap>", line 673, in _load_unlocked</span></p><p style="margin:0px;line-height:normal">
</p><p style="margin:0px;font-size:11px;line-height:normal;font-family:Menlo"><span style="">  File "<frozen importlib._bootstrap_external>", line 662, in exec_module</span></p><p style="margin:0px;line-height:normal">
</p><p style="margin:0px;font-size:11px;line-height:normal;font-family:Menlo"><span style="">  File "<frozen importlib._bootstrap>", line 222, in _call_with_frames_removed</span></p><p style="margin:0px;line-height:normal">
</p><p style="margin:0px;font-size:11px;line-height:normal;font-family:Menlo"><span style="">  File "./cluster/setup.py", line 8, in <module></span></p><p style="margin:0px;line-height:normal">
</p><p style="margin:0px;font-size:11px;line-height:normal;font-family:Menlo"><span style="">    from sklearn._build_utils import get_blas_info</span></p><p style="margin:0px;line-height:normal">
</p><p style="margin:0px;font-size:11px;line-height:normal;font-family:Menlo"><span style="">ImportError: No module named 'sklearn'</span></p></div></pre><div><div class="gmail_extra"><div class="gmail_quote">On Tue, Jun 14, 2016 at 11:41 AM,  <span dir="ltr"><<a href="mailto:scikit-learn-request@python.org" target="_blank">scikit-learn-request@python.org</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">Send scikit-learn mailing list submissions to<br>
        <a href="mailto:scikit-learn@python.org">scikit-learn@python.org</a><br>
<br>
To subscribe or unsubscribe via the World Wide Web, visit<br>
        <a href="https://mail.python.org/mailman/listinfo/scikit-learn" rel="noreferrer" target="_blank">https://mail.python.org/mailman/listinfo/scikit-learn</a><br>
or, via email, send a message with subject or body 'help' to<br>
        <a href="mailto:scikit-learn-request@python.org">scikit-learn-request@python.org</a><br>
<br>
You can reach the person managing the list at<br>
        <a href="mailto:scikit-learn-owner@python.org">scikit-learn-owner@python.org</a><br>
<br>
When replying, please edit your Subject line so it is more specific<br>
than "Re: Contents of scikit-learn digest..."<br>
<br>
<br>
Today's Topics:<br>
<br>
   1. Re: Adding BM25 relevance function (Pavel Soriano)<br>
   2. Re: The culture of commit squashing (Andreas Mueller)<br>
   3. Re: The culture of commit squashing (Tom DLT)<br>
<br>
<br>
----------------------------------------------------------------------<br>
<br>
Message: 1<br>
Date: Tue, 14 Jun 2016 16:11:10 +0000<br>
From: Pavel Soriano <<a href="mailto:sorianopavel@gmail.com">sorianopavel@gmail.com</a>><br>
To: Scikit-learn user and developer mailing list<br>
        <<a href="mailto:scikit-learn@python.org">scikit-learn@python.org</a>><br>
Subject: Re: [scikit-learn] Adding BM25 relevance function<br>
Message-ID:<br>
        <<a href="mailto:CAN0wWk93r2aw9No65CGiCW5hQG7-oFYVZaMJQpXpegTXMSqPLg@mail.gmail.com">CAN0wWk93r2aw9No65CGiCW5hQG7-oFYVZaMJQpXpegTXMSqPLg@mail.gmail.com</a>><br>
Content-Type: text/plain; charset="utf-8"<br>
<br>
Hey,<br>
<br>
Good thing that you are trying to finish this.<br>
<br>
Well, I looked into my old notes, and the Delta tf-idf comes from the "Delta<br>
TFIDF: An Improved Feature Space for Sentiment Analysis"<br>
<<a href="http://ebiquity.umbc.edu/_file_directory_/papers/446.pdf" rel="noreferrer" target="_blank">http://ebiquity.umbc.edu/_file_directory_/papers/446.pdf</a>> paper. I guess<br>
it is not very popular and apparently it has a drawback: it does not take<br>
into account the number of times a word occurs in each document while<br>
calculating the distribution amongst classes. At least that is what I wrote<br>
on my notes...<br>
<br>
As for the delta idf... If it helps, I can look into my old code cause I do<br>
not know what I was talking about. I guess it has to do somehow with the<br>
paper cited before.<br>
<br>
Cheers,<br>
<br>
Pavel Soriano<br>
<br>
<br>
<br>
<br>
On Tue, Jun 14, 2016 at 5:49 PM Basil Beirouti <<a href="mailto:basilbeirouti@gmail.com">basilbeirouti@gmail.com</a>><br>
wrote:<br>
<br>
> Hi Joel,<br>
><br>
> Thanks for your response and for digging up that archived thread, it gives<br>
> me a lot of clarity.<br>
><br>
> I see your point about BM25, but I think in most cases where TFIDF makes<br>
> sense, BM25 makes sense as well, but it could be "overkill".<br>
><br>
> Consider that TFIDF does not produce normalized results either<br>
> <<a href="http://scikit-learn.org/stable/auto_examples/text/document_clustering.html#example-text-document-clustering-py" rel="noreferrer" target="_blank">http://scikit-learn.org/stable/auto_examples/text/document_clustering.html#example-text-document-clustering-py</a>>,<br>
> If BM25 requires dimensionality reduction (eg. using LSA) , so too would<br>
> TFIDF. The term-document matrix is the same size no matter which weighting<br>
> scheme is used. The only difference is that BM25 produces better results<br>
> when the corpus is large enough that the term frequency in a document, and<br>
> the document frequency in the corpus, can vary considerably across a broad<br>
> range of values.Maybe you could even say TFIDF and BM25 are the same<br>
> equation except BM25 has a few additional hyperparameters (b and k).<br>
><br>
> So is the advantage that BM25 provides for large diverse corpora with it?<br>
> or is it marginal? Perhaps you can point me to some more examples where<br>
> TFIDF is used (in supervised setting preferably) and I can plug in BM25 in<br>
> place of TFIDF and see how it compares. Here are some I found:<br>
><br>
><br>
> <a href="http://scikit-learn.org/stable/tutorial/text_analytics/working_with_text_data.html" rel="noreferrer" target="_blank">http://scikit-learn.org/stable/tutorial/text_analytics/working_with_text_data.html</a><br>
> *(supervised)*<br>
><br>
> <a href="http://scikit-learn.org/stable/auto_examples/text/document_clustering.html#example-text-document-clustering-py" rel="noreferrer" target="_blank">http://scikit-learn.org/stable/auto_examples/text/document_clustering.html#example-text-document-clustering-py</a><br>
> (*unsupervised)*<br>
><br>
> Thank you!<br>
> Basil<br>
><br>
> PS: By the way, I'm not familiar with the delta-idf transform that Pavel<br>
> mentions in the archive you linked, I'll have to delve deeper into that. I<br>
> agree with the response to Pavel that he should be putting it in a separate<br>
> class, not adding on to the TFIDF. I think it would take me about 6-8 weeks<br>
> to adapt my code to the fit transform model and submit a pull request.<br>
><br>
><br>
><br>
><br>
><br>
><br>
> _______________________________________________<br>
> scikit-learn mailing list<br>
> <a href="mailto:scikit-learn@python.org">scikit-learn@python.org</a><br>
> <a href="https://mail.python.org/mailman/listinfo/scikit-learn" rel="noreferrer" target="_blank">https://mail.python.org/mailman/listinfo/scikit-learn</a><br>
><br>
--<br>
Pavel SORIANO<br>
<br>
PhD Student<br>
ERIC Laboratory<br>
Universit? de Lyon<br>
-------------- next part --------------<br>
An HTML attachment was scrubbed...<br>
URL: <<a href="http://mail.python.org/pipermail/scikit-learn/attachments/20160614/cbe49979/attachment-0001.html" rel="noreferrer" target="_blank">http://mail.python.org/pipermail/scikit-learn/attachments/20160614/cbe49979/attachment-0001.html</a>><br>
<br>
------------------------------<br>
<br>
Message: 2<br>
Date: Tue, 14 Jun 2016 12:13:29 -0400<br>
From: Andreas Mueller <<a href="mailto:t3kcit@gmail.com">t3kcit@gmail.com</a>><br>
To: Scikit-learn user and developer mailing list<br>
        <<a href="mailto:scikit-learn@python.org">scikit-learn@python.org</a>><br>
Subject: Re: [scikit-learn] The culture of commit squashing<br>
Message-ID: <<a href="mailto:57602D29.1070203@gmail.com">57602D29.1070203@gmail.com</a>><br>
Content-Type: text/plain; charset="windows-1252"; Format="flowed"<br>
<br>
I'm +1 for using the button when appropriate.<br>
I think it should be up to the merging person to make a call whether a<br>
squash is a better<br>
logical unit than all the commits.<br>
I would set like a soft limit at ~5 commits or something. If your PR has<br>
more than 5 separate<br>
big logical units, it's probably too big.<br>
<br>
The button is enabled in the settings but I can't see it.<br>
Am I being stupid?<br>
<br>
On 06/14/2016 06:58 AM, Joel Nothman wrote:<br>
> Sounds good to me. Thank goodness someone reads the documentation!<br>
><br>
> On 14 June 2016 at 19:51, Alexandre Gramfort<br>
> <<a href="mailto:alexandre.gramfort@telecom-paristech.fr">alexandre.gramfort@telecom-paristech.fr</a><br>
> <mailto:<a href="mailto:alexandre.gramfort@telecom-paristech.fr">alexandre.gramfort@telecom-paristech.fr</a>>> wrote:<br>
><br>
>     > We could stop squashing during development, and use the new Squash-and-Merge<br>
>     > button on GitHub.<br>
>     > What do you think?<br>
><br>
>     +1<br>
><br>
>     the reason I see for squashing during dev is to avoid killing the<br>
>     browser when reviewing. It really rarely happens though.<br>
><br>
>     A<br>
>     _______________________________________________<br>
>     scikit-learn mailing list<br>
>     <a href="mailto:scikit-learn@python.org">scikit-learn@python.org</a> <mailto:<a href="mailto:scikit-learn@python.org">scikit-learn@python.org</a>><br>
>     <a href="https://mail.python.org/mailman/listinfo/scikit-learn" rel="noreferrer" target="_blank">https://mail.python.org/mailman/listinfo/scikit-learn</a><br>
><br>
><br>
><br>
><br>
> _______________________________________________<br>
> scikit-learn mailing list<br>
> <a href="mailto:scikit-learn@python.org">scikit-learn@python.org</a><br>
> <a href="https://mail.python.org/mailman/listinfo/scikit-learn" rel="noreferrer" target="_blank">https://mail.python.org/mailman/listinfo/scikit-learn</a><br>
<br>
-------------- next part --------------<br>
An HTML attachment was scrubbed...<br>
URL: <<a href="http://mail.python.org/pipermail/scikit-learn/attachments/20160614/135d4c27/attachment-0001.html" rel="noreferrer" target="_blank">http://mail.python.org/pipermail/scikit-learn/attachments/20160614/135d4c27/attachment-0001.html</a>><br>
<br>
------------------------------<br>
<br>
Message: 3<br>
Date: Tue, 14 Jun 2016 18:40:39 +0200<br>
From: Tom DLT <<a href="mailto:tom.duprelatour@orange.fr">tom.duprelatour@orange.fr</a>><br>
To: Scikit-learn user and developer mailing list<br>
        <<a href="mailto:scikit-learn@python.org">scikit-learn@python.org</a>><br>
Subject: Re: [scikit-learn] The culture of commit squashing<br>
Message-ID:<br>
        <CAGKmC=sRMbwo1Pjm=<a href="mailto:ph3R6OqsmvZUZDBMjvj09yJwkk0%2BYq4EA@mail.gmail.com">ph3R6OqsmvZUZDBMjvj09yJwkk0+Yq4EA@mail.gmail.com</a>><br>
Content-Type: text/plain; charset="utf-8"<br>
<br>
@Andreas<br>
It's a bit hidden: You need to click on "Merge pull-request", then do *not*<br>
click on "Confirm merge", but on the small arrow to the right, and select<br>
"Squash and merge".<br>
<br>
2016-06-14 18:13 GMT+02:00 Andreas Mueller <<a href="mailto:t3kcit@gmail.com">t3kcit@gmail.com</a>>:<br>
<br>
> I'm +1 for using the button when appropriate.<br>
> I think it should be up to the merging person to make a call whether a<br>
> squash is a better<br>
> logical unit than all the commits.<br>
> I would set like a soft limit at ~5 commits or something. If your PR has<br>
> more than 5 separate<br>
> big logical units, it's probably too big.<br>
><br>
> The button is enabled in the settings but I can't see it.<br>
> Am I being stupid?<br>
><br>
><br>
> On 06/14/2016 06:58 AM, Joel Nothman wrote:<br>
><br>
> Sounds good to me. Thank goodness someone reads the documentation!<br>
><br>
> On 14 June 2016 at 19:51, Alexandre Gramfort <<br>
> <a href="mailto:alexandre.gramfort@telecom-paristech.fr">alexandre.gramfort@telecom-paristech.fr</a>> wrote:<br>
><br>
>> > We could stop squashing during development, and use the new<br>
>> Squash-and-Merge<br>
>> > button on GitHub.<br>
>> > What do you think?<br>
>><br>
>> +1<br>
>><br>
>> the reason I see for squashing during dev is to avoid killing the<br>
>> browser when reviewing. It really rarely happens though.<br>
>><br>
>> A<br>
>> _______________________________________________<br>
>> scikit-learn mailing list<br>
>> <a href="mailto:scikit-learn@python.org">scikit-learn@python.org</a><br>
>> <a href="https://mail.python.org/mailman/listinfo/scikit-learn" rel="noreferrer" target="_blank">https://mail.python.org/mailman/listinfo/scikit-learn</a><br>
>><br>
><br>
><br>
><br>
> _______________________________________________<br>
> scikit-learn mailing listscikit-learn@python.orghttps://<a href="http://mail.python.org/mailman/listinfo/scikit-learn" rel="noreferrer" target="_blank">mail.python.org/mailman/listinfo/scikit-learn</a><br>
><br>
><br>
><br>
> _______________________________________________<br>
> scikit-learn mailing list<br>
> <a href="mailto:scikit-learn@python.org">scikit-learn@python.org</a><br>
> <a href="https://mail.python.org/mailman/listinfo/scikit-learn" rel="noreferrer" target="_blank">https://mail.python.org/mailman/listinfo/scikit-learn</a><br>
><br>
><br>
-------------- next part --------------<br>
An HTML attachment was scrubbed...<br>
URL: <<a href="http://mail.python.org/pipermail/scikit-learn/attachments/20160614/511d2a1d/attachment.html" rel="noreferrer" target="_blank">http://mail.python.org/pipermail/scikit-learn/attachments/20160614/511d2a1d/attachment.html</a>><br>
<br>
------------------------------<br>
<br>
Subject: Digest Footer<br>
<br>
_______________________________________________<br>
scikit-learn mailing list<br>
<a href="mailto:scikit-learn@python.org">scikit-learn@python.org</a><br>
<a href="https://mail.python.org/mailman/listinfo/scikit-learn" rel="noreferrer" target="_blank">https://mail.python.org/mailman/listinfo/scikit-learn</a><br>
<br>
<br>
------------------------------<br>
<br>
End of scikit-learn Digest, Vol 3, Issue 27<br>
*******************************************<br>
</blockquote></div><br></div></div></div></div>